Re: Nautilus vs. command line scp

From: Christian Neumair <chris gnome-de org>
To: Saad Shakhshir <shakhshir gmail com>
Cc: gnome-vfs-list <gnome-vfs-list gnome org>, Ubuntu-Devel <ubuntu-devel lists ubuntu com>
Subject: Re: Nautilus vs. command line scp
Date: Sat, 09 Sep 2006 16:44:20 +0200
Am Freitag, den 08.09.2006, 02:31 -0400 schrieb Saad Shakhshir:
> I've noticed some strange behavior when connecting to a server through
> nautilus and downloading files as opposed to running scp straight from
> the command line.  I ran several tests that involved downloading a
> large file from a remote server both through the nautilus GUI and from
> the command line.  This confirmed my suspicions which are highlighted
> in the below linked screenshots.  
> 
> The download behavior of nautilus is extremely erratic with download
> speeds continually spiking from zero to 160kb/s and then back to zero.
> When comparing the nautilus graph to the scp graph, the difference is
> incredible.  Commandline scp is nearly impeccably smoothe, with few
> but very regular spikes.  These tests were run at several different
> times of the day and continued to produce similar results. 
> 
> Does anyone have working knowledge of the nautilus code that might
> have an idea as to what is going on here?  Note that in terms of
> average speed they were both roughly the same (commandline scp was
> slightly faster on average). 
> 
> nautilus: http://img215.imageshack.us/img215/3236/nautilusqq6.png
> scp: http://img237.imageshack.us/img237/5474/scplr0.png

Let me CC the gnome-vfs development mailing list as I think this is of
general interest:

Nautilus (in fact GmomeVFS) uses a pull strategy by dividing the source
file size into MIN(~500 kb,filesize) chunks (sftp-method.c:
default_req_len * max_req), processing one of these chunk at a time.

It now divides each of the MIN(500 kb,filesize) blocks further into
default_req_len big (i.e. 32kb) chunks, issues parallel requests for
each of the default_req_len chunks (*). As all of the chunks are
received and we transferred MIN(500kb, filesize) data, the data is
passed to the GnomeVFS XFer glue code, which will issue a seek, write
the data to the target and request the next MIN(500 kb,filesize-500kb)
data chunk.

IIRC in contrast the OpenSSH code also issues parallel requests (**),
but is more push-like: As a 32kb block arrives, it will seek the target,
write the chunk and immediately issue a new request. Thus, it will
usually have multiple pending requests at any time while GnomeVFS won't
have any pending request when the data is written.

If you think it is a major problem, I thought about two solutions:

a) I had plans for implementing a push data transfer for GnomeVFS where
the glue code would ensure that we always have multiple data requests,
but this would only be relevant for the sftp method so it might be waste
of energy. Also, some target file systems might not be able to seek() or
seek() very slowly, the idea would have been that some module globals
describe whether (and how many) read() IO_BLOCK_SIZE big requests may
pend in parallel, and whether the target method supports seek, and
additionally wants to "seek&write".

b) We could invent something that returns successfully received 32kb
(i.e. default_request_len) chunks from sftp's read() as they arrive and
are located at the beginning of the ~500 kb chunk.

For both solution, one has to consider the following sftp pitfall:
The whole GnomeVFS sftp code currently assumes that sftp access is
serial, through, i.e. there are no outstanding requests when any
function (like read()) leaves, because you can't rely on the order of
the incoming data when issueing an ls() and a read() in parallel. My
personal opinion is that supporting multiple parallel requests will help
us in making some operations (ls(), for instance) way more responsive
and less memory-intense, but requires some hours of hacking.

(*) which is a special SFTP facility for getting high transfer rates
(**) actually, we c'n'ped lots of OpenSSH code

-- 
Christian Neumair <chris gnome-de org>
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]