A tale of waiting

I have been on a quest to improve performance of the file chooser
lately. This post is about this process: what I measured, what I
learned and what I patched.

I'll start with my motivation. Just like all my applications do with
their dotfiles, I use my home directory as a dumping ground for stuff.
PDFs of specs, files someone sent me or that I downloaded somewhere,
code I wrote, etc. All the stuff that doesn't have a place anywhere,
but is too important to throw away gets dumped into my home directory.
Currently on this laptop, my home directory contains 3380 files. And
whenever an application opens the file chooser, it shows my home
directory - either because it's the default directory or because I
explicitly told it to - it is my default dumping ground after all.
With a warm cache, the filechooser takes 4-6 seconds to display the
full list. It sometimes displays a partial list, but that is utterly
useless as this partial list is rapidly changing while more files get
added. Having to wait 4 seconds to select a file is not a good thing.
In fact it has become so annoying that last Friday I set out to change
it - after deciding that sorting my 3500 files would probably take
longer and wouldn't help other people either.

The first thing I did was set a goal. I decided to target the ls
command, because I know it to be fast and because there's no reason my
terminal should be faster than my file chooser for listing files -
even if Behdad and the Chris'es maintain it. So i ran time ls -la >
/dev/null, it took about 0.1s. This means that everything that takes
more than 2% total CPU time needs to go away. I should note that I do
not care about memory usage a lot, only about execution speed, as a
file chooser should go away quickly and give you back your memory.

Next, I set up a way to measure the performance of the file chooser.
For this I created a test directory containing 40.000 files like so:
  mkdir test-directory
  for i in `seq 1 20000`; do echo "This is a text file" >
test-directory/file$i.txt; done
  for i in `seq 1 20000`; do echo cp small-image.png
test-directory/image$i.png; done
Of course, this is only a test directory that might have
pathologically bad performance in certain cases, so keep this in mind.
I nonetheless think it does a pretty good job as a - even if somewhat
exaggerated - performance measurement for big directories. In
particular it does a very good job of finding O(n²) performance
For measuring I mostly use my watch. When posting overly accurate
results - I probably used time instead. The computer I'm running this
on is a 2nd generation Macbook with a dual-core 1.83GHz and enough
memory to always run the tests with a warm cache, so there is no
hard-disk IO happening. For finding the bottlenecks I used sysprof.

First I measured the time it takes in various applications to list the
test directory:
 *  0.8s - ls -la
 *  1.8s - gvfs-ls -a
 * 23s   - nautilus
 * 60s - thunar
 * 60s - the file chooser

Next, I looked, where most of the time is taken. These things were standing out:
 * Getting the mime type
Getting the mime type for a file requires opening the file and
checking the first bytes against the known patterns for files. Both
operations take time. The mime type also is usually not required.
 * Getting the icons
Actually loading the icon takes time. It's faster when the icon cache
can be used - which is not true for thumbnails, or in this case half
of the files - but even then it's slow. In fact, using the icon cache
is only about 30% faster than just loading the icon from disc. (see
bug 586161)
 * Sorting the list
The file chooser uses a GtkTreeModelSort to sort. A huge amount of CPU
was spent trying to sort. And when looking at the ls time (which does
sort its output), this shouldn't happen.
 * the GtkTreeCellDataFuncs
The file chooser used custom cell data funcs that computed the
required values on demand from a GFileInfo. Unfortunately, some of
these values are complex to compute - like the thumbnail. It would
make more sense to compute the values only once, cache them and use
gtk_tree_view_column_set_attributes() instead.
So the first thing I decided to do was rip all these things out and
make a tree model that quickly listed all the files as unsorted as
they come. For this I pretty much rewrote GtkFileSystemModel, as the
previous incarnation supported a tree-like layout (even though that
wasn't used) and a linked list for storage.
While I did that, I realized that a lot of code in the file chooser is
triplicated - it exists once each for browsing, recent files and
search. I tried to come up with a tree model that abstracts away these
differences so that the file chooser code could be simplified. So I
modified the GtkFileSystemModel API to look more like a GtkListStore.
By the time I had done that, it was Sunday evening, I had removed over
700 lines netto from gtkfilechooserdefault.c, had rewritten
gtkfilesystemmodel.c and the file chooser opened my test directory in
5 seconds.

Next I implemented sorting. Because I use an array, I can use qsort(),
which is fast. I had thought about switching to GSequence to get ever
closer to GtkListStore, but I did not do that, and one of the reasons
is that GSequence uses insertion sort, which is quite slow. Sorting
40.000 elements takes over a second in a small test I did.
After I had implemented sorting, the file chooser was abysmally slow
again - taking around 25s. This was for two reasons:
 * The sort function was called way too often. So I added the option
to freeze and thaw the tree view. Now the sort function was only
called once or twice.
 * gtk_tree_model_get() being called in the iter comparison functions
took almost all the CPU. 25% of the time was due to needlessly copying
GValues, but 75% of the time was spent in lookup of the GtkTreeModel
interface. As the comparison function is in a critical spot for
performance, this was inacceptable. Also, using
gtk_tree_model_get_value() directly didn't get rid of any of these
problems. So I added an access function to GtkFileSystemModel and the
problem was gone.

And today I implemented loading only the visible icons, after I spent
almost a whole day wondering how to do it and almost giving up. The
idea was this: Only load the icon when it is displayed. Hardcode the
cell renderer size (icon sizes are known after all) and be happy.
Unfortunately, the tree view "validates" all rows (for computing of
ideal column widths and scrollbar size) and in the process accesses
every value in the model. Fortunately
gtk_tree_view_get_visible_range() is very fast, so I could use it in
the value lookup function to decide if an icon should be loaded,
without any noticable performance impact.

And that is the current state. When I currently open my home directory
it takes roughly 1 second to display, the test directory takes 6
seconds. This is still way too slow for my taste, but so much faster
that my work is useful already. I pushed it for your reviewing
pleasure to the filesystemmodel branch in the gtk+ git repository.

What is left to do?
 *  * reimplement directory monitoring. I did never get around adding that code.
 * Fix the usage of a filter on the mime type. Currently we don't
query the mime type (it requires file sniffing after all), so we never
get matches.
 * File bug(s) about the GValue and interface slowness
 * File a bug about this work
 * evaluate fixed height mode for the tree view, it might make things
even faster without losing features.
 * port search and recent files to GtkFileSystemModel, to get rid of
more special casing code and make them faster. (Woohoo!)

There are some remaining performance issues, that I didn't touch on
yet, but that are quite important for really catching up to ls.
 * 50+% of the remaining CPU is spent in GtkTreeView's validate_row().
I have no clue what that function even does. Is there a way to get rid
of it?
 * 20+% of time is spent in enumerating the files, with
lookup_attribute() in gio/gfileinfo.c accounting for 1/3rd of that
time. There should be a way to use the numeric ids directly instead of
always looking them up. In GLocalFileInfo it would make sense to me to
use numeric ids directly. This is an even bigger problem in Nautilus
(remember: it took 23s in the above test), as it queries a lot more
 * The async implementation of g_file_enumerator_next_files_async() is
very suboptimal, as it stops after N files are enumerated, and waits
for another call to the function to resume. It would be better if it
just kept going, so the next call can return already stored files. The
current behavior can cause excessive sorting in the current

And now the obvious question: Should I just merge it to master when
I'm done with the regressions?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]