Re: [Nautilus-list] branch the branch?

From: Darin Adler <darin bentspoon com>
To: Maciej Stachowiak <mjs noisehavoc org>
Cc: nautilus-list lists eazel com
Subject: Re: [Nautilus-list] branch the branch?
Date: Tue, 21 Aug 2001 08:43:49 -0700

On Tuesday, August 21, 2001, at 12:43  AM, Maciej Stachowiak wrote:

Darin, any opinions?

My personal preference is to get this all on HEAD as soon as possible.Using a branch to the branch sounds like a fine way to keep things goingin the mean time.

The changes are getting big because my attempts to improve performance
are turning into a game of whack-a-mole - fix one thing and another
pops up really high on the profile.

That's exciting to hear. That makes me confident that it's real productivework, since optimization usually works like that. I'm so excited to hearabout this.

The one I'm currently workign on is N^2 behavior in
nautilus-directory-async.c - I think to get good performance on large
directories it needs a queue of files that are missing at least one
piece of info instead of scanning the whole file list each time.

I knew that this might come up eventually. When you start on this change,you might want to take an extra moment consider exactly what datastructures to use once you go beyond the simple-minded ones.


Here are some thoughts that you might find useful, but are free to ignore:

For example, if we didn't mind making all NautilusFile objects bigger, wecould put a link to the location in the "files that need some I/O" queueinside the NautilusFile, which would make removal fast. But if we startgoing that direction, then we might want to use an intrusive circular listinstead of using GList at all.

If we want to avoid making the Nautilus objects bigger, then a hash tablemight be the right choice because it's not O(n) to remove items, but onthe other hand, g_list_remove might be really fast even if it is O(n). Theexpensive operation is probably evaluating the state of each NautilusFileobject over and over again to decide what work is needed.


Another issue is the ordering of operations in nautilus-directory-async.c.

Currently it's a bit randomish, but if we want reliable behavior we mightwant to consider the algorithm for what work to do first. Implementingthat algorithm might lead to a specific kind of data structure. One datastructure that keeps an ordering, but doesn't have O(n) remove is a GListand a GHashTable with entries that point to the list nodes. That'sactually the kind of data structure used for the file list, although thehash table keys are file names, not file object pointers.

Another way to go at this is to keep a separate queue of objects that needeach type of work. The "changed" function would reevaluate whether theobject belongs in each queu. So if 100 objects need file info, but onlyone needs a directory MIME type scan, then there would be two separatelists instead of one list with 101 objects.


    -- Darin

PS: I'm going to write some documentation about I/O in Nautilus today, aswe discussed, to try to help future hackers who need to work in this area.

References:
- [Nautilus-list] branch the branch?
  - From: Maciej Stachowiak

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]