Re: [TreeView] Performance of TreeView with huge lists

From: Kristian Rietveld <kris gtk org>
To: Thorsten Wilmer <thorsten wilmer udo edu>
Cc: gtk-devel-list gnome org
Subject: Re: [TreeView] Performance of TreeView with huge lists
Date: Mon, 11 Feb 2008 22:20:42 +0100
Hi,

Comments inline.

On Sun, Feb 10, 2008 at 08:26:01PM +0100, Thorsten Wilmer wrote:
> I want to Display  a really _huge_ List with TreeView. My result is,
> that my application consumes a even more huge amount of memory -- more
> than Virtual Memory is available. The responsiveness is also not as
> expected. I have only a few lines to be displayed, I expect that the
> TreeView does not iterate over the complete tree before showing
> anything, even though the Data are not accessed. It just takes several
> seconds to iterate over 1000000 elements and I don't want to wait for
> 100000000 elements.  I want a fixed_height display and my Model is
> marked as a list. By the way 100000000 elements are only a few Megabytes
> and my Harddisks stores Gigabytes...

Just as random thoughts, is there any way that giving a user a list of
100 million items is going to be useful?  If you already know that the
user will never see all of the list, why try to display it?  And
searching through this list for a specific item using the scrollbar is
going to be very tedious, if not impossible, for the user.

> I tracked the problem down and found that gtk_tree_view_build_tree
> effectively copies the complete structure of my List. This is a very bad
> idea. My model is able to load the needed data on demand from disk, but
> it is not possible to store the complete list in memory. As only a small
> amount of the data are visible at any given time, the view should only
> load the data to be displayed. And the view shall never use the
> iteratores to iterate over the complete list, if not requested by the
> user. I expect that the user will never see the complete list, though.
> 
> Then I thought, why not optimize TreeView? If its known that the Model
> is a list and each element shall have a fixed_height, then it is known
> at any time, what Data are to be displayed. It is possible to map the
> display coordinates to the appropriate item in the model.

For your case this is indeed possible.  But GtkTreeView is a very
general widget that has a multitude of use cases.  We cannot "optimize"
the tree view by just ripping out the RBTree, this is impossible.  As
you've figured the RBTree is used for storing data about nodes, such as
selection state.  But the RBTree also plays a very important role in
non-fixed-height views and views that use row hinting (we need the parity
property for this).  The most important part remains the bidirectional
mapping between paths and the actual location in coordinates of nodes in
the view.

> The next problem ist, what is to be done with the several masks and
> attributes which are stored in the rb_tree. For my application it is
> sufficient to have a single selection, but it might also be feasible to
> store a selection as a list of selections and deselections done by the
> user. And trying to compact the List after each add, so that if I select
> everything, it just states: selected [0-n] unselected [].
> 
> Quite a similar optimization might exist, to represent the collapsed and
> expanded states of the tree. Then you do not need to copy the complete
> structure of the Tree into the private tree and node fields. You can
> calculate the ranges, how to access a complete list in the tree.

But instead we will have to search lists to see if a node is selected
and expanded during drawing routines?  I think this can especially be a
problem if we have a list of 10000 nodes that have been expanded.

> I have several questions:
>  - How has this problem been solved by others. Is there an existing
> Gtk-Solution to my Problem?
>  - Is there anybody other than me trying to solve this kind of problem?

I don't know of anybody trying to display 100 million nodes in a list.
IIRC 4 million has been done though.

>  - What do you suggest:

For all of these suggestions I think you should keep in mind that
displaying such amounts of rows is not common at all and can be
considered a very special case that requires a specialized widget.

>  -- write a new simple specialized Widget to ViewHugeLists, which uses
> the TreeViewList or another model (Easy, might not be accepted to be
> integrated into gtk, because TreeView also displays lists)

>  -- change each TreeView function, which uses ->priv->tree or
> ->priv->node if it is a fixed_height list, dynamically recalculate
> everything needed and leave the ->priv->tree ->priv->node as they are
> now. (Hard, might not be accepted to be integrated into gtk, because 
> duplicated code and limited to Lists of fixed height)Q

Here you are basically going to rewrite all parts that use the rbtree
(which is a *lot*) into code that is completely tied to the fixed height
concept.  This is indeed at least very hard, if not impossible.  You
might even be better off writing a small specialized widget instead.

>  -- rewrite the complete TreeView to get completely rid of tree and node
> and recalculate only the needed information from the compacted current
> state of the tree and the model? (Master, might take a long hard time to
> implement, might become accepted if it works, the external interface is
> not changed and passes the TreeView test). This also works only for Rows
> of fixed_height because to calculate the Maximum rubber position you
> need to know the Height of the complete tree, an if the height of the
> rows differ it is not possible to precalculate the height.

This will never be accepted, because it would completely lose the
non-fixed-height functionality of the tree view ...

> Another problem is, how to evolve the nice existing TreeView into a
> TreeView which takes full advantage of the fixed_height ?  

This is a very hard problem and I haven't really "researched" this yet.
Maybe doing something with incrementally filling the RBTree is a
possibility, though this is a very wild guess.

> Is it feasible, to Modify each function, to first test if the Flag for
> HugeTree is set, then call the fixed_height counter part or otherwise
> leave the function as it is? The HugeTree optimization can only be
> activated, if it is fixed_height. The fixed_height counter part can
> test,  if the function is already capable to cope with the needed
> functionality. If a not supported function is needed, the optimization
> has to be disabled and the tree has to reinitialize itself or do
> nothing. As the gtk-user has to activate the HugeTree optimization and
> can test if everything needed is already working as expected.

I don't think this is really feasible.  I am not keen at all on having a
GtkTreeView that can choose between two completely different data
structures for keeping the internal tree data depending on how large the
data source is.  Note that the GtkTreeView source code is already close
to 500K.

> Or is it better to write a HugeTreeView from scratch, using the TreeView
> public interface, which first only handles Lists but can be extended to
> handle HugeTrees later?

If you are still inclined on showing lists of 100 million rows, I think
the best thing you can do right now is to write your own specialized
widget.



regards,

-kris.
Follow-Ups:
- Re: [TreeView] Performance of TreeView with huge lists
  - From: Morten Welinder
- Re: [TreeView] Performance of TreeView with huge lists
  - From: Andrew Cowie
References:
- [TreeView] Performance of TreeView with huge lists
  - From: Thorsten Wilmer
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]