Re: Possible Glib BTree substitute




Hi,

I'll let Josh or some other expert comment on the data structure in
general, but some notes with respect to my text widget:

 - The BTree data structure and its nodes are entirely concealed
   in the single file frootkxtbtree.c in my widget, so it's 
   certainly possible to experiment with skip lists while retaining
   all the other parts of the widget. It might be nice to rename 
   the BTree object, since its API does not require a BTree.
   But even the BTree object is not public. So basically you can 
   change my widget to use a BTree and keep it binary/source 
   compatible.

   Anyway, I think it would be wrong to compare text widgets on the 
   basis of the data structure used, the data structure is not 
   by any means the only important code in a text widget.

 - The tree nodes are not really a memory problem; they contain 
   9 32-bit values, and a 50,000-line buffer results in about 
   7,700 nodes; 36 bytes per node times 8000 is under 300K. 
   Of the 9 elements in the node, all 9 (I think) are also 
   required in a skip list node. So you can only improve on 
   memory by reducing the number of nodes, and even then the 
   nodes are a small percentage of total memory used once you 
   count lines, wrapped display lines, character data, and other 
   stuff that either data structure will have.

 - The only user-visible speed problem with the BTree, with a 
   50,000-line buffer, is re-wrapping a lot of lines. gprof
   reveals most time spent in gdk_char_width(), with the second 
   largest amount of time spent retrieving a list of tags 
   applied to a given point in the buffer.

   Pango will increase the gdk_char_width() time. This is really 
   the huge bottleneck.

   Finding the list of tags involves walking up the tree from a leaf, 
   and at each parent node adding up the number of tag toggles below 
   that node (if you look at the code, in frootkxtbtree.c, inc_count()
   is the function that shows up most in the profiling). Basically 
   this operation is O(n) in the number of parent nodes a line has.
   So, this will be faster if the skip list is shallower than the
   tree.

 - I think you'll find that most of the complexity of frootkxtbtree.c
   is not the tree (all the tree complexity is in
   froo_tkxt_btree_rebalance()), rather the complexity is in updating
   all the index information at the nodes (char_count, line_count,
   tag summaries, max width, total height). There is also 
   significant complexity in the lines and segments, but a skip list
   would have lines and segments as well. 

   So a different data structure isn't going to help that much with 
   the complexity that exists in this application.

 - The frootkxtbtree.c code could be substantially improved by
   rewriting it with a generic tree data structure which the 
   widget uses, most likely.

Some thoughts. Anyway, basic points are:
 - the memory problem has nothing to do with nodes
 - the speed problem is in rewrapping lines; this involves
   font metrics, and it involves figuring out which tags apply 
   to a given region of text. You can speed this up potentially 
   by reducing the depth of your data structure, but the 
   character metrics stuff will continue to be the main problem 
   (Pango makes it worse, too).
 - the Tk widget isn't really tied to the BTree, in fact the BTree
   and Node structures are not defined in any header file.
 - the BTree code in the Tk-based widget could use a good cleanup, 
   and a generic glib-style container is likely a good start.

Havoc



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]