Aggregates for GSequence



At

    http://www.daimi.au.dk/~sandmann/aggregates.patch

there is a patch to add aggregates to GSequence. 

In a tree data structure, an 'aggregate' is information stored in a
node which is computed from that node's children. This field from
GtkRBTree:

    /* count is the number of nodes beneath us, plus 1 for ourselves.
     * i.e. node->left->count + node->right->count + 1
     */
    gint count;

is an example.

There are two types of information involved, "atomic" and
"aggregate". Atomic information is information that directly describes
one node, where as aggregate information describes more than one
node. 

In the "count" example, the atomic information is simply "1" per node,
whereas the aggregate is the sum of all aggregates.

In the proposed API, there is an AggregateFunction which is called for
subsequences of the sequence and passed aggregates for the left and
right parts of the subsequence, and also a pointer to the user data
for a node in the middle of the subsequence.

I believe the API I am proposing is general enough that GSequence
could replace the trees used in GtkTextView and GtkTreeView. There are
some caveats though:

- The common operation of finding the first node with a certain
  property, say the first 'not validated' node in a GtkRBTree, can be
  done in time O(log n), but the implementation with the proposed API
  is O(log^2 n).  This is due to the "list" API of GSequence where
  there is no natural concepts of "left", "right" and "children". I
  don't see any way of making this operation O(log n) without
  complicating the API a lot. On the other hand, I don't think the
  difference between O(log n) and O(log^2 n) is huge in most cases.

- It adds two more fields to GSequenceNode, which brings its size up
  to 7 words from 5. This could probably be fixed by adding an extra
  indirection on the data pointer, say.

- For "invertible" aggregate functions such as "add", aggregates can
  be maintained without storing any atomic information for individual
  nodes. For example after a left rotation of this tree:

      a
     / \
    b   c
       / \
      d   e

  the new aggregate information would be (using ' to indicate "new")

    a' = (a - c) + d
    c' = (c - d) + a'

  which only makes use of aggregate information and no atomic
  information.

  GtkRBTree uses this technique with its "offset" aggregate. There,
  the atomic information is the height of individual rows, and the
  aggregate is the sum. The height of a row is expensive to compute
  because it calls into pango.

  The implementation in the patch does not support this. Adding such
  support is not impossible, but would require a bit of somewhat
  complicated API involving functions that can invert the aggregate to
  the left and right.

- I am not personally going to port GtkTextView or GtkTreeView.


The API is here:

/* Aggregates */
typedef gpointer (* GSequenceAggregateFunc) (gconstpointer left_agg,
                                             gpointer      middle_data,
                                             gconstpointer right_agg,
                                             gpointer      user_data);

void g_sequence_set_aggregate_function  (GSequence   *seq,
                                         GSequenceAggregateFunc  func,
                                         gpointer    data,
                                         GDestroyNotify agg_destroy);

void g_sequence_aggregate_changed       (GSequenceIter *iter);
gconstpointer  g_sequence_get_aggregate (GSequenceIter *begin,
                                         GSequenceIter *end);


If there is any interest in this, I'll write documentation and test
cases.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]