Fwd: Persistent Binary Search Trees
- From: Dana Jansens <dana cg scs carleton ca>
- To: gtk-devel-list gnome org
- Subject: Fwd: Persistent Binary Search Trees
- Date: Mon, 5 Oct 2009 09:05:56 -0400
I am interested in contributing code for persistent binary search
trees to the GLib project. This would actually count as credit
towards a grad level data structures class for me, but I have a lot of
experience working with GLib in the past, and developing C-based open
source projects. First some background on what persistence means and
why it would be nice to have available in an open source data
A persistent binary search tree is simply a BST that is versioned over
changes, such that you can do a lookup in any previous version of the
tree. An efficient implementation of this data structure requires
O(1) additional space per insertion into/deletion from the tree. That
is, after n insertions, creating n different versions of the tree, the
total space required would be roughly 2*(space for a single version),
while allowing you to search in any version of the tree. Searching in
any version takes the same number of comparisons and pointer
redirections as a tree with only a single version.
Persistent BSTs are used for operations such as a Plane Sweep  in
Computational Geometry. At the same time, the code behind the
persistence can be generalized to provide an additional data structure
that performs Fractional Cascading  for no extra cost to the BST
There are currently no readily available open source implementations
of a persistent BST. This would be a great tool for students,
academic researchers, and other developers.
I have noticed that GLib has a Binary Search Tree data structure,
which currently uses an AVL tree. AVL trees can be made persistent,
but the space requirement becomes O(log n) per insert/delete instead
There are 2 other BSTs that would work effectively to provide a
persistent BST, while also maintaining a very efficient implementation
for when persistence is not used. They are:
- Red-Black Trees
Treaps can be expected to outperform Red-Black Trees for any sized
tree. The constant in the search cost for a Treap is Approximately
equal to AVL trees (and slightly less for large trees), and is always
smaller than Red-Black Trees. (It's a small constant regardless.)
Treaps are also much simpler to implement, requiring less complicated
code. However Treaps are a randomized data structure, which some
people try to stay away from.
My proposed changes to the GTree API are the addition of the following
// returns the latest version of the tree available (starts from 0)
guint g_tree_current_version (GTree *tree);
// creates a new version of the tree, changes to the tree will no longer
// affect the previous versions. returns the new current version
guint g_tree_next_version (GTree *tree);
For each lookup/traversal function, a new function which takes
arguments (GTree *tree, guint version, ...) would be added, with
existing functions becoming macros pointing to the new ones.
This API allows a version to contain more than a single insert/delete,
if desired, making them more of a logical entity.
So my questions are this:
1) Is there interest in having a persistent binary search tree in the
GLib library ?
2) If so, would there be resistance to using a random data strucuture,
i.e. a Treap to implement them ?
Thanks very much,
p.s. I sent this before subscribing, but its been stuck in the
moderator queue for more than a week now, so I'm sending again.
Apologies if you see this twice.
] [Thread Prev