[Gnome-devtools] implementing versioned objects



Hello,
  I've been working on glf a bit, more specifically the implementation
of versioned objects. The issue that is the most important in the
implementation of self versioned documents is storage space, since the
entire document is usually wanted in memory. What I'm looking for is
comments on the data structures I've written, and if possible better
ideas on the implementation. I'm not going to go into the detail of each
objects API just yet, or the algorithms used on the structures, so you
have to trust that all the fields are necessary.

Basically a versioned object in glf is a log of values of the versioned
object's data. Each value has a global version associated with it.
Currently my versioned object is stored like this:

struct _GlfVObject {
  int        log_index;
  guint8     *data;
  guint       elt_size;
  GlfVersion *vlog;
  guint       len;
  /* change fields */
  unsigned int changed : 1;
};

I think this is a reasonable structure for a versioned object.

Now here is the versioned object in use:

typedef struct _GlfNode {
  GtkObject parent_object;
  GlfVObject symbol;
  GlfVObject arity;
  GlfVObject length;
  GlfVObject parent;
  GlfVObject children;
  unsigned int nested_changes : 1;
} GlfNode;

As you can see this is a very heavy structureinitially, the size should
grow slowly in theory with user modifications. Assuming the versioned
object itself is about 18 bytes with padding, the full node would be 160
bytes + sizeof(GtkObject) + arity * 4 bytes. One optimization suggested
by the wagner paper is to use a temporary representation at creation,
and use it until a the object is modified. Just for the versioned object
we can have something like this:

struct _GlfTemporaryVObject {
  int        log_index = -1; /* distinguishes it from a real versioned
object */
  guint8     *data;
  guint       elt_size;
};

Using this structure initially in the above node structure we need 80
bytes + sizeof(GtkObject) + arity * 4 bytes. Ass you can see this is a
big improvement for an initial structure.

applying this idea to this idea further to the node we can achieve the
following:

typedef struct _GlfNode {
  GtkObject parent_object;
  int symbol;
  int arity;
  int length;
  GlfNode *parent;
  GlfNode **children;
  unsigned int nested_changes : 1;
} GlfNode;

By contrast this struct takes 24 bytes + sizeof(GtkObject) + arity * 4
bytes. So a huge savings over the first structure, and a good savings
over the above structure. I think this would be default representation
for an initially loaded document, as most of the document is likely to
stay unchanged while editing.

The only problem with using two structures like this is the type
identification, their can't be two node representations. i think we can
solve the problem by have two derived nodes that derive from a common
empty node.

something like this:


                  GlfNode
                  /     \
                 /       \
      GlfStaticNode      GlfVersionedNode

It would be a bit more work in the node API and subsequent derived
objects, but the memory savings would be more than worth it I think.

Anyway, let me know what you think.

Mark




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]