Re: Serialization in libgobject

From: "Andrew Paprocki" <andrew ishiboo com>
To: timj imendio com
Cc: gtk-devel-list gnome org
Subject: Re: Serialization in libgobject
Date: Wed, 24 May 2006 11:09:34 -0400 (EDT)
I'll outline the implementation of a real-world generic GObject serialization/deserialization system that we wrote from scratch to handle these issues.

> a) the GType system is not self-contained enough to recursively (de-)serialize
>     nested structures and objects (e.g. POINTER or BOXED types).

We added the notion of GAttributes which can be placed on the class, property, or signal level and work almost exactly like .NET attributes. They are lazily created only at access time, so an object does not instantiate every attached attribute when it is created, only when an external piece of code explicitly asks for the attributes. We have custom serialization/deserialization attributes that are meant for situations such as POINTER/BOXED where the type system itself can not infer what to do. While this is possible, we shy away from making these types of properties in general.

> b.1) should the target stream contain binary or text representations of the
>       values?

We use XML for representing the serialized data, and routinely serialize/deserialize from files/database records. As for transmission on the wire, I would suggest some other available tool be used to translate the serialized XML into, say, binary XML.
(http://www.w3.org/TR/wbxml/ ?)

> b.2) how are incompatibilities between object types/implementations and the
>       file format going to be handled?

All of our serialized XML data is versioned at the file and the object class level. I say object class, because a serialized GtkLabel could be using GtkLabel "v1" but GtkWidget "v2". We created a singleton registrar that allows classes to register GType+version to/from conversion routines. An API is provided so that inside the conversion routines, classes can easily transform properties. For example, in a conversion routine to go from "v1" to "v2", I could introspect for a property named "foobar", pull it out of the destination, and instead insert two separate "foo" and "bar" properties with the appropriate values. This system is extremely flexible and allows for correction of any API "mistakes" that later need to be fixed. 

Future library releases can advertise "GtkWidget v1 is no longer supported" and it should be treated the same as an API change. Objects are responsible for maintaining their version number. In the same place in class_init where the objects register any necessary converters, that object sets its current version. Thus, bumping version numbers is a code change and should be treated as an API change. Object-level versions _only_ need to be bumped when an incompatible change is being made, and _only_ when converters are necessary. By default, no object _needs_ converters because introspection will appropriately map everything. 

Our system is lax in the sense that class names can be changed without requiring extensive converters. I could simply map "GtkLabel" to "GtkNewLabel", change GtkLabel's class name, and bump its version number without providing converters, and the deserializer will automatically map properties to ones of the same name in the new class if they exist. (Not that this is ever really used, but it has proved invaluable in the past for getting out of some sticky situations.)

While all this flexibility exists, even class-level versions very rarely bump. I believe in the past year, we've only needed to bump 2 class-level versions and provide converters. The point is that this is not as scary as it sounds, if you are thinking the code will bloat with hundreds of converters all over the place.

> b.3) how's file versioning going to be handled?

As I said above, we have a global file-level version as well as object class versions. To provide a real-world example, we have been serializing GObjects for years now and our file-level version is still 1. IMO, the file-level version is only a nice-to-have in case the format is ever going to be moved away from XML to a better (?) format in the future.

> b.4) how is data being handled that isn't reflected by the object property
>       interface?

We try to avoid this whenever possible because of the problem it poses to deserialization, but we finally bit the bullet and made a system to handle this. We have an attribute that we can place on the property level which allows us to modify the deserialization order of the properties. Not pretty, but it works fine.

> b.5) how is storage of defaulting properties handled?

We handle this in two ways. We only save non-default values "by default". At deserialization time, an instance of an object is created via the type system and the properties of the current object are compared against the "default" object instance. Anything that is different is automatically serialized. To tweak this behavior, we have an attribute that can be placed on the property level to force it to always serialize without comparing against the "default" value. Only one object instance of any given type exists at a time during serialization and this is very fast (even though it sounds like it might not be). 

> b.6) how are properties classified to distinguish between ones that are used
>       at the GUI, as programming interface or to reflect serializable object
>       state, and any combinations thereof?

We handle this by attaching appropriate attributes to properties that we need to classify a certain way, and then we can instruct the serializer to "only serialize properties on a class if they have an attribute of GType <type here>". This way, the same system allows to handle "deep copy" as well as some other exception cases.

> b.7) how are object and structure pointers being handled?
>       - can/should they be saved by reference or recursively by value?
>       - and when restoring, factories and lookup mechanisms are needed to
>         resolve references. also, when restoring circular references on object
>         trees, properties can not anymore be restored in order.

We don't provide a system at the moment for serializing by reference, but individual objects can do it freely by using our custom serializer/deserializer attributes on the class level. Inside the custom functions, a singleton object coordinator can be used to provide "reference ids" if a by-reference object has already been serialized in the document. These ids will be resolved by the same singleton in the custom deserializer. Right now no one here does this, but we will need to provide this generic coordinator because we currently have objects that need by-reference serialization.

> the border line is, serialization/deserialisation is far from being esily 
> solved generically. and any non-generic implementation of this should better
> make sure to define exact usage cases to cover, otherwise it will fall short
> on most applications and overdo central abstractions.

I don't believe it is that difficult, it is just a sub-system that needs to be planned out with care. Our entire system has no pieces that depend on anything other than GObject and our GAttribute class and we have not come across any case that is not handled (aside from the lacking by-reference objects, which we can easily implement a solution for).

In short, serialization/deserialization is not something you want to tackle without the power and flexibility of .NET style class/property/signal attributes. Attributes let you achieve nearly anything you desire by tagging introspectable metadata on introspectable pieces of GObject. The attributes themselves support attributes, so you can nest them indefinetely if you have a situation that warrants it. (Just like in .NET) Many articles/books on .NET attributes exist, but here is a good description if anyone is interested:
http://www.csharphelp.com/archives3/archive558.html

Andrew Paprocki
Bloomberg LP
References:
- Serialization in libgobject
  - From: Tim Janik
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]