Re: GVariant for prez!



Hi,

On Sun, Apr 12, 2009 at 9:14 AM, Christian Dywan <christian imendio com> wrote:
> You are asserting that something like a "gint" or "guint" is not
> something that can be saved to disk.

I'm not saying that; I'm saying they can only be saved to disk by
converting them to a fixed-size integer.

"int" is a bad example because for all machines glib runs on, int is
equal to int32.

But let's look at "long"; "long" means different things on different
machines. This makes it totally broken to save "long" to disk or push
it over the network. It's OK to autoconvert "long" to "int32" ... but
there's no meaningful "long" type once we're serialized, imo.

> That said, it is all too common, for better or worse, to work with
> numbers of unknown physical size, including storage to disk.

It's impossible to do this, imo. Once you save to disk the physical
size is "locked in"

Yes you can write a super-broken program that saves to disk at a
different size depending on the cpu it's running on, but that
super-broken program is still going to be picking either int32 or
int64 when it goes to save the long.

Basically, "typedefs" don't mean anything once you're serialized. A
serialized format could support time_t, size_t, long, ssize_t, pid_t,
etc. etc. (this goes on forever); or it could support int32, int64,
uint32, uint64 and that's it. I would advocate the second; have the
type system of a serialized format describe the actual binary data
types, not every way those binary data types can be interpreted or
mapped into language types.

This is a pretty fundamental decision, I went into more in my other
mail, with two slightly different angles on it or ways to frame it.

a) union of all type systems (GType, python, JavaScript, Java, etc.)
or rough intersection of primitive types they have in common
b) round-trip full "native" (GType, python, JS, Java, etc.) type
knowledge through the serialization, or preserve the binary storage
format only

For disk formats and ipc formats, I think intersection / binary format
only is the right way to lean, rather than
union/full-type-annotation-knowledge.

Something like Python pickle or Java serialization round-trip the full
native type knowledge, but they are completely specific to the
particular native type system. Reading a python pickle file or Java
serialization from another language is a headache at best. The upside
to the pickle/Java-serialization approach is total convenience when
working in that language; these formats are really good for quick
hacks.

If you add to something like the dbus format the ability to add custom
type annotation, so you could say "this int32 is really G_TYPE_LONG"
or "this int64 is really time_t" then what you're doing in effect is
creating a dialect of the serialization format for each language
runtime or framework, most likely.... basically sharing code among a
collection of language/framework-specific python-pickle-like systems.

For the two use-cases at hand here (dbus, mmap'd cache files), being
extremely GType-specific seems bad, for the same reasons that Python
pickle or Java serialization would be poor choices.

There's nothing wrong with supporting autoconversion of native types
to serializable types. So for example if you store G_TYPE_LONG or
G_TYPE_INT, those could be automatically serialized as INT32. However,
I don't think you should get those types back out when you read;
reading the file back should give you INT32 only. And LONG should go
to INT32 consistently, regardless of the CPU it's running on. The
round-trip should not be supported, in short.

The other thing one could do is build a GType-based pickle-like
feature *on top* of a more interoperable binary format. What you'd do
in this case is that you would store a GType name as a string, plus a
GVariant value. You could then serialize/unserialize GTypes with full
information preserved through the round-trip.

That illustrates really well, perhaps, how there are two layers here.
There's a binary format and a type system that describes the *format*;
and then there's the type system of the language runtime or framework,
which could be Python, GType, Java, or whatever.

Havoc


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]