Re: Introspection binary format comments



I have tried to incorporate most of the feedback in the next draft of
the binary metadata format. One question I have not decided yet is 
hashes vs directories: 

- do we want hashes ?
- if yes, where do we want them ? Likely candidates are a hash to 
  lookup blobs listed in the directory, and a hash per interface
  to look up the functions in that interface. With the extensibility
  precautions added in the new draft, we should be able to add hashes
  as they turn out to be necessary.

- are they optional or mandatory ?
- if they are mandatory, do we still need the arrays which they   
  complement ?

Here is the list of changes wrt to the first draft:

- drop comments about _GOBJ_METADATA
- drop string pool, strings can appear anywhere
- use 'blob' as collective name for the various blob types
- rename 'type' field in blobs to 'blob_type'
- rename 'type_name' and 'type_init' fields to 'gtype_name',
'gtype_init'
- shrink directory entries to 12 bytes
- merge struct and boxed blobs
- split interface blobs into enum, object and interface blobs
- add an 'unregistered' flag to struct and enum blobs
- add a 'wraps_vfunc' flag to function blobs and link them to
  the vfuncs they wrap
- restrict value blobs to only occur inside enums and flags again
- add constant blobs, allow them toplevel, in interfaces and in objects
- rename 'receiver_owns_value' and 'receiver_owns_container' to
  'transfer_ownership' and 'transfer_container_ownership'
- add a 'struct_offset' field to virtual function and field blobs
- add 'dipper' and 'optional' flags to arg blobs
- add a 'true_stops_emit' flag to signal blobs
- add variable blob sizes to header
- store offsets to signature blobs instead of including them directly
- change the type offset to be measured in words rather than bytes


Matthias


Version 0.2

Changes since 0.1:

- drop comments about _GOBJ_METADATA
- drop string pool, strings can appear anywhere
- use 'blob' as collective name for the various blob types
- rename 'type' field in blobs to 'blob_type'
- rename 'type_name' and 'type_init' fields to 'gtype_name', 'gtype_init'
- shrink directory entries to 12 bytes 
- merge struct and boxed blobs
- split interface blobs into enum, object and interface blobs
- add an 'unregistered' flag to struct and enum blobs
- add a 'wraps_vfunc' flag to function blobs and link them to 
  the vfuncs they wrap
- restrict value blobs to only occur inside enums and flags again
- add constant blobs, allow them toplevel, in interfaces and in objects
- rename 'receiver_owns_value' and 'receiver_owns_container' to
  'transfer_ownership' and 'transfer_container_ownership'
- add a 'struct_offset' field to virtual function and field blobs
- add 'dipper' and 'optional' flags to arg blobs
- add a 'true_stops_emit' flag to signal blobs
- add variable blob sizes to header
- store offsets to signature blobs instead of including them directly
- change the type offset to be measured in words rather than bytes

GObject binary metadata
-----------------------

The format of GObject metadata is strongly influenced by the Mozilla XPCOM 
format. 

Some of the differences to XPCOM include:
- Type information is stored not quite as compactly (XPCOM stores it inline 
  in function descriptions in variable-sized blobs of 1 to n bytes. We store 
  16 bits of type information for each parameter, which is enough to encode 
  simple types inline. Complex (e.g. recursive) types are stored out of line 
  in a separate list of types.
- String data is stored outside of interface blobs, string references are 
  stored as offsets relative to the start of the metadata. One possibility
  is to store all strings in a string pool at the end of the metadata.

Overview
--------

The metadata has the following general format.

metadata ::= header, directory, blobs, types, annotations

directory ::= list of entries

entry ::= blob type, name, namespace, offset

blob ::= function|callback|struct|boxed|enum|flags|object|interface|constant

annotations ::= list of annotations, sorted by offset 

annotation ::= offset, key, value


Details
-------

We describe the fragments that make up the metadata in the form of C structs 
(although some fall short of being valid C structs since they contain multiple
flexible arrays).

Header (74 bytes)

struct Header 
{
  gchar[16] magic;
  guint8    major_version;
  guint8    minor_version;
  guint16   reserved;

  guint16   n_entries;
  guint16   n_local_entries;
  guint32   directory;
  guint32   types;
  guint32   annotations;

  guint32   size;
  guint32   namespace;
  
  guint16   entry_blob_size;      /* 12 */
  guint16   function_blob_size;   /* 16 */
  guint16   callback_blob_size;   /* 12 */
  guint16   signal_blob_size;     /* 12 */
  guint16   vfunc_blob_size;      /* 16 */
  guint16   arg_blob_size;        /*  8 */
  guint16   property_blob_size;   /*  8 */
  guint16   field_blob_size;      /*  8 */  
  guint16   value_blob_size;      /* 16 */
  guint16   annotation_blob_size; /* 12 */
  guint16   constant_blob_size;   /* 16 */

  guint16   signature_blob_size;  /*  4 */
  guint16   enum_blob_size;       /* 22 */
  guint16   struct_blob_size;     /* 36 */
  guint16   interface_blob_size;  /* 28 */
}

magic:    The string "GOBJ\nMETADATA\r\n\032". This was inspired by XPCOM, 
          which in turn borrowed from PNG.

major_version, 
minor_version: 
          The version of the metadata format. Minor version changes indicate 
          compatible changes and should still allow the metadata to be parsed 
          by a parser designed for the same major_version.
      
n_entries: 
          The number of entries in the directory. 

n_local_entries:
	  The number of entries referring to blobs in this metadata. The
	  local entries must occur before the unresolved entries.

directory: 
          Offset of the directory in the metadata. 
          FIXME: need to specify if and how the directory is sorted

types:    Offset of the list of types in the metadata.

annotations: 
          Offset of the list of annotations in the metadata.

size:     The size of the metadata.

namespace:
          Offset of the namespace string in the metadata. 

entry_blob_size:
function_blob_size:
callback_blob_size:
signal_blob_size:
vfunc_blob_size:
arg_blob_size:
property_blob_size:
field_blob_size:
value_blob_size:
annotation_blob_size:
constant_blob_size:
          The sizes of fixed-size blobs. Recording this information here
          allows to write parser which continue to work if the format is
          extended by adding new fields to the end of the fixed-size blobs.

signature_blob_size: 
enum_blob_size:
struct_blob_size:
interface_blob_size:
          For variable-size blobs, the size of the struct up to the first
          flexible array member. Recording this information here allows to 
          write parser which continue to work if the format is extended by 
          adding new fields before the first flexible array member in 
          variable-size blobs.

Directory entry (12 bytes)

struct DirectoryEntry 
{
  guint16 blob_type;

  guint   is_local : 1;
  guint   reserved :15;

  guint32 name;
  guint32 offset;
}

blob_type:
          The type of blob this entry points to:
          1 function
          2 callback
          3 struct
          4 boxed
          5 enum
          6 flags
          7 object
          8 interface
          9 constant

is_local: Wether this entry refers to a blob in this metadata.

name:     The name of the entry.

offset:   If is_local is set, this is the offset of the blob in the metadata.
          Otherwise, it is the offset of the namespace in which the blob has
          to be looked up by name.


All blobs pointed to by a directory entry start with the same layout for 
the first 8 bytes (the reserved flags may be used by some interface types)

struct InterfacePrefix 
{
  guint16 blob_type; 
  guint   deprecated : 1;
  guint   reserved   :15;
  guint32 name;
}

blob_type: 
          An integer specifying the type of the blob, see DirectoryEntry 
          for details.

deprecated: 
          Wether the blob is deprecated.

name:     The name of the blob.


The SignatureBlob is shared between Functions, 
Callbacks, Signals and VirtualFunctions.

SignatureBlob (4 + 8 * n_arguments bytes)

struct SignatureBlob 
{
  SimpleTypeBlob    return_type;

  guint             may_return_null              : 1;
  guint             caller_owns_return_value     : 1;
  guint             caller_owns_return_container : 1;
  guint             reserved                     : 5;

  guint8            n_arguments;
  ArgBlob[]         arguments;
}
 

return_type:
          Describes the type of the return value. See details below.

may_return_null:
          Only relevant for pointer types. Indicates whether the caller
          must expect NULL as a return value.

caller_owns_return_value:
          If set, the caller is responsible for freeing the return value
          if it is no longer needed.

caller_owns_return_container:
          This flag is only relevant if the return type is a container type.
          If the flag is set, the caller is resonsible for freeing the 
          container, but not its contents.
                     
n_arguments:
          The number of arguments that this function expects, also the length 
          of the array of ArgBlobs.

arguments: 
          An array of ArgBlob for the arguments of the function.


FunctionBlob (16 bytes)

struct FunctionBlob 
{
  guint16 blob_type;  /* 1 */

  guint   deprecated     : 1;
  guint   is_setter      : 1; 
  guint   is_getter      : 1;
  guint   is_constructor : 1;
  guint   wraps_vfunc    : 1;
  guint   reserved       : 1;
  guint   index          :10;

  guint32 name;
  guint32 c_name;
  guint32 signature;
}

c_name:   The symbol which can be used to obtain the function pointer with 
          dlsym().

deprecated
          The function is deprecated.

is_setter
          The function is a setter for a property. Language bindings may 
          prefer to not bind individual setters and rely on the generic 
          g_object_set().

is_getter
          The function is a getter for a property. Language bindings may 
          prefer to not bind individual getters and rely on the generic 
          g_object_get().

is_constructor
          The function acts as a constructor for the object it is contained 
          in. 

wraps_vfunc: 
          The function is a simple wrapper for a virtual function.

index:    Index of the property that this function is a setter or getter of 
          in the array of properties of the containing interface, or index
          of the virtual function that this function wraps.

signature: 
          Offset of the SignatureBlob describing the parameter types and the 
          return value type.


CallbackBlob (12 bytes)

struct CallbackBlob 
{
  guint16 blob_type;  /* 2 */

  guint   deprecated : 1;
  guint   reserved   :15;

  guint32 name;
  guint32 signature;
}

signature: 
          Offset of the SignatureBlob describing the parameter types and the 
          return value type.


ArgBlob (8 bytes)

struct ArgBlob 
{
  guint32 name;

  guint          in                           : 1;
  guint          out                          : 1;
  guint          dipper                       : 1;
  guint          null_ok                      : 1;
  guint          optional                     : 1;
  guint          transfer_ownership           : 1;
  guint          transfer_container_ownership : 1;
  guint          is_return_value              : 1;
  guint          reserved                     : 8:

  SimpleTypeBlob arg_type;
}

name:     A suggested name for the parameter. 

in:       The parameter is an input to the function

out:      The parameter is used to return an output of the function. 
          Parameters can be both in and out. Out parameters implicitly 
          add another level of indirection to the parameter type. Ie if 
          the type is uint32 in an out parameter, the function actually 
          takes an uint32*.

dipper:   The parameter is a pointer to a struct or object that will 
          receive an output of the function. 

null_ok:  Only meaningful for types which are passed as pointers.
          For an in parameter, indicates if it is ok to pass NULL in, for 
          an out parameter, wether it may return NULL. Note that NULL is a 
          valid GList and GSList value, thus null_ok will normally be set for 
          parameters of these types.

optional:          
          For an out parameter, indicates that NULL may be passed in
          if the value is not needed.

transfer_ownership:
          For an in parameter, indicates that the function takes over 
          ownership of the parameter value. For an out parameter, it 
          indicates that the caller is responsible for freeing the return 
          value.

transfer_container_ownership:
          For container types, indicates that the ownership of the container, 
          but not of its contents is transferred. This is typically the case 
          for out parameters returning lists of statically allocated things.

is_return_value:
          The parameter should be considered the return value of the function. 
          Only out parameters can be marked as return value, and there can be 
          at most one per function call. If an out parameter is marked as 
          return value, the actual return value of the function should be 
          either void or a boolean indicating the success of the call.
  
arg_type:
          Describes the type of the parameter. See details below.


Types are specified by two bytes. If the high byte is zero, the low byte 
describes a basic type, otherwise the 16bit number is an offset relative to 
header->types and points to a TypeBlob. 


SimpleTypeBlob (2 bytes)

union SimpleTypeBlob 
{
  struct 
  {
    guint8 reserved;      /* 0 */
    guint  is_pointer :1;
    guint  reserved   :2;
    guint  tag        :5;
  };
  guint16 offset;
}

is_pointer: 
          indicates wether the type is passed by reference. 

tag:      specifies what kind of type is described, as follows:
          0  void
          1  boolean (booleans are passed as ints)
          2  int8
          3  uint8
          4  int16
          5  uint16
          6  int32
          7  uint32
          8  int32
          9 int64
         10 uint64
         11 float
         12 double
         13 string  (these are zero-terminated char* and assumed to be 
                     in UTF-8, for other data, use uint8[])
         14 GString 

         For string and GString, is_pointer will always be set.

offset:  Offset relative to header->types that points to a TypeBlob. 
         Unlike other offsets, this is in words (ie 32bit units) rather
         than bytes.


TypeBlob (4 or more bytes)

union TypeBlob
{
  ArrayTypeBlob        array_type;
  InterfaceTypeBlob    interface_type;
  OneParameterTypeBlob one_parameter_type;  
  TwoParameterTypeBlob two_parameter_type;  
  ErrorTypeBlob        error_type;  
}


ArrayTypeBlob (4 bytes)

Arrays have a tag value of 20. They are passed by reference, thus is_pointer 
is always 1.

struct ArrayTypeBlob 
{
  guint          is_pointer      :1; /* 1 */
  guint          reserved        :2;
  guint          tag             :5; /* 20 */
  guint          zero_terminated :1;
  guint          has_length      :1;
  guint          length          :6;  

  SimpleTypeBlob type;
}

zero_terminated: 
          Indicates that the array must be terminated by a suitable NULL 
          value. 

has_length: 
          Indicates that length points to a parameter specifying the length 
          of the array. If both has_length and zero_terminated are set, the 
          convention is to pass -1 for the length if the array is 
          zero-terminated. 
          FIXME: what does this mean for types of field and properties ?

length:   The index of the parameter which is used to pass the length of the 
          array. The parameter must be an integer type and have the same 
          direction as this one. 

type:     The type of the array elements.


InterfaceTypeBlob (4 bytes)

struct InterfaceTypeBlob 
{
  guint   is_pointer :1; 
  guint   reserved   :2;
  guint   tag        :5; /* 21 */
  guint8  reserved;

  guint16 interface;
}

Types which are described by an entry in the metadata have a tag value of 21. 
If the interface is an enum of flags type, is_pointer is 0, otherwise it is 1.

interface: 
          Index of the directory entry for the interface.
    

OneParameterTypeBlob (4 bytes)

GLists have a tag value of 22, GSLists have a tag value of 23. They are passed 
by reference, thus is_pointer is always 1.

struct OneParameterTypeBlob 
{
  guint          is_pointer :1; /* 1 */
  guint          reserved   :2;
  guint          tag        :5; /* 22 or 23 */
  guint8         reserved;

  SimpleTypeBlob type;
}

type:     Describes the type of the list elements.


TwoParameterTypeBlob (8 bytes)

GHashTables have a tag value of 24. They are passed by reference, thus 
is_pointer is always 1.

struct TwoParameterTypeBlob 
{
  guint          is_pointer :1; /* 1 */
  guint          reserved   :2;
  guint          tag        :5; /* 24 */
  guint8         reserved;

  guint16        reserved;

  SimpleTypeBlob type1;
  SimpleTypeBlob type2;
}

type1:    Describes the type of the keys in the table. This will most commonly
          be int or string.

type2:    Describes the type of the value in the table. This may be void*.


ErrorTypeBlob (4 + 4 * n_domains bytes)

struct ErrorTypeBlob
{
  guint           is_pointer :1; /* 1 */
  guint           reserved   :2;
  guint           tag        :5; /* 25 */
  
  guint8          reserved;

  guint16         n_domains;

  guint32         domains[];
}

n_domains:
          The number of domains to follow

domains:  Offsets of ErrorDomainBlobs describing the possible error domains.


ErrorDomainBlob (10 bytes)

struct ErrorDomainBlob
{
  guint32        name;
  guint32        get_quark;
  guint16        error_codes;
}

name:     The name of the error domain

get_quark:
          The symbol name of the function which must be called to obtain the 
          GQuark for the error domain.

error_codes:
          Index of the InterfaceBlob describing the enumeration which lists
          the possible error codes.


PropertyBlob (8 bytes)

struct PropertyBlob
{
  guint32        name;

  guint          deprecated     :1;
  guint          readable       :1;
  guint          writable       :1;
  guint          construct      :1;
  guint          construct_only :1;
  guint          reserved       :11;

  SimpleTypeBlob type;
}

name:     The name of the property. 

readable:
writable: 
construct: 
construct_only: 
          The ParamFlags used when registering the property.

type:     Describes the type of the property.


SignalBlob (12 bytes)

struct SignalBlob 
{
  guint32 name;

  guint   deprecated        : 1;
  guint   run_first         : 1;
  guint   run_last          : 1;
  guint   run_cleanup       : 1;
  guint   no_recurse        : 1;
  guint   detailed          : 1;
  guint   action            : 1;
  guint   no_hooks          : 1;
  guint   has_class_closure : 1;
  guint   true_stops_emit   : 1;
  guint   reserved          : 5;

  guint16 class_closure;
  guint32 signature; 
}

name:     The name of the signal.

run_first:
run_last:
run_cleanup:
no_recurse:
detailed:
action:
no_hooks: The flags used when registering the signal.

has_class_closure: 
          Set if the signal has a class closure.

true_stops_emit:
          Wether the signal has true-stops-emit semantics          

class_closure: 
          The index of the class closure in the list of virtual functions
          of the interface on which the signal is defined.

signature: 
          Offset of the SignatureBlob describing the parameter types and the 
          return value type.


VirtualFunctionBlob (16 bytes)

struct VirtualFunctionBlob 
{
  guint32 name;

  guint   must_chain_up           : 1;
  guint   must_be_implemented     : 1;
  guint   must_not_be_implemented : 1;
  guint   is_class_closure        : 1;
  guint   reserved                :12;

  guint16 signal;
  guint16 struct_offset;
  guint16 reserved;  
  guint32 signature;
}

name:     The name of the virtual function.

must_chain_up:
          If set, every implementation of this virtual function must
          chain up to the implementation of the parent class. 

must_be_implemented:
          If set, every derived class must override this virtual function.

must_not_be_implemented:
          If set, derived class must not override this virtual function.

is_class_closure:
          Set if this virtual function is the class closure of a signal.

signal: 
          The index of the signal in the list of signals of the interface 
          to which this virtual function belongs.

struct_offset:
          The offset of the function pointer in the class struct.

signature: 
          Offset of the SignatureBlob describing the parameter types and the 
          return value type.


FieldBlob (8 bytes)

struct FieldBlob 
{
  guint32        name;

  guint          readable : 1; 
  guint          writable : 1;
  guint          reserved : 6;
  guint8         bits;
  
  guint16        struct_offset;      
  guint16        reserved;
	
  SimpleTypeBlob type;
}

name:     The name of the field.

readable:
writable: How the field may be accessed.

bits:     If this field is part of a bitfield, the number of bits which it
          uses, otherwise 0.

struct_offset:
          The offset of the field in the struct.

type:     The type of the field.


ValueBlob (16 bytes)

Values commonly occur in enums and flags, but we also allow them to occur
in interfaces or freestanding, to describe constants.

struct ValueBlob
{
  guint   deprecated : 1;
  guint   reserved   :31;
  guint32 name;

  guint32 short_name;
  guint32 value;
}

short_name: 
          A short name for the value;

value:    The numerical value;


GTypeBlob (8 bytes)

struct GTypeBlob 
{
  guint32 gtype_name;
  guint32 gtype_init;
}

gtype_name: 
          The name under which the interface is registered with GType.

gtype_init:
          The symbol name of the get_type() function which registers the type.


StructBlob (12 + 8 * n_fields + x * n_functions)

struct StructBlob 
{
  guint16      blob_type; /* 3: struct, 4: boxed */
  guint        deprecated   : 1;
  guint        unregistered : 1;
  guint        reserved     :14;
  guint32      name;

  GTypeBlob    gtype;

  guint16      n_fields;
  guint16      n_functions;

  FieldBlob    fields[];   
  FunctionBlob functions[];  
}

unregistered: 
          If this is set, the type is not registered with GType.

gtype:    For types which are registered with GType, contains the 
          information about the GType. Otherwise unused.

n_fields: 
n_functions: 
          The lengths of the arrays.

fields:   An array of n_fields FieldBlobs. 

functions:
          An array of n_functions FunctionBlobs. The described functions 
          should be considered as methods of the struct. 


EnumBlob (22 + 16 * n_values)

struct EnumBlob
{
  guint16   blob_type;  /* 5: enum, 6: flags */
  guint     deprecated   : 1; 
  guint     unregistered : 1;
  guint     reserved     :14;
  guint32   name; 

  GTypeBlob gtype;
  guint16   n_values;
  guint16   reserved;

  ValueBlob values[];    
}

unregistered: 
          If this is set, the type is not registered with GType.

gtype:    For types which are registered with GType, contains the 
          information about the GType. Otherwise unused.

n_values:
          The lengths of the values arrays.

values:   Describes the enum values. 


ObjectBlob (36 + x bytes)

struct ObjectBlob
{
  guint16 blob_type;  /* 7 */
  guint   deprecated   : 1; 
  guint   reserved     :15;
  guint32 name; 

  GTypeBlob gtype;

  guint16 parent;

  guint16 n_interfaces;
  guint16 n_fields;
  guint16 n_properties;
  guint16 n_methods;
  guint16 n_signals;
  guint16 n_virtual_functions;
  guint16 n_constants;
  guint16 reserved;

  guint16 interfaces[];
 
  FieldBlob           fields[];
  PropertyBlob        properties[];
  FunctionBlob        methods[];
  SignalBlob          signals[];
  VirtualFunctionBlob virtual_functions[];
  ConstantBlob        constants;
} 

gtype:    Contains the information about the GType.

parent:   The directory index of the parent interface. This is only set for 
          objects.

n_interfaces:
n_fields: 
n_properties:
n_methods:
n_signals:
n_virtual_functions:
n_constants:
          The lengths of the arrays.

Up to 16bits of padding may be inserted between the arrays to ensure that they
start on a 32bit boundary.

interfaces:
          An array of indices of directory entries for the implemented 
          interfaces.

fields:   Describes the fields. 

functions: 
          Describes the methods, constructors, setters and getters. 

properties:
          Describes the properties. 

signals:  Describes the signals. 

virtual_functions:
          Describes the virtual functions. 

constants:
          Describes the constants.


InterfaceBlob (28 + x bytes)

struct InterfaceBlob
{
  guint16 blob_type;  /* 8 */
  guint   deprecated   : 1; 
  guint   reserved     :15;
  guint32 name; 

  GTypeBlob gtype;

  guint16 n_prerequisites;
  guint16 n_properties;
  guint16 n_methods;
  guint16 n_signals;
  guint16 n_virtual_functions;
  guint16 n_constants;  

  guint16 prerequisites[];
 
  PropertyBlob        properties[];
  FunctionBlob        methods[];
  SignalBlob          signals[];
  VirtualFunctionBlob virtual_functions[];
  ConstantBlob        constants[];
} 

n_prerequisites:
n_properties:
n_methods:
n_signals:
n_virtual_functions:
n_constants:
          The lengths of the arrays.

Up to 16bits of padding may be inserted between the arrays to ensure that they
start on a 32bit boundary.

prerequisites:
          An array of indices of directory entries for required interfaces.

functions: 
          Describes the methods, constructors, setters and getters. 

properties:
          Describes the properties. 

signals:  Describes the signals. 

virtual_functions:
          Describes the virtual functions. 

constants:
          Describes the constants.


ConstantBlob (16 bytes)

struct ConstantBlob
{
  guint16        blob_type;  /* 9 */
  guint          deprecated   : 1; 
  guint          reserved     :15;
  guint32        name; 

  SimpleTypeBlob type;
  guint16        size;
  guint32        offset;
}

type:     The type of the value. In most cases this should be a numeric
          type or string.

size:     The size of the value in bytes.

offset:   The offset of the value in the metadata.


AnnotationBlob (12 bytes)

struct AnnotationBlob
{ 
  guint32 offset;
  guint32 name;
  guint32 value;
}

offset:   The offset of the interface to which this annotation refers. 
          Annotations are kept sorted by offset, so that the annotations 
          of an interface can be found by a binary search.

name:     The name of the annotation, a string.

value:    The value of the annotation (also a string)






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]