Re: Introspection binary format comments

From: "Gustavo J. A. M. Carneiro" <gjc inescporto pt>
To: Owen Taylor <otaylor redhat com>
Cc: gtk-devel-list gnome org, Matthias Clasen <mclasen redhat com>
Subject: Re: Introspection binary format comments
Date: Mon, 28 Feb 2005 00:26:51 +0000

On Tue, 2005-03-01 at 01:21 -0500, Owen Taylor wrote:
>On Sun, 2005-02-27 at 15:23 +0000, Gustavo J. A. M. Carneiro wrote:
>> On Sat, 2005-02-26 at 12:21 -0500, Matthias Clasen wrote:
>> >I have tried to incorporate most of the feedback in the next draft of
>> >the binary metadata format. One question I have not decided yet is 
>> >hashes vs directories: 
>> >
>> >- do we want hashes ?
>> >- if yes, where do we want them ? Likely candidates are a hash to 
>> >  lookup blobs listed in the directory, and a hash per interface
>> >  to look up the functions in that interface. With the extensibility
>> >  precautions added in the new draft, we should be able to add hashes
>> >  as they turn out to be necessary.
>> 
>>   An interesting idea is to have lazy hashes.  The hash table would be
>> initially empty.  When a blob is requested by name, glib would:
>> 	1. lookup the name in the hash table
>> 	2. if lookup succeeds:
>> 	2.a    return blob
>> 	3. do linear scan for the blob
>> 	4. add name->blob to hash table
>> 	5. return blob
>> 
>>   This would save on memory (since hash tables cannot be shared between
>> multiple processes, unlike the blobs) but still provide adequate speed
>> for frequently accessed items.
>
>If you are going to reserve space in the blob for the hash table, you
>might as well  just put the hash table there :-). (It's actually pretty
>hard to share stuff between processes unless it's read-only.)

  I actually thought this whole introspection metadata was to be
mmapp'ed read-only and shared between processes.  I guess sharing this
is not that important if it is not that big.  I feel that anything below
100K is not worth sharing.

>
>>   One more thing, perhaps it pays off to have a single hash table for
>> interfaces types + methods.   Instead of a tree of hash tables, we could
>> have a flat hash table, with name mangling, eg. GtkLabel::set_text.
>> Personally I'm not 100% sure this is a good idea, but I leave the
>> thought for consideration...
>
>The question here is whether looking up a method without having the
>InterfaceBlob is going to be common. If so, then what you are proposing
>is more efficient, since it only involves one hash table lookup. But
>if that isn't a common lookup, then you nteed to spend time making
>strings and hashing them. (The table could, of course, be hashed with
>a non-single-string hash table, but that again is additional complexity)

  I honestly don't know if this is going to be a common scenario.  But
if we leave the option open without too much additional
effort/complexity, that would be great.

  Let me explain why the scenario might be important.  It's all about
speed/memory trade-offs, as always.  Giving the example of the python
bindings, we could have two approaches:

	1. the "speed" option:  when a library is loaded, we walk all object
types, create PyTypeObject's for each GObject type, and each type we add
method objects, methods which are of a special callable python type
whose instance C structure holds a pointer to the InterfaceBlob.

	2. the "memory" option: when a library is loaded, we create a special
module object, derived from PyModule_Type, which contains an attribute
getter (tp_getattro slot) that looks up, in runtime, types.  Hence the
need for the hash table here.  So types aren't registered at module
import, they are created on demand, as they're requested from the
module.  A similar approach could be taken for loading methods on demand
for each object.

  Obviously option 1 brings greater execution speed, since no hash table
lookups are ever needed.  On the other hand, it consumes much more
memory, and increases module import time.

  Option 2 consumes much less memory, but is slower, although it
provides better import times.

  Nonetheless, option 1 is much more simple to implement and less error
prone, more robust, etc., so that's the more likely to be implemented
first.  However, for embedded devices, option 2 could prove invaluable.
Also, I recall that, in C, types are registered on a as-needed basis
too, by calling gtk_xxx_get_type().

  I think this answers Matthias' initial question "do we want hashes?".
The answer is just a definite maybe, IMHO.

  Regards.

-- 
Gustavo J. A. M. Carneiro
<gjc inescporto pt> <gustavo users sourceforge net>
The universe is always one step beyond logic

Follow-Ups:
- Re: Introspection binary format comments
  - From: muppet

References:
- Introspection binary format comments
  - From: Owen Taylor
- Re: Introspection binary format comments
  - From: Matthias Clasen
- Re: Introspection binary format comments
  - From: Gustavo J. A. M. Carneiro
- Re: Introspection binary format comments
  - From: Owen Taylor

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]