simplifying closures



Hi,

These are some design comments Owen and I talked about (but I wrote it
down, so any mangled ideas are my fault most likely). It's probably a
bit out of the blue for people who haven't seen the current closure
design, so I'll try to summarize that briefly.

A GClosure is an abstract callable object, usually a wrapper around 
a callback function, but in language bindings a wrapper around some
type of native callable object.

The closure stores:
 - a marshaller
 - a callback to be marshalled
 - data to pass to the callback
 - various kinds of notifier callbacks:
   . notification that the closure is destroyed, so the data 
     can be freed (GDestroyNotify)
   . notification that the closure has been "invalidated," which
     means that if the closure is connected to a signal for 
     example, the signal should be disconnected
   . pre-invoke notification to ref the user data before it's passed
     into the callback
   . post-invoke notification to unref the user data after it's
     passed into the callback

The current plan proposed by Tim and Karl is to allow arbitrary
numbers of notification callbacks of each type. You can add multiple
destroy notifiers, multiple invalidation notifiers, etc.

This is implemented as follows: the closure stores an allocated array
of notifiers, with a type tag on each notifier indicating whether it's
an invalid notifier, destroy notifier, etc. The notifiers are sorted
by type.

If you need multiple instances of _any_ notifier type, then, you can
get multiple instances of all of them "for free." There's no penalty
for allowing multiple destroy notifiers, for example, if you need
invalidity notifiers already.

However, if we can get away with only one instance of each notifier
type, then we can lose the whole allocated array, making GClosure
lighter-weight; and we can also simplify the system as a whole, since
you only have to think about one chunk of user data with one destroy
notifier, one invalid notifier, etc., and maybe more importantly each
closure can only be connected to one signal.

This is possible with certain changes to the GClosure design.
Here are those changes:

1) A closure can only be connected to one signal, main loop source,
   or whatever. i.e. only one "connection" per closure.
   This means you only need one invalidity notifier to "disconnect" 
   the closure.

   In the current plan, you can connect one closure to multiple
   signals. There are two arguments for this. 

   One is saving space; by reusing closures, you can have
   fewer. However I think it's clear that closures can only be reused
   about 5% of the time, and if we can avoid multiple notifiers, we
   can save more space the other 95% of the time.

   The second argument is that we want to use closures for the default
   handlers in the class struct. This allows language bindings to
   replace the default handler with a closure, avoiding the
   subclass-and-proxy hack Gtk-- and Inti currently use. I don't
   actually see how this results in more than one connection; the
   object class would own the closure, and invalidating the closure
   would (I guess) remove the default handler.

   Tim argues that you still need an additional invalidity notifier
   regardless of multiple connections because of "alive objects." An
   alive object is user data that disconnects the signal when it's
   destroyed. You would implement this by storing a list of closures
   the alive object is contained within on the alive object. When the
   alive object is destroyed, it will go through the list of closures
   and invalidate each one, disconnecting them. Then you need to 
   remove the closures from the list on the alive object; 
   in Tim's scheme this would be done by an invalidate handler that 
   disconnects the closure from the alive object.

   However this can be alternatively solved as follows: when the 
   alive object is destroyed, for each closure stored on the 
   object, first remove the destroy notifier from the closure,
   then invalidate the closure. Connect a destroy notifier 
   from the closure to the alive object which removes the closure
   from the list on the object. (The basic summary of this 
   approach is: use the destroy notifier to strip the closure 
   from the alive object.)

2) Only one user data object is allowed in the closure. 

   In the current plan, you can have multiple chunks of data, 
   and multiple destroy notifiers.

   It's always possible to get around the need for this by chaining a
   single destroy notifier to a proxy object containing more notifiers
   and multiple data. So it's simply not necessary to have more than
   one chunk of data.

   The only reason you might want multiple chunks of data is the case
   where two unrelated sections of code independently add data/dnotify
   to the closure. In that case, you couldn't create a special proxy
   object which contained the list of data, since both sections 
   of code wouldn't know about the proxy object. However, this case 
   doesn't happen anywhere in GTK+ that I see, and language bindings
   can simply add support for multiple data by creating a proxy object
   on all closures that binding creates.


Adding these two limitations simplifies the closure/signal code, makes
it easier to implement, makes the code easier to maintain and extend
in the future, and makes it easier for people using the API to get
everything right.

The current plan has:

 - one closure tied to multiple connections, which can be
   connections to the main loop, signals, or whatever
 - one closure has multiple data objects, which may be the 
   same as the signal-emitting object for example
 - one data object can be used by multiple closures

Then we try to do an elaborate system of weak references to make the
whole network of connections, closures, objects go away when any of
them go away. All of it has to be reentrant, since closures can be
invalidated or destroyed while in the invocation process or while a
signal is being emitted. Moreover we have an efficiency hit because
closures store this allocated array of callbacks, which is at minimum
4 bytes of malloc overhead plus 8 bytes for an invalidate notifier,
plus the 4 byte pointer to the array in the closure struct.  I'm also
concerned about the interaction of all this with unrelated refcount
and garbage collection code, such as the Python refcount system or
simply an application's own refcount strategies; I can imagine some
really interesting and hard-to-debug refcount cycles in a large
application.

If we simplify it, then we have:

 - each closure corresponds to one connection to main loop or signal
 - each closure has one data object
 - one data object can be used by multiple closures

At that point you have four callbacks in the closure structure
(dnotify/inotify/preinvoke/postinvoke) which comes out to 16 bytes, 
exactly the same as the minimum case with the allocated array. The 
maximum case though is smaller than the allocated array; you still use
only 16 bytes to store pre/post/inotify/dnotify, while the allocated 
array has to add 3 more notifier objects to the array in that case, 
estimated 24 more bytes. 

(BTW, I don't think these byte counts are that important, the main
issue here is system simplicity; but the byte count issue has come up
as a reason to avoid having the callback pointers in the GClosure
struct, rather than dynamically allocating only the ones you need, so
I'm discussing it and pointing out that the simpler version is
superior on this front. I don't think it hurts to avoid the extra
malloc() either, if you have lots of closures e.g. a closure for each
default signal handler...)


So the interesting issue for discussion here is to find an example
case where you _need_ the multiple notifiers. IMHO having them be
_convenient_ is not enough; the issue is whether we _need_ them. Above
I've tried to explain simple ways to avoid needing them in the cases
we've seen so far. The extra complexity is IMHO actively bad, so has
to be justified with the presumption against it.


Comments? Can we make the simplification work?

Havoc






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]