Re: simplifying closures



Your message seemed kind of fragmented, such that is is a little hard
to respond.   Sorry if the response seems a little scattered.


> The closure stores:
>  - a marshaller
>  - a callback to be marshalled
>  - data to pass to the callback
>  - various kinds of notifier callbacks:
>    . notification that the closure is destroyed, so the data 
>      can be freed (GDestroyNotify)
>    . notification that the closure has been "invalidated," which
>      means that if the closure is connected to a signal for 
>      example, the signal should be disconnected
>    . pre-invoke notification to ref the user data before it's passed
>      into the callback
>    . post-invoke notification to unref the user data after it's
>      passed into the callback

I believe some of these things are only present in derived
closures, but basically correct.


> The current plan proposed by Tim and Karl is to allow arbitrary
> numbers of notification callbacks of each type. You can add multiple
> destroy notifiers, multiple invalidation notifiers, etc.
> 
> This is implemented as follows: the closure stores an allocated array
> of notifiers, with a type tag on each notifier indicating whether it's
> an invalid notifier, destroy notifier, etc. The notifiers are sorted
> by type.

Lets look at this in the most general terms.

The actual implementation is not really an issue.  The point is that it
should have 2 types of destructors.  One type is "invalid", meaning
to send to all interested parties and tell them that this closure is 
no longer callable.  The second is "free", to remove the resources which are
stored in the closure itself when the reference count of the closure
reaches zero.

The things which call the "invalid" list are the objects pointed to
by the closure.  There may be many of them.  

The things on the "invalid" list are 
  - removing the dependencies if multiple objects can invalidate the
    closure. (example two objects are in a structure and when 
    either of them are removed the closure should not be called.)
  - the parent (or parents) of the closure which may need immediate
    action to save resources.  (example: a timeout routine needs to 
    clean up in the event that the closure becomes invalid because 
    when the timeout gets called the closure won't be there.)

These functions are supplied at different times (one by the creator of
the closure and one by user or the closure) and thus unless you want to 
make them chain, they need to be stored separately.

(chaining is a major pain.  You can see chaining used in a number
of places in gtk+ currently where the user asked for a connection
and we instead create another connection to proxy over to the 
first function.) 

 
> If you need multiple instances of _any_ notifier type, then, you can
> get multiple instances of all of them "for free." There's no penalty
> for allowing multiple destroy notifiers, for example, if you need
> invalidity notifiers already.

The point in the implementation is "caveman counting".  A caveman 
sees no objects, one object, many objects.  That is that if you 
are going to require more than one routine you might as well pay
the cost for as many as needed.  Considering the cost of 
two and many is approximately the same this makes sense.

 
> However, if we can get away with only one instance of each notifier
> type, then we can lose the whole allocated array, making GClosure
> lighter-weight; and we can also simplify the system as a whole, since
> you only have to think about one chunk of user data with one destroy
> notifier, one invalid notifier, etc., and maybe more importantly each
> closure can only be connected to one signal.
> 
> This is possible with certain changes to the GClosure design.
> Here are those changes:
> 
> 1) A closure can only be connected to one signal, main loop source,
>    or whatever. i.e. only one "connection" per closure.
>    This means you only need one invalidity notifier to "disconnect" 
>    the closure.

This is bad because there are multiple partners which need to be
informed in the case of multiple pointed to objects or multiple
parents.


>    In the current plan, you can connect one closure to multiple
>    signals. There are two arguments for this. 
> 
>    One is saving space; by reusing closures, you can have
>    fewer. However I think it's clear that closures can only be reused
>    about 5% of the time, and if we can avoid multiple notifiers, we
>    can save more space the other 95% of the time.

What numbers to you have to support this?  

Assuming that to create multiple owners in a system which supports
only one costs 3 times more (2 proxies to the real callback plus
the original callback), what must the savings be such that the
5% is should just be a special case.
 
   (0.05)*2*x + (0.95)*x < 1

x must be at least a 10% savings.
  

This was not the only reason for reuse of closures.  A classic example
of reuse of closures is systems like MDI where an item is used over and
over repeatedly be the system.  Everytime the info is used to build a
menu the closure gets another connection.  The user may not know that 
the closure is being used more than once.  I believe having this
ability is likely going to be important.


 
>    Tim argues that you still need an additional invalidity notifier
>    regardless of multiple connections because of "alive objects." An
>    alive object is user data that disconnects the signal when it's
>    destroyed. You would implement this by storing a list of closures
>    the alive object is contained within on the alive object. When the
>    alive object is destroyed, it will go through the list of closures
>    and invalidate each one, disconnecting them. Then you need to 
>    remove the closures from the list on the alive object; 
>    in Tim's scheme this would be done by an invalidate handler that 
>    disconnects the closure from the alive object.
  
>    However this can be alternatively solved as follows: when the 
>    alive object is destroyed, for each closure stored on the 
>    object, first remove the destroy notifier from the closure,
>    then invalidate the closure. Connect a destroy notifier 
>    from the closure to the alive object which removes the closure
>    from the list on the object. (The basic summary of this 
>    approach is: use the destroy notifier to strip the closure 
>    from the alive object.)

I am not sure which you mean be the destroy notifier here.
The destroy notifier is often a call to g_free to free
the structure pointed to.  

How does this handle the case when 2 object have a peice of
data which points to the closure to tell the closure to 
invalidate

    A
     \ 
      C
     /
    B

How does the A going away remove B's invalid call?


> 2) Only one user data object is allowed in the closure. 
> 
>    In the current plan, you can have multiple chunks of data, 
>    and multiple destroy notifiers.
> 
>    It's always possible to get around the need for this by chaining a
>    single destroy notifier to a proxy object containing more notifiers
>    and multiple data. So it's simply not necessary to have more than
>    one chunk of data.

Chaining is error prone.  It means that each time you connect you
must store the old routine.  Where do you plan to store it?  
In other words you just pushed to problem to the user to store
where before we stored it in our lists.  In other words, you
are going to have to add some sort of data list to store the chains.
(Note, this is what libsigc++ does.  It uses chaining and has to
provide a chaining list to store its data.)


>    The only reason you might want multiple chunks of data is the case
>    where two unrelated sections of code independently add data/dnotify
>    to the closure. In that case, you couldn't create a special proxy
>    object which contained the list of data, since both sections 
>    of code wouldn't know about the proxy object. However, this case 
>    doesn't happen anywhere in GTK+ that I see, and language bindings
>    can simply add support for multiple data by creating a proxy object
>    on all closures that binding creates.

There are cases in the gtk+ code where the closure was created
with some destruction routine then the caller who receives the
closure adds another.   If we have to proxy and chain here the
the cost of the system multiples by each object in the chain.

 
 
> Adding these two limitations simplifies the closure/signal code, makes
> it easier to implement, makes the code easier to maintain and extend
> in the future, and makes it easier for people using the API to get
> everything right.

> The current plan has:
> 
>  - one closure tied to multiple connections, which can be
>    connections to the main loop, signals, or whatever
>  - one closure has multiple data objects, which may be the 
>    same as the signal-emitting object for example
>  - one data object can be used by multiple closures
> 
> Then we try to do an elaborate system of weak references to make the
> whole network of connections, closures, objects go away when any of
> them go away. 

This is not the case in gtk+.  In general the functions are global
and can't go away and the same applied to the data.  Thus the
normal case would be for the list to be empty.  Thus no malloc
cost only the pointer and the bitfields.

> All of it has to be reentrant, since closures can be
> invalidated or destroyed while in the invocation process or while a
> signal is being emitted. Moreover we have an efficiency hit because
> closures store this allocated array of callbacks, which is at minimum
> 4 bytes of malloc overhead plus 8 bytes for an invalidate notifier,
> plus the 4 byte pointer to the array in the closure struct.  I'm also
> concerned about the interaction of all this with unrelated refcount
> and garbage collection code, such as the Python refcount system or
> simply an application's own refcount strategies; I can imagine some
> really interesting and hard-to-debug refcount cycles in a large
> application.

Requiring more proxies will mean more cycles and clean up problems
over this black box.


> If we simplify it, then we have:
> 
>  - each closure corresponds to one connection to main loop or signal
>  - each closure has one data object
>  - one data object can be used by multiple closures
> 
> At that point you have four callbacks in the closure structure
> (dnotify/inotify/preinvoke/postinvoke) which comes out to 16 bytes, 
> exactly the same as the minimum case with the allocated array. The 
> maximum case though is smaller than the allocated array; you still use
> only 16 bytes to store pre/post/inotify/dnotify, while the allocated 
> array has to add 3 more notifier objects to the array in that case, 
> estimated 24 more bytes. 

We would need frequencies of the system to make that sort of judgement.
How often to be have 0, 1, 2, n routines, and what is the cost
in each system of having them?


> So the interesting issue for discussion here is to find an example
> case where you _need_ the multiple notifiers. IMHO having them be
> _convenient_ is not enough; the issue is whether we _need_ them. Above
> I've tried to explain simple ways to avoid needing them in the cases
> we've seen so far. The extra complexity is IMHO actively bad, so has
> to be justified with the presumption against it.

I would argue that being about to take a closure and reuse it is
enough of a justification for multiple parents.  This is a feature
which libsigc++ lacked and has become a sore point because it actually
did occur.  (It has become the most outstanding request from 
libsigc++ users).  The alternative to allowing multiple parents, means 
creating special closures which have delete routines which handle 
multiple parents. However, you are passing much of the work of making sure 
the deletion order is correct on the user.   Although this currently 
is a fairly rare case in gtk+ I would assume that if the feature
of reusing closures were allowed it would become more frequent.


The second case of multiple objects being pointed to although rare
is hard to deal with and very error prone.  Currently, gtk+ allows
for one and one only.  We need to support at least one.  

  
I think you are also missing a major point of a library.  The
point of a library is to solve hard problems so that the user
doesn't have to.  Thus no matter how complex the inter workings
of the implementation, so long as the interface is simple
it is good.  You are proposing that the same things can be accomplished
by clever use of the closure system, however, this means that every
user who needs these functions must reinvent their own kludges
around the basic functionality.  Considering they may not be 
as well informed about the problems caused by destruction order,
they are more likely to have problems.  Thus the users life is harder.
In short, it is a library writers job to take on hard problems and
solve them well so the user doesn't have to.

--Karl






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]