Garbage collection in Gtk



This is cross-posted to ruby-gnome2-devel-en lists sourceforge net
and gtk-devel-list gnome org, because the problem was initially found
with Ruby/GNOME2, but it seems extremely difficult to fix without serious
changes to Gtk itself.

The post is probably unnecessarily long, but I want to
avoid any possible misunderstanding of the problem.

=== Links from Gtk objects to objects in other languages ===

Gtk is not merely a library for displaying stuff on the screen,
like for example SDL or OpenGL. Such libraries can be easily
integrated with any language and any garbage collection system.

Gtk widgets are object. They're connected with other objects,
they call installed signal handlers etc.

To be used in a language different than C, every Gtk object
refered to from the language must be associated with an object
X of that language. In a very simple case, that associated object
can be simply a pointer to Gtk object plus a type information
(at least some kind of "this is a Gtk object, call this function
for further information"). In typical cases, such associated
object will be more complex.

In case of some libraries we can simply create those objects
when returning a Gtk object value from a function and throw away
when no longer used. However, when we use Gtk, we would like to
associate some additional information with a Gtk object,
that is - add pointers from Gtk objects to non-Gtk objects.

One such case is subclassing a Gtk object. For example we
want to make a class OurApp, that's a subclass of GtkWindow.
Of course, OurApp must function as a real object of our language,
one that can have methods, attributes (links to other object) etc.
On the other hand it must function a Gtk object, so we can connect
event handlers etc. to it. Whether OurApp subclasses GtkWindow,
or merely links to GtkWindow through one of its attributes,
a GtkWindow object must contain a link to OurApp.
When a Gtk event happens, a callback function must be able
to access OurApp's attributes somehow.
We can do that using g_object_set_qdata on the GtkWindow
(what Ruby/GNOME2 currently does), or by passing a link inside
a closure connected to the Gtk event.

Even without subclassing, merely connecting an event handler makes
a Gtk object links to a (closure) object of the other language.

=== Garbage collection ===

Not only may the Gtk objects link to the language's objects,
but they can have the only existing links.

In happens when we drop all local variables pointing to the object
associated with the Gtk object, but which should still
be available in event handlers, or when we get the Gtk object
through method calls on other Gtk objects.

In also happens when we didn't have a local variable in the first
place. Typically we install closures as event handlers without
putting them in variables first.

Not only those objects must not be freed, objects that can
be accessed by following links from them must stay alive.

We can't assume that objects associated with Gtk object will
always stay alive, but have to trace their status.

The "free the object when Gtk object is freed" rule will not work.
If the object own a normal reference Gtk object,
then the Gtk object will never be freed,
and if it doesn't, and merely owns a weak reference,
then the Gtk object will be freed before we can even use it,
as nothing else links to it.

An interesting way of handling this problem, would be to
always have a strong reference is one way, and a weak reference
in another. So after object pair's creation, language's
object would hold a strong reference to associated Gtk object,
but the reference in the other way would be weak (language's
GC would merely not be informed about its existence).
On the other hand when Gtk object gets refered by anything
from the Gtk world, it gets a strong reference to the
language's object (GC is informed about its existence),
while the reference from language's object to Gtk object
becomes "virtually weak". That is, when the last reference
from the Gtk to the object is dropped, we weaken th Gtk object
-> language object reference again (whether it's easier
to do that by weakening the "virtually weak" reference
and checking when reference count goes to zero, or not weakening
it and checking when is goes to one, is just an implementation detail).

While such an algorithm can be a reasonable temporary workaround,
it's not a real solution. But none of the language bindings that
I'm aware of implement even this.

In any reasonably complex program, there will be circular links.
Objects associated with widgets will, through chains of references,
link to objects associated with their parent widgets.

For example a button that does something to it's dialog window,
must have a reference to the window. It's actually quite difficult
to imagine an useful composite widget without such circular links.
In this algorithm, and actually any safe (one which guarantee
that objects that can be accessed won't be freed) algorithm,
such widgets will leak.

To correctly garbage collect them, we need to be able to
learn status of all objects by propagating marks from the root set
(or by some essentially equivalent scheme).

Unfortunately the highly C-centric object model of glib does
not provide us with any functionality of propagating marks
through Gtk objects. For objects as big as Gtk widgets,
mark&sweep GC is extremely efficient (it's somewhat problematic
on small objects, at least without some optimizations),
probably more so that the reference counting scheme used now.

=== Solutions ===

Of course Gtk does not have to rely on any particular GC,
it merely has to cooperate with various garbage collectors,
that is - either implement a mark forwarding, or provide
the functionality necessary to implement it in toolkits.

I suspect that the easiest way of doing it would be
changing the reference system in a way that lets us
iterate over all references own by the glib object.
This does not affect the referred object in any way,
in particular it does not require any changes to
the functions that update the reference count
in a thread-safe manner. As the reference counter
is a conservative approximation of the garbage collector,
keeping it running in parallel with a real GC will
only affect performance (because the objects are big,
the effect is unlikely to be significant).

Objects would also require a mark field of some kind
(1 or 2 bits), a way of getting a list of all
unmarked objects, and an ability to destroy those objects.

The mark field can be implemented easily as a hash table
in GC code without modifying the Gtk object's layout,
and Gobject must already have a type-independent
destroy for the reference counter to work,
so the only things that requires assistance is
in keeping a list of all objects.

A pair of "glib_object_created", "glib_object_distroyed"
callbacks would probably be more efficient
than a monolithic "glib_list_all_objects" function,
and at the same time easier to implement.

I don't know glib's internals well enough to be
able to assess how much work would such modifications take,
but they're unavoidable if Gtk is to be used efficiently
in garbage-collected languages.

If the changes are likely to take much time,
I think all Gtk bindings to garbage-collected (and even reference-counted)
languages should use the algorithm presented above,
or one of equivalent power.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]