question about glib/gmain.c for a ruby-gnome2-caused abort in glib



Hello,

I'd like to submit a couple of questions regarding the main loop code
of glib/gmain.c and a failing assertion triggered by ruby-gnome2 under
certain circumstances. I have tried to understand glib/gmain.c code as
much as possible, yet I'd like to ask to excuse any mistake in my
understanding, which I think is far from complete.

First, you have to know that ruby features collaborative threads
implemented purely in the interpreter. Basically, there is a "main
thread", plus possibly other threads started with a ruby pseudo
function call, and ruby is responsible for scheduling between them.
Since scheduling happens purely in the interpreter, there is no
synchronization need in wrapped C libraries, since they are non
interruptible of course. However, when wrapping a mainloop-based
application such as gtk, the problem is then how to give control to
ruby threads, because if we allow a regular call to gtk_main(),
nothing allows to schedule other threads when needed.

Back in 2000, the people of ruby/GTK chose to set a custom poll
function, with g_main_set_poll_func(), from which they do a ruby
select-like call allowing thread scheduling[1].

Basically, in the end that means that ruby threads, scheduled from the
custom poll function, will call glib/gtk functions. My first question
is whether this is considered legal? In particular, is calling
gtk_main_iteration() legal? I believe that this is the reason why I
can see an abort, which I'm going to try to explain:

I have tried to write down the scheduling path that leads to the abort
I can see, you can have a look at it from the following screen shot.
There is nearly nothing ruby-specific, so this should be fairly
understandable.

http://zarb.org/~gc/t/ruby-gnom2-thr-pb.png

Basically, my understanding is that gmain's g_main_dispatch partially
modifies the pending dispatches data structure (line 1896), then the
dispatch function will end up continuing a main iteration which was
interrupted from the custom poll function, reenter the g_main_dispatch
function, which then "sees" the pending dispatches data structure is
inconsistent (line 1897), and aborts for that reason. I hope I'm
making myself clear :)

It's obvious that this is related to ruby scheduling which introduces
an unusual code path in glib's source (non purely stack based).

However, I cannot see any obvious way to "fix" this in ruby-gnome2,
and also I am not sure that the partial modification of the pending
dispatches in g_main_dispatch is mandatory. In particular, in a pure
C/glib application, it seems that calling the main iteration from a
dispatch is legal (for example in gtk, calling gtk_main_iteration from
a gtk callback, if I'm correct), and this is handled by
unconditionally removing all pending dispatches in
g_main_context_prepare (line 2225). So my second question is, would it
be an option to make g_main_dispatch modify the pending dispatches
totally before calling the dispatches? I have a very small patch doing
that in [2], with which I have verified the abort I see with
ruby-gnome2 is cleared and no apparent new problem comes; however,
this is a very naive implementation which is suboptimal by far, it is
more to show my idea with source code. A glib's mainloop specialist
could also comment on whether this is an appropriate idea (with a
better implementation), in particular it is hoped that it doesn't
introduce other problems in the long run.

Thanks,

[1] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/1621
[2] http://zarb.org/~gc/t/glib-reentrant-g_main_dispatch.diff
-- 
Guillaume Cottenceau - http://zarb.org/~gc/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]