Re: [REGRESSION] perl exceptions in callbacks now break c/gtk+2



Thierry Vignaud said:
however, dieing in a gtk callback now results in unexpected behaviour
(that is from a perl programmer viewpoint, of course gtk+ as being
developed in c is not supposed to support them):

this sparked a good amount of offlist discussion over the last few days.  with
this message i hope to update you (the list) on what's going on and what
impact this might have on your code.  if you don't like the fix proposed here,
*please speak up now*!


the problem thierry described is actually two problems, one with signal
emission, the other with mainloops, both effects of the same cause.  see below
for a technical discussion of the gory details.

the upshot is that to fix this, we may have to break some perl rules, unless
somebody can point out a magical fix that makes everything work correctly.


specifically, when eval is in effect, a die in a callback may not immediately
exit the current block as you would expect.  this is because g_signal_emit
*must* return normally or the glib library can get into an unstable state,
resulting in the crash pointed out in the previous message.

this behavior is not ideal, and in fact is most unexpected from the perl
developer's point of view, but has the advantage that it does not crash
entirely.  :-/

for example:

  # contrived example of a callback in which something goes wrong...
  $button->signal_connect (clicked => sub {die "whee"});

  # some way to trigger that to happen.
  Glib::Timeout->add (1000, sub {
                   # this next statement calls g_signal_emit under the hood
                   $button->clicked;
                   print "this line should not be reached\n";
                   0});

  # now try to trap that exception:
  eval {
      # this will block for 1000 milliseconds, until the timeout
      # callback dies.
      Gtk2->main;
      print "nor should this be reached\n";
  };
  # we should wind up here.

should print nothing, but with the planned fixes it will wind up printing
'this line should not be reached', because the signal emission *must* return
correctly.  depending on how we fix the mainloop thing, 'nor should this be
reached' may print as well.


i have code Glib, in cvs under the branch "exceptions", to fix the signal side
of the issue.  if you're feeling brave, please check this code out and try it.
 comments on all this would be most appreciated.

if anoncvs and cvsweb ever update, you'll want the exceptions branch of
GClosure.xs and the new tests t/8.t and t/9.t.


old gtk-perl and gtk2-perl recovery systems:

as noted previously, the old gtk2-perl recovery system is not suitable for
current versions of gtk2-perl, because Glib should know nothing about Gtk2.  a
more generic system to provide exception handlers may be in order.

always running callback handlers in G_EVAL is a no-no, as it hides broken code
when no eval block is present in your program.  the proper solution for this
is to run callbacks in G_EVAL when we are already in eval context. 
furthermore, the specialized handling is only necessary for callbacks run as
part of a signal emission.

- if we catch an exception:
  o if we already cautgh another exception, print a "sorry folks"
    message
  o else we save the exception that need to be croaked again and we
    run gtk_main_quit
  o once c gtk_main had exited, perl gtk_main croak the pending
    exception

this logic needs to be made more generic.  yosh (on #gtk+) suggested
installing a custom GSource which watches for exceptions.


but this mechanism was not keeped when switching to inline to xs (due
to saner/better design of new tree).

specifically, the G_EVAL was removed because it caused many, many problems in
the early days of gtk2-perl-xs when many functions were not implemented --
basically, your callbacks would always succeed, even though their code would
not run.  by removing this G_EVAL, bad code in a callback (without a
surronding eval block) correctly kills the program.


old perl-GTK (that is gtk+-1.0/1.2 perl binding) also has such a
mechanism (die in callback did worked).

on the contrary, gtk-perl actually had no safety net at all, and its apparent
good handling of exceptions was really just a happy coincidence.  in fact,
trapping outside a mainloop of an exception that happens inside a mainloop
typically resulted in an unkillable mainloop.



the gory technical details
==========================

perl's exception handling is implemented with the c library functions setjmp()
and longjmp().  unlike goto, which allows you to jump from one instruction to
another within a function, longjmp lets you jump to a completely different
place within the code.  the difference is that goto doesn't require you to
change stack frames, but longjmp does.  you set the point to which you want to
come back with setjmp, and if something goes horribly, horribly wrong, you use
longjmp to get back there.

this is a lot like exceptions in C++, with a major difference --- whereas C++
"unwinds" the stack, calling destructors on all objects deallocated on the way
from the current stack frame to the one containing the enclosing "try" block,
C doesn't know about objects, so it just discards the skipped stack.

traditionally, this means that stuff you allocated on the heap is now leaked,
and any recursion is now unrecoverable.


it just so happens that g_signal_emit is a recursive function.  you can emit a
signal whose handler emits another signal.  you can stop a signal's emission
from the outside, but all that does is set a flag in some heap data structures
to tell g_signal_emit to stop calling handlers; this does nothing if you don't
*finish* the call to g_signal_emit for that stack frame.  by longjmp() from
the handler called by g_signal_emit to some place farther down the stack, the
cleanup code for the skipped signal emission never gets called, leaving the
private data structures in an odd state, and invariably resulting in either an
assertion or a memory fault at some later time.

similarly, gtk_main is a recursive function, which stores a list of GMainLoop
objects.  the first one on the list is the current one, and the length of the
list is returned by g_main_level().  gtk_main does not return until some code
called *by* it calls gtk_main_quit, at which point the function continues,
removing the GMainLoop object and returning to the enclosing scope.  if you
longjmp past the stack frame containing a gtk_main call, the mainloop started
there now cannot be killed.

thus, we need to ensure that perl does not jump past the stack frames of calls
to gtk_main and g_signal_emit.  it remains to be seen whether this can be
achieved with GMainLoop code alone, or indeed whether the situation can be
solved without help from the application (e.g., $SIG{__DIE__} = sub
{Gtk2->main_quit}).


-- 
muppet <scott at asofyet dot org>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]