Re: bug (?) in libbonobo that is breaking evolution



on 9/7/01 8:27 PM, Jon Trowbridge at trow ximian com wrote:

> In the evolution addressbook, we've been plagued by 100%-cpu-sucking
> lock-ups.  I've traced the problem to a race condition in which horrible
> things happen when impl_Bonobo_Control_realize gets called at a
> particularly inopportune time.[1]
> 
> We are spinlocking somewhere inside of the gtk_events_pending() call in:
> 
>  if (!control->priv->is_local) {
>   while (gtk_events_pending ())
>    gtk_main_iteration ();
>   gdk_flush ();
>  }
> 
> (this is from bonobo/bonobo-control.c, around line 460)
> 
> at the bottom of a long, convoluted stack trace.  If I comment this code
> out, the problem disappears without any obvious adverse effects.

We ran into a problem that seemed just like this in Nautilus. Although I
have had no luck convincing Michael of this, I think this is related to a
general problem with race conditions caused by bad timing of incoming CORBA
calls. It's incredibly difficult to code with a model that allows any
incoming CORBA call to come in at the time of any outgoing CORBA call,
although I understand that it's hard to fashion any other kind of design.

I can't remember which of our many workarounds was the one that made this go
away for Nautilus components. It's probably the way we changed the
NautilusView class so that we defer actually doing any work until idle time,
but I'm not sure. See nautilus-view.c and nautilus-idle-queue.c for details
of that one.

In my opinion, the best way to handle this is some kind of a change in how
ORBit handles incoming calls from another process. It seems to me that
there's no reason that we need to handle this particular call (you don't say
which it is) from the client process until the server process gets back to
its main loop. But I can't say this is true for all types of incoming calls.
I'm pretty sure that it's true for all the high-level ones in the UI part of
Bonobo, but not for things like ref and unref, so there we have a bit of
trouble, I think.

If we don't change the basic model, we'll just be fighting fires like this
one from now on. (With only two major applications using Bonobo heavily, the
developers of both have encountered this problem.) I think that this problem
and the orphaned process problem would prevent me from starting a new Bonobo
application until both are resolved.

> I'm a bit confused by the fact that is_local must be false when this
> happens, since my factory and the embedding app are always both running
> on the same machine.  Is this the meta-bug (i.e.
> bonobo_gtk_widget_from_x11_id is failing to correctly distinguish
> between the local and the remote case), or have I totally misunderstood
> what is going on here?

The is_local flag in this case doesn't mean "same machine", it means "within
the same process".

    -- Darin





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]