Re: Serious Bonobo Problem for Sun



Hi Brian,

        Thanks for your mail, luckily I think the problem is not as acute
as it may appear on first glance.

> This problem can be demonstrated using the Bonobo test program
> test-container which is found in the Bonobo source tree under tests.
> First I run oaf-slay to make sure no oaf processes are running (and
> then I verify this by running "ps -ef | grep oaf | grep -v "grep oaf")
> Then I follow these steps:
>
> $ test-container

        test-container is _incredibly_ old, horribly stale code that is
not intended to be used by the public, I have just disabled it in the CVS
build and added some horror type warnings about basing new code on it,.
 
        If you try samples/controls/sample-control-container or
samples/compound_doc/container/sample-container you will notice that these
cleanup nicely - as expected.
  
On Wed, 4 Apr 2001, Brian Cameron wrote:
> Here at Sun we have serious problems running programs which use the
> Bonobo architecture (like Evolution and Nautilus).  When a program
> that uses Bonobo exits (either by exiting normally or by crashing),
> oaf-processes are left running in the background.   
 
        Ok, so, there are several problems here. The first problem is
built into the referencing scheme. An object has a reference count, but
this contains no concept of ownership whatsoever. ie. it is impossible to
tell who owns the 5 references it has. Consequently if someone comes
along, references the server and then crashes or just leaks the reference,
there is no possible way to detect this.
 
        Looked at from another angle, a process can be serving 10 other
processes with controls. Should 1 process die, it is not correct to go
round killing all the processes that it communicated with - even if this
were possible.
 
        Consequently, in the case of pathalogical component failure, it
will be the case that we get process leaks. The only solution to this is
to minimise the liklihood of component failure.
 
>  This may not be a serious problem for single user machines, but on
> multi-user servers this quickly becomes critical.  We have seen
> multi-user servers with hundreds and sometimes thousands of oaf
> processes left behind,
 
        Now, I suspect that the more telling problem is that AFAIK oafd
doesn't time out after a while and shut down - and oafd can chew some
serious resources ~ 1Mb + per process on my system.
          
        So, I suspect that 'oafd' proliferation is the real problem -
could this be the case ? in which case we need to ensure that it shuts
itself down after a while [ this may already have been done ].

> We notice that when programs that use the Bonobo architecture crash,
> the oaf processes are sometimes left in a unusable state.  Trying to
> re-run the program causes the program to immediately crash or behave
> strangely.  Running oaf-slay and then restarting corrects this
> problem.

        oaf-slay is a program that causes a discontinuity in your
component world - when you run it you inform the system that although 
idle references may still exist, that this is not due to idle or slow
TCP connections, and that there are no clients currently using their
services, thus everything can be raised.
 
> In both cases, the Bonobo architecture should be robust enough to
> handle the situation.  When the program that launched the oaf process
> exits (whether by choice or by crash), the oaf processes should
> recognize this and quit.

        I agree that oafd should timeout after a while and exit.
 
> It would be very useful if we could ship a version of Bonobo without
> this problem with the Sun version of Gnome 1.4.  If it is possible to
> correct this problem in the very near future, then perhaps we could
> explore the possibility of doing this.  Any other ideas/suggestions
> would be appreciated.

        It should be relatively simple in fact, a few hacks in oaf/oafd,
but perhaps it has been done already and / or is underway; Maciej ?

	Since we'll all be at GUADEC for a while, it will probably look
like we're ignoring you, but we're really not ... :-)

        Regards,

                Michael.

-- 
 mmeeks gnu org  <><, Pseudo Engineer, itinerant idiot





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]