Re: An interesting deadlock in ORBit2...
- From: Michael Meeks <michael ximian com>
- To: Justin Schoeman <justin expertron co za>
- Cc: orbit <orbit-list gnome org>
- Subject: Re: An interesting deadlock in ORBit2...
- Date: Sat, 06 Dec 2003 01:30:10 +0000
Hi Justin,
On Tue, 2003-11-18 at 09:19, Justin Schoeman wrote:
> I have just run into an interesting deadlock in ORBit2... In a fairly
> heavily multithreaded environment, after about 4 days of operation, the
> system hangs up in the following state:
So; I just had a look at this;
> Thread 3 (Thread 2051 (LWP 13362)):
> #0 0x4017a384 in write () from /lib/libc.so.6
> #1 0x402479d4 in __DTOR_END__ () from
> /home/justin/orbit2/lib/libORBit-2.so.0
> #2 0x4020fa89 in giop_thread_push_recv (ent=0xbffff210) at giop.c:568
This seems likely to be:
giop_incoming_signal_T with wakeup_mainloop inlined inside it (I
guess).
wakeup mainloop spins trying to write an 'A' to the wakeup pipe of the
'main context' - we should really be using something associated with the
'wake_context' itself here I think; but that's not the problem.
Unfortunately if the buffer is full we spin indefinately - which is
pretty silly, since if we get EAGAIN, we know that the mainloop is being
woken up :-)
OTOH - the cause of the problem for you looks like you're not running
the glib mainloop in the first thread; either through CORBA_ORB_run or a
g_main_loop_do_foo type thing.
> #12 0x0805b313 in ien_init_fn (id=0) at hc_input_core.c:2196
> #13 0x08062a28 in threadpool_init_item (item=0x80b6480, pool=0xbffff580,
> id=0)
> at threadpool.c:119
> #14 0x08062ebe in threadpool_retrieve (pool=0xbffff580) at threadpool.c:236
> #15 0x0805c45e in main (argc=1, argv=0xbffff744) at hc_input_core.c:2531
> #16 0x400b4280 in __libc_start_main () from /lib/libc.so.6
Looks slightly odd to me this main thread :-)
> The system deadlocks at this point, with all threads permanently stuck
> in that state.
Sure; it's spinning on EAGAIN; I think I'll stop it spinning on EAGAIN,
and only on EINTR - that may fix the immediate problem for you.
> The one thing I do see is that giop_incoming_signal_T is being called
> from giop_thread_push_recv, apparently without the lock being held. I
> am correct that this is the problem, or does anybody else have any other
> ideas as to what I should look at?
I don't think that's an issue - we can signal the condition without
problems and using the wakeup is fairly rare and safe anyhow.
HTH,
Michael.
--
michael@ximian.com <><, Pseudo Engineer, itinerant idiot
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]