An interesting deadlock in ORBit2...

From: Justin Schoeman <justin expertron co za>
To: "orbit-list gnome org" <orbit-list gnome org>,Michael Meeks <michael ximian com>
Subject: An interesting deadlock in ORBit2...
Date: Tue, 18 Nov 2003 11:19:31 +0200
Hi all,

I have just run into an interesting deadlock in ORBit2... In a fairly 
heavily multithreaded environment, after about 4 days of operation, the 
system hangs up in the following state:

Thread 3 (Thread 2051 (LWP 13362)):
#0  0x4017a384 in write () from /lib/libc.so.6
#1  0x402479d4 in __DTOR_END__ () from 
/home/justin/orbit2/lib/libORBit-2.so.0
#2  0x4020fa89 in giop_thread_push_recv (ent=0xbffff210) at giop.c:568
#3  0x402125f7 in handle_reply (buf=0x80b43e0) at giop-recv-buffer.c:1092
#4  0x40212a81 in giop_connection_handle_input (lcnx=0x80b6598)
     at giop-recv-buffer.c:1271
#5  0x402323b2 in link_connection_io_handler (gioc=0x0, condition=G_IO_IN,
     data=0x80b6598) at linc-connection.c:1256
#6  0x4023417d in link_source_dispatch (source=0x80b63f8,
     callback=0x40232330 <link_connection_io_handler>, user_data=0x80b6598)
     at linc-source.c:56
#7  0x402c4055 in g_main_dispatch (context=0x80bd5f0) at gmain.c:1720
#8  0x402c4f71 in g_main_context_dispatch (context=0x80bd5f0) at 
gmain.c:2268
#9  0x402c5329 in g_main_context_iterate (context=0x80bd5f0, block=1,
     dispatch=1, self=0x80b59c0) at gmain.c:2349
#10 0x402c5a20 in g_main_loop_run (loop=0x80bd6b0) at gmain.c:2569
#11 0x40230570 in link_io_thread_fn (data=0x0) at linc.c:342
#12 0x402da919 in g_thread_create_proxy (data=0x80b59c0) at gthread.c:551
#13 0x402930ce in pthread_start_thread (arg=0xbf5ffc00) at manager.c:291

Thread 2 (Thread 2049 (LWP 13360)):
#0  0x4017fc74 in poll () from /lib/libc.so.6
#1  0x40292e16 in __pthread_manager (arg=0xe) at manager.c:142

Thread 1 (Thread 1024 (LWP 13359)):
#0  0x400c66b2 in sigsuspend () from /lib/libc.so.6
#1  0x40295df0 in __pthread_wait_for_restart_signal (self=0x4029fa80)
     at pthread.c:969
#2  0x40297e07 in __pthread_alt_lock (lock=0x80b0e70, self=0x0) at 
restart.h:34
#3  0x40294015 in __pthread_mutex_lock (mutex=0x80b0e60) at mutex.c:120
#4  0x40291fc4 in pthread_cond_wait (cond=0x80b0e80, mutex=0x80b0e60)
     at condvar.c:132
#5  0x40211beb in giop_recv_buffer_get (ent=0xbffff210)
     at giop-recv-buffer.c:704
#6  0x40216464 in ORBit_small_invoke_stub (obj=0x80b68f8, m_data=0x80928e0,
     ret=0x0, args=0xbffff2f0, ctx=0x0, ev=0x80becc8) at orbit-small.c:646
#7  0x40216201 in ORBit_small_invoke_stub_n (object=0x80b68f8,
     methods=0x8092984, index=1, ret=0x0, args=0xbffff2f0, ctx=0x0,
     ev=0x80becc8) at orbit-small.c:571
#8  0x4022ada2 in ORBit_c_stub_invoke (obj=0x80b68f8, methods=0x8092984,
     method_index=1, ret=0x0, args=0xbffff2f0, ctx=0x0, ev=0x80becc8,
     class_id=0, method_offset=8,
     skel_impl=0x8063880 <_ORBIT_skel_small_Trace_TraceInt_Trace>) at 
poa.c:2595
#9  0x080637fc in Trace_TraceInt_Trace (_obj=0x80b68f8, req=0xbffff440,
     resp=0xbffff43c, ev=0x80becc8) at trace-stubs.c:33
#10 0x08063707 in trace_trace (handle=0x80becc8, req=0xbffff440,
     resp=0xbffff43c) at trace-client.c:23
#11 0x08063c0a in trace_trace_wf (handle=0x80becc8, trans_id=0, 
trace_code=2,
     id=0x80a8940 "hc_input_core_airtime",
     fmt=0x807fce0 "Tracing for IEN pool thread %d started.")
     at trace-client-wrap.c:104
#12 0x0805b313 in ien_init_fn (id=0) at hc_input_core.c:2196
#13 0x08062a28 in threadpool_init_item (item=0x80b6480, pool=0xbffff580, 
id=0)
     at threadpool.c:119
#14 0x08062ebe in threadpool_retrieve (pool=0xbffff580) at threadpool.c:236
#15 0x0805c45e in main (argc=1, argv=0xbffff744) at hc_input_core.c:2531
#16 0x400b4280 in __libc_start_main () from /lib/libc.so.6

I am not entirely sure about that
#0  0x4017a384 in write () from /lib/libc.so.6
#1  0x402479d4 in __DTOR_END__ () from 
/home/justin/orbit2/lib/libORBit-2.so.0

at the top of Thread 3... It should actually be a call to 
giop_incoming_signal_T (in fact, I think it is, but has been optimised 
out in some way?).

The system deadlocks at this point, with all threads permanently stuck 
in that state.

The one thing I do see is that giop_incoming_signal_T is being called 
from giop_thread_push_recv, apparently without the lock being held.  I 
am correct that this is the problem, or does anybody else have any other 
ideas as to what I should look at?

Thanks,

Justin
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]