An interesting deadlock in ORBit2...
- From: Justin Schoeman <justin expertron co za>
- To: "orbit-list gnome org" <orbit-list gnome org>,Michael Meeks <michael ximian com>
- Subject: An interesting deadlock in ORBit2...
- Date: Tue, 18 Nov 2003 11:19:31 +0200
Hi all,
I have just run into an interesting deadlock in ORBit2... In a fairly
heavily multithreaded environment, after about 4 days of operation, the
system hangs up in the following state:
Thread 3 (Thread 2051 (LWP 13362)):
#0 0x4017a384 in write () from /lib/libc.so.6
#1 0x402479d4 in __DTOR_END__ () from
/home/justin/orbit2/lib/libORBit-2.so.0
#2 0x4020fa89 in giop_thread_push_recv (ent=0xbffff210) at giop.c:568
#3 0x402125f7 in handle_reply (buf=0x80b43e0) at giop-recv-buffer.c:1092
#4 0x40212a81 in giop_connection_handle_input (lcnx=0x80b6598)
at giop-recv-buffer.c:1271
#5 0x402323b2 in link_connection_io_handler (gioc=0x0, condition=G_IO_IN,
data=0x80b6598) at linc-connection.c:1256
#6 0x4023417d in link_source_dispatch (source=0x80b63f8,
callback=0x40232330 <link_connection_io_handler>, user_data=0x80b6598)
at linc-source.c:56
#7 0x402c4055 in g_main_dispatch (context=0x80bd5f0) at gmain.c:1720
#8 0x402c4f71 in g_main_context_dispatch (context=0x80bd5f0) at
gmain.c:2268
#9 0x402c5329 in g_main_context_iterate (context=0x80bd5f0, block=1,
dispatch=1, self=0x80b59c0) at gmain.c:2349
#10 0x402c5a20 in g_main_loop_run (loop=0x80bd6b0) at gmain.c:2569
#11 0x40230570 in link_io_thread_fn (data=0x0) at linc.c:342
#12 0x402da919 in g_thread_create_proxy (data=0x80b59c0) at gthread.c:551
#13 0x402930ce in pthread_start_thread (arg=0xbf5ffc00) at manager.c:291
Thread 2 (Thread 2049 (LWP 13360)):
#0 0x4017fc74 in poll () from /lib/libc.so.6
#1 0x40292e16 in __pthread_manager (arg=0xe) at manager.c:142
Thread 1 (Thread 1024 (LWP 13359)):
#0 0x400c66b2 in sigsuspend () from /lib/libc.so.6
#1 0x40295df0 in __pthread_wait_for_restart_signal (self=0x4029fa80)
at pthread.c:969
#2 0x40297e07 in __pthread_alt_lock (lock=0x80b0e70, self=0x0) at
restart.h:34
#3 0x40294015 in __pthread_mutex_lock (mutex=0x80b0e60) at mutex.c:120
#4 0x40291fc4 in pthread_cond_wait (cond=0x80b0e80, mutex=0x80b0e60)
at condvar.c:132
#5 0x40211beb in giop_recv_buffer_get (ent=0xbffff210)
at giop-recv-buffer.c:704
#6 0x40216464 in ORBit_small_invoke_stub (obj=0x80b68f8, m_data=0x80928e0,
ret=0x0, args=0xbffff2f0, ctx=0x0, ev=0x80becc8) at orbit-small.c:646
#7 0x40216201 in ORBit_small_invoke_stub_n (object=0x80b68f8,
methods=0x8092984, index=1, ret=0x0, args=0xbffff2f0, ctx=0x0,
ev=0x80becc8) at orbit-small.c:571
#8 0x4022ada2 in ORBit_c_stub_invoke (obj=0x80b68f8, methods=0x8092984,
method_index=1, ret=0x0, args=0xbffff2f0, ctx=0x0, ev=0x80becc8,
class_id=0, method_offset=8,
skel_impl=0x8063880 <_ORBIT_skel_small_Trace_TraceInt_Trace>) at
poa.c:2595
#9 0x080637fc in Trace_TraceInt_Trace (_obj=0x80b68f8, req=0xbffff440,
resp=0xbffff43c, ev=0x80becc8) at trace-stubs.c:33
#10 0x08063707 in trace_trace (handle=0x80becc8, req=0xbffff440,
resp=0xbffff43c) at trace-client.c:23
#11 0x08063c0a in trace_trace_wf (handle=0x80becc8, trans_id=0,
trace_code=2,
id=0x80a8940 "hc_input_core_airtime",
fmt=0x807fce0 "Tracing for IEN pool thread %d started.")
at trace-client-wrap.c:104
#12 0x0805b313 in ien_init_fn (id=0) at hc_input_core.c:2196
#13 0x08062a28 in threadpool_init_item (item=0x80b6480, pool=0xbffff580,
id=0)
at threadpool.c:119
#14 0x08062ebe in threadpool_retrieve (pool=0xbffff580) at threadpool.c:236
#15 0x0805c45e in main (argc=1, argv=0xbffff744) at hc_input_core.c:2531
#16 0x400b4280 in __libc_start_main () from /lib/libc.so.6
I am not entirely sure about that
#0 0x4017a384 in write () from /lib/libc.so.6
#1 0x402479d4 in __DTOR_END__ () from
/home/justin/orbit2/lib/libORBit-2.so.0
at the top of Thread 3... It should actually be a call to
giop_incoming_signal_T (in fact, I think it is, but has been optimised
out in some way?).
The system deadlocks at this point, with all threads permanently stuck
in that state.
The one thing I do see is that giop_incoming_signal_T is being called
from giop_thread_push_recv, apparently without the lock being held. I
am correct that this is the problem, or does anybody else have any other
ideas as to what I should look at?
Thanks,
Justin
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]