Re: An interesting deadlock in ORBit2...



OK: First deadlock (seems the main loop grabbed the response, or 
something strange... linc is explicitly iterating the main loop, but 
under a different context.

(gdb) t a a bt

Thread 3 (Thread 1026 (LWP 25541)):
#0  0x40180c74 in poll () from /lib/libc.so.6
#1  0x402c6d60 in g_main_context_poll (context=0x80a1a48, timeout=-1,
     priority=2147483647, fds=0x80a8b60, n_fds=4) at gmain.c:2667
#2  0x402c62fe in g_main_context_iterate (context=0x80a1a48, block=1,
     dispatch=1, self=0x80a8c88) at gmain.c:2344
#3  0x402c6a20 in g_main_loop_run (loop=0x80a4488) at gmain.c:2569
#4  0x0805ba58 in _corba_client_main_loop_thread_handler (arg=0x0)
     at corba_common.c:107
#5  0x402940ce in pthread_start_thread (arg=0xbf7ffc00) at manager.c:291

Thread 2 (Thread 2049 (LWP 25540)):
#0  0x40180c74 in poll () from /lib/libc.so.6
#1  0x40293e16 in __pthread_manager (arg=0xa) at manager.c:142

Thread 1 (Thread 1024 (LWP 25539)):
#0  0x40180c74 in poll () from /lib/libc.so.6
#1  0x402c6d60 in g_main_context_poll (context=0x80a0e68, timeout=-1,
     priority=2147483647, fds=0x80a8ab8, n_fds=3) at gmain.c:2667
#2  0x402c62fe in g_main_context_iterate (context=0x80a0e68, block=1,
     dispatch=1, self=0x809d2c8) at gmain.c:2344
#3  0x402c64b0 in g_main_context_iteration (context=0x80a0e68, may_block=1)
---Type <return> to continue, or q <return> to quit---
     at gmain.c:2408
#4  0x40231363 in link_main_iteration (block_for_reply=1) at linc.c:231
#5  0x40212c0a in giop_recv_buffer_get (ent=0xbffff400)
     at giop-recv-buffer.c:714
#6  0x40217469 in ORBit_small_invoke_stub (obj=0x80a8f50, m_data=0x807f9e0,
     ret=0x0, args=0xbffff4e0, ctx=0x0, ev=0x80a4210) at orbit-small.c:649
#7  0x40217211 in ORBit_small_invoke_stub_n (object=0x80a8f50,
     methods=0x807fa84, index=1, ret=0x0, args=0xbffff4e0, ctx=0x0,
     ev=0x80a4210) at orbit-small.c:575
#8  0x4022bdb2 in ORBit_c_stub_invoke (obj=0x80a8f50, methods=0x807fa84,
     method_index=1, ret=0x0, args=0xbffff4e0, ctx=0x0, ev=0x80a4210,
     class_id=0, method_offset=8,
     skel_impl=0x80596f0 <_ORBIT_skel_small_Trace_TraceInt_Trace>) at 
poa.c:2595
#9  0x0805966c in Trace_TraceInt_Trace (_obj=0x80a8f50, req=0xbffff530,
     resp=0xbffff52c, ev=0x80a4210) at trace-stubs.c:33
#10 0x08059577 in trace_trace (handle=0x80a4210, req=0xbffff530,
     resp=0xbffff52c) at trace-client.c:23
#11 0x08059851 in trace_trace_w (handle=0x80a4210, trans_id=0, 
trace_code=2,
     txt=0x806e986 "Could not receive new sms.",
     id=0x8095f60 "gsm_input_core_all") at trace-client-wrap.c:37
#12 0x08051d63 in _gsm_recv (sms=0x80a8fa0, to=1) at gsm_input_core.c:282
#13 0x08057e57 in main (argc=2, argv=0xbffff784) at gsm_input_core.c:2084
#14 0x400b5280 in __libc_start_main () from /lib/libc.so.6

The OOB messages do not result in a core dump, but they are likely also 
a result of the two separate main loop iterations?

-justin

Michael Meeks wrote:
> Hi Justin,
> 
> On Tue, 2003-12-09 at 07:21, Justin Schoeman wrote:
> 
>>I will have to look into starting a separate glib mainloop.  The 
>>deadlock still occurs in 2.9.2... (the call trace is still the same as 
>>before).
> 
> 
> 	Ok; I just went over that again, and since we know the client is
> polling for us we should never poke the wakeup pipe; I also set NONBLOCK
> on the pipe as well ( the missing piece from the last fix ).
> 
> 	So; HEAD should fix that nicely now, if you update.
> 
> 
>>Ouch. I just tried creating a glib mainloop in my clients, and the 
>>results are fairly ugly... Sporadically hangs up waiting for client 
>>responses, sometimes generates errors like 'OOB incoming msg header 
>>data', and sometimes causes memory corruption... Unfortunately these
>>are rather unpredictable errors.  The memory corruption never seems to
>>happed under valgrind, and OOB messages never under gdb.  Very
>>difficult to track down exactly what is going wrong here :-(
> 
> 
> 	Wow - that's extraordinarily vicious; if you run with --g-fatal-errors
> (there is some way to get them to bomb out), perhaps we can get some
> core dumps that we can use to fix this;
> 
> 	Thanks,
> 
> 		Michael.
> 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]