Re: An interesting deadlock in ORBit2...
- From: Justin Schoeman <justin expertron co za>
- To: Michael Meeks <michael ximian com>
- Cc: orbit <orbit-list gnome org>
- Subject: Re: An interesting deadlock in ORBit2...
- Date: Wed, 10 Dec 2003 16:51:16 +0200
OK: First deadlock (seems the main loop grabbed the response, or
something strange... linc is explicitly iterating the main loop, but
under a different context.
(gdb) t a a bt
Thread 3 (Thread 1026 (LWP 25541)):
#0 0x40180c74 in poll () from /lib/libc.so.6
#1 0x402c6d60 in g_main_context_poll (context=0x80a1a48, timeout=-1,
priority=2147483647, fds=0x80a8b60, n_fds=4) at gmain.c:2667
#2 0x402c62fe in g_main_context_iterate (context=0x80a1a48, block=1,
dispatch=1, self=0x80a8c88) at gmain.c:2344
#3 0x402c6a20 in g_main_loop_run (loop=0x80a4488) at gmain.c:2569
#4 0x0805ba58 in _corba_client_main_loop_thread_handler (arg=0x0)
at corba_common.c:107
#5 0x402940ce in pthread_start_thread (arg=0xbf7ffc00) at manager.c:291
Thread 2 (Thread 2049 (LWP 25540)):
#0 0x40180c74 in poll () from /lib/libc.so.6
#1 0x40293e16 in __pthread_manager (arg=0xa) at manager.c:142
Thread 1 (Thread 1024 (LWP 25539)):
#0 0x40180c74 in poll () from /lib/libc.so.6
#1 0x402c6d60 in g_main_context_poll (context=0x80a0e68, timeout=-1,
priority=2147483647, fds=0x80a8ab8, n_fds=3) at gmain.c:2667
#2 0x402c62fe in g_main_context_iterate (context=0x80a0e68, block=1,
dispatch=1, self=0x809d2c8) at gmain.c:2344
#3 0x402c64b0 in g_main_context_iteration (context=0x80a0e68, may_block=1)
---Type <return> to continue, or q <return> to quit---
at gmain.c:2408
#4 0x40231363 in link_main_iteration (block_for_reply=1) at linc.c:231
#5 0x40212c0a in giop_recv_buffer_get (ent=0xbffff400)
at giop-recv-buffer.c:714
#6 0x40217469 in ORBit_small_invoke_stub (obj=0x80a8f50, m_data=0x807f9e0,
ret=0x0, args=0xbffff4e0, ctx=0x0, ev=0x80a4210) at orbit-small.c:649
#7 0x40217211 in ORBit_small_invoke_stub_n (object=0x80a8f50,
methods=0x807fa84, index=1, ret=0x0, args=0xbffff4e0, ctx=0x0,
ev=0x80a4210) at orbit-small.c:575
#8 0x4022bdb2 in ORBit_c_stub_invoke (obj=0x80a8f50, methods=0x807fa84,
method_index=1, ret=0x0, args=0xbffff4e0, ctx=0x0, ev=0x80a4210,
class_id=0, method_offset=8,
skel_impl=0x80596f0 <_ORBIT_skel_small_Trace_TraceInt_Trace>) at
poa.c:2595
#9 0x0805966c in Trace_TraceInt_Trace (_obj=0x80a8f50, req=0xbffff530,
resp=0xbffff52c, ev=0x80a4210) at trace-stubs.c:33
#10 0x08059577 in trace_trace (handle=0x80a4210, req=0xbffff530,
resp=0xbffff52c) at trace-client.c:23
#11 0x08059851 in trace_trace_w (handle=0x80a4210, trans_id=0,
trace_code=2,
txt=0x806e986 "Could not receive new sms.",
id=0x8095f60 "gsm_input_core_all") at trace-client-wrap.c:37
#12 0x08051d63 in _gsm_recv (sms=0x80a8fa0, to=1) at gsm_input_core.c:282
#13 0x08057e57 in main (argc=2, argv=0xbffff784) at gsm_input_core.c:2084
#14 0x400b5280 in __libc_start_main () from /lib/libc.so.6
The OOB messages do not result in a core dump, but they are likely also
a result of the two separate main loop iterations?
-justin
Michael Meeks wrote:
> Hi Justin,
>
> On Tue, 2003-12-09 at 07:21, Justin Schoeman wrote:
>
>>I will have to look into starting a separate glib mainloop. The
>>deadlock still occurs in 2.9.2... (the call trace is still the same as
>>before).
>
>
> Ok; I just went over that again, and since we know the client is
> polling for us we should never poke the wakeup pipe; I also set NONBLOCK
> on the pipe as well ( the missing piece from the last fix ).
>
> So; HEAD should fix that nicely now, if you update.
>
>
>>Ouch. I just tried creating a glib mainloop in my clients, and the
>>results are fairly ugly... Sporadically hangs up waiting for client
>>responses, sometimes generates errors like 'OOB incoming msg header
>>data', and sometimes causes memory corruption... Unfortunately these
>>are rather unpredictable errors. The memory corruption never seems to
>>happed under valgrind, and OOB messages never under gdb. Very
>>difficult to track down exactly what is going wrong here :-(
>
>
> Wow - that's extraordinarily vicious; if you run with --g-fatal-errors
> (there is some way to get them to bomb out), perhaps we can get some
> core dumps that we can use to fix this;
>
> Thanks,
>
> Michael.
>
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]