Re: compiler optimization causing issues with glib



On Fri, Jun 1, 2012 at 11:17 AM, Mike <puffy taco gmail com> wrote:
>>>>>> I am using glib 2.26.0
>>>>> [...]
>>>>>>
>>>>>> The application I'm having issues with uses gdbus
>>>>> [...]
>>>>>> - Thread stuck in futex wait inside kernel
>>>>
>>>> When I've attached GDB (or done SysRq), it's the gdbus thread that is stuck.
>>>>
>>>>> If your application is calling GDBus APIs from more than one thread
>>>>> (it's unclear from your message whether it is or not), you should
>>>>> certainly upgrade. GDBus earlier than 2.32 has known thread-safety bugs,
>>>>> notably <https://bugzilla.gnome.org/show_bug.cgi?id=665211>.
>>>>
>>>> I use g_dbus_connection_emit_signal in the callback from
>>>> g_child_watch_add.  Does that occur in the main loop?  Does the gdbus
>>>> thread ever call back in to my code, or does it also go through the
>>>> main loop?
>>>>
>>>>> If you can't upgrade all the way to 2.32 right now for whatever reason,
>>>>> 2.28 or 2.30, or perhaps even a later 2.26.x version, would still be an
>>>>> improvement: for instance,
>>>>> <https://bugzilla.gnome.org/show_bug.cgi?id=651268> and
>>>>> <https://bugzilla.gnome.org/show_bug.cgi?id=662100> were both fixed
>>>>> since 2.28.0.
>>>>
>>>> I did attempt 2.28.0, same issue.  I'll just have to upgrade my host
>>>> tools and build 2.32.  Still of course unclear whether I'm looking at
>>>> a kernel or userspace issue.  The part that bothers me the most is
>>>> that simply sending a SIGCHLD to the gdbus thread causes it to wake up
>>>> and function.
>>>
>>> So I upgraded to 2.32.3.  I seem to have the original problem and more
>>> unexplainable errors (there's a bit of time between each message, a
>>> second or so):
>>>
>>> (process:494): GLib-GObject-CRITICAL **: g_object_unref: assertion
>>> `G_IS_OBJECT (object)' failed
>>>
>>> (process:494): GLib-GObject-CRITICAL **: g_object_unref: assertion
>>> `G_IS_OBJECT (object)' failed
>>>
>>> (process:494): GLib-GObject-CRITICAL **: g_object_unref: assertion
>>> `G_IS_OBJECT (object)' failed
>>>
>>> (process:494): GLib-GObject-CRITICAL **: g_object_unref: assertion
>>> `G_IS_OBJECT (object)' failed
>>>
>>> (process:494): GLib-GIO-CRITICAL **: g_dbus_message_new_method_reply:
>>> assertion `G_IS_DBUS_MESSAGE (method_call_message)' failed
>>>
>>> (process:494): GLib-GIO-CRITICAL **: g_dbus_message_set_body:
>>> assertion `G_IS_DBUS_MESSAGE (message)' failed
>>>
>>> (process:494): GLib-GIO-CRITICAL **: g_dbus_connection_send_message:
>>> assertion `G_IS_DBUS_MESSAGE (message)' failed
>>>
>>> At the end is when it segfaulted.  GDB claimed is crashed here
>>>
>>> #0  g_dbus_method_invocation_return_value_internal
>>> (invocation=0x28a00, parameters=0xbeb248d4, fd_list=0x0) at
>>> gdbusmethodinvocation.c:357
>>> #1  0x00008fc4 in handle_method_call (connection=0x1d800,
>>> sender=0x2a238 ":1.11", object_path=0x2a8e0 "/",
>>> interface_name=0x28538 "org.example.Interface", method_name=0x2a770
>>> "Execute",
>>>    parameters=0x18878, invocation=0x28a00, user_data=0x0) at main.c:166
>>> #2  0x400fa470 in validate_and_maybe_schedule_method_call
>>> (connection=0x2a238, message=0x28a00, registration_id=100472,
>>> subtree_registration_id=<value optimized out>, interface_info=0x2a6f0,
>>>    vtable=0x402e3330, main_context=0x8, user_data=0x40210de4) at
>>> gdbusconnection.c:4733
>>> #3  0x402127ec in g_main_dispatch (context=0x0) at gmain.c:2539
>>> #4  g_main_context_dispatch (context=0x0) at gmain.c:3075
>>> #5  0x40215068 in g_main_context_iterate (context=0x19b70, block=1,
>>> dispatch=1, self=<value optimized out>) at gmain.c:3146
>>> #6  0x402152d0 in g_main_loop_run (loop=0x1d2e8) at gmain.c:3340
>>> #7  0x00009154 in main () at main.c:234
>>>
>>> The parameters variable is pointing at the stack -- no idea why as I
>>> recorded the value sent to that function and it definitely is not that
>>> one.  I'm recompiling with "-O0" to see if I can trap it with better
>>> debug.
>>>
>>> I'm setting up a laptop with a newer glib version and will run the
>>> same test there (of course, that is x86 not the target arch which is
>>> arm), just to make sure.  I may end up recompiling everything just to
>>> make sure I don't have someone odd laying around.
>>
>> I attached a very simple example program that is causing me issues.
>> The issue seems to somehow relate to calling fork.  If I don't call
>> fork, the program seems to run forever (where forever means I've run
>> for a few hours).  If I call fork, eventually (on the order of a
>> couple minutes) I will either segv or hang on a futex on the gdbus
>> thread.  I've reduced it down to this from calling g_spawn_async just
>> to keep it simple.  This is running on an arm9, linux 2.6.33.20 and
>> glib 2.32.3.
>>
>> Now, I'm running the same application on an x86 machine with linux 3.2
>> and some version of glib 2.32.  I have experienced no problems there
>> (couple days runtime).
>>
>>
>> This is my test script:
>>
>> #!/bin/sh
>>
>> while [ true ]; do
>>        dbus-send --session --print-reply --dest=com.example.UpgradeServer
>> --type=method_call / com.example.UpgradeInterface.ExecuteCommand
>> done
>>
>>
>> Any ideas on this?  Or does this seem more like a compiler or kernel issue?
>
> I should also add that I added code as a test that verified that in
> the method call, calling G_IS_DBUS_MESSAGE on invocation->message was
> still true.  This is the first GLib-GIO-CRITICAL error I generally
> get.  It was always true first thing in the method call, but if I
> check it after forking, it will be false on the iteration that it
> crashed.  The only thing in between is a call (in the original app) to
> g_spawn_async.  So, somehow it gets trashed.

Finally got my libc symbols figured out.  Here's a backtrace of the
gdbus thread when it's blocked on a futex:

#0  0x40518ed4 in __lll_lock_wait_private () from /lib/libc.so.6
#1  0x404a83dc in malloc_atfork () from /lib/libc.so.6
#2  0x404a7550 in calloc () from /lib/libc.so.6
#3  0x402d6404 in g_malloc0 (n_bytes=32) at gmem.c:189
#4  0x402d6854 in g_malloc0_n (n_blocks=1, n_block_bytes=32) at gmem.c:385
#5  0x4016b598 in _g_socket_read_with_control_messages_ready
(socket=0x4016b598, condition=1096440764, user_data=0x401ef8b0) at
gdbusprivate.c:140
#6  0x4016cb04 in _g_dbus_worker_do_read_unlocked (worker=0x24800) at
gdbusprivate.c:844
#7  0x4016c8c4 in _g_dbus_worker_do_read_cb (input_stream=0x1e998,
res=0x27ba0, user_data=0x24800) at gdbusprivate.c:761
#8  0x400cea88 in g_simple_async_result_complete (simple=0x415a5ca4)
at gsimpleasyncresult.c:746
#9  0x400cead4 in g_simple_async_result_complete (simple=0x1) at
gsimpleasyncresult.c:752
#10 0x402d0154 in g_idle_dispatch (source=0x257d0, callback=0x400ceab4
<g_simple_async_result_complete+52>, user_data=0x27ba0) at
gmain.c:4657
#11 0x402ccbcc in g_main_dispatch (context=0x24bd8) at gmain.c:2539
#12 0x402cdbb4 in g_main_context_dispatch (context=0x24bd8) at gmain.c:3075
#13 0x402cddf4 in g_main_context_iterate (context=0x24bd8, block=1,
dispatch=1, self=0x201b0) at gmain.c:3146
#14 0x402ce3c8 in g_main_loop_run (loop=0x24bc8) at gmain.c:3340
#15 0x4016ba30 in ensure_type (gtype=154960) at gdbusprivate.c:242
#16 0x403055b4 in g_thread_proxy (data=0x201b0) at gthread.c:801
#17 0x40417a90 in start_thread () from /lib/libpthread.so.0
#18 0x4050afb8 in clone () from /lib/libc.so.6

And this is the main thread:

#0  0x404fc550 in poll () from /lib/libc.so.6
#1  0x402e2530 in g_poll (fds=0x1cd40, nfds=1, timeout=-1) at gpoll.c:132
#2  0x402ce748 in g_main_context_poll (context=0x194d8, timeout=-1,
priority=2147483647, fds=0x1cd40, n_fds=1) at gmain.c:3440
#3  0x402cddc0 in g_main_context_iterate (context=0x194d8, block=1,
dispatch=1, self=0x1b020) at gmain.c:3141
#4  0x402ce3c8 in g_main_loop_run (loop=0x1cd20) at gmain.c:3340
#5  0x00008b94 in main () at main.c:114

These are the only two threads in the application, and this is the app
that I attached earlier.  I'm getting the feeling that somehow
g_spawn_async/fork is not thread safe with the gdbus thread.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]