Forks, slices and threads: Can you make GSlice deadlock?



Hello,

can g_slice_alloc() be made to deadlock simply by some bad sequence of
GLib function calls, considering the calling program does not, of
course, hold any GLib lock explicitly?  (Without a GLib bug, that is.)

I am starting to suspect a problem in GSlice interaction with threads.
But I cannot report anything to bugzilla because I am unable to get to
the core of the problem.

My program (test program for a library) does g_test_trap_fork() and the
child creates worker threads with g_thread_new(), sends them tasks with
GAsyncQueue and cancels the tasks using GCancellables.

Occasionally, a seemingly innocent g_thread_new() call deadlocks in
g_slice_alloc(), see the backtrace below for how and where exactly.  If
it happens, it happens in the child soon after forking.

I canNOT reproduce any deadlock if:
- G_SLICE=always-malloc is set,
- g_test_trap_fork() is not used and the test is run directly in the
  main program,
- under valgrind (also, it reports no errors),
- I print anything to stderr in g_slice_alloc() – infuriating, but so it
  works.


When it deadlocks the main thread looks:
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x0000003451009c8c in _L_lock_1024 () from /lib64/libpthread.so.0
#2  0x0000003451009c35 in __pthread_mutex_lock (mutex=0x25442e0) at pthread_mutex_lock.c:105
#3  0x00007fe496b01fb6 in g_mutex_lock (mutex=0x7fe496d9f8e0) at gthread-posix.c:208
#4  0x00007fe496adef29 in magazine_cache_pop_magazine (ix=4, countp=0x2544368) at gslice.c:718
#5  0x00007fe496adf235 in thread_memory_magazine1_reload (tmem=0x2544310, ix=4) at gslice.c:794
#6  0x00007fe496adf4df in g_slice_alloc (mem_size=72) at gslice.c:992
#7  0x00007fe496adf572 in g_slice_alloc0 (mem_size=72) at gslice.c:1032
#8  0x00007fe496b02981 in g_system_thread_new (thread_func=0x7fe496ae8fc9 <g_thread_proxy>, stack_size=0, error=0x7fffb1ded5a0) at gthread-posix.c:1101
#9  0x00007fe496ae9207 in g_thread_new_internal (name=0x491da9 "canceller", proxy=0x7fe496ae8fc9 <g_thread_proxy>, func=0x461790 <cancel_cancel>, data=0x25822f0, stack_size=0, 
    error=0x7fffb1ded5a0) at gthread.c:884
#10 0x00007fe496ae90d5 in g_thread_new (name=0x491da9 "canceller", func=0x461790 <cancel_cancel>, data=0x25822f0) at gthread.c:835
#11 0x0000000000461667 in master_cancel_one (nproc=1) at master.c:232
#12 master_cancel_one (nproc=1) at master.c:208
#13 0x00007fe496ae792d in test_case_run (tc=0x2555c90) at gtestutils.c:1679
#14 g_test_run_suite_internal (suite=suite@entry=0x2557c80, path=<optimized out>, path@entry=0x7fe496b5c2be "") at gtestutils.c:1732
#15 0x00007fe496ae7aa6 in g_test_run_suite_internal (suite=suite@entry=0x2557c60, path=<optimized out>, path@entry=0x7fe496b5c2be "") at gtestutils.c:1743
#16 0x00007fe496ae7aa6 in g_test_run_suite_internal (suite=suite@entry=0x2557c40, path=<optimized out>, path@entry=0x7fffb1deef4d "/master") at gtestutils.c:1743
#17 0x00007fe496ae7aa6 in g_test_run_suite_internal (suite=suite@entry=0x2545c20, path=<optimized out>, path@entry=0x7fffb1deef43 "testlibgwy/master") at gtestutils.c:1743
#18 0x00007fe496ae7e0b in g_test_run_suite (suite=0x2545c20) at gtestutils.c:1788
#19 0x00007fe496ae7e55 in g_test_run () at gtestutils.c:1308
#20 0x0000000000412e11 in main (argc=1, argv=0x7fffb1dedae8) at testlibgwy.c:88


There is also a worker thread waiting on my own GConf with my own lock
at the moment:
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:166
#1  0x00007fe496b02575 in g_cond_wait (cond=0x258b5a8, mutex=0x258b5a0) at gthread-posix.c:746
#2  0x00007fe496a9b09b in g_async_queue_pop_intern_unlocked (queue=queue@entry=0x258b5a0, wait=wait@entry=1, end_time=end_time@entry=-1) at gasyncqueue.c:421
#3  0x00007fe496a9b546 in g_async_queue_pop (queue=queue@entry=0x258b5a0) at gasyncqueue.c:455
#4  0x00007fe4973b60e3 in worker_thread_main (thread_data=<optimized out>) at master.c:190
#5  0x00007fe496ae9082 in g_thread_proxy (data=0x258e850) at gthread.c:797
#6  0x0000003451007d14 in start_thread (arg=0x7fe49534c700) at pthread_create.c:309
#7  0x00000034508f168d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115


and several threads are in the following state:
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x0000003451009c8c in _L_lock_1024 () from /lib64/libpthread.so.0
#2  0x0000003451009c35 in __pthread_mutex_lock (mutex=0x25442e0) at pthread_mutex_lock.c:105
#3  0x00007fe496b01fb6 in g_mutex_lock (mutex=0x7fe496d9f8e0) at gthread-posix.c:208
#4  0x00007fe496adf163 in private_thread_memory_cleanup (data=0x7fe4740008c0) at gslice.c:774
#5  0x0000003451007b12 in __nptl_deallocate_tsd () at pthread_create.c:157
#6  0x0000003451007d22 in start_thread (arg=0x7fe495b4d700) at pthread_create.c:316
#7  0x00000034508f168d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115


That's all the threads.

So, they are all trying to lock allocator->slab_mutex in GSlice.
But nothing seems to hold it.
Could that be messed up by forking somehow?

Could you please advise how to debug it further, to rule out the
possibility of GLib bug, if nothing else?

Thank you,

Yeti



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]