Re: [evolution-patches] Re: [Evolution-hackers] Thread locking in gtkhtml!?



On Fri, 2005-02-25 at 09:45 +0100, Philip Van Hoof wrote:
On Fri, 2005-02-25 at 11:22 +0800, Not Zed wrote:
There shouldn't be any threading involved at all with these functions.  This code should all run exclusively in the main thread.  That doesn't mean there isn't race-like problems in the code though.
Is a remote ORBit-2 function working on the spell_errors list? Or a function that has been called by ORBit-2? You can create a POA that will, in the background, launch a new thread to handle such a remote procedure call. In which case the developer might not have noticed it, but then it is going to use multiple threads and it will run in parallel.

If I remember it correctly, it happens like this when you initialize the POA with ORBIT_THREAD_HINT_PER_REQUEST.
Yep, it does.  But none of this code can work that way so it would be running against the mainloop.  Not to say something else isn't invalidly using threads I guess.
Its probably "corba re-entrancy" or "glib re-entrancy", i.e. something is calling a corba method or a glib function, which then goes back into the mainloop and ends up invoking a callback/signal, while it is running a loop on the same data somewhere up the stack.  These problems are often more tricky to track down and fix than real race conditions ... and often harder to fix since you can't just use a simple lock.

Okay. My patch might have made it a little but more stable then. But I agree that if thats the case, this patch isn't a real fix for the problem. I think it's more stable for me now, because there's a smaller timeframe in which problems can happen. The unpatched version can introduce problems during the time it's looping the spell_errors. Mine will during that time just collect those items that are to be removed. It can only create problems while it's actually removing the collected items. Which takes less time. Overall will the remove_spell_errors function take longer, but there's a shorter timetime in which it can create problems.

It's not removing the problem, it's just making it happen less frequently. Which isn't going to help debugging this case :-), of course.
Any chance of trying it in valgrind and seeing if that spots anything?  It just looks like its bad list code somewhere, but nothing is obvious in the code.

If you can build glib/gmem.c with DISABLE_MEM_POOLS defined and run evolution against that, you will likely get more information out of valgrind.


This might also fix it. But then it's still using a hackish way for removing an item from the GList.

static GList*
remove_one (GList *list, GList *link)
{
    GList *retval = g_list_remove_link (list, link);
    spell_error_destroy ((SpellError *) link->data);
    return retval;
}
I doubt it would make any difference, spell_error_destroy() is just a g_free() and nothing more.

Which leaves the pointer at link->data undefined, while it's possible that another thread is still working on that pointer at the same time (of course only in case something else is running in parallel).
Although this is true, if it was threaded, the fact you're removing a link is just as serious a problem (i.e. the link pointers become just as invalid as the data pointer).  So the order isn't really that important (the fact g_list_remove_link nulls out the link's pointers may reduce the 'invalidity window', but not to any extent that makes it a valid execution path).




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]