Re: [evolution-patches] Re: [Evolution-hackers] Thread locking in gtkhtml!?



On Fri, 2005-02-25 at 11:22 +0800, Not Zed wrote:
There shouldn't be any threading involved at all with these functions.  This code should all run exclusively in the main thread.  That doesn't mean there isn't race-like problems in the code though.
Is a remote ORBit-2 function working on the spell_errors list? Or a function that has been called by ORBit-2? You can create a POA that will, in the background, launch a new thread to handle such a remote procedure call. In which case the developer might not have noticed it, but then it is going to use multiple threads and it will run in parallel.

If I remember it correctly, it happens like this when you initialize the POA with ORBIT_THREAD_HINT_PER_REQUEST.

Its probably "corba re-entrancy" or "glib re-entrancy", i.e. something is calling a corba method or a glib function, which then goes back into the mainloop and ends up invoking a callback/signal, while it is running a loop on the same data somewhere up the stack.  These problems are often more tricky to track down and fix than real race conditions ... and often harder to fix since you can't just use a simple lock.

Okay. My patch might have made it a little but more stable then. But I agree that if thats the case, this patch isn't a real fix for the problem. I think it's more stable for me now, because there's a smaller timeframe in which problems can happen. The unpatched version can introduce problems during the time it's looping the spell_errors. Mine will during that time just collect those items that are to be removed. It can only create problems while it's actually removing the collected items. Which takes less time. Overall will the remove_spell_errors function take longer, but there's a shorter timetime in which it can create problems.

It's not removing the problem, it's just making it happen less frequently. Which isn't going to help debugging this case :-), of course.

Well i just had a look at the code,and maybe it isn't, none of the loops do anything but purely memory operations inside of them.
A possible reason why the previous method had problems might be because of this:

static GList*
remove_one (GList *list, GList *link)
{
(a) spell_error_destroy ((SpellError *) link->data);
        -- race conditions can happen here --
(b) return g_list_remove_link (list, link);
}

As you can see it's first destroying and they removing the link from the GList. It's possible my kernel is going to interrupt my process between (a) and (b) and that the other process is going to try playing with the pointer link->data during the time it was given to play on my CPU.

Yeah but it can't be a thread issue :)  (or it better not be, if it is, there are bigger problems than this happening).


This might also fix it. But then it's still using a hackish way for removing an item from the GList.

static GList*
remove_one (GList *list, GList *link)
{
    GList *retval = g_list_remove_link (list, link);
    spell_error_destroy ((SpellError *) link->data);
    return retval;
}
I doubt it would make any difference, spell_error_destroy() is just a g_free() and nothing more.

Which leaves the pointer at link->data undefined, while it's possible that another thread is still working on that pointer at the same time (of course only in case something else is running in parallel).

If that other routine, being run in parallel, is doing something like this:

while (spell_errors)
{
    SPellError *e = spell_errors->data;

    /* Do stuff with e */

    spell_errors = g_list_next (spell_errors);
}

It can't be sure whether or not e has been g_free't by the other thread during one such loop or not. Each time it would touch e, it would have to check for it (and then still, an "if" can also get scheduled). Therefor, just like you said, it's necessary to introduce locking and lock the other thread each atomic block of operations it's going to perform on items in the spell_errors list.

remove_one() should also free the link I think, since it looks like it is just leaked currently.

Indeed. I've noticed it too. But rather than fixing remove_one, I decided to rewrite it's calling-function ;-). There's only one function calling remove_one, thats the remove_spell_errors.

BTW none of these fixes would be any help if it was threaded - it would definitely need a lock.

Indeed.

The remove_spell_errors() looks like a normal iterative-with-modify loop, I can't imagine it is a problem as such, assuming nothing silly is going on with the list (e.g. if the list contains freed nodes or invalid back-pointers).  Valgrind could check this - only if you changed glib not to use memchunks for the list nodes though.



-- 
Philip Van Hoof, Software Developer @ Cronos
home: me at freax dot org
gnome: pvanhoof at gnome dot org
work: philip dot vanhoof at cronos dot be
junk: philip dot vanhoof at gmail dot com
http://www.freax.be, http://www.freax.eu.org


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]