[evolution-patches] Re: [Evolution-hackers] Thread locking in gtkhtml!?




Hi Philip,

On Thu, 2005-02-24 at 22:18 +0100, Philip Van Hoof wrote:

Hi there,

While trying to figure out why at some race condition the Evolution composer was crashing (well, actually it was hanging in a loop), I decided to go fishing in the code of gtkhtml.

I've found that there's a doubly-linked list being passed to some functions (but you can assume it's a global variable since it appears to be always the same pointer being passed): spell_errors. It contains SpellError objects.

The original author of remove_spell_errors used a clever hack for removing what I assume where "old" spelling errors. So spellings errors that now have been corrected (Am I right)?

However. That clever hack might have worked if there wasn't any threading involved. The hack removed an element from the list while already keeping a pointer to the next item in the list. So it's removing the current list-element, and by prefetching keeping a useful pointer to the next element. The next loop would work on the "next" pointer. While it has removed the current item from the global spell_errors list.
There shouldn't be any threading involved at all with these functions.  This code should all run exclusively in the main thread.  That doesn't mean there isn't race-like problems in the code though.
Sounds like always correct. However. I've experienced moments when Evolution got into an endless loop at the remove_spell_errors routine: http://bugzilla.ximian.com/show_bug.cgi?id=72988

So I'm assuming that there's other functions who are manipulating the spell_errors list. And whats worse is that they are probably doing so in a different thread. Which in turn leads to race conditions in one of both parallel running processes. I don't know which thread or whatever. It's just a thought of mine.

So I decided to redo this remove_spell_errors function in a less clever but also less hackish way. Easily by keeping a new doubly-linked list which I called toremove and creating a copy of the global list spell_errors and utilising that copy during the first loop, rather than working on a pointer to the global list.

This E-mail and it's old spelling errors -- hehe -- are the first test of this function. So far it hasn't crashed on me once. Which is, for me at least, a whole new Evolution experience since a few weeks.

Notice: If it's known which threads are working on this spell_errors list, some mutex-locking would be necessary here. You can't just remove an item from a list while another thread is continuing relying on it's contents. and GList hacks aren't to way to circumvent it. Why not just lock it and unlock when it's done? The way it's after this patch, shouldn't have any problems even without locking mechanisms. Since it's destroying the SpellError after removing it from that global spell_errors list.
Its probably "corba re-entrancy" or "glib re-entrancy", i.e. something is calling a corba method or a glib function, which then goes back into the mainloop and ends up invoking a callback/signal, while it is running a loop on the same data somewhere up the stack.  These problems are often more tricky to track down and fix than real race conditions ... and often harder to fix since you can't just use a simple lock.

Well i just had a look at the code,and maybe it isn't, none of the loops do anything but purely memory operations inside of them.
A possible reason why the previous method had problems might be because of this:

static GList*
remove_one (GList *list, GList *link)
{
(a) spell_error_destroy ((SpellError *) link->data);
        -- race conditions can happen here --
(b) return g_list_remove_link (list, link);
}

As you can see it's first destroying and they removing the link from the GList. It's possible my kernel is going to interrupt my process between (a) and (b) and that the other process is going to try playing with the pointer link->data during the time it was given to play on my CPU.
Yeah but it can't be a thread issue :)  (or it better not be, if it is, there are bigger problems than this happening).

This might also fix it. But then it's still using a hackish way for removing an item from the GList.

static GList*
remove_one (GList *list, GList *link)
{
    GList *retval = g_list_remove_link (list, link);
    spell_error_destroy ((SpellError *) link->data);
    return retval;
}
I doubt it would make any difference, spell_error_destroy() is just a g_free() and nothing more.

remove_one() should also free the link I think, since it looks like it is just leaked currently.

BTW none of these fixes would be any help if it was threaded - it would definitely need a lock.

The remove_spell_errors() looks like a normal iterative-with-modify loop, I can't imagine it is a problem as such, assuming nothing silly is going on with the list (e.g. if the list contains freed nodes or invalid back-pointers).  Valgrind could check this - only if you changed glib not to use memchunks for the list nodes though.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]