Re: [xml] libxml2 2.9.2 hangs running multi-threaded on Windows

  Thanks, that's now commited in git :-) !


On Tue, Mar 03, 2015 at 09:55:30AM +0000, Steven Nairn wrote:

After recently upgrading to 2.9.2 for use in our (multi-threaded)
program we started to experience occasional hangs on Windows (the
other platforms were fine).

Attaching to the hung process with gdb showed that all the threads
were waiting on the xmlDictMutex xmlRMutex. The count field of the
xmlRMutex was zero, which indicated that the mutex should not have
been locked. However, the cs field showed that the CriticalSection was
locked and was held by a thread that had completed. So, that thread
had locked the mutex but not unlocked it.

Eventually the problem was tracked down to the maintenance of the
count field. The relevant code fragments (with non-Windows stuff
removed) are:
typedef struct _xmlRMutex {
    unsigned int count;
} *xmlRMutexPtr;

void xmlRMutexLock(xmlRMutexPtr tok)

void xmlRMutexUnlock(xmlRMutexPtr tok)
    if (tok->count > 0) {

So, when locking the mutex the count field is incremented inside the
critical section but when unlocking the count field is decremented
outside the critical section. The increment/decrement is not atomic so
if one thread is locking the mutex while another is unlocking it the
count field might not be updated properly. This is what was happening
in our case, leading to a call to xmlRMutexUnlock not calling

The fix is simple. When unlocking the xmlRMutex decrement the count
field before leaving the critical section. That is:
void xmlRMutexUnlock(xmlRMutexPtr tok)
    if (tok->count > 0) {

This problem was introduced in commit id
8854e4631844eac8dbae10cc32904f27d5268af7 for bug 737851. Prior to the
change the Windows CriticalSections were definitely not being left
properly when xmlRMutexes were used recursively. However, at least in
the way we use libxml2, that problem was masked since xmlRMutexes were
not used recursively.

I've added a comment to the bug in bugzilla.

