[xml] libxml2 thread safety



Hi,

I've proposed to Daniel that there should be a project to make libxml2 thread safe. I briefly outlined to Daniel a way to do this with as little disruption as possible and he suggested that I send a fuller proposal to the list to solicit feedback. So here it is, complete with some code samples. My experience is primarily on the Solaris platform, although I have written code for Linux and Win32 platforms (not recently, though), so any feedback on difficulties likely to be experienced on other platforms would be particularly appreciated.

Proposal

Re-work libxml2 to be thread-safe and ready for use in a multi-threaded environment.

Goals

1. To make libxml2 thread safe whilst preserving backwards compatibility as much as possible. 2. One library, which implements all required threaded and non-threaded behaviours. 3. A guide to migrating existing code (Hopefully, this will be "recompile existing multi-threaded apps else do nothing").


Approach

0. Use the same widely implemented threads package on all supported platforms to minimize porting work. Currently only pthreads is under consideration. 1. Where possible, redefine global state so that it is accessed via a macro from multi-threaded applications and is untouched for single threaded applications. (The basis for the approach currently under consideration for dealing with global state is based on the treatment of errno support on many platforms. )

2. Add extra functions to allow explicit locking of parser state by application clients to ensure synchronisation is safe. Generally speaking, adding support for threads to existing libraries is a complex undertaking and it can be made much easier by adopting a "token" passing scheme based around some unit of synchronisation. I'm planning to be able to offer synchronisation at the parser level, i.e. each individual parse tree is synchronised allowing many parses to execute in parallel, however if this is not possible then synchronisation at the level of the library will be implemented. I have no plans to synchronise at a lower level than a parse tree as this would be too intrusive on the existing structures.

Example (Sorry it's long, but I'm trying to provide as many details as possible) To test out the above approach (in particular the suggestion for dealing with global state) I have modified libxml2-2.4.5 on my local machine to provide a thread specific data version of xmlDoValidityCheckingDefaultValue.

libxml2 exports a variable called xmlDoValidityCheckingDefaultValue, it is an int. Currently it is defined in parserInternals.c as follows:

int xmlDoValidityCheckingDefaultValue = 0;

This can be reworked as follows:

1. Introduce a new header file to contain declarations appropriate to the migrated globals. Delete the existing definition of the global.
e.g. GlobalFunctions.h

#ifndef __XML_THREAD_H
#define    __XML_THREAD_H
#if defined(_REENTRANT) || (_POSIX_C_SOURCE - 0 >= 199506L)
extern int *__xmlDoValidityCheckingDefaultValue();
#define    xmlDoValidityCheckingDefaultValue \
(*(__xmlDoValidityCheckingDefaultValue()))
#else
extern int xmlDoValidityCheckingDefaultValue;
#endif
#endif

The above is checking to see if the code is to be compiled multi-threaded. If so, redefine xmlDoValidityCheckingDefaultValue to be a macro, otherwise leave the definition alone. The above header should be included by parser.h to ensure that the definitions are correct for all libxml2 invocations.

2. Write new functions to support thread specific data (TSD hence) for globals.
The function __xmlDoValidityCheckingDefaultValue is implemented here.
e.g. GlobalFunctions.c
#include <stdlib.h>
#include <pthread.h>
#include <note.h>
#include "globalFunctions.h"
/* Static Prototypes */
static pthread_mutex_t    keylock;
static pthread_key_t    globalkey;
static int        keyonce = 0;
NOTE(DATA_READABLE_WITHOUT_LOCK(keyonce))
#undef xmlDoValidityCheckingDefaultValue

int xmlDoValidityCheckingDefaultValue = 0;
static void tsd_free(void *v);

int *
__xmlDoValidityCheckingDefaultValue()
{
   int *globalval;
   if (keyonce == 0) {
       (void) pthread_mutex_lock(&keylock);
       if (keyonce == 0) {
           keyonce++;
           (void) pthread_key_create(&globalkey, tsd_free);
       }
       (void) pthread_mutex_unlock(&keylock);
   }
   if ((globalval = (int *)pthread_getspecific(globalkey)) == NULL) {
       int *tsd = malloc(sizeof (int));
       *tsd = 0;
       pthread_setspecific(globalkey, tsd);
       return (tsd);
   } else
       return (globalval);
}

static void
tsd_free(void *v)
{
   free(v);
}

The code above checks to see if we have already initialised the key used to hold TSD, if not it creates it (this will only be executed once per library invocation). If the current thread doesn't have it's own copy of the required data, allocate one and store it. Finally return the TSD. tsd_free is a utility function to allow the memory used by threads to be reclaimed when the thread terminates. The globalkey is currently only holding an int, however this would be extended into a structure holding all required global variables as they are identified.

To test the above I have the following makefile and driver programs:
Makefile
.KEEP_STATE:

CFLAGS=-I /home/garyp/LIB_XML_BUILD/libxml2-2.4.5/include -L /home/garyp/LIB_XML
_BUILD/libxml2-2.4.5/.libs -R /home/garyp/LIB_XML_BUILD/libxml2-2.4.5/.libs
#CFLAGS=-I /usr/local/libxml/include -L /usr/local/libxml/sparc/lib -R /usr/loca
l/libxml/sparc/lib

LDFLAGS=-lxml2

all: xml_test xml_test_mt

xml_test:

xml_test_mt:= CFLAGS += -mt

xml_test_mt:

clean:
       rm xml_test xml_test_mt

You can either compile the code with "out of the box libxml2" or with my modified version depending on which CFLAGS you set.
xml_test.c
#include <stdlib.h>
#include <libxml/parser.h>
#include <unistd.h>

#ifndef _REENTRANT
extern int xmlDoValidityCheckingDefaultValue;
#endif

int
main(int argc, char **argv)
{
       printf("xmlDoValidityCheckingDefaultValue=%d\n",
           xmlDoValidityCheckingDefaultValue);
       xmlDoValidityCheckingDefaultValue = 5;
       printf("xmlDoValidityCheckingDefaultValue=%d\n",
           xmlDoValidityCheckingDefaultValue);
       return (0);
}


xml_test_mt.c
#include <stdlib.h>
#include <libxml/parser.h>
#include <pthread.h>
#include <unistd.h>
#include <assert.h>

void *thread_specific_data();
#define MAX_ARGC        20
pthread_t tid[MAX_ARGC];
int num_threads;

extern int xmlDoValidityCheckingDefaultValue;

int
main(int argc, char **argv)
{
       int i;
       num_threads = argc - 1;

       for (i = 0; i < num_threads; i++)
               pthread_create(&tid[i], 0, thread_specific_data, (void *)i);
       for (i = 0; i < num_threads; i++)
               pthread_join(tid[i], NULL);

       printf("%d: xmlDoValidityCheckingDefaultValue=%d\n", pthread_self(),
           xmlDoValidityCheckingDefaultValue);
       xmlDoValidityCheckingDefaultValue = 5;
       printf("%d: xmlDoValidityCheckingDefaultValue=%d\n", pthread_self(),
           xmlDoValidityCheckingDefaultValue);
       return (0);
}

void *
thread_specific_data(int private_data)
{
       printf("%d: xmlDoValidityCheckingDefaultValue=%d\n", pthread_self(),
           xmlDoValidityCheckingDefaultValue);
       xmlDoValidityCheckingDefaultValue = private_data;
       printf("%d: xmlDoValidityCheckingDefaultValue=%d\n", pthread_self(),
           xmlDoValidityCheckingDefaultValue);
       sleep(5);
       printf("%d: xmlDoValidityCheckingDefaultValue=%d\n", pthread_self(),
           xmlDoValidityCheckingDefaultValue);
       assert(xmlDoValidityCheckingDefaultValue == private_data);
       return (NULL);
}

The two programs are provided to show that the value of xmlDoValidityCheckingDefaultValue is preserved correctly in both threaded and non-threaded code. If you compile the threaded code against the existing unmodified library the assertion fires due to the threads interfering with each other.

xml_test_mt a b c with the modified library
$ ./xml_test_mt a b c
4: xmlDoValidityCheckingDefaultValue=0
4: xmlDoValidityCheckingDefaultValue=0
5: xmlDoValidityCheckingDefaultValue=0
5: xmlDoValidityCheckingDefaultValue=1
6: xmlDoValidityCheckingDefaultValue=0
6: xmlDoValidityCheckingDefaultValue=2
4: xmlDoValidityCheckingDefaultValue=0
5: xmlDoValidityCheckingDefaultValue=1
6: xmlDoValidityCheckingDefaultValue=2
1: xmlDoValidityCheckingDefaultValue=0
1: xmlDoValidityCheckingDefaultValue=5
$
xml_test_mt a b c with the existing 2.4.5 library
$ ./xml_test_mt a b c
4: xmlDoValidityCheckingDefaultValue=0
4: xmlDoValidityCheckingDefaultValue=0
5: xmlDoValidityCheckingDefaultValue=0
5: xmlDoValidityCheckingDefaultValue=1
6: xmlDoValidityCheckingDefaultValue=1
6: xmlDoValidityCheckingDefaultValue=2
4: xmlDoValidityCheckingDefaultValue=2
5: xmlDoValidityCheckingDefaultValue=2
Assertion failed: xmlDoValidityCheckingDefaultValue == private_data, file xml_test_mt.c, line 44 Assertion failed: xmlDoValidityCheckingDefaultValue == private_data, file xml_test_mt.c, line 44
6: xmlDoValidityCheckingDefaultValue=2
Abort(coredump)
$
./xml_test works with either library as you would expect
$ ./xml_test
xmlDoValidityCheckingDefaultValue=0
xmlDoValidityCheckingDefaultValue=5
$ Okay, that's enough to be starting with I think. I look forward to hearing your comments on the above.

Gary

--
Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]