[xml] libxml2 thread safety
- From: Gary Pennington <Gary Pennington uk sun com>
- To: xml gnome org
- Cc: Daniel Veillard <veillard redhat com>, Stephen Hahn <stephen hahn sun com>
- Subject: [xml] libxml2 thread safety
- Date: Tue, 02 Oct 2001 16:02:17 +0100
Hi,
I've proposed to Daniel that there should be a project to make libxml2
thread safe. I briefly outlined to Daniel a way to do this with as
little disruption as possible and he suggested that I send a fuller
proposal to the list to solicit feedback. So here it is, complete with
some code samples. My experience is primarily on the Solaris platform,
although I have written code for Linux and Win32 platforms (not
recently, though), so any feedback on difficulties likely to be
experienced on other platforms would be particularly appreciated.
Proposal
Re-work libxml2 to be thread-safe and ready for use in a multi-threaded
environment.
Goals
1. To make libxml2 thread safe whilst preserving backwards compatibility
as much as possible.
2. One library, which implements all required threaded and non-threaded
behaviours.
3. A guide to migrating existing code (Hopefully, this will be
"recompile existing multi-threaded apps else do nothing").
Approach
0. Use the same widely implemented threads package on all supported
platforms to minimize porting work. Currently only pthreads is under
consideration.
1. Where possible, redefine global state so that it is accessed via a
macro from multi-threaded applications and is untouched for single
threaded applications.
(The basis for the approach currently under consideration for dealing
with global state is based on the treatment of errno support on many
platforms. )
2. Add extra functions to allow explicit locking of parser state by
application clients to ensure synchronisation is safe.
Generally speaking, adding support for threads to existing libraries is
a complex undertaking and it can be made much easier by adopting a
"token" passing scheme based around some unit of synchronisation. I'm
planning to be able to offer synchronisation at the parser level, i.e.
each individual parse tree is synchronised allowing many parses to
execute in parallel, however if this is not possible then
synchronisation at the level of the library will be implemented. I have
no plans to synchronise at a lower level than a parse tree as this would
be too intrusive on the existing structures.
Example (Sorry it's long, but I'm trying to provide as many details as
possible)
To test out the above approach (in particular the suggestion for dealing
with global state) I have modified libxml2-2.4.5 on my local machine to
provide a thread specific data version of xmlDoValidityCheckingDefaultValue.
libxml2 exports a variable called xmlDoValidityCheckingDefaultValue, it
is an int. Currently it is defined in parserInternals.c as follows:
int xmlDoValidityCheckingDefaultValue = 0;
This can be reworked as follows:
1. Introduce a new header file to contain declarations appropriate to
the migrated globals. Delete the existing definition of the global.
e.g. GlobalFunctions.h
#ifndef __XML_THREAD_H
#define __XML_THREAD_H
#if defined(_REENTRANT) || (_POSIX_C_SOURCE - 0 >= 199506L)
extern int *__xmlDoValidityCheckingDefaultValue();
#define xmlDoValidityCheckingDefaultValue \
(*(__xmlDoValidityCheckingDefaultValue()))
#else
extern int xmlDoValidityCheckingDefaultValue;
#endif
#endif
The above is checking to see if the code is to be compiled
multi-threaded. If so, redefine xmlDoValidityCheckingDefaultValue to be
a macro, otherwise leave the definition alone. The above header should
be included by parser.h to ensure that the definitions are correct for
all libxml2 invocations.
2. Write new functions to support thread specific data (TSD hence) for
globals.
The function __xmlDoValidityCheckingDefaultValue is implemented here.
e.g. GlobalFunctions.c
#include <stdlib.h>
#include <pthread.h>
#include <note.h>
#include "globalFunctions.h"
/* Static Prototypes */
static pthread_mutex_t keylock;
static pthread_key_t globalkey;
static int keyonce = 0;
NOTE(DATA_READABLE_WITHOUT_LOCK(keyonce))
#undef xmlDoValidityCheckingDefaultValue
int xmlDoValidityCheckingDefaultValue = 0;
static void tsd_free(void *v);
int *
__xmlDoValidityCheckingDefaultValue()
{
int *globalval;
if (keyonce == 0) {
(void) pthread_mutex_lock(&keylock);
if (keyonce == 0) {
keyonce++;
(void) pthread_key_create(&globalkey, tsd_free);
}
(void) pthread_mutex_unlock(&keylock);
}
if ((globalval = (int *)pthread_getspecific(globalkey)) == NULL) {
int *tsd = malloc(sizeof (int));
*tsd = 0;
pthread_setspecific(globalkey, tsd);
return (tsd);
} else
return (globalval);
}
static void
tsd_free(void *v)
{
free(v);
}
The code above checks to see if we have already initialised the key used
to hold TSD, if not it creates it (this will only be executed once per
library invocation). If the current thread doesn't have it's own copy of
the required data, allocate one and store it. Finally return the TSD.
tsd_free is a utility function to allow the memory used by threads to be
reclaimed when the thread terminates. The globalkey is currently only
holding an int, however this would be extended into a structure holding
all required global variables as they are identified.
To test the above I have the following makefile and driver programs:
Makefile
.KEEP_STATE:
CFLAGS=-I /home/garyp/LIB_XML_BUILD/libxml2-2.4.5/include -L
/home/garyp/LIB_XML
_BUILD/libxml2-2.4.5/.libs -R /home/garyp/LIB_XML_BUILD/libxml2-2.4.5/.libs
#CFLAGS=-I /usr/local/libxml/include -L /usr/local/libxml/sparc/lib -R
/usr/loca
l/libxml/sparc/lib
LDFLAGS=-lxml2
all: xml_test xml_test_mt
xml_test:
xml_test_mt:= CFLAGS += -mt
xml_test_mt:
clean:
rm xml_test xml_test_mt
You can either compile the code with "out of the box libxml2" or with my
modified version depending on which CFLAGS you set.
xml_test.c
#include <stdlib.h>
#include <libxml/parser.h>
#include <unistd.h>
#ifndef _REENTRANT
extern int xmlDoValidityCheckingDefaultValue;
#endif
int
main(int argc, char **argv)
{
printf("xmlDoValidityCheckingDefaultValue=%d\n",
xmlDoValidityCheckingDefaultValue);
xmlDoValidityCheckingDefaultValue = 5;
printf("xmlDoValidityCheckingDefaultValue=%d\n",
xmlDoValidityCheckingDefaultValue);
return (0);
}
xml_test_mt.c
#include <stdlib.h>
#include <libxml/parser.h>
#include <pthread.h>
#include <unistd.h>
#include <assert.h>
void *thread_specific_data();
#define MAX_ARGC 20
pthread_t tid[MAX_ARGC];
int num_threads;
extern int xmlDoValidityCheckingDefaultValue;
int
main(int argc, char **argv)
{
int i;
num_threads = argc - 1;
for (i = 0; i < num_threads; i++)
pthread_create(&tid[i], 0, thread_specific_data, (void *)i);
for (i = 0; i < num_threads; i++)
pthread_join(tid[i], NULL);
printf("%d: xmlDoValidityCheckingDefaultValue=%d\n", pthread_self(),
xmlDoValidityCheckingDefaultValue);
xmlDoValidityCheckingDefaultValue = 5;
printf("%d: xmlDoValidityCheckingDefaultValue=%d\n", pthread_self(),
xmlDoValidityCheckingDefaultValue);
return (0);
}
void *
thread_specific_data(int private_data)
{
printf("%d: xmlDoValidityCheckingDefaultValue=%d\n", pthread_self(),
xmlDoValidityCheckingDefaultValue);
xmlDoValidityCheckingDefaultValue = private_data;
printf("%d: xmlDoValidityCheckingDefaultValue=%d\n", pthread_self(),
xmlDoValidityCheckingDefaultValue);
sleep(5);
printf("%d: xmlDoValidityCheckingDefaultValue=%d\n", pthread_self(),
xmlDoValidityCheckingDefaultValue);
assert(xmlDoValidityCheckingDefaultValue == private_data);
return (NULL);
}
The two programs are provided to show that the value of
xmlDoValidityCheckingDefaultValue is preserved correctly in both
threaded and non-threaded code. If you compile the threaded code against
the existing unmodified library the assertion fires due to the threads
interfering with each other.
xml_test_mt a b c with the modified library
$ ./xml_test_mt a b c
4: xmlDoValidityCheckingDefaultValue=0
4: xmlDoValidityCheckingDefaultValue=0
5: xmlDoValidityCheckingDefaultValue=0
5: xmlDoValidityCheckingDefaultValue=1
6: xmlDoValidityCheckingDefaultValue=0
6: xmlDoValidityCheckingDefaultValue=2
4: xmlDoValidityCheckingDefaultValue=0
5: xmlDoValidityCheckingDefaultValue=1
6: xmlDoValidityCheckingDefaultValue=2
1: xmlDoValidityCheckingDefaultValue=0
1: xmlDoValidityCheckingDefaultValue=5
$
xml_test_mt a b c with the existing 2.4.5 library
$ ./xml_test_mt a b c
4: xmlDoValidityCheckingDefaultValue=0
4: xmlDoValidityCheckingDefaultValue=0
5: xmlDoValidityCheckingDefaultValue=0
5: xmlDoValidityCheckingDefaultValue=1
6: xmlDoValidityCheckingDefaultValue=1
6: xmlDoValidityCheckingDefaultValue=2
4: xmlDoValidityCheckingDefaultValue=2
5: xmlDoValidityCheckingDefaultValue=2
Assertion failed: xmlDoValidityCheckingDefaultValue == private_data,
file xml_test_mt.c, line 44
Assertion failed: xmlDoValidityCheckingDefaultValue == private_data,
file xml_test_mt.c, line 44
6: xmlDoValidityCheckingDefaultValue=2
Abort(coredump)
$
./xml_test works with either library as you would expect
$ ./xml_test
xmlDoValidityCheckingDefaultValue=0
xmlDoValidityCheckingDefaultValue=5
$
Okay, that's enough to be starting with I think. I look forward to
hearing your comments on the above.
Gary
--
Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]