Re: [xml] libxml2 thread safety (SUMMARY)



Daniel Veillard wrote:

On Thu, Oct 04, 2001 at 10:13:38AM +0100, Gary Pennington wrote:

Here's the appropriate snippet (relating to a single variable xmlDoValidityCheckingDefaultValue for puposes of illustration, there would be one of these for each global that was modified) from the header file to show how it works:

#if defined(_REENTRANT) || (_POSIX_C_SOURCE - 0 >= 199506L)


What I dislike in this solution is that we end-up with 2 different libraries
one when one compiles with _REENTRANT and one without this. From a packaging
and maintainance point of view this is no fun at all.

Agreed, but that's the cost of backwards compatibility. I'll try to keep the intrusion down to a minimum and re-locate the global variables into a separate file where they can be easily identified.



extern int *__xmlDoValidityCheckingDefaultValue();
#define xmlDoValidityCheckingDefaultValue \
(*(__xmlDoValidityCheckingDefaultValue()))
#else
extern int xmlDoValidityCheckingDefaultValue;
#endif

Any new code compiled would either be threaded or non-threaded. It would pick up the macro definition if threaded, the standard extern symbol if not. Existing code would still use the standard extern symbol and would thus behave as non-threaded code.

The implementation of __xmlDoValidityCheckingDefaultValue() shows how the value of the TSD is manipulated

#undef  xmlDoValidityCheckingDefaultValue

int xmlDoValidityCheckingDefaultValue = 0;

int *
__xmlDoValidityCheckingDefaultValue()
{
       struct glob_struct *globalval;

       if (keyonce == 0) {
               (void) pthread_mutex_lock(&keylock);
               if (keyonce == 0) {
                       keyonce++;
                       (void) pthread_key_create(&globalkey, tsd_free);
               }
               (void) pthread_mutex_unlock(&keylock);
       }
if ((globalval = (struct glob_struct *)pthread_getspecific(globalkey))
           == NULL) {
               struct glob_struct *tsd = alloc_glob_struct();
               pthread_setspecific(globalkey, tsd);
               return (&tsd->xmlDoValidityCheckingDefaultValue);
       } else
               return (&globalval->xmlDoValidityCheckingDefaultValue);
}


 Okay, I would not call this beautiful but it is acceptable if contained
to a limited area. There is also 2 places where I would need to take a lock
in other places in the code. I suggest the following:
   - sepearate the global state definition in a state.h header
   - move all this code and definitions of the global variables in
     a state.c module
   - export the locking as a libxml function and data to help portability,
     this is not a full encapsulation of a thread system, just the
     create/lock/unlock/destroy of a lock since it will be used in other
     places in libxml (and possibly libxslt too).

I would propose adding two functions to support locking/unlocking of the parser context. I don't think that I will export the lock create/destroy since that is internal to the library and should have nothing to do with any consumers. I'll think more about the utility of internal functions/macros to do this.



I really think that the above is only a temporary measure until a "proper" re-working of libxml (i.e. libxml3) can be performed. It's a solution that promotes backwards-compatibility, however it's not as clean as doing the reworking. Some of the work required to remove global state would be performed which would help in the migration to libxml3 and that would mean phased implementation of the threading support, which is either a good or a bad thing depending on your point of view.

 This is fine by me at least.

We could decide that this is not worth doing and that we should just wait for libxml3 and do a non-backwards compatible reworking. I have some (limited) time to work on making this happen, but I'd be happy to defer and wait to do a proper reworking for libxml2 if I knew what the time frame for that project was.


 Go ahead. I will integrate it, I probably won't make it the default
(i.e. it will require a configure flag) but once the capacity will be there
we will get more feedback.

Ok.



Anyway, I have a series of test programs using the above partial implementation which work fine single threaded - I still need to write a more comprehensive mt-driver (I've run through the testall target) and the main (technically) tricky bits remaining are : 1. Deciding exactly which global symbols go into the global symbol structure. All global variables or only those we expect to be modified?


I would go for the latter. For example the predefined entities list is global and should be shared by all threads, I would not put it in
the structure. Other variables are harder, like the Catalog ones, it
could make sense to maintain them per thread but they are costly to build,
this is not simple to decide.

I will need some help here. I have a candidate list (which is all the exported clearly non-static variables in 2.4.5) could you go through this list and indicate to me which globals should be thread specific.

Here's the list (ignore the undef):

#undef  docbDefaultSAXHandler
#undef  htmlDefaultSAXHandler
#undef  oldXMLWDcompatibility
#undef  xmlBufferAllocScheme
#undef  xmlDefaultBufferSize
#undef  xmlDefaultSAXHandler
#undef  xmlDefaultSAXLocator
#undef  xmlDoValidityCheckingDefaultValue
#undef  xmlFree
#undef  xmlGenericError
#undef  xmlGenericErrorContext
#undef  xmlGetWarningsDefaultValue
#undef  xmlIndentTreeOutput
#undef  xmlKeepBlanksDefaultValue
#undef  xmlLineNumbersDefaultValue
#undef  xmlLoadExtDtdDefaultValue
#undef  xmlMalloc
#undef  xmlMemStrdup
#undef  xmlParserDebugEntities
#undef  xmlParserVersion
#undef  xmlPedanticParserDefaultValue
#undef  xmlRealloc
#undef  xmlSaveNoEmptyTags
#undef  xmlStringComment
#undef  xmlStringText
#undef  xmlStringTextNoenc
#undef  xmlSubstituteEntitiesDefaultValue

I know that some of them are clearly not meant to be re-defined on a per-thread basis, e.g. xmlMalloc but I've left them in for completeness sake. Could you indicate which variables should go in the structure in your opinion? I will assume that any variables which aren't intended to be changed by consumers are the ones left over and they will be clearly documented as such when this work is done.

A consequence of this work is that no more global state should be introduced into libxml2, at least not unless it's completely unavoidable. We want to minimise the amount of global state to make movement to libxml3 as painless as possible.



2. Working out the appropriate configure incantation to make thread support optional even if detected on the platform. (If we decide we want this to be an option)


  definitely.

Non-technically, writing the docs will be extremely important too.

Daniel

Gary

--
Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com







[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]