Re: [xml] libxml2 thread safety (SUMMARY)

Daniel Veillard wrote:

On Wed, Oct 03, 2001 at 04:59:38PM +0100, Gary Pennington wrote:

 - that structure is accessed using an overridable function returning
   a pointer to that structure

I'm not clear as to exactly what this means. Why would the function be overridable? Is this because we want to make it possible for users to maintain state in any fashion they choose? If so, I think that is not a

 Yes. The point is that it allows the same code once compiled to run either
in a threaded or unthreaded fashion. I don't want to see libxml2-thread
and libxml2 packages ducplication exporting the same symbols but incompatible
from a run-time point of view. Major mistake, seen Sun do that in the past
(some ugly X11 threadified library versus normal ones problems) and I certainly
don't want to enter that maintenance nightmare. Second benefit is that the thread dependancy can be isolated from libxml itself anjd is IMHO a huge gain maintenance wise.

good idea. What happens if multiple libraries all link to libxml2 and choose to use different methods to maintain global state? This solution

 Those libraries are incompatible. Would not be the first time one see
this. If one aplication uses library A and B and B uses A too, B is threaded
and the application is threaded then yes you might have a trouble.
Is this worth putting directly thread dependancies in the library.

is appropriate for single applications, but breaks when applied to libraries which use libxml2 and have no control over how libxml2 may be used from other consumers inside a process adress space.

 Agreed you may have troubles.

 - the old name for the global variable is turned into a macro
   dereferencing the pointer returned by that function (I think
   this will keep the read/write capability of that property).

It does. You can see how this works in the sample I sent out yesterday.

 Even for write, i.e. an application compiled now and writing to a
global var would still work ? Not what I understood.

The reason it works is that you leave the definition of the global variable in the new library. Any old code will still manipulate the global variable state, which is fine for existing code since it will preserve the expected semantics. Any new code will be compiled to use the TSD (thread specific data, see later) state.

There may be issues where old code is going to be used with new code. In these cases you should re-compile the code to get correct thread-safe behaviour, if you don't there will be problems. This is one of the costs of maintaining backwards compatibility.

Do you foresee any significant problem with this approach ? Any recompiled
library would work in the same mode as the application. There is a single
atomic step to switch to a threaded mode which is to set the global variable
containing the pointer to the accessor function.

The main problem is that it requires source code to explicitly be compiled to be thread-safe. In other words, the user will have to know that he is planning to use libxml2 in a threaded fashion. This would be

 If you use a thread fork in your application you better know it !!!
I don't see anything pernicious there, rather the opposite, an application
has to be conceived as thread based versus loop based nearly from the
start, saying the app designer need not know it sounds really strange
there is just so many things withc need careful attention in that case
that adding the initialization of threading explicitely for libxml seems
acceptable to me in this context.

extremely pernicious when people start to develop and distribute libraries which use this facility. How would a potential consumer of a library know what assumptions the distributed library had made about thread-safeness? Also, this would still be broken with respect to any existing libraries and would not support binary compatibility. Finally, it would be very broken if multiple libraries had to be linked against and they all required different behaviour from libxml2.

 That is true

The solution I'm proposing is, I think, very similiar to what you describe above - but there are some key differences. I think that we can get libxml2 to be thread-safe, backwards compatible and avoid re-compiling any existing code by careful use of TSD and placement of

 TSD == Task State Descriptors ?

(see above)

 I'm really vary of adding thread knowledge at the library level itself
it *is* hard to make portable and maintain. Currently libxml is easy to port
and relatively easy to maintain. I have to resist changes which decreases those key properties. It may be all a matter of careful coding, but I need
to be cautious there. I may have to maintain this code for the next 10 years !

global state in internal data structures. I don't think that what you propose, user configurable global state storage, will work because of the reasons I outlined.

 Okay, what do you think your approach would break ?

Well, hopefully as little as possible. There would be issues for code which was compiled multi-threaded and which linked with old libraries which had been compiled with versions of libxml2 prior to this work being performed. This should be a fairly small category of breakage and the fix is easy , just recompile your code against the new thread-safe libxml2 with no code changes required.

The reason this works is because:
1. The existing global symbols are still exported by the library.
2. The new header file checks for threaded compilation and if threaded substitues the appropriate function rather than using the global symbol. If not threaded, the global symbol is used. Here's the appropriate snippet (relating to a single variable xmlDoValidityCheckingDefaultValue for puposes of illustration, there would be one of these for each global that was modified) from the header file to show how it works:

#if defined(_REENTRANT) || (_POSIX_C_SOURCE - 0 >= 199506L)
extern int *__xmlDoValidityCheckingDefaultValue();
#define xmlDoValidityCheckingDefaultValue \
extern int xmlDoValidityCheckingDefaultValue;

Any new code compiled would either be threaded or non-threaded. It would pick up the macro definition if threaded, the standard extern symbol if not. Existing code would still use the standard extern symbol and would thus behave as non-threaded code.

The implementation of __xmlDoValidityCheckingDefaultValue() shows how the value of the TSD is manipulated

#undef  xmlDoValidityCheckingDefaultValue

int xmlDoValidityCheckingDefaultValue = 0;

int *
       struct glob_struct *globalval;

       if (keyonce == 0) {
               (void) pthread_mutex_lock(&keylock);
               if (keyonce == 0) {
                       (void) pthread_key_create(&globalkey, tsd_free);
               (void) pthread_mutex_unlock(&keylock);
if ((globalval = (struct glob_struct *)pthread_getspecific(globalkey))
           == NULL) {
               struct glob_struct *tsd = alloc_glob_struct();
               pthread_setspecific(globalkey, tsd);
               return (&tsd->xmlDoValidityCheckingDefaultValue);
       } else
               return (&globalval->xmlDoValidityCheckingDefaultValue);

1. The undefine of the macro to prevent the clash with the variable.
2. The check to instantiate a single key for the TSD retrieval.
3. Note the pthread_getspecific call which gets a single structure per thread which holds a thread-specific copy of the global state. Each thread manipulates it's own global state.
4. tsd_free frees the global state when the thread terminates.

I really think that the above is only a temporary measure until a "proper" re-working of libxml (i.e. libxml3) can be performed. It's a solution that promotes backwards-compatibility, however it's not as clean as doing the reworking. Some of the work required to remove global state would be performed which would help in the migration to libxml3 and that would mean phased implementation of the threading support, which is either a good or a bad thing depending on your point of view. We could decide that this is not worth doing and that we should just wait for libxml3 and do a non-backwards compatible reworking. I have some (limited) time to work on making this happen, but I'd be happy to defer and wait to do a proper reworking for libxml2 if I knew what the time frame for that project was.

Anyway, I have a series of test programs using the above partial implementation which work fine single threaded - I still need to write a more comprehensive mt-driver (I've run through the testall target) and the main (technically) tricky bits remaining are : 1. Deciding exactly which global symbols go into the global symbol structure. All global variables or only those we expect to be modified? 2. Working out the appropriate configure incantation to make thread support optional even if detected on the platform. (If we decide we want this to be an option)
3. Writing a mt-driver to test the changes.

We basically agree on what need to be done but diverge on the way to do it,
though the core part is similar in both approaches.



Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]