Re: [xml] libxml2 thread safety (SUMMARY)
- From: Gary Pennington <Gary Pennington uk sun com>
- To: veillard redhat com
- Cc: xml gnome org
- Subject: Re: [xml] libxml2 thread safety (SUMMARY)
- Date: Thu, 04 Oct 2001 10:13:38 +0100
Daniel Veillard wrote:
On Wed, Oct 03, 2001 at 04:59:38PM +0100, Gary Pennington wrote:
- that structure is accessed using an overridable function returning
a pointer to that structure
I'm not clear as to exactly what this means. Why would the function be
overridable? Is this because we want to make it possible for users to
maintain state in any fashion they choose? If so, I think that is not a
Yes. The point is that it allows the same code once compiled to run either
in a threaded or unthreaded fashion. I don't want to see libxml2-thread
and libxml2 packages ducplication exporting the same symbols but incompatible
from a run-time point of view. Major mistake, seen Sun do that in the past
(some ugly X11 threadified library versus normal ones problems) and I certainly
don't want to enter that maintenance nightmare. Second benefit is that
the thread dependancy can be isolated from libxml itself anjd is IMHO
a huge gain maintenance wise.
good idea. What happens if multiple libraries all link to libxml2 and
choose to use different methods to maintain global state? This solution
Those libraries are incompatible. Would not be the first time one see
this. If one aplication uses library A and B and B uses A too, B is threaded
and the application is threaded then yes you might have a trouble.
Is this worth putting directly thread dependancies in the library.
is appropriate for single applications, but breaks when applied to
libraries which use libxml2 and have no control over how libxml2 may be
used from other consumers inside a process adress space.
Agreed you may have troubles.
- the old name for the global variable is turned into a macro
dereferencing the pointer returned by that function (I think
this will keep the read/write capability of that property).
It does. You can see how this works in the sample I sent out yesterday.
Even for write, i.e. an application compiled now and writing to a
global var would still work ? Not what I understood.
The reason it works is that you leave the definition of the global
variable in the new library. Any old code will still manipulate the
global variable state, which is fine for existing code since it will
preserve the expected semantics. Any new code will be compiled to use
the TSD (thread specific data, see later) state.
There may be issues where old code is going to be used with new code. In
these cases you should re-compile the code to get correct thread-safe
behaviour, if you don't there will be problems. This is one of the costs
of maintaining backwards compatibility.
Do you foresee any significant problem with this approach ? Any recompiled
library would work in the same mode as the application. There is a single
atomic step to switch to a threaded mode which is to set the global variable
containing the pointer to the accessor function.
The main problem is that it requires source code to explicitly be
compiled to be thread-safe. In other words, the user will have to know
that he is planning to use libxml2 in a threaded fashion. This would be
If you use a thread fork in your application you better know it !!!
I don't see anything pernicious there, rather the opposite, an application
has to be conceived as thread based versus loop based nearly from the
start, saying the app designer need not know it sounds really strange
there is just so many things withc need careful attention in that case
that adding the initialization of threading explicitely for libxml seems
acceptable to me in this context.
extremely pernicious when people start to develop and distribute
libraries which use this facility. How would a potential consumer of a
library know what assumptions the distributed library had made about
thread-safeness? Also, this would still be broken with respect to any
existing libraries and would not support binary compatibility. Finally,
it would be very broken if multiple libraries had to be linked against
and they all required different behaviour from libxml2.
That is true
The solution I'm proposing is, I think, very similiar to what you
describe above - but there are some key differences. I think that we can
get libxml2 to be thread-safe, backwards compatible and avoid
re-compiling any existing code by careful use of TSD and placement of
TSD == Task State Descriptors ?
(see above)
I'm really vary of adding thread knowledge at the library level itself
it *is* hard to make portable and maintain. Currently libxml is easy to port
and relatively easy to maintain. I have to resist changes which decreases
those key properties. It may be all a matter of careful coding, but I need
to be cautious there. I may have to maintain this code for the next 10 years !
global state in internal data structures. I don't think that what you
propose, user configurable global state storage, will work because of
the reasons I outlined.
Okay, what do you think your approach would break ?
Well, hopefully as little as possible. There would be issues for code
which was compiled multi-threaded and which linked with old libraries
which had been compiled with versions of libxml2 prior to this work
being performed. This should be a fairly small category of breakage and
the fix is easy , just recompile your code against the new thread-safe
libxml2 with no code changes required.
The reason this works is because:
1. The existing global symbols are still exported by the library.
2. The new header file checks for threaded compilation and if threaded
substitues the appropriate function rather than using the global symbol.
If not threaded, the global symbol is used.
Here's the appropriate snippet (relating to a single variable
xmlDoValidityCheckingDefaultValue for puposes of illustration, there
would be one of these for each global that was modified) from the header
file to show how it works:
#if defined(_REENTRANT) || (_POSIX_C_SOURCE - 0 >= 199506L)
extern int *__xmlDoValidityCheckingDefaultValue();
#define xmlDoValidityCheckingDefaultValue \
(*(__xmlDoValidityCheckingDefaultValue()))
#else
extern int xmlDoValidityCheckingDefaultValue;
#endif
Any new code compiled would either be threaded or non-threaded. It would
pick up the macro definition if threaded, the standard extern symbol if
not. Existing code would still use the standard extern symbol and would
thus behave as non-threaded code.
The implementation of __xmlDoValidityCheckingDefaultValue() shows how
the value of the TSD is manipulated
#undef xmlDoValidityCheckingDefaultValue
int xmlDoValidityCheckingDefaultValue = 0;
int *
__xmlDoValidityCheckingDefaultValue()
{
struct glob_struct *globalval;
if (keyonce == 0) {
(void) pthread_mutex_lock(&keylock);
if (keyonce == 0) {
keyonce++;
(void) pthread_key_create(&globalkey, tsd_free);
}
(void) pthread_mutex_unlock(&keylock);
}
if ((globalval = (struct glob_struct
*)pthread_getspecific(globalkey))
== NULL) {
struct glob_struct *tsd = alloc_glob_struct();
pthread_setspecific(globalkey, tsd);
return (&tsd->xmlDoValidityCheckingDefaultValue);
} else
return (&globalval->xmlDoValidityCheckingDefaultValue);
}
Notes:
1. The undefine of the macro to prevent the clash with the variable.
2. The check to instantiate a single key for the TSD retrieval.
3. Note the pthread_getspecific call which gets a single structure per
thread which holds a thread-specific copy of the global state. Each
thread manipulates it's own global state.
4. tsd_free frees the global state when the thread terminates.
I really think that the above is only a temporary measure until a
"proper" re-working of libxml (i.e. libxml3) can be performed. It's a
solution that promotes backwards-compatibility, however it's not as
clean as doing the reworking. Some of the work required to remove global
state would be performed which would help in the migration to libxml3
and that would mean phased implementation of the threading support,
which is either a good or a bad thing depending on your point of view.
We could decide that this is not worth doing and that we should just
wait for libxml3 and do a non-backwards compatible reworking. I have
some (limited) time to work on making this happen, but I'd be happy to
defer and wait to do a proper reworking for libxml2 if I knew what the
time frame for that project was.
Anyway, I have a series of test programs using the above partial
implementation which work fine single threaded - I still need to write a
more comprehensive mt-driver (I've run through the testall target) and
the main (technically) tricky bits remaining are :
1. Deciding exactly which global symbols go into the global symbol
structure. All global variables or only those we expect to be modified?
2. Working out the appropriate configure incantation to make thread
support optional even if detected on the platform. (If we decide we want
this to be an option)
3. Writing a mt-driver to test the changes.
We basically agree on what need to be done but diverge on the way to do it,
though the core part is similar in both approaches.
Daniel
Gary
--
Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]