Re: [xml] libxml2 thread safety (SUMMARY)
- From: Gary Pennington <Gary Pennington uk sun com>
- To: veillard redhat com
- Cc: xml gnome org
- Subject: Re: [xml] libxml2 thread safety (SUMMARY)
- Date: Wed, 03 Oct 2001 16:59:38 +0100
Daniel Veillard wrote:
I'm not sure what the quoted statement refers to... I don't want anyone
to recompile any code. It should be possible to supply a single libxml2
library which is either threaded or not depending on the environment it
was compiled under.
On Wed, Oct 03, 2001 at 09:56:20AM +0100, Gary Pennington wrote:
Secondly, I believe that it is possible to make libxml2 thread-safe and
backwards compatible. Key to this belief is the use of TSD (as I
described in my code samples yesterday). All global state flags will be
moved into a TSD structure which means that each thread will have it's
own view of the libxml2 configuration. The big plus of this approach is
that multiple threads get private behaviour out of libxml2 without
requiring any re-writing of user code. I'll repeat that, since it's
importance is near the top of my list and no-one else seemed to be
bothered about this. NO re-writing of code for existing single-threaded
libxml2 consumers. The big negative of this approach is that you need to
be very careful about behaviour when passing documents around between
threads; since each thread will have it's own view of the libxml2
configuration flags. I believe this is a small price to pay for
backwards compatiblity, but it should be considered.
I have a small concern around the "recompilation for threaded usage"
point. I think it is possible to make the changes in a backward compatible
way, and allow the same shared libraries to serve both kind of users.
Thirdly, If we do decide that backwards compatibility is not an issue.
Well clearly it is an issue. I can't start forking code now, I'm
not ready to work on libxml3 yet, libxml2 is expected to stabilize
for the Gnome-2.0 release. And if (as I believe) we can provide a binary
compatible verion with thread support then this should be pursued.
I'm not sure what single mode means. Can you clarify for me? Does it
mean a single library which works on all platforms? If so, I agree but
there will be conditional compilation of code depending on the
availability of the appropriate thread library support.
Then there are many measures that can be taken to improve the library.
New re-entrant APIs for setting state using user supplied storage,
remove all global state, etc... However, I don't believe this is a
realistic solution as it will cause major compatibility issues for the
existing user base. Let's keep discussing this, but I believe that
backwards compatibility is right at the top of the design goals for any
change to the library.
Fourthly, you are near the top of my list of people we should help
(well, to be more precise - not hinder). I'm not going to provide any
thread support to help you, but I certainly don't want to break any of
your existing code. When the library is built for your platform, the
thread specific code will be excluded from the compile and you will get
a single threaded libxml2 that you can use as you choose. Key to this
approach working is the adoption of TSD and the limited intrusiveness on
libxml2 which I advocate.
I still think we can come with a single mode solution.
You can. There is no requirement to recompile any code, simply use a
thread safe libxml2.
Finally, this is an interesting question since it reveals the difference
between making a library thread-safe and having a thread-safe convention
for using the library. I believe that this approach will result in a
thread-safe convention for using libxml2; whereas I would like to make
the library thread-safe. Taking this approach means that you must trust
your clients to behave properly, they mustn't try to manipulate global
state whilst a parse is in progress for instance. I don't think that
this will result in a library that can be used effectively by
multi-threaded clients, in fact I have already considered this option at
some length and discarded it before making this proposal. One very good
reason for discarding this approach becomes evident when you consider
what would happen if you tried to link your application with another
library which also used libxml2; but directly - not following your
synchronisation conventions or using your synchronisation wrapper layer.
I think the answer to the question I posed is a clear no.
If the extra library hasn't been recompiled you can't guarantee any thread
safety anyway, right ? If it had been recompiled and there is only one
mode of access (i.e. removing the threaded vs. non-threaded way of getting
the information) then I don't see where the problem would come from.
That is what I am proposing. It's a single structure and is currently
alligned per-thread, but a better alignment may be to partition the data
and have some global data aligned with parser context and some per
thread (if required by data global state that can't find a home on a
parser context) . That would better enable threads to co-operate and may
even be a completely seamless, backwards compatible solution.
My personal inclination as I explained yesterday would be:
- to keep all global settings in a new structure
I'm not clear as to exactly what this means. Why would the function be
overridable? Is this because we want to make it possible for users to
maintain state in any fashion they choose? If so, I think that is not a
good idea. What happens if multiple libraries all link to libxml2 and
choose to use different methods to maintain global state? This solution
is appropriate for single applications, but breaks when applied to
libraries which use libxml2 and have no control over how libxml2 may be
used from other consumers inside a process adress space.
- that structure is accessed using an overridable function returning
a pointer to that structure
- the old name for the global variable is turned into a macro
dereferencing the pointer returned by that function (I think
this will keep the read/write capability of that property).
It does. You can see how this works in the sample I sent out yesterday.
The main problem is that it requires source code to explicitly be
compiled to be thread-safe. In other words, the user will have to know
that he is planning to use libxml2 in a threaded fashion. This would be
extremely pernicious when people start to develop and distribute
libraries which use this facility. How would a potential consumer of a
library know what assumptions the distributed library had made about
thread-safeness? Also, this would still be broken with respect to any
existing libraries and would not support binary compatibility. Finally,
it would be very broken if multiple libraries had to be linked against
and they all required different behaviour from libxml2.
Do you foresee any significant problem with this approach ? Any recompiled
library would work in the same mode as the application. There is a single
atomic step to switch to a threaded mode which is to set the global variable
containing the pointer to the accessor function.
The solution I'm proposing is, I think, very similiar to what you
describe above - but there are some key differences. I think that we can
get libxml2 to be thread-safe, backwards compatible and avoid
re-compiling any existing code by careful use of TSD and placement of
global state in internal data structures. I don't think that what you
propose, user configurable global state storage, will work because of
the reasons I outlined.
Solaris Kernel Development,
Gary Pennington sun com
] [Thread Prev