Re: [xml] libxml2 thread safety (SUMMARY)
- From: Gary Pennington <Gary Pennington uk sun com>
- To: xml gnome org
- Cc: Daniel Veillard <veillard redhat com>
- Subject: Re: [xml] libxml2 thread safety (SUMMARY)
- Date: Wed, 03 Oct 2001 09:56:20 +0100
Hi,
There has been a lot of debate since yesterday, I'd just like to try and
address some of the points that were raised in this one mail rather than
fragmenting my thoughts across many posts. Many of the points raised are
good points and I'll try to summarise them here:
1. Fine-grained locking is bad.
2. It's possible to extend libxml2 in a backwards-compatible fashion so
that it is thread-safe.
3. It's possible to extend libxml2 in an incompatible fashion so that is
thread safe.
4. I don't use pthreads, I use <other threads> or my platform has no
threads- what about me?
5. Can we make libxml2 thread-safe without modifying the library itself?
Before I move on to discuss what I think about the above, let me state
clearly which kinds of libxml2 consumers I'm trying to help with this
proposal.
a) Thread-safe libraries/ Multi-threaded applications
The library/application you have written is thread safe/multi-threaded
and is capable of manipulating many documents and would like to support
simultaneous parsing in multiple threads with different parse options.
For instance, some threads want to have validation whilst parsing and
other threads don't. It's not possible to support this now in libxml2,
since a lot of the configuration flags are represented as global state.
You don't want a "better" libxml2 which will make your
application/library scale, you just want it to support your
application/library in a thread-safe fashion so that your code can scale.
b) Thread-unsafe libraries/Single-threaded applications
You have single threaded code that works fine with libxml2, you don't
want to change your code and you don't want to be left behind when
libxml2 changes to become multi-threaded. Maybe you don't have access to
your source, so you can't change your code and backwards compatibility
is a big deal.
In addition to the above, I'm also aiming to help more sophisticated
applications that use lower level APIs to manipulate the parser context
directly, but that's not the main goal of this proposal.
There is a category of applications that I'm NOT trying to help,
specifically:
You have a multi-threaded application and you want to be able to parse
documents faster. You wish that libxml2 was actively threaded and would
parse in parallel using all the processors available on your machine.
To be completely clear, I don't want to make libxml2 "active". It won't
have it's own threads and it won't try to parse in parallel on your
behalf or automatically adjust it's behaviour to match it's execution
environment. That kind of stuff is hard to achieve from scratch and
would be very hard to achieve on the existing libxml2 code base in a
cross-platform fashion, it's way outside the scope of my propsal.
Okay, now I'll try to discuss the points above.
Firstly, I agree that fine-grained locking can be a real performance hog
and I should state up front that I have no plans to introduce
fine-grained locking into libxml2. I want to introduce parallelism at
the "right" level, i.e. the level which I think makes most sense for
most applications. I propose to introduce at least one and maybe two
locking levels; the parse context and the library. I believe that
locking the parser context will open up opportunities for applications
to co-operate on document parsing without adding too much complexity.
All methods for document parsing require a xmlParserCtxt and it's a
"natural" place to synchronize. If necessary, I will also provide a lock
at the library level - although I hope to avoid requiring this by using TSD.
Secondly, I believe that it is possible to make libxml2 thread-safe and
backwards compatible. Key to this belief is the use of TSD (as I
described in my code samples yesterday). All global state flags will be
moved into a TSD structure which means that each thread will have it's
own view of the libxml2 configuration. The big plus of this approach is
that multiple threads get private behaviour out of libxml2 without
requiring any re-writing of user code. I'll repeat that, since it's
importance is near the top of my list and no-one else seemed to be
bothered about this. NO re-writing of code for existing single-threaded
libxml2 consumers. The big negative of this approach is that you need to
be very careful about behaviour when passing documents around between
threads; since each thread will have it's own view of the libxml2
configuration flags. I believe this is a small price to pay for
backwards compatiblity, but it should be considered.
Thirdly, If we do decide that backwards compatibility is not an issue.
Then there are many measures that can be taken to improve the library.
New re-entrant APIs for setting state using user supplied storage,
remove all global state, etc... However, I don't believe this is a
realistic solution as it will cause major compatibility issues for the
existing user base. Let's keep discussing this, but I believe that
backwards compatibility is right at the top of the design goals for any
change to the library.
Fourthly, you are near the top of my list of people we should help
(well, to be more precise - not hinder). I'm not going to provide any
thread support to help you, but I certainly don't want to break any of
your existing code. When the library is built for your platform, the
thread specific code will be excluded from the compile and you will get
a single threaded libxml2 that you can use as you choose. Key to this
approach working is the adoption of TSD and the limited intrusiveness on
libxml2 which I advocate.
Finally, this is an interesting question since it reveals the difference
between making a library thread-safe and having a thread-safe convention
for using the library. I believe that this approach will result in a
thread-safe convention for using libxml2; whereas I would like to make
the library thread-safe. Taking this approach means that you must trust
your clients to behave properly, they mustn't try to manipulate global
state whilst a parse is in progress for instance. I don't think that
this will result in a library that can be used effectively by
multi-threaded clients, in fact I have already considered this option at
some length and discarded it before making this proposal. One very good
reason for discarding this approach becomes evident when you consider
what would happen if you tried to link your application with another
library which also used libxml2; but directly - not following your
synchronisation conventions or using your synchronisation wrapper layer.
I think the answer to the question I posed is a clear no.
Okay, I hope that has answered many peoples' questions and I look
forward to hearing more feedback.
Gary
--
Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]