Re: [xml] libxml2 thread safety (SUMMARY)

From: Gary Pennington <Gary Pennington uk sun com>
To: xml gnome org
Cc: Daniel Veillard <veillard redhat com>
Subject: Re: [xml] libxml2 thread safety (SUMMARY)
Date: Wed, 03 Oct 2001 09:56:20 +0100

Hi,

There has been a lot of debate since yesterday, I'd just like to try andaddress some of the points that were raised in this one mail rather thanfragmenting my thoughts across many posts. Many of the points raised aregood points and I'll try to summarise them here:


1. Fine-grained locking is bad.

2. It's possible to extend libxml2 in a backwards-compatible fashion sothat it is thread-safe.3. It's possible to extend libxml2 in an incompatible fashion so that isthread safe.4. I don't use pthreads, I use <other threads> or my platform has nothreads- what about me?

5. Can we make libxml2 thread-safe without modifying the library itself?

Before I move on to discuss what I think about the above, let me stateclearly which kinds of libxml2 consumers I'm trying to help with thisproposal.


a) Thread-safe libraries/ Multi-threaded applications

The library/application you have written is thread safe/multi-threadedand is capable of manipulating many documents and would like to supportsimultaneous parsing in multiple threads with different parse options.For instance, some threads want to have validation whilst parsing andother threads don't. It's not possible to support this now in libxml2,since a lot of the configuration flags are represented as global state.You don't want a "better" libxml2 which will make yourapplication/library scale, you just want it to support yourapplication/library in a thread-safe fashion so that your code can scale.


b) Thread-unsafe libraries/Single-threaded applications

You have single threaded code that works fine with libxml2, you don'twant to change your code and you don't want to be left behind whenlibxml2 changes to become multi-threaded. Maybe you don't have access toyour source, so you can't change your code and backwards compatibilityis a big deal.

In addition to the above, I'm also aiming to help more sophisticatedapplications that use lower level APIs to manipulate the parser contextdirectly, but that's not the main goal of this proposal.

There is a category of applications that I'm NOT trying to help,specifically:

You have a multi-threaded application and you want to be able to parsedocuments faster. You wish that libxml2 was actively threaded and wouldparse in parallel using all the processors available on your machine.

To be completely clear, I don't want to make libxml2 "active". It won'thave it's own threads and it won't try to parse in parallel on yourbehalf or automatically adjust it's behaviour to match it's executionenvironment. That kind of stuff is hard to achieve from scratch andwould be very hard to achieve on the existing libxml2 code base in across-platform fashion, it's way outside the scope of my propsal.


Okay, now I'll try to discuss the points above.

Firstly, I agree that fine-grained locking can be a real performance hogand I should state up front that I have no plans to introducefine-grained locking into libxml2. I want to introduce parallelism atthe "right" level, i.e. the level which I think makes most sense formost applications. I propose to introduce at least one and maybe twolocking levels; the parse context and the library. I believe thatlocking the parser context will open up opportunities for applicationsto co-operate on document parsing without adding too much complexity.All methods for document parsing require a xmlParserCtxt and it's a"natural" place to synchronize. If necessary, I will also provide a lockat the library level - although I hope to avoid requiring this by using TSD.

Secondly, I believe that it is possible to make libxml2 thread-safe andbackwards compatible. Key to this belief is the use of TSD (as Idescribed in my code samples yesterday). All global state flags will bemoved into a TSD structure which means that each thread will have it'sown view of the libxml2 configuration. The big plus of this approach isthat multiple threads get private behaviour out of libxml2 withoutrequiring any re-writing of user code. I'll repeat that, since it'simportance is near the top of my list and no-one else seemed to bebothered about this. NO re-writing of code for existing single-threadedlibxml2 consumers. The big negative of this approach is that you need tobe very careful about behaviour when passing documents around betweenthreads; since each thread will have it's own view of the libxml2configuration flags. I believe this is a small price to pay forbackwards compatiblity, but it should be considered.

Thirdly, If we do decide that backwards compatibility is not an issue.Then there are many measures that can be taken to improve the library.New re-entrant APIs for setting state using user supplied storage,remove all global state, etc... However, I don't believe this is arealistic solution as it will cause major compatibility issues for theexisting user base. Let's keep discussing this, but I believe thatbackwards compatibility is right at the top of the design goals for anychange to the library.

Fourthly, you are near the top of my list of people we should help(well, to be more precise - not hinder). I'm not going to provide anythread support to help you, but I certainly don't want to break any ofyour existing code. When the library is built for your platform, thethread specific code will be excluded from the compile and you will geta single threaded libxml2 that you can use as you choose. Key to thisapproach working is the adoption of TSD and the limited intrusiveness onlibxml2 which I advocate.

Finally, this is an interesting question since it reveals the differencebetween making a library thread-safe and having a thread-safe conventionfor using the library. I believe that this approach will result in athread-safe convention for using libxml2; whereas I would like to makethe library thread-safe. Taking this approach means that you must trustyour clients to behave properly, they mustn't try to manipulate globalstate whilst a parse is in progress for instance. I don't think thatthis will result in a library that can be used effectively bymulti-threaded clients, in fact I have already considered this option atsome length and discarded it before making this proposal. One very goodreason for discarding this approach becomes evident when you considerwhat would happen if you tried to link your application with anotherlibrary which also used libxml2; but directly - not following yoursynchronisation conventions or using your synchronisation wrapper layer.I think the answer to the question I posed is a clear no.

Okay, I hope that has answered many peoples' questions and I lookforward to hearing more feedback.


Gary

--
Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com

Follow-Ups:
- Re: [xml] libxml2 thread safety (SUMMARY)
  - From: Leo Davidson
- Re: [xml] libxml2 thread safety (SUMMARY)
  - From: Daniel Veillard

References:
- [xml] libxml2 thread safety
  - From: Gary Pennington

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]