Re: [xml] libxml2 thread safety (SUMMARY)


There has been a lot of debate since yesterday, I'd just like to try and address some of the points that were raised in this one mail rather than fragmenting my thoughts across many posts. Many of the points raised are good points and I'll try to summarise them here:

1. Fine-grained locking is bad.
2. It's possible to extend libxml2 in a backwards-compatible fashion so that it is thread-safe. 3. It's possible to extend libxml2 in an incompatible fashion so that is thread safe. 4. I don't use pthreads, I use <other threads> or my platform has no threads- what about me?
5. Can we make libxml2 thread-safe without modifying the library itself?

Before I move on to discuss what I think about the above, let me state clearly which kinds of libxml2 consumers I'm trying to help with this proposal.

a) Thread-safe libraries/ Multi-threaded applications
The library/application you have written is thread safe/multi-threaded and is capable of manipulating many documents and would like to support simultaneous parsing in multiple threads with different parse options. For instance, some threads want to have validation whilst parsing and other threads don't. It's not possible to support this now in libxml2, since a lot of the configuration flags are represented as global state. You don't want a "better" libxml2 which will make your application/library scale, you just want it to support your application/library in a thread-safe fashion so that your code can scale.

b) Thread-unsafe libraries/Single-threaded applications
You have single threaded code that works fine with libxml2, you don't want to change your code and you don't want to be left behind when libxml2 changes to become multi-threaded. Maybe you don't have access to your source, so you can't change your code and backwards compatibility is a big deal.

In addition to the above, I'm also aiming to help more sophisticated applications that use lower level APIs to manipulate the parser context directly, but that's not the main goal of this proposal.

There is a category of applications that I'm NOT trying to help, specifically:

You have a multi-threaded application and you want to be able to parse documents faster. You wish that libxml2 was actively threaded and would parse in parallel using all the processors available on your machine.

To be completely clear, I don't want to make libxml2 "active". It won't have it's own threads and it won't try to parse in parallel on your behalf or automatically adjust it's behaviour to match it's execution environment. That kind of stuff is hard to achieve from scratch and would be very hard to achieve on the existing libxml2 code base in a cross-platform fashion, it's way outside the scope of my propsal.

Okay, now I'll try to discuss the points above.

Firstly, I agree that fine-grained locking can be a real performance hog and I should state up front that I have no plans to introduce fine-grained locking into libxml2. I want to introduce parallelism at the "right" level, i.e. the level which I think makes most sense for most applications. I propose to introduce at least one and maybe two locking levels; the parse context and the library. I believe that locking the parser context will open up opportunities for applications to co-operate on document parsing without adding too much complexity. All methods for document parsing require a xmlParserCtxt and it's a "natural" place to synchronize. If necessary, I will also provide a lock at the library level - although I hope to avoid requiring this by using TSD.

Secondly, I believe that it is possible to make libxml2 thread-safe and backwards compatible. Key to this belief is the use of TSD (as I described in my code samples yesterday). All global state flags will be moved into a TSD structure which means that each thread will have it's own view of the libxml2 configuration. The big plus of this approach is that multiple threads get private behaviour out of libxml2 without requiring any re-writing of user code. I'll repeat that, since it's importance is near the top of my list and no-one else seemed to be bothered about this. NO re-writing of code for existing single-threaded libxml2 consumers. The big negative of this approach is that you need to be very careful about behaviour when passing documents around between threads; since each thread will have it's own view of the libxml2 configuration flags. I believe this is a small price to pay for backwards compatiblity, but it should be considered.

Thirdly, If we do decide that backwards compatibility is not an issue. Then there are many measures that can be taken to improve the library. New re-entrant APIs for setting state using user supplied storage, remove all global state, etc... However, I don't believe this is a realistic solution as it will cause major compatibility issues for the existing user base. Let's keep discussing this, but I believe that backwards compatibility is right at the top of the design goals for any change to the library.

Fourthly, you are near the top of my list of people we should help (well, to be more precise - not hinder). I'm not going to provide any thread support to help you, but I certainly don't want to break any of your existing code. When the library is built for your platform, the thread specific code will be excluded from the compile and you will get a single threaded libxml2 that you can use as you choose. Key to this approach working is the adoption of TSD and the limited intrusiveness on libxml2 which I advocate.

Finally, this is an interesting question since it reveals the difference between making a library thread-safe and having a thread-safe convention for using the library. I believe that this approach will result in a thread-safe convention for using libxml2; whereas I would like to make the library thread-safe. Taking this approach means that you must trust your clients to behave properly, they mustn't try to manipulate global state whilst a parse is in progress for instance. I don't think that this will result in a library that can be used effectively by multi-threaded clients, in fact I have already considered this option at some length and discarded it before making this proposal. One very good reason for discarding this approach becomes evident when you consider what would happen if you tried to link your application with another library which also used libxml2; but directly - not following your synchronisation conventions or using your synchronisation wrapper layer. I think the answer to the question I posed is a clear no.

Okay, I hope that has answered many peoples' questions and I look forward to hearing more feedback.


Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]