Re: [xml] libxml2 thread safety (SUMMARY)

Daniel Veillard wrote:

On Wed, Oct 03, 2001 at 09:56:20AM +0100, Gary Pennington wrote:

Secondly, I believe that it is possible to make libxml2 thread-safe and backwards compatible. Key to this belief is the use of TSD (as I described in my code samples yesterday). All global state flags will be moved into a TSD structure which means that each thread will have it's own view of the libxml2 configuration. The big plus of this approach is that multiple threads get private behaviour out of libxml2 without requiring any re-writing of user code. I'll repeat that, since it's importance is near the top of my list and no-one else seemed to be bothered about this. NO re-writing of code for existing single-threaded libxml2 consumers. The big negative of this approach is that you need to be very careful about behaviour when passing documents around between threads; since each thread will have it's own view of the libxml2 configuration flags. I believe this is a small price to pay for backwards compatiblity, but it should be considered.

 I have a small concern around the "recompilation for threaded usage"
point. I think it is possible to make the changes in a backward compatible
way, and allow the same shared libraries to serve both kind of users.

I'm not sure what the quoted statement refers to... I don't want anyone to recompile any code. It should be possible to supply a single libxml2 library which is either threaded or not depending on the environment it was compiled under.

Thirdly, If we do decide that backwards compatibility is not an issue.

 Well clearly it is an issue. I can't start forking code now, I'm
not ready to work on libxml3 yet, libxml2 is expected to stabilize for the Gnome-2.0 release. And if (as I believe) we can provide a binary
compatible verion with thread support then this should be pursued.


Then there are many measures that can be taken to improve the library. New re-entrant APIs for setting state using user supplied storage, remove all global state, etc... However, I don't believe this is a realistic solution as it will cause major compatibility issues for the existing user base. Let's keep discussing this, but I believe that backwards compatibility is right at the top of the design goals for any change to the library.

 yes, definitely.

Fourthly, you are near the top of my list of people we should help (well, to be more precise - not hinder). I'm not going to provide any thread support to help you, but I certainly don't want to break any of your existing code. When the library is built for your platform, the thread specific code will be excluded from the compile and you will get a single threaded libxml2 that you can use as you choose. Key to this approach working is the adoption of TSD and the limited intrusiveness on libxml2 which I advocate.

 I still think we can come with a single mode solution.

I'm not sure what single mode means. Can you clarify for me? Does it mean a single library which works on all platforms? If so, I agree but there will be conditional compilation of code depending on the availability of the appropriate thread library support.

Finally, this is an interesting question since it reveals the difference between making a library thread-safe and having a thread-safe convention for using the library. I believe that this approach will result in a thread-safe convention for using libxml2; whereas I would like to make the library thread-safe. Taking this approach means that you must trust your clients to behave properly, they mustn't try to manipulate global state whilst a parse is in progress for instance. I don't think that this will result in a library that can be used effectively by multi-threaded clients, in fact I have already considered this option at some length and discarded it before making this proposal. One very good reason for discarding this approach becomes evident when you consider what would happen if you tried to link your application with another library which also used libxml2; but directly - not following your synchronisation conventions or using your synchronisation wrapper layer. I think the answer to the question I posed is a clear no.

 One minute...
If the extra library hasn't been recompiled you can't guarantee any thread
safety anyway, right ? If it had been recompiled and there is only one
mode of access (i.e. removing the threaded vs. non-threaded way of getting
the information) then I don't see where the problem would come from.

You can. There is no requirement to recompile any code, simply use a thread safe libxml2.

My personal inclination as I explained yesterday would be:
  - to keep all global settings in a new structure

That is what I am proposing. It's a single structure and is currently alligned per-thread, but a better alignment may be to partition the data and have some global data aligned with parser context and some per thread (if required by data global state that can't find a home on a parser context) . That would better enable threads to co-operate and may even be a completely seamless, backwards compatible solution.

  - that structure is accessed using an overridable function returning
    a pointer to that structure

I'm not clear as to exactly what this means. Why would the function be overridable? Is this because we want to make it possible for users to maintain state in any fashion they choose? If so, I think that is not a good idea. What happens if multiple libraries all link to libxml2 and choose to use different methods to maintain global state? This solution is appropriate for single applications, but breaks when applied to libraries which use libxml2 and have no control over how libxml2 may be used from other consumers inside a process adress space.

  - the old name for the global variable is turned into a macro
    dereferencing the pointer returned by that function (I think
    this will keep the read/write capability of that property).

It does. You can see how this works in the sample I sent out yesterday.

Do you foresee any significant problem with this approach ? Any recompiled
library would work in the same mode as the application. There is a single
atomic step to switch to a threaded mode which is to set the global variable
containing the pointer to the accessor function.

The main problem is that it requires source code to explicitly be compiled to be thread-safe. In other words, the user will have to know that he is planning to use libxml2 in a threaded fashion. This would be extremely pernicious when people start to develop and distribute libraries which use this facility. How would a potential consumer of a library know what assumptions the distributed library had made about thread-safeness? Also, this would still be broken with respect to any existing libraries and would not support binary compatibility. Finally, it would be very broken if multiple libraries had to be linked against and they all required different behaviour from libxml2.

The solution I'm proposing is, I think, very similiar to what you describe above - but there are some key differences. I think that we can get libxml2 to be thread-safe, backwards compatible and avoid re-compiling any existing code by careful use of TSD and placement of global state in internal data structures. I don't think that what you propose, user configurable global state storage, will work because of the reasons I outlined.



Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]