Hello to all, I’m a new user of libxml and new to XML in
general. I’ve been asked to evaluate XML libraries, preferably Open
Source projects, for some things we want to do with XML in our products.
We provide an archival/retrieval system for medical records and images and we
use XML for attaching metadata to the files we store. We have some
front-end UI components that make some use of XML but currently most of the
work is done in the transport layer and the backend database components. Due
to the volume of data involved, efficiency and execution speed is a prime
concern, though not necessarily an overriding one. Most of the XML work
being done now is with roll-your-own string processing. Going forward we
will need to be more sophisticated and standards-compliant. Of the packages that turned up when I did a search, Xerces
and libxml are the leading candidates. I’ve downloaded, installed,
built, and written test code for both and based on my findings, I’m
leaning very heavily toward recommending libxml. The person I report to
has a very strong bias toward Xerces in general, and the W3C DOM standard in
particular, as the hammer with which to pound all nails, even if the problem
isn’t a nail. I’ve also received feedback from some of the
users in the Xerces group and they make some points that I should at least
consider. What I’d like to do is present my reasons for
recommending libxml, given the job we need to do as described above, include some
of the Xerces users’ comments, and hopefully get your thoughts as well.
I like libxml because:
I’ve likened the use of big
packages like Xerces for some of the things we need to “using a blowtorch
to light a cigarette”. Here is one response from a Xerces user: “Libxml
is a great library with somewhat different goals than Xerces. I don't think it's explicitly
stated on the Web site, but Xerces and other projects that build on it
tend to implement W3C standards (DOM, XML Schema), while libxml
implements what its maintainer prefers (a unique API, RelaxNG), with a focus
on efficiency. Both approaches are reasonable, and which is
appropriate depends on your needs. In your shoes, if I were
certain that lighting a cigarette is all I would ever need to do, I'd
probably use libxml. In my experience, though, XML is useful for so
many things that I'd probably want to be prepared to bake, boil,
weld, and power fighter jets as well - in a variety of local
languages. I'm a nut for portability, and a DOM interface has the advantage
of being similar or identical in a wide range of environments (C++,
C#, _javascript_, etc).” What about this? Is Xerces
that much more powerful, as the writer suggests? Is portability the only
advantage to W3C-compliant interfaces like DOM? And then this: “In
cases where performance is critical, I think you'd be best off avoiding XPath altogether. (snip)
An optimal Xerces SAX parser might well be more efficient than libxml parsing + XPath
evaluation.” Finally: “One big difference
between Xerces-C++ and Libxml2 is that the latter does not have a functional
XML Schema validator. I don't know if it is important to you or not.
Also note that much of the speed-up of Libxml2 compared to
Xerces-C++ comes from the fact that Xerces-C++ uses 2-byte characters
(UTF-16) while Libxml2 uses 1-byte characters (UTF-8). Since most
performance tests that I am aware of are done on XML files that are either
ASCII or UTF-8, Libxml2 has a natural advantage here. This is also
something to consider depending on the type of applications you are
planning to build.” I’m unsure of the importance
of an XML Schema validator so I can’t comment on this. I don’t
think I agree with the comment about speed vis a vis UTF-8/16. Encoding
conversions using UTF-8 are more computationally intensive than UTF-16 so what
you lose by moving around double the number bytes would, I think be offset by
the greater CPU requirement for translating the data. Does Xerces’
use of UTF-16 provide support for a wider range of encodings and local
languages? I know this is rather long and I
apologize in advance if it is too much so, but obviously there’s a lot to
be considered, this is a hefty decision, and I want to provide anybody who
might be inclined to help with as much to go on as possible. Thanks in
advance for any responses, -will |