Re: [xml] libxml2 review in windows::developer
- From: Sean McGuire <stm particulate net>
- To: xml gnome org
- Subject: Re: [xml] libxml2 review in windows::developer
- Date: Fri, 11 Apr 2003 04:24:13 -0700
This is a surprising result... and I think it's, um... what's the
word... incorrect. :) I've been speed-testing Xerces and libxml for a
few days so I've been doing a lot of benchmarking. I've done tests on
parsing speed, speed of walking every node in the doc, speed of
duplicating the doc by hand (including only those parts I'd care about),
speed of cloning the doc via built-in copy methods, and speed of
serializing to disk... I've found libxml to be twice as fast as Xerces
for parsing, about the same for walking, between two and three times as
fast for copying, and slightly faster for writing for a highly
structured doc, and three times as fast at writing for a flatter doc. I
should do tests on building a doc by hand, from scratch, but I've been
too lazy and have just let the hand-copy serve as a proxy for that.
Unfortunately I can't give many details about the testing environment
since I'm not allowed to say who I work for, lest my statements be
taken as "official" when they're just the ramblings of some developer,
or much about the test files. But I'll talk to my boss and see if I can
get them to loosen up a bit in the interest of spreading some truth around.
Of course, it's entirely possible that my use of Xerces is bone-brained
and that accounts for some of the difference... OTOH, my use of libxml
is based on a bit of time reading the docs, and so is my use of Xerces,
so at least, these numbers are a good example of what a skilled if
clueless developer will get. And frankly, even if my use of Xerces is
bone-brained, it's hard for me to imagine it is *so* bone-brained as to
account for as much difference as I've seen.
Anyway, I know that that's not too useful without actual code and sample
files to look at, but at least I thought it'd be nice to hear that some
people are seeing radically different numbers (which are more favorable
Peter Jacobi wrote:
In the April issue of the windows::developer magazine there was
a comparative review of five (?) XML parsers, including libxml2. Author is
An online version is at
but requires registration at the web site.
I think Matthew is a rather competent C++ guru, but this doesn't help much
in XML issues, so there is not much beef in the article. But for benefit or
amusement, I'll try to summarize the main points:
All testing done using C++, libxml2 used via libbxml++. The other parsers
were MSXML, Xerces and XMLBooster. XMLBooster (www.xmlbooster.com)
is in fact a parser generator - given the class definition in a propietary
XML-format, it generates Serializere and Deserialiers for this class.
The test data was rather 'flat', being 1000, 10000 and 100000 entries of
the same structure:
<entry year = "2003" month = "4" day = "1" who="Windows Developer
<entry year = "2002" month = "12" day = "1" who="C/C++ User's Journal" />
<entry year = "2001" month = "5" day = "1" who="Windows Developer
. . .
So parsing this files gives only two performance classes (with little intra-
'slow' (needs 3 times the 'fast' time)
So what's telling us these numbers:
1. Allocating the (DOM) tree needs time, and doing SAX or a specialized
parser is faster. libxml-SAX wasn't benchmarked.
2. Xerces faster than libxml is a bit a mystery, but given the XML above,
it may be the 'attribute cost'.
Matthew's comments on ease of use favor XMLBooster, but I find this rather
pointless, as XMLBooster offers a layer above pure XML-Parsing, which can
be added similiarly to each parser.
xml mailing list, project page http://xmlsoft.org/
xml gnome org
] [Thread Prev