Re: [xml] libxml2 review in windows::developer



This is a surprising result... and I think it's, um... what's the word... incorrect. :) I've been speed-testing Xerces and libxml for a few days so I've been doing a lot of benchmarking. I've done tests on parsing speed, speed of walking every node in the doc, speed of duplicating the doc by hand (including only those parts I'd care about), speed of cloning the doc via built-in copy methods, and speed of serializing to disk... I've found libxml to be twice as fast as Xerces for parsing, about the same for walking, between two and three times as fast for copying, and slightly faster for writing for a highly structured doc, and three times as fast at writing for a flatter doc. I should do tests on building a doc by hand, from scratch, but I've been too lazy and have just let the hand-copy serve as a proxy for that.

Unfortunately I can't give many details about the testing environment since I'm not allowed to say who I work for, lest my statements be taken as "official" when they're just the ramblings of some developer, or much about the test files. But I'll talk to my boss and see if I can get them to loosen up a bit in the interest of spreading some truth around.

Of course, it's entirely possible that my use of Xerces is bone-brained and that accounts for some of the difference... OTOH, my use of libxml is based on a bit of time reading the docs, and so is my use of Xerces, so at least, these numbers are a good example of what a skilled if clueless developer will get. And frankly, even if my use of Xerces is bone-brained, it's hard for me to imagine it is *so* bone-brained as to account for as much difference as I've seen.

Anyway, I know that that's not too useful without actual code and sample files to look at, but at least I thought it'd be nice to hear that some people are seeing radically different numbers (which are more favorable to libxml2.)

Peter Jacobi wrote:

Hi!

In the April issue of the windows::developer magazine there was
a comparative review of five (?) XML parsers, including libxml2. Author is Matthew Wilson. An online version is at http://www.windevnet.com/documents/s=7868/win0304a/0304a.htm,
but requires registration at the web site.

I think Matthew is a rather competent C++ guru, but this doesn't help much in XML issues, so there is not much beef in the article. But for benefit or amusement, I'll try to summarize the main points:

All testing done using C++, libxml2 used via libbxml++. The other parsers were MSXML, Xerces and XMLBooster. XMLBooster (www.xmlbooster.com) is in fact a parser generator - given the class definition in a propietary XML-format, it generates Serializere and Deserialiers for this class.

The test data was rather 'flat', being 1000, 10000 and 100000 entries of the same structure:
<agenda>
<entry year = "2003" month = "4" day = "1" who="Windows Developer Magazine" />
<entry year = "2002" month = "12" day = "1" who="C/C++ User's Journal" />
<entry year = "2001" month = "5" day = "1" who="Windows Developer Magazine" />
. . .
</agenda>

So parsing this files gives only two performance classes (with little intra-
class differentation):

'fast'
MSXML-SAX
Xerces
XMLBooster

'slow' (needs 3 times the 'fast' time)
MSXML-DOM
libxml2-tree

So what's telling us these numbers:
1. Allocating the (DOM) tree needs time, and doing SAX or a specialized parser is faster. libxml-SAX wasn't benchmarked. 2. Xerces faster than libxml is a bit a mystery, but given the XML above, it may be the 'attribute cost'.

Matthew's comments on ease of use favor XMLBooster, but I find this rather pointless, as XMLBooster offers a layer above pure XML-Parsing, which can be added similiarly to each parser.

Regards,
Peter Jacobi


_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml







[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]