[xml] Start of changes in the XML (and HTML) parsers



  I just commited my first set of changes to the parsing code. Those
aren't yet SAX evolution code but an attept at minimizing the number
of memory allocations done by the parser internals (even before the
SAX layer is called). In a nutshell I added a dictionnary module whose
only goal is to keep a single instance of all tag strings manipulated
by the parser. 
Pros:
   - reduce the number of calls to malloc() and free(), this is 
     especially dramatic if you use just the SAX layer
   - reduce the working set of data, improving cache performances
   - should improve threaded execution since we replace global 
     allocation routines with per parser private data
Cons:
   - cost of the hash function
   - cost of the lookup in the dictionary hash table
   - tiny change in some APIs, but those are entry point that only
     a parser can use like
     const xmlChar *
     xmlParseName(xmlParserCtxtPtr ctxt);
     i.e. function working directly on the parser input data.

  After playing a bit with the hash, the code I commited reduce the
total number of instructions and total number of memory accesses in 
3-4 very distinct tests. The performance improvement is neglectible
on recent linux NTPL based single threaded programs, but when doing
pure SAX parsing on sequential data-like instance with older systems
where malloc() and free() are more expensive, I have seen 20% parsing
speed improvements:

localhost:~/tmp/libxml2-2.5.10 -> time ./testSAX --quiet ~/XML/db1000000.xml
32000006 callbacks generated
 
 real    0m30.011s
 user    0m21.030s
 sys     0m2.360s
 localhost:~/tmp/libxml2-2.5.10 ->

localhost:~/XML ->  time ./testSAX --quiet ~/XML/db1000000.xml
32000006 callbacks generated
 
 real    0m24.462s
 user    0m15.720s
 sys     0m2.590s
localhost:~/XML -> ls -l ~/XML/db1000000.xml
-rw-r--r--    1 veillard www      202810040 Jul 18 13:36 /u/veillard/XML/db1000000.xml
localhost:~/XML ->

  It is on a Celeron 700 running on Red Hat Linux Advanced Server 2.1AS
With a smaller instance on a Duron 1.2GHz running Red Hat Linux 9

paphio:~/tmp/libxml2-2.5.10 -> time ./testSAX --quiet ~/XML/db100000.xml
3200006 callbacks generated
 
 real    0m1.101s
 user    0m1.000s
 sys     0m0.070s
paphio:~/tmp/libxml2-2.5.10 ->
paphio:~/XML -> time ./testSAX --quiet ~/XML/db100000.xml
3200006 callbacks generated
 
 real    0m1.062s
 user    0m0.980s
 sys     0m0.080s
paphio:~/XML -> 

   the difference is hardly noticeable. But this opens the door to a number
of other speedups like for the xmlReader interface, while keeping 

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]