[xml] performance of parsing docbook with xincludes
- From: Stefan Sauer <ensonic hora-obscura de>
- To: xml gnome org
- Subject: [xml] performance of parsing docbook with xincludes
- Date: Sun, 13 May 2018 20:54:53 +0200
hi,
I am the maintainer of gtk-doc. One biggest complaint I get is the
performance. gtk-doc is scanning sources and combining the extracted
comments with handwritten docbook into a signle docbook document. The
docbook document uses xinclude for its parts. As a next step we were
using the docbook-stylesheets to generate reference docs as html (and
dblatex for pdfs).
So far I blamed the xslt processing for the low performance and since
about a quarter I am working on a (python) tool in gtk-doc that reads
the docbook with lxml (xml module that uses libxml2) and then walks the
tree a few time and produces chunked html similar to the docbook
stylesheets. The tool is getting feature complete and is up to 10 times
faster (despite python). One reason I believed xslt is slow is that it
is single threaded and when I added multi-threding/processing to my
python tool I was puzzled that it does not get much faster. At this
point I added some benchmarking and found out that the biggest chunk of
time it spend on loading the xml.
Lets look at some numbers using glib (https://gitlab.gnome.org/GNOME/glib)
cd glib/docs/reference/glib
xmllint --timing --xinclude --noout glib-docs.xml
Parsing took 0 ms
Xinclude processing took 4560 ms
Freeing took 91 ms
Any idea how I can get more breakdown of whats happening in 'Xinclude
processing'?
Running with "perf record -g -- xmllint --timing --xinclude --noout
glib-docs.xml" gets me such a report.
+ 17.15% 16.69% xmllint libc-2.24.so [.] _int_malloc
+ 11.93% 11.87% xmllint libc-2.24.so [.] malloc_consolidate
+ 9.01% 8.97% xmllint libxml2.so.2.9.4 [.] xmlDictLookup
+ 7.15% 0.00% xmllint ld-2.24.so [.] 0xffff8021a0022010
+ 6.25% 6.21% xmllint libxml2.so.2.9.4 [.] xmlHashAddEntry3
+ 6.22% 0.00% xmllint libxml2.so.2.9.4 [.] xmlSAX2IsStandalone
+ 6.22% 0.00% xmllint [unknown] [.] 0x56413c74c0854810
+ 3.95% 3.94% xmllint libxml2.so.2.9.4 [.] xmlHashLookup2
3.72% 3.70% xmllint libc-2.24.so [.] _int_free
+ 3.28% 0.00% xmllint [unknown] [.] 0000000000000000
+ 3.06% 3.04% xmllint libxml2.so.2.9.4 [.]
xmlFreeDocElementContent
+ 2.96% 2.91% xmllint libc-2.24.so [.] free
Trying a different allocator seems to help quite a bit too (xtime is an
alias for /usr/bin/time -f '%Uu %Ss %er %MkB %C' "$@")
rm html-build.stamp; ~/bin/xtime make docs
53.28u 0.99s 54.70r 202372kB make docs
rm html-build.stamp; LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4.3.0
~/bin/xtime make docs
42.48u 1.54s 44.48r 185404kB make docs
-> saves ~11sec when using the original toolchain (libxml2 + libxslt
with docbook-stylesheets)
~/bin/xtime python3 ~/projects/gnome/gtk-doc/gtkdoc-mkhtml2 glib
glib-docs.xml
7.01u 0.25s 7.27r 146068kB python3
/home/ensonic/projects/gnome/gtk-doc/gtkdoc-mkhtml2 glib glib-docs.xml
LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4.3.0 ~/bin/xtime python3
~/projects/gnome/gtk-doc/gtkdoc-mkhtml2 glib glib-docs.xml
5.69u 0.39s 6.10r 137340kB python3
/home/ensonic/projects/gnome/gtk-doc/gtkdoc-mkhtml2 glib glib-docs.xml
-> saves ~1.5sec with my new toolchain (mostly on the loading xml side).
Any ideas. Is there a know issues with using xincludes here?
Stefan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]