Re: [xml] performance of parsing docbook with xincludes
- From: Stefan Sauer <ensonic hora-obscura de>
- To: Nick Wellnhofer <wellnhofer aevum de>, xml gnome org
- Subject: Re: [xml] performance of parsing docbook with xincludes
- Date: Mon, 14 May 2018 21:44:10 +0200
Seems that the list does not like attachements, so resensing with links
On 05/14/2018 12:19 PM, Nick Wellnhofer wrote:
On 13/05/2018 20:54, Stefan Sauer wrote:
Lets look at some numbers using glib
(https://gitlab.gnome.org/GNOME/glib)
cd glib/docs/reference/glib
xmllint --timing --xinclude --noout glib-docs.xml
Parsing took 0 ms
Xinclude processing took 4560 ms
Freeing took 91 ms
Any idea how I can get more breakdown of whats happening in 'Xinclude
processing'?
It seems that "XInclude processing" also contains the time needed to
parse the included documents, so maybe the XIncludes aren't the issue
at all (glib-docs.xml is a small document including several larger
ones). Can you save glib-docs.xml after processing XIncludes and check
whether parsing the consolidated document is considerably faster?
$ xmllint --timing --xinclude glib-docs.xml >glib-full.xml
Parsing took 0 ms
Xinclude processing took 4726 ms
Saving took 85 ms
Freeing took 104 ms
$ xmllint --timing --noout glib-full.xml
Parsing took 151 ms
Freeing took 14 ms
$ ls -alh glib-full.xml
-rw-r--r-- 1 ensonic users 6.8M May 14 12:42 glib-full.xml
Parsing the consolidated doc is a magnitude faster. Thanks for
suggesting this test.
$ xtime sh -c 'find . -name "*.xml" | grep -v version.xml | xargs xmllint --noout'
0.40u 0.04s 0.44r 70296kB sh -c find . -name "*.xml" | grep -v version.xml | xargs xmllint --noout
Even parsing all files like this is >10 times faster.
Running with "perf record -g -- xmllint --timing --xinclude --noout
glib-docs.xml" gets me such a report.
+ 17.15% 16.69% xmllint libc-2.24.so [.] _int_malloc
+ 11.93% 11.87% xmllint libc-2.24.so [.]
malloc_consolidate
+ 9.01% 8.97% xmllint libxml2.so.2.9.4 [.] xmlDictLookup
+ 7.15% 0.00% xmllint ld-2.24.so [.]
0xffff8021a0022010
+ 6.25% 6.21% xmllint libxml2.so.2.9.4 [.] xmlHashAddEntry3
+ 6.22% 0.00% xmllint libxml2.so.2.9.4 [.]
xmlSAX2IsStandalone
+ 6.22% 0.00% xmllint [unknown] [.]
0x56413c74c0854810
+ 3.95% 3.94% xmllint libxml2.so.2.9.4 [.] xmlHashLookup2
3.72% 3.70% xmllint libc-2.24.so [.] _int_free
+ 3.28% 0.00% xmllint [unknown] [.] 0000000000000000
+ 3.06% 3.04% xmllint libxml2.so.2.9.4 [.]
xmlFreeDocElementContent
+ 2.96% 2.91% xmllint libc-2.24.so [.] free
The callgraph based reports (perf report -g or -G) are usually more
helpful.
here is a full perf.log with callgraph.
https://gist.github.com/ensonic/a73608edd60a995374e0961d3840eb4f
Any ideas. Is there a know issues with using xincludes here?
It might be quadratic behavior in the XInclude engine or something
else entirely. How large is glib-docs.xml after processing XIncludes?
See above. Btw. only the toplevel doc is using xincludes, the included
docs don't include other docs:
$grep "xi:inc" glib-docs.xml | wc -l
116
Nick
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]