Re: [xml] performance of parsing docbook with xincludes



Seems that the list does not like attachements, so resensing with links

On 05/14/2018 12:19 PM, Nick Wellnhofer wrote:
On 13/05/2018 20:54, Stefan Sauer wrote:
Lets look at some numbers using glib
(https://gitlab.gnome.org/GNOME/glib)

cd glib/docs/reference/glib
xmllint --timing --xinclude --noout glib-docs.xml
Parsing took 0 ms
Xinclude processing took 4560 ms
Freeing took 91 ms

Any idea how I can get more breakdown of whats happening in  'Xinclude
processing'?

It seems that "XInclude processing" also contains the time needed to
parse the included documents, so maybe the XIncludes aren't the issue
at all (glib-docs.xml is a small document including several larger
ones). Can you save glib-docs.xml after processing XIncludes and check
whether parsing the consolidated document is considerably faster?

$ xmllint --timing --xinclude glib-docs.xml >glib-full.xml

Parsing took 0 ms
Xinclude processing took 4726 ms
Saving took 85 ms
Freeing took 104 ms

$ xmllint --timing --noout glib-full.xml

Parsing took 151 ms
Freeing took 14 ms

$ ls -alh glib-full.xml

-rw-r--r-- 1 ensonic users 6.8M May 14 12:42 glib-full.xml

Parsing the consolidated doc is a magnitude faster. Thanks for
suggesting this test.

$ xtime sh -c 'find . -name "*.xml" | grep -v version.xml | xargs xmllint --noout'
0.40u 0.04s 0.44r 70296kB sh -c find . -name "*.xml" | grep -v version.xml | xargs xmllint --noout

Even parsing all files like this is >10 times faster.


Running with "perf record -g -- xmllint --timing --xinclude --noout
glib-docs.xml" gets me such a report.

+   17.15%    16.69%  xmllint  libc-2.24.so        [.] _int_malloc
+   11.93%    11.87%  xmllint  libc-2.24.so        [.]
malloc_consolidate
+    9.01%     8.97%  xmllint  libxml2.so.2.9.4    [.] xmlDictLookup
+    7.15%     0.00%  xmllint  ld-2.24.so          [.]
0xffff8021a0022010
+    6.25%     6.21%  xmllint  libxml2.so.2.9.4    [.] xmlHashAddEntry3
+    6.22%     0.00%  xmllint  libxml2.so.2.9.4    [.]
xmlSAX2IsStandalone
+    6.22%     0.00%  xmllint  [unknown]           [.]
0x56413c74c0854810
+    3.95%     3.94%  xmllint  libxml2.so.2.9.4    [.] xmlHashLookup2
      3.72%     3.70%  xmllint  libc-2.24.so        [.] _int_free
+    3.28%     0.00%  xmllint  [unknown]           [.] 0000000000000000
+    3.06%     3.04%  xmllint  libxml2.so.2.9.4    [.]
xmlFreeDocElementContent
+    2.96%     2.91%  xmllint  libc-2.24.so        [.] free

The callgraph based reports (perf report -g or -G) are usually more
helpful.
here is a full perf.log with callgraph.
https://gist.github.com/ensonic/a73608edd60a995374e0961d3840eb4f

Any ideas. Is there a know issues with using xincludes here?

It might be quadratic behavior in the XInclude engine or something
else entirely. How large is glib-docs.xml after processing XIncludes?

See above. Btw. only the toplevel doc is using xincludes, the included
docs don't include other docs:

$grep "xi:inc" glib-docs.xml | wc -l

116




Nick




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]