Re: [xml] performance of parsing docbook with xincludes



On 05/14/2018 09:48 PM, Stefan Sauer wrote:
On 05/14/2018 12:19 PM, Nick Wellnhofer wrote:
On 13/05/2018 20:54, Stefan Sauer wrote:
Lets look at some numbers using glib (https://gitlab.gnome.org/GNOME/glib)

cd glib/docs/reference/glib
xmllint --timing --xinclude --noout glib-docs.xml
Parsing took 0 ms
Xinclude processing took 4560 ms
Freeing took 91 ms

Any idea how I can get more breakdown of whats happening in  'Xinclude
processing'?

It seems that "XInclude processing" also contains the time needed to parse the included documents, so maybe the XIncludes aren't the issue at all (glib-docs.xml is a small document including several larger ones). Can you save glib-docs.xml after processing XIncludes and check whether parsing the consolidated document is considerably faster?

Running with "perf record -g -- xmllint --timing --xinclude --noout
glib-docs.xml" gets me such a report.

+   17.15%    16.69%  xmllint  libc-2.24.so        [.] _int_malloc
+   11.93%    11.87%  xmllint  libc-2.24.so        [.] malloc_consolidate
+    9.01%     8.97%  xmllint  libxml2.so.2.9.4    [.] xmlDictLookup
+    7.15%     0.00%  xmllint  ld-2.24.so          [.] 0xffff8021a0022010
+    6.25%     6.21%  xmllint  libxml2.so.2.9.4    [.] xmlHashAddEntry3
+    6.22%     0.00%  xmllint  libxml2.so.2.9.4    [.] xmlSAX2IsStandalone
+    6.22%     0.00%  xmllint  [unknown]           [.] 0x56413c74c0854810
+    3.95%     3.94%  xmllint  libxml2.so.2.9.4    [.] xmlHashLookup2
      3.72%     3.70%  xmllint  libc-2.24.so        [.] _int_free
+    3.28%     0.00%  xmllint  [unknown]           [.] 0000000000000000
+    3.06%     3.04%  xmllint  libxml2.so.2.9.4    [.]
xmlFreeDocElementContent
+    2.96%     2.91%  xmllint  libc-2.24.so        [.] free

The callgraph based reports (perf report -g or -G) are usually more helpful.

This part looks suspicious:
               |--22.98%--0xc2160
               |          xmlFreeDoc
               |          |          
               |           --22.42%--xmlFreeDtd
               |                     |          
               |                     |--19.62%--xmlHashFree
               |                     |          |          
               |                     |          |--10.03%--_int_free
               |                     |          |          |          
               |                     |          |           --9.56%--malloc_consolidate
               |                     |          |          
               |                     |          |--3.69%--0x7e560
               |                     |          |          xmlFreeDocElementContent
               |                     |          |          |          
               |                     |          |           --2.19%--xmlFreeDocElementContent
               |                     |          |          
               |                     |          |--0.71%--0x7face
               |                     |          |          
               |                     |          |--0.66%--0x30498
               |                     |          |          
               |                     |           --0.61%--0x7fae3
               |                     |                     xmlUnlinkNode
               |                     |          
               |                      --0.89%--xmlFreeNode


Can I tell it to not load dtds in the first place? Is it loading the dtd for each an every xinclude?

Stefan
All my xincluded files have doctype headers like:
<?xml version="1.0"?>
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
               "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
]>

If I remove them it seems to become faster. I'll see if I can programmatically strip them all to be really sure though.

Stefan



Any ideas. Is there a know issues with using xincludes here?

It might be quadratic behavior in the XInclude engine or something else entirely. How large is glib-docs.xml after processing XIncludes?

Nick




_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]