Re: [xml] performance of parsing docbook with xincludes



On 05/15/2018 08:40 PM, Stefan Sauer wrote:
On 05/15/2018 12:42 PM, Nick Wellnhofer wrote:
On 14/05/2018 21:48, Stefan Sauer wrote:
This part looks suspicious:

                |--22.98%--0xc2160
                |          xmlFreeDoc
                |          |
                |           --22.42%--xmlFreeDtd
Can I tell it to not load dtds in the first place? Is it loading the
dtd for each an every xinclude?
Good catch. It seems that the XInclude engine always parses included
docs with XML_PARSE_DTDLOAD:

    https://git.gnome.org/browse/libxml2/tree/xinclude.c#n450

If you're not using XML catalogs, this will probably cause the DTD to
be loaded over the network multiple times which could explain the
slowdown.

Can you try to change the line to

    xmlCtxtUseOptions(pctxt, ctxt->parseFlags);

and see if it helps?

Nick
It does not help. I'll experiment further. Thanks for the recomendations.
and FYI: a call grpah plot:
https://imgur.com/a/d27xxor

As an experiemnt I dropped the doctype headers for the (generated)
xincluded files. So no it is 20 files with doctype headers  + 105
(generated) files without doctype headers. And voila!

xmllint --timing --xinclude  --noout glib-docs.xml
Parsing took 0 ms
Xinclude processing took 447 ms
Freeing took 19 ms

The docbook header looks like this:

<?xml version="1.0"?>
<!DOCTYPE book PUBLIC '-//OASIS//DTD DocBook XML V4.5//EN'
                     
'http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd' [
<!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED
'http://www.w3.org/2003/XInclude'">
<!ENTITY version SYSTEM "version.xml">
]>

and gtk-doc will replicate this for the fragments (replacing 'book' with
e.g. 'refentry'). This way one can e.g. inject things like a version.

I do have the /usr/share/xml/docbook/schema/dtd/4.5/docbookx.dtd locally
available. I guess there is no way avoiding to loading the dtd then. Is
libxml2 doing that for each file over and over? Wouldn't it make sense
to only load each dtd once? And where exatly is it loaded (I can only
see xmlFreeDtd, but can't find a xmlLoadDtd or the like.

Sorry for all the questions, but it looks like there is low hanging
fruit to save a lot of cpu time.

Stefan


Stefan

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]