Re: [xml] performance of parsing docbook with xincludes
- From: Stefan Sauer <ensonic hora-obscura de>
- To: xml gnome org
- Subject: Re: [xml] performance of parsing docbook with xincludes
- Date: Tue, 15 May 2018 21:56:35 +0200
On 05/15/2018 08:40 PM, Stefan Sauer wrote:
On 05/15/2018 12:42 PM, Nick Wellnhofer wrote:
On 14/05/2018 21:48, Stefan Sauer wrote:
This part looks suspicious:
|--22.98%--0xc2160
| xmlFreeDoc
| |
| --22.42%--xmlFreeDtd
Can I tell it to not load dtds in the first place? Is it loading the
dtd for each an every xinclude?
Good catch. It seems that the XInclude engine always parses included
docs with XML_PARSE_DTDLOAD:
https://git.gnome.org/browse/libxml2/tree/xinclude.c#n450
If you're not using XML catalogs, this will probably cause the DTD to
be loaded over the network multiple times which could explain the
slowdown.
Can you try to change the line to
xmlCtxtUseOptions(pctxt, ctxt->parseFlags);
and see if it helps?
Nick
It does not help. I'll experiment further. Thanks for the recomendations.
and FYI: a call grpah plot:
https://imgur.com/a/d27xxor
As an experiemnt I dropped the doctype headers for the (generated)
xincluded files. So no it is 20 files with doctype headers + 105
(generated) files without doctype headers. And voila!
xmllint --timing --xinclude --noout glib-docs.xml
Parsing took 0 ms
Xinclude processing took 447 ms
Freeing took 19 ms
The docbook header looks like this:
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC '-//OASIS//DTD DocBook XML V4.5//EN'
'http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd' [
<!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED
'http://www.w3.org/2003/XInclude'">
<!ENTITY version SYSTEM "version.xml">
]>
and gtk-doc will replicate this for the fragments (replacing 'book' with
e.g. 'refentry'). This way one can e.g. inject things like a version.
I do have the /usr/share/xml/docbook/schema/dtd/4.5/docbookx.dtd locally
available. I guess there is no way avoiding to loading the dtd then. Is
libxml2 doing that for each file over and over? Wouldn't it make sense
to only load each dtd once? And where exatly is it loaded (I can only
see xmlFreeDtd, but can't find a xmlLoadDtd or the like.
Sorry for all the questions, but it looks like there is low hanging
fruit to save a lot of cpu time.
Stefan
Stefan
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]