Re: [xml] performance of parsing docbook with xincludes



On 05/16/2018 12:41 AM, Nick Wellnhofer wrote:
On May 15, 2018, at 21:56 , Stefan Sauer <ensonic hora-obscura de> wrote:
On 05/15/2018 08:40 PM, Stefan Sauer wrote:
On 05/15/2018 12:42 PM, Nick Wellnhofer wrote:
Can you try to change the line to

    xmlCtxtUseOptions(pctxt, ctxt->parseFlags);

and see if it helps?

It does not help. I'll experiment further. Thanks for the recomendations.
I think you also have to remove the line at https://git.gnome.org/browse/libxml2/tree/xinclude.c#n463

    pctxt->loadsubset |= XML_DETECT_IDS;

Looks like the idea is to make sure that ID attributes are detected for XIncludes with XPointers. IMO, it 
should be the application's responsibility to set the XML_PARSE_DTDLOAD flag in this case. But changing the 
behavior might break code that relies on this feature.
This helps!

LD_LIBRARY_PATH=~/debug/lib ~/debug/bin/xmllint --timing --xinclude
--nonet --noent --noout glib-docs.xml
Parsing took 0 ms
Xinclude processing took 179 ms
Freeing took 17 ms

So one solution could be another flag to enable this?
Is libxml2 doing that for each file over and over?
Yes.
Actually easy to confirm using --load-trace:
https://gist.github.com/ensonic/e1c4c7f80a0c072d119a649722de1e20
Wouldn't it make sense to only load each dtd once?
This would make sense.

And where exatly is it loaded (I can only
see xmlFreeDtd, but can't find a xmlLoadDtd or the like.
Via xmlParseDocument -> xmlSAX2ExternalSubset -> xmlParseExternalSubset.
Thanks, reading the code. Need to figure where we could cache external
subsets and what a suitable keys is (ExternalID ?).

Stefan


Nick





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]