Re: [xml] performance of parsing docbook with xincludes



On 07/06/2018 00:00, Stefan Sauer wrote:
Another idea is to stop loading external DTDs for XIncludes without an
XPointer expression. This would still change the behavior for some
users but it's much less likely to cause problems.
change the behaviour, as in we would not catch validation errors?

No, nothing related to validation. If you validate a document, the DTDs will always be loaded. But parsing with or without XML_PARSE_DTDLOAD will obviously produce different results. It's hard to tell whether this will cause problems for users. But maybe I'm overly cautious. If someone parses a document without DTD flags, why would they assume that XIncluded documents are parsed with XML_PARSE_DTDLOAD?

Too bad that xmlXIncludeParseFile() does not get the parent parserCtx,
in that case we could apply the same flags'.

I think the original flags are already passed via xmlXIncludeSetFlags.

It seems that xmldict is only handling key and value to be a string,
right? So, we'll even need out one cache data structure. I'd say it
would need to be on the _xmlXIncludeCtxt level. global is easier, but
then we can't free it ever :/

xmlHash should work fine:

    http://xmlsoft.org/html/libxml-hash.html

But building a DTD cache would be the least of your problems. The hard part is to apply a cached DTD to a document. There are some interactions between internal and external subsets (see xmlAddElementDecl and xmlAddAttributeDecl in valid.c for example), so you it looks like you can't just simply set doc->extSubset to the cached DTD. You'd probably have to replay the calls to xmlAddElementDecl etc, maybe even in the original order which might be lost. That's why I wouldn't want to go down this route.

Nick


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]