Re: [xml] performance of parsing docbook with xincludes



On 05/17/2018 04:18 PM, Nick Wellnhofer wrote:
On 16/05/2018 21:51, Stefan Sauer wrote:
So one solution could be another flag to enable this?

Yes, but it would be rather ugly.
In which sense? I guess because it is something that noone should need
to know about or have to care about?

Thanks, reading the code. Need to figure where we could cache external
subsets and what a suitable keys is (ExternalID ?).

Note that I'm currently not planning to review and integrate larger
patches from other developers. I only took over some libxml2
maintenance duties because noone else did. So even if you write a
high-quality patch, it might never get merged.
Thanks for making this clear upfront. This is how I ended up becoming
the gtkdoc maintainer :)


Caching external subsets for XIncludes certainly sounds like a nice
feature but I would prefer to find a simpler solution. For example,
can't you just omit the external DTD from included documents?
Yeah, right now, the benefit of having the DTD is that one can validate
fragments. I'll do some research (aka grepping over existing projects)
to see how the doc-type headers being used today look like. If all that
people do is using an entity to inject the version, I'll write a
migration tool.

We have a test that validates the doc, but I think I can change this to
just resolve all xincludes and check through the top-level doctype.


You wrote:

and gtk-doc will replicate this for the fragments (replacing 'book' with
e.g. 'refentry'). This way one can e.g. inject things like a version.

What do you mean by "inject things like a version"? Why exactly do
your included documents have to reference an external DTD?

The documentation consists of a handwritten master doc (type book), that
includes more handwritten parts (e.g. tutorials, guides) and include
generated reference docs. When gtkdoc generated the reference docs, it
applies takes the doctype header of the master-doc as a template and
uses that for the generated reference docs. If the master doc has
entities declared, those can be expanded in the reference fragments.
Thats the part I will check how widely it is actually used.

Stefan


Another idea is to stop loading external DTDs for XIncludes without an
XPointer expression. This would still change the behavior for some
users but it's much less likely to cause problems.

Nick

I definitely don't know enough about the implications here. I was mostly
thinking to see if we can stick a dictionary of <dtd-identifier,
xmlDtdPtr> into the Parser Context and before actually loading a dtd,
check if we did already and reuse. Somehow the dict needs to be stored
in the top-level doc, when parsing is done (do we need the dtds once the
doc has been parsed?). We only free the dtds with the top-level doc. But
I agree, it is not going to be a two liner.

Stefan




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]