[xml] [bug] external subset ignored by 2.9.0 when parsing in incremental mode



Hi,

there is an unfortunate interaction between the "progressive" parsing mode
and the loading of an external DTD, e.g. to inject defaulted attribute
values. I see this in lxml's iterparse() implementation that started
failing to inject them in libxml2 2.9.0. It uses incremental push parsing.

The problem results from the fact that xmlSAX2ExternalSubset() in SAX2.c
reuses the existing parser context, which, in this case, is in progressive
mode. When it calls into xmlParseExternalSubset(), that starts by running
the "GROW" macro, which is a no-opt in progressive mode. Thus, no data is
available and xmlParseExternalSubset() terminates without doing anything.

I'm not currently sure why it worked in older releases. I suspect that one
of the many additional places that now set the ctxt->progressive field to 1
might have triggered it.

I'm not entirely sure about the right way to fix this. Maybe
xmlSAX2ExternalSubset() should also back up and restore the "progressive"
field of the context and then set it to 0 before calling
xmlParseExternalSubset()? I attached a patch that does that and that fixes
the problem for me.

BTW, is it correct that "ctxt->progressive" is sometimes set to "1" and
sometimes to things like "XML_PARSER_COMMENT" or "XML_PARSER_PI" in
parser.c? Those values are more commonly assigned to the "instate" field.

Stefan

Attachment: sax-dtd-progressive.patch
Description: Text Data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]