Re: [xml] [bug] external subset ignored by 2.9.0 when parsing in incremental mode

On Fri, Sep 28, 2012 at 7:44 AM, Stefan Behnel <stefan_ml behnel de> wrote:
there is an unfortunate interaction between the "progressive" parsing mode
and the loading of an external DTD, e.g. to inject defaulted attribute
values. I see this in lxml's iterparse() implementation that started
failing to inject them in libxml2 2.9.0. It uses incremental push parsing.

I ran into this problem too. I have a testcase showing the problem
using xmllint:

% cat x.xml
<!DOCTYPE x SYSTEM "x.dtd">
% cat x.dtd
% xmllint --valid --noout --stream x.xml
x.xml:2: element x: validity error : No declaration for element x
x.xml:3: element x: validity error : No declaration for element x

Document x.xml does not validate

The problem results from the fact that xmlSAX2ExternalSubset() in SAX2.c
reuses the existing parser context, which, in this case, is in progressive
mode. When it calls into xmlParseExternalSubset(), that starts by running
the "GROW" macro, which is a no-opt in progressive mode. Thus, no data is
available and xmlParseExternalSubset() terminates without doing anything.

I'm not currently sure why it worked in older releases. I suspect that one
of the many additional places that now set the ctxt->progressive field to 1
might have triggered it.

A git bisect session points to this commit as the problem:
[5353bbf7dda0a01462109337c5fa34859d3e6d0b] More fixups on the push
parser behaviour

I'm not entirely sure about the right way to fix this. Maybe
xmlSAX2ExternalSubset() should also back up and restore the "progressive"
field of the context and then set it to 0 before calling
xmlParseExternalSubset()? I attached a patch that does that and that fixes
the problem for me.

This patch fixes my test case as well.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]