Re: [xml] runtest mystery bug: name2.xml error case regression test



On Wed, Sep 12, 2012 at 10:13:23PM -0400, Daniel Richard G. wrote:
On Wed, 12 Sep 2012, Daniel Veillard wrote:

The value of INPUT_CHUNK in include/libxml/parserInternals.h: Is
it valid/legal to crank this value up? Like, say, from 250 to
250000?

no :-) You would require the parsers to always have 250KB of
readahead data in the buffer (ahead of the current parsing point).
this is not the I/O block read value (which is MINLEN 4000 in
xmlIO.c). It would also lead the parser to not shrink the read
buffer on a regular basis.
Too much read-ahead does not help, just the opposite I'm afraid.

Right; the point was not to make this change officially, but to
change the buffer-growing behavior in a way that teases out the bug.
I saw INPUT_CHUNK in the code, and figured frobbing that would do
something.

What I was asking is, should everything still work correctly with
that larger value? Not break anything, still give the same results?
(Because if not, I'll need to find some other way of making the bug
reproducible on FC17.)

  It may change some of the output as you reported, but not the document
content, so if that's the way you reproduced it on F17 okay

So the issue, as far as I can tell, appears to be realloc()
shenanigans (or something a lot like it).

Hum, I can try to explain what thise does there: we are parsing a
start tag and we

....<name attr1="...>

cur counts the number of characters from the beginning of the
input buffer until the 'n', base is a pointer to the beginning of
the input buffer. We want all the start tag to be in the input
buffer to provide a SAX callback without copying strings out, only
pointers to the buffer. So if while parsing name we notice that
the bufer had to be expanded (and we have good tests to check
that) we may need to restart the parsing phase of that start tag
from scratch. That one of the most tricky part of the 'new' parser
:-)

And if the buffer expands, the address might change, hence the
pointer check...

Now somehow you hit a problem there, it might be useful to understand
what the parser does at that point, does it fail parsing (if yes which
error) does it succeed parsing but with incorrect data ?

Well, you know what's going on much better than I do ^_^  Can you
reproduce this?

  No, I don't have a precise idea of what you actually changed nor which file
exposes the problem nor how you reproduce it except by running the
modified runtest.c , does that show up in xmllint ?

Interesting, the only scenario which could break there would be if
xmlParseQName() where shrinking the buffer making it impossible to
get back to the start of the name, and most likely leading to a
parsing failure.

The printf()s I alluded to earlier went into that conditional,
showing which branch was followed: "base has changed" or "base is
equal". With the cut-down runtest, when it succeeds, there are eight
"equal" lines and one "changed". When it fails, there are nine
"equal"s. So it seems like a *lack* of growing is what leads to the
bug....

  And the big buffer might be one of the way to force that behaviour.
Can your git diff your setup and tell me how you reproduce ?

 thanks,

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]