Re: [xml] runtest mystery bug: name2.xml error case regression test



On Wed, 12 Sep 2012, Daniel Veillard wrote:

The value of INPUT_CHUNK in include/libxml/parserInternals.h: Is it valid/legal to crank this value up? Like, say, from 250 to 250000?

no :-) You would require the parsers to always have 250KB of readahead data in the buffer (ahead of the current parsing point). this is not the I/O block read value (which is MINLEN 4000 in xmlIO.c). It would also lead the parser to not shrink the read buffer on a regular basis.
 Too much read-ahead does not help, just the opposite I'm afraid.

Right; the point was not to make this change officially, but to change the buffer-growing behavior in a way that teases out the bug. I saw INPUT_CHUNK in the code, and figured frobbing that would do something.

What I was asking is, should everything still work correctly with that larger value? Not break anything, still give the same results? (Because if not, I'll need to find some other way of making the bug reproducible on FC17.)

So the issue, as far as I can tell, appears to be realloc()
shenanigans (or something a lot like it).

 Hum, I can try to explain what thise does there: we are parsing a
 start tag and we

 ....<name attr1="...>

cur counts the number of characters from the beginning of the input buffer until the 'n', base is a pointer to the beginning of the input buffer. We want all the start tag to be in the input buffer to provide a SAX callback without copying strings out, only pointers to the buffer. So if while parsing name we notice that the bufer had to be expanded (and we have good tests to check that) we may need to restart the parsing phase of that start tag from scratch. That one of the most tricky part of the 'new' parser :-)

And if the buffer expands, the address might change, hence the pointer check...

 Now somehow you hit a problem there, it might be useful to understand
what the parser does at that point, does it fail parsing (if yes which
error) does it succeed parsing but with incorrect data ?

Well, you know what's going on much better than I do ^_^ Can you reproduce this?

Interesting, the only scenario which could break there would be if xmlParseQName() where shrinking the buffer making it impossible to get back to the start of the name, and most likely leading to a parsing failure.

The printf()s I alluded to earlier went into that conditional, showing which branch was followed: "base has changed" or "base is equal". With the cut-down runtest, when it succeeds, there are eight "equal" lines and one "changed". When it fails, there are nine "equal"s. So it seems like a *lack* of growing is what leads to the bug....


--Daniel


--
Daniel Richard G. || danielg teragram com || Software Developer
Teragram Linguistic Technologies (a division of SAS)
http://www.teragram.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]