Re: [xml] runtest mystery bug: name2.xml error case regression test
- From: Daniel Richard G. <oss teragram com>
- To: Daniel Veillard <veillard redhat com>
- Cc: xml gnome org
- Subject: Re: [xml] runtest mystery bug: name2.xml error case regression test
- Date: Wed, 12 Sep 2012 22:13:23 -0400
On Wed, 12 Sep 2012, Daniel Veillard wrote:
The value of INPUT_CHUNK in include/libxml/parserInternals.h: Is it
valid/legal to crank this value up? Like, say, from 250 to 250000?
no :-) You would require the parsers to always have 250KB of readahead
data in the buffer (ahead of the current parsing point). this is not the
I/O block read value (which is MINLEN 4000 in xmlIO.c). It would also
lead the parser to not shrink the read buffer on a regular basis.
Too much read-ahead does not help, just the opposite I'm afraid.
Right; the point was not to make this change officially, but to change the
buffer-growing behavior in a way that teases out the bug. I saw
INPUT_CHUNK in the code, and figured frobbing that would do something.
What I was asking is, should everything still work correctly with that
larger value? Not break anything, still give the same results? (Because if
not, I'll need to find some other way of making the bug reproducible on
FC17.)
So the issue, as far as I can tell, appears to be realloc()
shenanigans (or something a lot like it).
Hum, I can try to explain what thise does there: we are parsing a
start tag and we
....<name attr1="...>
cur counts the number of characters from the beginning of the input
buffer until the 'n', base is a pointer to the beginning of the input
buffer. We want all the start tag to be in the input buffer to provide a
SAX callback without copying strings out, only pointers to the buffer.
So if while parsing name we notice that the bufer had to be expanded
(and we have good tests to check that) we may need to restart the
parsing phase of that start tag from scratch. That one of the most
tricky part of the 'new' parser :-)
And if the buffer expands, the address might change, hence the pointer
check...
Now somehow you hit a problem there, it might be useful to understand
what the parser does at that point, does it fail parsing (if yes which
error) does it succeed parsing but with incorrect data ?
Well, you know what's going on much better than I do ^_^ Can you
reproduce this?
Interesting, the only scenario which could break there would be if
xmlParseQName() where shrinking the buffer making it impossible to get
back to the start of the name, and most likely leading to a parsing
failure.
The printf()s I alluded to earlier went into that conditional, showing
which branch was followed: "base has changed" or "base is equal". With the
cut-down runtest, when it succeeds, there are eight "equal" lines and one
"changed". When it fails, there are nine "equal"s. So it seems like a
*lack* of growing is what leads to the bug....
--Daniel
--
Daniel Richard G. || danielg teragram com || Software Developer
Teragram Linguistic Technologies (a division of SAS)
http://www.teragram.com/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]