Re: [xml] runtest mystery bug: name2.xml error case regression test

On Wed, Sep 12, 2012 at 05:12:43PM -0400, Daniel Richard G. wrote:
On Wed, 12 Sep 2012, Daniel Veillard wrote:

I could try to put Ubuntu on a VM too and see what is going on.
Did you manage to isolate what specific test is failing, doing the
same through xmllint command line test might be easier to debug,

I did some more digging on this, this time using GCC's
-finstrument-functions in conjunction with Michal Ludvig's
handy-dandy CygProfiler suite
(, and have obtained
some interesting results. But first, a question...

The value of INPUT_CHUNK in include/libxml/parserInternals.h: Is it
valid/legal to crank this value up? Like, say, from 250 to 250000?

  no :-) You would require the parsers to always have 250KB of readahead
data in the buffer (ahead of the current parsing point). this is not the
I/O block read value (which is MINLEN 4000 in xmlIO.c). It would also
lead the parser to not shrink the read buffer on a regular basis.
  Too much read-ahead does not help, just the opposite I'm afraid.
And there are some

That change causes the (unmodified) runtest program to do this on FC17:

      $ ./runtest
      ## XML regression tests
      ## XML regression tests on memory
      ## XML entity subst regression tests
      ## XML Namespaces regression tests
      ## Error cases regression tests
      Error for ./test/errors/attr1.xml failed
      File ./test/errors/attr1.xml generated an error
      Error for ./test/errors/attr2.xml failed
      File ./test/errors/attr2.xml generated an error
      Error for ./test/errors/name2.xml failed
      File ./test/errors/name2.xml generated an error

  i would assume this change the output of the error messages, why
and how, i don't know.
      ## Error cases stream regression tests
      ## Reader regression tests
      ## Reader entities substitution regression tests
      ## Reader on memory regression tests
      ## Walker regression tests
      ## SAX1 callbacks regression tests
      Got a difference for ./test/rdf2
      File ./test/rdf2 generated an error
      ## SAX2 callbacks regression tests
      Got a difference for ./test/rdf2
      File ./test/rdf2 generated an error

  Could be due to 2 consecutive character() callback not split at
the same level due to the change in buffering.

      ## XML push regression tests

(All the other "make check" tests pass.)

I was finally able to do a proper execution-trace diff with the
CygProfiler output, which showed that the good versus bad runs
diverged in xmlParseStartTag2(). Further GDB and printf() action
seemed to point to line 9213:

      if (ctxt->input->base != base) goto base_changed;

So the issue, as far as I can tell, appears to be realloc()
shenanigans (or something a lot like it).

  Hum, I can try to explain what thise does there: we are parsing a
  start tag and we 

  ....<name attr1="...>

cur counts the number of characters from the beginning of the input
buffer until the 'n', base is a pointer to the beginning of the input
buffer. We want all the start tag to be in the input buffer to provide
a SAX callback without copying strings out, only pointers to the buffer.
So if while parsing name we notice that the bufer had to be expanded
(and we have good tests to check that) we may need to restart the
parsing phase of that start tag from scratch. That one of the most
tricky part of the 'new' parser :-)

  Now somehow you hit a problem there, it might be useful to understand
what the parser does at that point, does it fail parsing (if yes which
error) does it succeed parsing but with incorrect data ?

 Interesting, the only scenario which could break there would be if
xmlParseQName() where shrinking the buffer making it impossible to get
back to the start of the name, and most likely leading to a parsing


Daniel Veillard      | libxml Gnome XML XSLT toolkit
daniel veillard com  | Rpmfind RPM search engine | virtualization library

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]