Re: [xml] bug 362989

On Tue, Oct 31, 2006 at 02:25:49PM -0800, Yong Chen (yongche) wrote:

Ok. I actually have been trying to understand/debug this issue. This
problem doesn't show up when the file is not big enough.

It's due to xmlFAReduceEpsilonTransitions (in xmlregexp.c) calling
itself (recursively) infinitely. In this function, at line 1623 (or so),
you have code:

      if ( to->trans[transnr].counter>=0 )
          xmlFAReduceEpsilonTransitions(ctxt, fromnr,
          xmlFAReduceEpsilonTransitions(ctxt, fromnr,

When the problem happens, normally "to->trans[transnr].counter" is -1,
so the 2nd part of the code is executed. But actually the "counter" also
has a value of -1, so there's no difference between 1st or 2nd part of
the code. Is it a problem? (during my debug, I see the loop happens here
with both to->trans[transnr].counter and counter being -1).

  That means non counted transitions IIRC, i.e. the most common normal basic

I tried to change the code so when it executes the 2nd part, it also
checks "if counter>=0" on "else" line, as following:

      if ( to->trans[transnr].counter>=0 )
          xmlFAReduceEpsilonTransitions(ctxt, fromnr,
      else if ( counter>=0 )
          xmlFAReduceEpsilonTransitions(ctxt, fromnr,

But it seems the program terminates too early.

  if you make any change make sure you still pass all the XSD, RelaxNG and
DTD validation from the suite which rely on that code to spot any regression.
At the bare minimum, runtest and runsuite should be tested.

Do you have any document regarding the purpose/design of this
file/function that can help me understand it better?

  The dragon book, i.e. the most common compiler design reference [1].
This is the epsilon transition elimination in the graph of transition.
Each time someone tried to point out that the implementation of this function
was broken it was something else, it's just where the problem usually show
up, the error being the construction of the graph automata. Of course the
construct in the automata are a bit more complex than in the book.
  It is not trivial, it takes serious concentration and time to really
understand where the problem comes from. The bigger the example, the bigger
the automatas, the more time it takes. You may start to understand why I insist
on getting a reduced problem. Yes it takes active work on your part to
isolate what is the part of the schemas which generate the problem. Not
doing it means more work to me trying to understand what is going on.
Currently your schemas is still 10 pages long, have you tried to drop
each logical parts as separate schemas and tried to reproduce on every
combination ? You're also not providing an example test which is supposed
to be accepted by said schemas this may help understand the structure
which need to be generated and how to slit it out into a smaller debuggable
chunk. It's your time against my time, and I'm busy working on completely
different code for my employer ATM, if you're in hurry simplify the test,
it will probably be faster than trying to understand that part of the
code (though getting more people learning that code is a good idea).
  I would also like to raise the point of the 'niceness' of that schemas.
It seems to me trying to understand the semantic of that is near impossible,
  <xs:group ref="optional_group_4" minOccurs="0" maxOccurs="1"/>
looks like automatically generated schemas with names without semantic
constant references to labels, making even harder to get the full picture.
It's a bit like asking to debug code which was automatically generated
i.e. even more painful than when you can extract the semantic of the



Red Hat Virtualization group
Daniel Veillard      | virtualization library
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]