Re: [xml] FW: Regexp Failure



On Wed, Aug 22, 2007 at 09:59:28PM +0530, Ashwin wrote:

   On Fri, Aug 03, 2007 at 08:22:37PM +0530, Ashwin wrote:

   >

   >    Hi,

   >

   >     While  testing the xmlFARegExec function if I give the following
   input

   >

   >     (0|1|2|3|4|5|6|7|8|9)    (0,10)    (The  rule  to  be  used  for
   matching the

   >    input expression)

   >

   >    Expression to be matched 1234567891 (length 10)

   >

   >     This  is returning a failure, if I reduce the expression by 1 it
   works

   >    fine. Is this because of incorrect usage?


   >  Hum, no, looks like a bug, strange ...

   >    paphio:~/XML    ->   ./testRegexp   '(0|1|2|3|4|5|6|7|8|9){0,10}'
   '1234567891'

   > Testing (0|1|2|3|4|5|6|7|8|9){0,10}:

   > 1234567891: Fail

   >    paphio:~/XML    ->   ./testRegexp   '(0|1|2|3|4|5|6|7|8|9){0,10}'
   '123456789'

   > Testing (0|1|2|3|4|5|6|7|8|9){0,10}:

   > 123456789: Ok


     Could be worth bugzilla'ing that's something I should be able to fix
   !


   Hi,

       With  regard to the above problem I made some modifications to the
   generate epsilon function, which seems to have solved the problem when
   a  range  with  0  is specified, however now it so happens that if the
   input  is  more  than  the  specified range regxpexec function returns
   success instead of failure. The code change is as follows:-

   In file xmlregxp.c


   1522              if (atom->min == 0) {

   1523                              xmlFAGenerateEpsilonTransition(ctxt,
   atom->start,atom->stop);

   1524               newstate = xmlRegNewState(ctxt);

   1525               xmlRegStatePush(ctxt, newstate);

                  ctxt->state = newstate;

                      xmlFAGenerateEpsilonTransition(ctxt,   atom->start,
   newstate);


                 counter = xmlRegGetCounter(ctxt);

                 ctxt->counters[counter].min = atom->min - 1;

                   ctxt->counters[counter].max  =  atom->max - 1;// These
   three lines were not part of the earlier if condition

                 counter = -1;

              }

           else

           {

               counter = xmlRegGetCounter(ctxt);

               ctxt->counters[counter].min = atom->min - 1;

               ctxt->counters[counter].max = atom->max - 1;

           }


   I  have  to  admit that this is nothing but a shabby workaround, if at
   all  one  can call it even that. Passing the value of counter as -1 in
   the         functions         xmlFAGenerateCountedTransition         &
   xmlFAGenerateCountedEpsilonTransition  immediately  following this bit
   of  code  seems to solve the problem.The basis for the above change is
   the  fact  I  have  a  hunch  the  problem lies in counted transitions
   getting  generated where they are not required. Consider the following
   cases(I used xmlRegxpPrint to print the below on the console):-

   On the left hand side is the case where range was 0, and maximum range
   string  was  returning  failure  (after  the  code  change this is now
   returning success)


   The  input string is 012                           The input String is
   122

   Testing        (0|1|2){0,3}:                                   Testing
   (0|1|2){1,3}:

    regexp:       '(0|1|2){0,3}'                                  regexp:
   '(0|1|2){1,3}'

   4           atoms:                                                   4
   atoms:

    00   atom:  charval once char 0                     00  atom: charval
   once char 0

    01   atom:  charval once char 1                     01  atom: charval
   once char 1

    02   atom:  charval once char 2                     02  atom: charval
   once char 2

    03   atom:  subexpr once start -572662307 end 2     03  atom: subexpr
   once start 4 end 2

   5           states:                                                  4
   states:

    state:  START  0, 8 transitions:                    state: START 0, 4
   transitions:

         trans:     removed                                        trans:
   removed

     trans: removed                                    trans: char 0 atom
   0, to 2

     trans: removed                                    trans: char 1 atom
   1, to 2

     trans: removed                                    trans: char 2 atom
   2, to 2

       trans:   count   based   0,   epsilon  to  4                state:
   NULL

      trans:  counted  0,  char  0  atom  0,  to 2            state: 2, 5
   transitions:

     trans: counted 0, char 1 atom 1, to 2             trans: count based
   0, epsilon to 3

       trans:   counted  0,  char  2  atom  2,  to  2              trans:
   removed

    state:  NULL                                        trans: counted 0,
   char 0 atom 0, to 2

    state:  2, 5 transitions:                           trans: counted 0,
   char 1 atom 1, to 2

      trans: count based 0, epsilon to 4                trans: counted 0,
   char 2 atom 2, to 2

      trans:  removed                                   state: FINAL 3, 0
   transitions:

       trans:   counted   0,   char   0   atom   0,   to   2            1
   counters:

      trans:  counted  0,  char  1  atom  1, to 2            0: min 0 max
   2

       trans:   counted   0,   char   2   atom  2,  to  2            122:
   Ok

    state:
   NULL

    state: FINAL 4, 0
   transitions:

   1
   counters:

    0: min -1 max
   2

    122:Fail


   I  made  the change on the assumption that the part highlighted on LHS
   should  be  similar  to  RHS (I might of course be completely wrong.).
   However  now for cases like (0|1|2|3){0,3} if I give the input as 1231
   it returns success.


   Regards

  I'm sorry but once it went though the various mail agents and processing your
mail is near unreadable :-\
  If you have a change to submit, please add a patch (preferably using diff
-p as an attachment), this will make sure at least the part related to 
functional changes is not scrambled, and this has the good point of being
precise to the point that a machine can the apply it. So please, 
    1/ drop HTML mail formatting
    2/ attach the patch based on diff

  BTW using the version in SVN after my fix I see:

paphio:~/XML -> ./testRegexp   '(0|1|2|3|4|5|6|7|8|9){0,10}' '123456789'
Testing (0|1|2|3|4|5|6|7|8|9){0,10}:
123456789: Ok
paphio:~/XML -> ./testRegexp   '(0|1|2|3|4|5|6|7|8|9){0,10}' '1234567890'
Testing (0|1|2|3|4|5|6|7|8|9){0,10}:
1234567890: Ok
paphio:~/XML -> ./testRegexp   '(0|1|2|3|4|5|6|7|8|9){0,10}' '12345678901'
Testing (0|1|2|3|4|5|6|7|8|9){0,10}:
12345678901: Fail
paphio:~/XML -> 

  So this might be a duplicate of the previous issue,

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]