[xml] FW: Regexp Failure



On Fri, Aug 03, 2007 at 08:22:37PM +0530, Ashwin wrote:

>

>    Hi,

>

>    While testing the xmlFARegExec function if I give the following input

>

>    (0|1|2|3|4|5|6|7|8|9)   (0,10)   (The rule to be used for matching the

>    input _expression_)

>

>    _expression_ to be matched 1234567891 (length 10)

>

>    This  is returning a failure, if I reduce the _expression_ by 1 it works

>    fine. Is this because of incorrect usage?

 

>  Hum, no, looks like a bug, strange ...

> paphio:~/XML -> ./testRegexp '(0|1|2|3|4|5|6|7|8|9){0,10}' '1234567891'

> Testing (0|1|2|3|4|5|6|7|8|9){0,10}:

> 1234567891: Fail

> paphio:~/XML -> ./testRegexp '(0|1|2|3|4|5|6|7|8|9){0,10}' '123456789'

> Testing (0|1|2|3|4|5|6|7|8|9){0,10}:

> 123456789: Ok

 

  Could be worth bugzilla'ing that's something I should be able to fix !

 

Hi,

   With regard to the above problem I made some modifications to the generate epsilon function, which seems to have solved the problem when a range with 0 is specified, however now it so happens that if the input is more than the specified range regxpexec function returns success instead of failure. The code change is as follows:-

In file xmlregxp.c

 

1522              if (atom->min == 0) {

1523               xmlFAGenerateEpsilonTransition(ctxt, atom->start,atom->stop);

1524               newstate = xmlRegNewState(ctxt);

1525               xmlRegStatePush(ctxt, newstate);

               ctxt->state = newstate;

               xmlFAGenerateEpsilonTransition(ctxt, atom->start, newstate);

 

              counter = xmlRegGetCounter(ctxt);

              ctxt->counters[counter].min = atom->min - 1;

              ctxt->counters[counter].max = atom->max - 1;// These three lines were not part of the earlier if condition

              counter = -1;

           }

        else

        {

            counter = xmlRegGetCounter(ctxt);

            ctxt->counters[counter].min = atom->min - 1;

            ctxt->counters[counter].max = atom->max - 1;

        }

 

I have to admit that this is nothing but a shabby workaround, if at all one can call it even that. Passing the value of counter as -1 in the functions xmlFAGenerateCountedTransition & xmlFAGenerateCountedEpsilonTransition immediately following this bit of code seems to solve the problem.The basis for the above change is the fact I have a hunch the problem lies in counted transitions getting generated where they are not required. Consider the following cases(I used xmlRegxpPrint to print the below on the console):-

On the left hand side is the case where range was 0, and maximum range string was returning failure (after the code change this is now returning success)

 

The input string is 012                           The input String is 122                 

Testing (0|1|2){0,3}:                             Testing (0|1|2){1,3}:                   

 regexp: '(0|1|2){0,3}'                            regexp: '(0|1|2){1,3}'                 

4 atoms:                                          4 atoms:                                

 00  atom: charval once char 0                     00  atom: charval once char 0          

 01  atom: charval once char 1                     01  atom: charval once char 1          

 02  atom: charval once char 2                     02  atom: charval once char 2          

 03  atom: subexpr once start -572662307 end 2     03  atom: subexpr once start 4 end 2   

5 states:                                         4 states:                               

 state: START 0, 8 transitions:                    state: START 0, 4 transitions:         

  trans: removed                                    trans: removed                        

  trans: removed                                    trans: char 0 atom 0, to 2            

  trans: removed                                    trans: char 1 atom 1, to 2            

  trans: removed                                    trans: char 2 atom 2, to 2            

  trans: count based 0, epsilon to 4               state: NULL                            

  trans: counted 0, char 0 atom 0, to 2            state: 2, 5 transitions:                

  trans: counted 0, char 1 atom 1, to 2             trans: count based 0, epsilon to 3    

  trans: counted 0, char 2 atom 2, to 2             trans: removed                        

 state: NULL                                        trans: counted 0, char 0 atom 0, to 2 

 state: 2, 5 transitions:                           trans: counted 0, char 1 atom 1, to 2 

  trans: count based 0, epsilon to 4                trans: counted 0, char 2 atom 2, to 2 

  trans: removed                                   state: FINAL 3, 0 transitions:         

  trans: counted 0, char 0 atom 0, to 2           1 counters:                             

  trans: counted 0, char 1 atom 1, to 2            0: min 0 max 2                         

  trans: counted 0, char 2 atom 2, to 2           122: Ok                                 

 state: NULL                                                                              

 state: FINAL 4, 0 transitions:                                                            

1 counters:                                                                               

 0: min -1 max 2                                                                          

 122:Fail

 

I made the change on the assumption that the part highlighted on LHS should be similar to RHS (I might of course be completely wrong.). However now for cases like (0|1|2|3){0,3} if I give the input as 1231 it returns success.  

 

Regards

Ashwin                                                                                 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]