Re: [xml] FW: Regexp Failure
- From: Daniel Veillard <veillard redhat com>
- To: Ashwin <ashwins huawei com>
- Cc: xml gnome org, ranjit huawei com
- Subject: Re: [xml] FW: Regexp Failure
- Date: Wed, 22 Aug 2007 12:41:41 -0400
On Wed, Aug 22, 2007 at 09:59:28PM +0530, Ashwin wrote:
On Fri, Aug 03, 2007 at 08:22:37PM +0530, Ashwin wrote:
>
> Hi,
>
> While testing the xmlFARegExec function if I give the following
input
>
> (0|1|2|3|4|5|6|7|8|9) (0,10) (The rule to be used for
matching the
> input expression)
>
> Expression to be matched 1234567891 (length 10)
>
> This is returning a failure, if I reduce the expression by 1 it
works
> fine. Is this because of incorrect usage?
> Hum, no, looks like a bug, strange ...
> paphio:~/XML -> ./testRegexp '(0|1|2|3|4|5|6|7|8|9){0,10}'
'1234567891'
> Testing (0|1|2|3|4|5|6|7|8|9){0,10}:
> 1234567891: Fail
> paphio:~/XML -> ./testRegexp '(0|1|2|3|4|5|6|7|8|9){0,10}'
'123456789'
> Testing (0|1|2|3|4|5|6|7|8|9){0,10}:
> 123456789: Ok
Could be worth bugzilla'ing that's something I should be able to fix
!
Hi,
With regard to the above problem I made some modifications to the
generate epsilon function, which seems to have solved the problem when
a range with 0 is specified, however now it so happens that if the
input is more than the specified range regxpexec function returns
success instead of failure. The code change is as follows:-
In file xmlregxp.c
1522 if (atom->min == 0) {
1523 xmlFAGenerateEpsilonTransition(ctxt,
atom->start,atom->stop);
1524 newstate = xmlRegNewState(ctxt);
1525 xmlRegStatePush(ctxt, newstate);
ctxt->state = newstate;
xmlFAGenerateEpsilonTransition(ctxt, atom->start,
newstate);
counter = xmlRegGetCounter(ctxt);
ctxt->counters[counter].min = atom->min - 1;
ctxt->counters[counter].max = atom->max - 1;// These
three lines were not part of the earlier if condition
counter = -1;
}
else
{
counter = xmlRegGetCounter(ctxt);
ctxt->counters[counter].min = atom->min - 1;
ctxt->counters[counter].max = atom->max - 1;
}
I have to admit that this is nothing but a shabby workaround, if at
all one can call it even that. Passing the value of counter as -1 in
the functions xmlFAGenerateCountedTransition &
xmlFAGenerateCountedEpsilonTransition immediately following this bit
of code seems to solve the problem.The basis for the above change is
the fact I have a hunch the problem lies in counted transitions
getting generated where they are not required. Consider the following
cases(I used xmlRegxpPrint to print the below on the console):-
On the left hand side is the case where range was 0, and maximum range
string was returning failure (after the code change this is now
returning success)
The input string is 012 The input String is
122
Testing (0|1|2){0,3}: Testing
(0|1|2){1,3}:
regexp: '(0|1|2){0,3}' regexp:
'(0|1|2){1,3}'
4 atoms: 4
atoms:
00 atom: charval once char 0 00 atom: charval
once char 0
01 atom: charval once char 1 01 atom: charval
once char 1
02 atom: charval once char 2 02 atom: charval
once char 2
03 atom: subexpr once start -572662307 end 2 03 atom: subexpr
once start 4 end 2
5 states: 4
states:
state: START 0, 8 transitions: state: START 0, 4
transitions:
trans: removed trans:
removed
trans: removed trans: char 0 atom
0, to 2
trans: removed trans: char 1 atom
1, to 2
trans: removed trans: char 2 atom
2, to 2
trans: count based 0, epsilon to 4 state:
NULL
trans: counted 0, char 0 atom 0, to 2 state: 2, 5
transitions:
trans: counted 0, char 1 atom 1, to 2 trans: count based
0, epsilon to 3
trans: counted 0, char 2 atom 2, to 2 trans:
removed
state: NULL trans: counted 0,
char 0 atom 0, to 2
state: 2, 5 transitions: trans: counted 0,
char 1 atom 1, to 2
trans: count based 0, epsilon to 4 trans: counted 0,
char 2 atom 2, to 2
trans: removed state: FINAL 3, 0
transitions:
trans: counted 0, char 0 atom 0, to 2 1
counters:
trans: counted 0, char 1 atom 1, to 2 0: min 0 max
2
trans: counted 0, char 2 atom 2, to 2 122:
Ok
state:
NULL
state: FINAL 4, 0
transitions:
1
counters:
0: min -1 max
2
122:Fail
I made the change on the assumption that the part highlighted on LHS
should be similar to RHS (I might of course be completely wrong.).
However now for cases like (0|1|2|3){0,3} if I give the input as 1231
it returns success.
Regards
I'm sorry but once it went though the various mail agents and processing your
mail is near unreadable :-\
If you have a change to submit, please add a patch (preferably using diff
-p as an attachment), this will make sure at least the part related to
functional changes is not scrambled, and this has the good point of being
precise to the point that a machine can the apply it. So please,
1/ drop HTML mail formatting
2/ attach the patch based on diff
BTW using the version in SVN after my fix I see:
paphio:~/XML -> ./testRegexp '(0|1|2|3|4|5|6|7|8|9){0,10}' '123456789'
Testing (0|1|2|3|4|5|6|7|8|9){0,10}:
123456789: Ok
paphio:~/XML -> ./testRegexp '(0|1|2|3|4|5|6|7|8|9){0,10}' '1234567890'
Testing (0|1|2|3|4|5|6|7|8|9){0,10}:
1234567890: Ok
paphio:~/XML -> ./testRegexp '(0|1|2|3|4|5|6|7|8|9){0,10}' '12345678901'
Testing (0|1|2|3|4|5|6|7|8|9){0,10}:
12345678901: Fail
paphio:~/XML ->
So this might be a duplicate of the previous issue,
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]