[xslt] bug in str:tokenize



I think I've found a number of bugs in the str:tokenize implementation
in libxslt.  It's hard to tell exactly which ones are bugs, because
EXSLT isn't very specific.

If you call str:tokenize('/foo//bar/', '/'), you'll get:

<token>/foo</token>
<token>/bar</token>
<token></token>

For the first one: Leading instances of the delimiter always seem to
make it into the first token.  I don't think that's right.

For the second one: The delimiter made it into the token because of the
double slash.

For the third one: Trailing instances of the delimiter always produce an
empty token element.

To play around even more, str:tokenize('//foo', '/') produces:

<token>/</token>
<token>foo</token>

And str:tokenize('foo///bar', '/') produces:

<token>foo</token>
<token>/</token>
<token>bar</token>

I'm pretty certain that delimiter characters should never appear in
tokens, and certainly never appear alone as tokens.  And I *think* that
empty tokens should be stripped (which the implementation in fact tries
to do, but doesn't catch on the final character).

If nobody disagrees, I'll try to fix up str:tokenize tonight and send a
patch.  Also, somebody should speak with the EXSLT people to get them to
be more specific.

--
Shaun





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]