Re: [xml] xmlUTF8Strpos()'s use of xmlUTF8Strlen()

Bill Moseley said:
On Mon, Dec 20, 2004 at 02:37:16AM +0800, William M. Brack wrote:
Bill Moseley said:
I'm wondering why xmlUTF8Strpos() needs to call xmlUTF8Strlen().

    const xmlChar *
    xmlUTF8Strpos(const xmlChar *utf, int pos) {
        xmlChar ch;

        if (utf == NULL) return(NULL);
        if ( (pos < 0) || (pos >= xmlUTF8Strlen(utf)) )
        while (pos--) {
            if ((ch=*utf++) == 0) return(NULL);

xmlUTF8Strpos() is already checking for a \0 byte so doesn't seem like it
does any good to go over the string twice: once in xmlUTF8Strlen() and
again in xmlUTF8Strpos().

The problem which this is meant to guard against is a caller passing in a
large value for pos which potentially causes a crash (analogous to the check
for a NULL pointer).  Since the 'while' loop is starting from the end of the
string and working backward, the check within the loop for termination

Hi Bill,

We had a related discussion back in 2001[1]

It's not starting from the end of the string.  xmlUTF8Strpos() only
calls xmlUTF8Strlen() to see if pos (the number of chars to skip) is
more than the utf-8 length of the string.

xmlUTF8Strlen() walks the utf-8 string counting characters
until a null is found and returns the count or -1 if an invalid utf-8
char is found.

xmlUTF8Strpos() also walks the utf-8 string.  It returns the ending
position after stepping "pos" chars or NULL if it hits \0 before "pos"
chars have be found or if an invalid utf-8 sequence is found.  So it's
making the same checks as xmlUTF8Strlen() does.

At least that's my reading of the code.

So, I don't think calling xmlUTF8Strlen() does anything that isn't
done in xmlUTF8Strpos().

I *love* it when other people understand my code better than I do :-).  Yes,
you are completely correct, and I have no valid excuse for my previous post
(maybe it's the onset of senility?).  I have modified the CVS code - thanks
for pointing this out!

Makes me wonder if it would be useful to have different type for
null-terminated strings.  xmlSAX2Characters() returns a *non* null-
terminated string of type xmlChar and so does xmlUTF8Strlen() which
requires a null-terminated string.  If they were different
types then the compiler would have caught this.

This might be a nice feature, but I don't see how we could do it without
breaking source compatibility with the existing API.

Thanks for the help,


Bill Moseley
moseley hank org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]