Re: [xml] xmlUTF8Strpos()'s use of xmlUTF8Strlen()

Bill Moseley said:
I'm wondering why xmlUTF8Strpos() needs to call xmlUTF8Strlen().

    const xmlChar *
    xmlUTF8Strpos(const xmlChar *utf, int pos) {
        xmlChar ch;

        if (utf == NULL) return(NULL);
        if ( (pos < 0) || (pos >= xmlUTF8Strlen(utf)) )
        while (pos--) {
            if ((ch=*utf++) == 0) return(NULL);

xmlUTF8Strpos() is already checking for a \0 byte so doesn't seem like it
does any good to go over the string twice: once in xmlUTF8Strlen() and then
again in xmlUTF8Strpos().

The problem which this is meant to guard against is a caller passing in a
large value for pos which potentially causes a crash (analogous to the check
for a NULL pointer).  Since the 'while' loop is starting from the end of the
string and working backward, the check within the loop for termination doesn't

And the problem I'm having is I'm *not* passing in a \0 terminated utf-8
string, so xmlUTF8Strlen() can end up running off the end of the
buffer looking for the end of the string.

The reason I'm not passing in a \0 terminated string is that I'm using
UTF8Toisolat1() to convert the utf-8 string passed to my SAX character
handler (which is not null-terminated), and then calling
xmlUTF8Strpos() to skip over any chars that could not be converted to

Bill Moseley
moseley hank org

I understand your point, but I'm reluctant to change the routine (for the
reason given above).  I also understand that xmlUTF8Strlen doesn't completely
guard against the problem (but I think it helps).  Further input / discussion
from the list would be welcome.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]