Re: [xml] xmlSetProp reports error - "error : string is not in UTF-8" for a URL !



Hi,
Prashant R wrote:
Hi ,

This is using C++/ gcc on LIBXML 2.7.2

I am trying to add an attribute to a node , that raises an error
"error : string is not in UTF-8"

I am using the API
xmlSetProp(currentNode , (const xmlChar *) kAttribName , (const
xmlChar *)"
http://www.w3.org/2000/09/xmldsig#";))


Looking at the stack trace , the error originates from
xmlNewPropInternal(
..)

where
xmlCheckUTF8(value) returns 0

I am baffled as to why xmlCheckUTF8 would fail when passing this
string - "
http://www.w3.org/2000/09/xmldsig#";
Basically , inside the for loop the first if statement is encountered
(if ((c
& 0x80) == 0x00)

There isn't a check for NULL termination due to which  it even passes
the
NULL characters at the end of the string and then grabs garbage and
ultimately returns 0 .

I am baffled as to why you think there is no check for a NULL
character termination.



int

xmlCheckUTF8(const unsigned char *utf)

{

    int ix;

    unsigned char c;


    if (utf == NULL)

        return(0);

    /*

     * utf is a string of 1, 2, 3 or 4 bytes.  The valid strings

     * are as follows (in "bit format"):

     *    0xxxxxxx                                      valid 1-byte

     *    110xxxxx 10xxxxxx                             valid 2-byte

     *    1110xxxx 10xxxxxx 10xxxxxx                    valid 3-byte

     *    11110xxx 10xxxxxx 10xxxxxx 10xxxxxx           valid 4-byte

     */

    for (ix = 0;;) {      /* string is 0-terminated */

c = utf[ix];

No, that line (in the issued source for at least 5 years) has been

      for (ix = 0; (c = utf[ix]);) {

Why is yours different????


        if ((c & 0x80) == 0x00) { /* 1-byte code, starts with 10 */

            ix++;

} else if ((c & 0xe0) == 0xc0) {/* 2-byte code, starts with 110 */

    if ((utf[ix+1] & 0xc0 ) != 0x80)

        return 0;

    ix += 2;

} else if ((c & 0xf0) == 0xe0) {/* 3-byte code, starts with 1110 */

    if (((utf[ix+1] & 0xc0) != 0x80) ||

        ((utf[ix+2] & 0xc0) != 0x80))

    return 0;

    ix += 3;

} else if ((c & 0xf8) == 0xf0) {/* 4-byte code, starts with 11110 */

    if (((utf[ix+1] & 0xc0) != 0x80) ||

        ((utf[ix+2] & 0xc0) != 0x80) ||

((utf[ix+3] & 0xc0) != 0x80))

    return 0;

    ix += 4;

} else /* unknown encoding */

    return 0;

      }

      return(1);

}

Am I missing something very fundamental here ?

Thanks
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml


Bill





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]