Re: [xml] xmlSetProp reports error - "error : string is not in UTF-8" for a URL !
- From: "William M. Brack" <wbrack mmm com hk>
- To: "Prashant R" <ramapra gmail com>
- Cc: xml gnome org
- Subject: Re: [xml] xmlSetProp reports error - "error : string is not in UTF-8" for a URL !
- Date: Wed, 11 Mar 2009 23:01:17 -0700 (PDT)
Hi,
Prashant R wrote:
Hi ,
This is using C++/ gcc on LIBXML 2.7.2
I am trying to add an attribute to a node , that raises an error
"error : string is not in UTF-8"
I am using the API
xmlSetProp(currentNode , (const xmlChar *) kAttribName , (const
xmlChar *)"
http://www.w3.org/2000/09/xmldsig#"))
Looking at the stack trace , the error originates from
xmlNewPropInternal(
..)
where
xmlCheckUTF8(value) returns 0
I am baffled as to why xmlCheckUTF8 would fail when passing this
string - "
http://www.w3.org/2000/09/xmldsig#"
Basically , inside the for loop the first if statement is encountered
(if ((c
& 0x80) == 0x00)
There isn't a check for NULL termination due to which it even passes
the
NULL characters at the end of the string and then grabs garbage and
ultimately returns 0 .
I am baffled as to why you think there is no check for a NULL
character termination.
int
xmlCheckUTF8(const unsigned char *utf)
{
int ix;
unsigned char c;
if (utf == NULL)
return(0);
/*
* utf is a string of 1, 2, 3 or 4 bytes. The valid strings
* are as follows (in "bit format"):
* 0xxxxxxx valid 1-byte
* 110xxxxx 10xxxxxx valid 2-byte
* 1110xxxx 10xxxxxx 10xxxxxx valid 3-byte
* 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx valid 4-byte
*/
for (ix = 0;;) { /* string is 0-terminated */
c = utf[ix];
No, that line (in the issued source for at least 5 years) has been
for (ix = 0; (c = utf[ix]);) {
Why is yours different????
if ((c & 0x80) == 0x00) { /* 1-byte code, starts with 10 */
ix++;
} else if ((c & 0xe0) == 0xc0) {/* 2-byte code, starts with 110 */
if ((utf[ix+1] & 0xc0 ) != 0x80)
return 0;
ix += 2;
} else if ((c & 0xf0) == 0xe0) {/* 3-byte code, starts with 1110 */
if (((utf[ix+1] & 0xc0) != 0x80) ||
((utf[ix+2] & 0xc0) != 0x80))
return 0;
ix += 3;
} else if ((c & 0xf8) == 0xf0) {/* 4-byte code, starts with 11110 */
if (((utf[ix+1] & 0xc0) != 0x80) ||
((utf[ix+2] & 0xc0) != 0x80) ||
((utf[ix+3] & 0xc0) != 0x80))
return 0;
ix += 4;
} else /* unknown encoding */
return 0;
}
return(1);
}
Am I missing something very fundamental here ?
Thanks
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml
Bill
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]