Re: [xml] xmlCheckUTF8-problem (bugfix) [signed]

Hi again,

i tried to trace the Problem a bit.

A valid 2-byte utf8 char must be something like:

110xxxxx 10xxxxxx (

I would suggest to change this line:

     if ((c & 0xc0) != 0x80 || (utf[ix + 1] & 0xc0) != 0x80)
     if ((c & 0xe0) != 0xc0  || ( utf[ix + 1] & 0xc0 ) != 0x80 )

it "ands" the "c" with 11100000=0xe0 to get the first 3 bits.
If this is exactly 11000000=0xc0 you can be sure, that the byte starts with "110".


On 27.08.2004, at 15:53, Julius Mittenzwei [c] wrote:

i just updated to libxml 2.6.12 and became problems with the function xmlCheckUTF8(). This function returns false even if the string is a valid utf8-string, which can easily be translated to isolat with the function UTF8Toisolat1. Im not quite sure whether this has something to do with:
Any suggestions?

Thank you
#include <libxml/parser.h>

int main (int i,char** s) {
  const xmlChar* utf = "Köchin";
  int utflen         = xmlStrlen(utf);
  unsigned char* lat = (unsigned char*) malloc(utflen);
  int latlen;

  if(xmlCheckUTF8 (utf))
    printf("valid utf8\n");
    printf("no valid utf8\n");

  UTF8Toisolat1(lat,&latlen,(unsigned char*)utf,&utflen);

  printf("%s: %s -> %s\n",LIBXML_DOTTED_VERSION,utf,lat);

  return 0;
[chef bruce test]$ ./test
no valid utf8
2.6.12: Köchin -> Köchin


