Re: [xml] xmlCheckUTF8-problem (bugfix) [signed]



Hi again,

i tried to trace the Problem a bit.

A valid 2-byte utf8 char must be something like:

110xxxxx 10xxxxxx (http://de.wikipedia.org/wiki/UTF8)

I would suggest to change this line:

     if ((c & 0xc0) != 0x80 || (utf[ix + 1] & 0xc0) != 0x80)
in
    xmlstring.c
to
     if ((c & 0xe0) != 0xc0  || ( utf[ix + 1] & 0xc0 ) != 0x80 )

it "ands" the "c" with 11100000=0xe0 to get the first 3 bits.
If this is exactly 11000000=0xc0 you can be sure, that the byte starts with "110".

Regards
/Julius


On 27.08.2004, at 15:53, Julius Mittenzwei [c] wrote:


>
Hi,
i just updated to libxml 2.6.12 and became problems with the function xmlCheckUTF8(). This function returns false even if the string is a valid utf8-string, which can easily be translated to isolat with the function UTF8Toisolat1. Im not quite sure whether this has something to do with: http://bugzilla.gnome.org/show_bug.cgi?id=148115
Any suggestions?

Thank you
/Julius
-----------------------------------------------------
#include <libxml/parser.h>

int main (int i,char** s) {
  const xmlChar* utf = "Köchin";
  int utflen         = xmlStrlen(utf);
  unsigned char* lat = (unsigned char*) malloc(utflen);
  int latlen;

  if(xmlCheckUTF8 (utf))
    printf("valid utf8\n");
  else
    printf("no valid utf8\n");

  UTF8Toisolat1(lat,&latlen,(unsigned char*)utf,&utflen);
  lat[latlen]=0x00;

  printf("%s: %s -> %s\n",LIBXML_DOTTED_VERSION,utf,lat);

  return 0;
}
---------------------------------------------------
[chef bruce test]$ ./test
no valid utf8
2.6.12: Köchin -> Köchin
---------------------------------------------------



>


_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml



--
---------------------[ Ciphire Signature ]----------------------
From: julius muenchen-sued de signed email body (1527 characters)
Date: on 27 August 2004 at 16:20:36 GMT
To:   xml gnome org
----------------------------------------------------------------
: The message above has been secured using Ciphire Mail.
: Verify this signature and download your free encryption
: software at www.ciphire.com. The three garbled lines
: below are the sender's verifiable encoded signature.
----------------------------------------------------------------
00fAAAAAEAAABUXy9B9wUAAJwCAAIAAgACACCmPgNAJQoFEAwysJwtcX5m05sj5F
cuq6WfqRmNBGuajQEAB8ZV8kQLz9eHXt1kqpjkBfkmIa/UpvlGUjvMKJl/qx/YII
R6yG5La0w0um+FgGv20NNRmwEaRdmlLBNa8sPc2g==
------------------[ End Ciphire Signed Message ]----------------




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]