RE: [xml] Perl module XML::LibXML not encoding UTF-8 properly [SOLUTION]

From: "Loren Osborn" <lsosborn dis-sol-inc com>
To: <veillard redhat com>
Cc: xml gnome org
Subject: RE: [xml] Perl module XML::LibXML not encoding UTF-8 properly [SOLUTION]
Date: Mon, 26 Sep 2005 14:18:28 -0700

Daniel Veillard wrote:

UTF-8 makes certain assertions about how multi-byte characters are
represented.  While this code change doesn't check all of those
assumptions, but it does ensure that all the non-first bytes have

their

high bits set correctly.  This is likely to catch similar errors at
least regarding Latin characters.  If you are feeling ambitious,

feel

free to check for the assertion that code-points are encoded in the
fewest number of bytes possible.  This patch is untested, but I

prefer

that a developer more familiar with the libxml2 library give it a

more

thorough once over.


  that problem is that you add this check in one APIs. I am mot sure
it make sense to do this on one entry point and not all the others.
I am not sure it makes sense to add the checking to all tree APIs
this could be extremely costly at runtime.


Yes, I was expecting such a reaction, but I felt justified putting the
check where I did because there was already a correctness check there. I
simply refined it a bit.  As far as whether this type of correctness
check be enforced on all entry-points is certainly an efficiency concern
that should be considered by libxml2's architects, but I simply wanted
to submit a code sample to demonstrate how this could be done.

Thanks,

-Loren

Follow-Ups:
- Re: [xml] Perl module XML::LibXML not encoding UTF-8 properly [SOLUTION]
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]