Re: [xml] Perl module XML::LibXML not encoding UTF-8 properly [SOLUTION]

On Mon, Sep 26, 2005 at 02:18:28PM -0700, Loren Osborn wrote:
Daniel Veillard wrote:
UTF-8 makes certain assertions about how multi-byte characters are
represented.  While this code change doesn't check all of those
assumptions, but it does ensure that all the non-first bytes have
high bits set correctly.  This is likely to catch similar errors at
least regarding Latin characters.  If you are feeling ambitious,
free to check for the assertion that code-points are encoded in the
fewest number of bytes possible.  This patch is untested, but I
that a developer more familiar with the libxml2 library give it a
thorough once over. 

  that problem is that you add this check in one APIs. I am mot sure
it make sense to do this on one entry point and not all the others.
I am not sure it makes sense to add the checking to all tree APIs
this could be extremely costly at runtime.

Yes, I was expecting such a reaction, but I felt justified putting the
check where I did because there was already a correctness check there. I
simply refined it a bit.  As far as whether this type of correctness
check be enforced on all entry-points is certainly an efficiency concern
that should be considered by libxml2's architects, but I simply wanted
to submit a code sample to demonstrate how this could be done.

  Yes I appreciate that. There is something half baked in that function
it makes sense to fix it, and on the other hand it's asymetric :-)
I'm still uncertain about how to best do this.


Daniel Veillard      | Red Hat Desktop team
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]