Re: [xml] Perl module XML::LibXML not encoding UTF-8 properly In[SOLUTION]
- From: Bruce Miller <bruce miller nist gov>
- To: xml gnome org
- Subject: Re: [xml] Perl module XML::LibXML not encoding UTF-8 properly In[SOLUTION]
- Date: Tue, 27 Sep 2005 10:37:07 -0400
Loren Osborn wrote:
Yes, I must admit that I only had 5.8 at my disposal, and made reference
to 5.6 only based on what I read online, and I was only concerned with
the string once it already existed in Perl.
Maybe there's something libxml2 should do differently,
but the problem is starting at the Perl layer
(and we shouldn't go too far with Perl on this list...)
Petr's message hits the core point: The string is still
bytes, rather than unicode chars. The particular problem
here is that the construct \x{...} only forces conversion
to unicode if the arg is > 0x100 (for compatibility w/ old scripts).
To force the conversion, you need either to use
the \N{name} construct, or pack('U',$code).
In perldoc perluniintro, it answers a FAQ:
Q: How do I know whether my string is in unicode?
A: You shouldn't care....
For that to apply here, I guess that every point within
XML::LibXML that passes a string through to libxml2,
would need to check for the utf8 flag and convert if
necessary.... (Ugh!).
In the meantime, we've got to watch for those `ambiguous'
codes in the 0x80--0xFF range....
--
bruce miller nist gov
http://math.nist.gov/~BMiller/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]