Re: [xml] Perl module XML::LibXML not encoding UTF-8 properly In[SOLUTION]



Loren Osborn wrote:
Yes, I must admit that I only had 5.8 at my disposal, and made reference to 5.6 only based on what I read online, and I was only concerned with the string once it already existed in Perl.

Maybe there's something libxml2 should do differently,
but the problem is starting at the Perl layer
(and we shouldn't go too far with Perl on this list...)

Petr's message hits the core point: The string is still
bytes, rather than unicode chars.  The particular problem
here is that the construct \x{...} only forces conversion
to unicode if the arg is > 0x100 (for compatibility w/ old scripts).
To force the conversion, you need either to use
the \N{name} construct, or pack('U',$code).

In perldoc perluniintro, it answers a FAQ:
 Q: How do I know whether my string is in unicode?
 A: You shouldn't care....

For that to apply here, I guess that every point within
XML::LibXML that passes a string through to libxml2,
would need to check for the utf8 flag and convert if
necessary.... (Ugh!).

In the meantime, we've got to watch for those `ambiguous'
codes in the 0x80--0xFF range....


--
bruce miller nist gov
http://math.nist.gov/~BMiller/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]