Re: [xml] Perl module XML::LibXML not encoding UTF-8 properly [SOLUTION]



Loren Osborn wrote:
I appreciate your feedback, but unfortunately it didn't give me any
additional warnings or errors.

First off, thanks for taking my ill-tempered "feedback" with good humor...

Fortunately I *DID* figure out both the
cause and a solution.  Additionally I'd like to propose a code change to
catch most instances of this problem in the future.

I wonder if you found a fix or a workaround :>
Both perl and libxml2 _should_ be dealing with unicode,
but there are peculiar differences in the handling with 5.6 and 5.8.

Running under 5.6, your original code produced a
"output error : invalid character value"
message.  I seem to recall problems in 5.6 with \x for the 2 digit case,
although codepoints higher than FF work.
In particular: \xED and \x{00ED} fail, but \N{LATIN SMALL LETTER I WITH ACUTE} works
(with use charnames qw(:full); ).

OTOH, under 5.8, your original encoding as \xED is apparently
read correctly.  However the output simply outputs the unicode
character, rather than the character entity í you were expecting.

(of course, some of these differences may also be due to differences
in the terminal's unicode support, etc, between the two systems).

hope that helps
--
--
bruce miller nist gov
http://math.nist.gov/~BMiller/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]