Re: [xml] Perl module XML::LibXML not encoding UTF-8 properly [SOLUTION]

Loren Osborn wrote:
I appreciate your feedback, but unfortunately it didn't give me any
additional warnings or errors.

First off, thanks for taking my ill-tempered "feedback" with good humor...

Fortunately I *DID* figure out both the
cause and a solution.  Additionally I'd like to propose a code change to
catch most instances of this problem in the future.

I wonder if you found a fix or a workaround :>
Both perl and libxml2 _should_ be dealing with unicode,
but there are peculiar differences in the handling with 5.6 and 5.8.

Running under 5.6, your original code produced a
"output error : invalid character value"
message.  I seem to recall problems in 5.6 with \x for the 2 digit case,
although codepoints higher than FF work.
In particular: \xED and \x{00ED} fail, but \N{LATIN SMALL LETTER I WITH ACUTE} works
(with use charnames qw(:full); ).

OTOH, under 5.8, your original encoding as \xED is apparently
read correctly.  However the output simply outputs the unicode
character, rather than the character entity í you were expecting.

(of course, some of these differences may also be due to differences
in the terminal's unicode support, etc, between the two systems).

hope that helps
bruce miller nist gov

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]