Re: [xml] Perl module XML::LibXML not encoding UTF-8 properly [SOLUTION]
- From: Bruce R Miller <bruce miller nist gov>
- To: Loren Osborn <lsosborn dis-sol-inc com>
- Cc: xml gnome org
- Subject: Re: [xml] Perl module XML::LibXML not encoding UTF-8 properly [SOLUTION]
- Date: Mon, 26 Sep 2005 15:57:39 -0400
Loren Osborn wrote:
I appreciate your feedback, but unfortunately it didn't give me any
additional warnings or errors.
First off, thanks for taking my ill-tempered "feedback" with good humor...
Fortunately I *DID* figure out both the
cause and a solution. Additionally I'd like to propose a code change to
catch most instances of this problem in the future.
I wonder if you found a fix or a workaround :>
Both perl and libxml2 _should_ be dealing with unicode,
but there are peculiar differences in the handling with 5.6 and 5.8.
Running under 5.6, your original code produced a
"output error : invalid character value"
message. I seem to recall problems in 5.6 with \x for the 2 digit case,
although codepoints higher than FF work.
In particular: \xED and \x{00ED} fail,
but \N{LATIN SMALL LETTER I WITH ACUTE} works
(with use charnames qw(:full); ).
OTOH, under 5.8, your original encoding as \xED is apparently
read correctly. However the output simply outputs the unicode
character, rather than the character entity í you were expecting.
(of course, some of these differences may also be due to differences
in the terminal's unicode support, etc, between the two systems).
hope that helps
--
--
bruce miller nist gov
http://math.nist.gov/~BMiller/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]