Bruce R Miller wrote: > > > I appreciate your feedback, but unfortunately it didn't give
me any > > additional warnings or errors.
> > First off, thanks for taking my ill-tempered "feedback"
with good humor... I always try to have my "grain of
salt" handy, but did want to acknowledge your taking the time to reply. It is so often easier to ignore than reply,
and I was grateful to get *A* response, even if it wasn't helpful. > Running under 5.6, your original code produced a > "output error : invalid character
value" > message.
I seem to recall problems in 5.6 with \x for the 2 digit case, > although codepoints
higher than FF work. > In particular: \xED and \x{00ED} fail, > but \N{LATIN SMALL LETTER I WITH ACUTE} works > (with use charnames qw(:full);
). > > OTOH, under 5.8, your original encoding as \xED is apparently > read correctly. However the
output simply outputs the unicode > character, rather than the character entity í you were
expecting. Yes, I must admit that I only had 5.8 at
my disposal, and made reference to 5.6 only based on what I read online, and I
was only concerned with the string once it already existed in Perl. In my specific situation though what I
ended up with was a bogus Unicode character: I started with the 3 character string: "ía " In Unicode this is: 0xED 0x61 0x20 So, as UTF-8, within Perl
this should have been stored: 0xC3 0xAD 0x61
0x20 So what I expected as output was: "ía " except that the Unicode code-points: 0xED 0x61 0x20 were interpreted as the UTF-8 bytes: 0xED 0xA1 0xA0 Which produced
the illegal Unicode character: "�" (absorbing the
“a” and the space) Now the solution was to encode the UTF-8 bytes
as if they were code-points: 0xED 0x61 0x20 becomes: 0xC3 0x83 0xC2
0xAD 0x61 0x20 internally, within Perl.
Which libxml2 now reads correctly as the byte stream: 0xC3 0xAD 0x61
0x20 which it interpreted as the Unicode
string: 0xED 0x61 0x20 and produces the correct output: "ía " I hope that is now less confusing. Thanks again for your comments and
feedback, -Loren |