| Bruce R Miller wrote: >  > > I appreciate your feedback, but unfortunately it didn't give
me any > > additional warnings or errors.
 > > First off, thanks for taking my ill-tempered "feedback"
with good humor... I always try to have my "grain of
salt" handy, but did want to acknowledge your taking the time to reply.  It is so often easier to ignore than reply,
and I was grateful to get *A* response, even if it wasn't helpful. > Running under 5.6, your original code produced a > "output error : invalid character
value" > message. 
I seem to recall problems in 5.6 with \x for the 2 digit case, > although codepoints
higher than FF work. > In particular: \xED and \x{00ED} fail,  > but \N{LATIN SMALL LETTER I WITH ACUTE} works > (with use charnames qw(:full);
). > > OTOH, under 5.8, your original encoding as \xED is apparently > read correctly.  However the
output simply outputs the unicode > character, rather than the character entity í you were
expecting. Yes, I must admit that I only had 5.8 at
my disposal, and made reference to 5.6 only based on what I read online, and I
was only concerned with the string once it already existed in Perl. In my specific situation though what I
ended up with was a bogus Unicode character: I started with the 3 character string: "ía " In Unicode this is:       0xED  0x61  0x20 So, as UTF-8, within Perl
this should have been stored:       0xC3  0xAD  0x61 
0x20 So what I expected as output was:       "ía " except that the Unicode code-points:       0xED  0x61  0x20 were interpreted as the UTF-8 bytes:       0xED  0xA1  0xA0 Which produced
the illegal Unicode character:       "�"  (absorbing the
“a” and the space) Now the solution was to encode the UTF-8 bytes
as if they were code-points:       0xED  0x61  0x20 becomes:       0xC3  0x83  0xC2 
0xAD  0x61  0x20 internally, within Perl.
Which libxml2 now reads correctly as the byte stream:       0xC3  0xAD  0x61 
0x20 which it interpreted as the Unicode
string:       0xED  0x61  0x20 and produces the correct output:       "ía " I hope that is now less confusing. Thanks again for your comments and
feedback, -Loren |