RE: [xml] Perl module XML::LibXML not encoding UTF-8properly In[SOLUTION]

From: "Loren Osborn" <lsosborn dis-sol-inc com>
To: <xml gnome org>
Subject: RE: [xml] Perl module XML::LibXML not encoding UTF-8properly In[SOLUTION]
Date: Tue, 27 Sep 2005 09:30:43 -0700

I gotta admit what is throwing me for a loop though is when I dump the variable directly to standard output, I get the two bytes cooresponding to the correct UTF-8 encoding. It is possible that the print function does this on the fly for non-UTF-8 encoded strings, but it does make the problem rather hard to characterize:

[lsosborn devbox dsi]$ ~/bin/LibXML+UTF8_test | grep '<'

Una manzana al día mantiene al doctor ausente. & blah <blah>

[lsosborn devbox dsi]$ ~/bin/LibXML+UTF8_test | grep '<' | od -t x1

0000000 09 55 6e 61 20 6d 61 6e 7a 61 6e 61 20 61 6c 20

0000020 64 c3 ad 61 20 6d 61 6e 74 69 65 6e 65 20 61 6c

0000040 20 64 6f 63 74 6f 72 20 61 75 73 65 6e 74 65 2e

0000060 20 26 20 62 6c 61 68 20 3c 62 6c 61 68 3e 20 0a

0000100

[lsosborn devbox dsi]$

Bruce Miller wrote:

Petr's message hits the core point: The string is still

bytes, rather than unicode chars. The particular problem

here is that the construct \x{...} only forces conversion

to unicode if the arg is > 0x100 (for compatibility w/ old scripts).

To force the conversion, you need either to use

the \N{name} construct, or pack('U',$code).

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]