Re: [xml] encoding practice?



On Fri, 2002-12-13 at 12:52, Daniel Veillard wrote:
On Fri, Dec 13, 2002 at 07:29:17AM -0500, Derek Holden wrote:
I am wondering what common practice is for reading and writing XML 
internally. Specifically, xmlDumpMemory writing utf8 and xmlParseMemory 
reading iso-8859. Is it better to handle the decoding on the dump end or 
encoding on the parsing end? If I'm approaching this incorrectly or there 
is an equivalent utf8 xmlParse routine I'd appreciate hearing it. Thanks.

  I would say keep everything UTF8, UTF8 and UTF16 are the only encodings
that any XML parser MUST support. And I would advise against UTF16 in general
because it forces extra conversion in most processing tools and in general
wastes spaces with useless zeroes...

AFAIK this is only true for latin-based scripts. For most far-east
scripts UTF-16 actually saves space by spending only 2 bytes per char
instead of 3.

BTW, is there any support for using UTF-16 internally in libxml2/libxslt
?

Daniel
-- 
Hannu Krosing <hannu tm ee>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]