Re: [xml] encoding practice?

On Fri, Dec 13, 2002 at 05:07:46PM +0000, Hannu Krosing wrote:
that any XML parser MUST support. And I would advise against UTF16 in general
because it forces extra conversion in most processing tools and in general
wastes spaces with useless zeroes...

AFAIK this is only true for latin-based scripts. For most far-east

   "in general"

scripts UTF-16 actually saves space by spending only 2 bytes per char
instead of 3.

  Even people using Japanese or Chinese usually don't go for 
UTF-16 anyway. And still most of the non-CDATA will still waste
every other byte. Non ASCII characters are really non-frequent for
markup, that and indentation still make UTF8 win in most documents.
I maintain UTF-8 is a better choice in most cases.

BTW, is there any support for using UTF-16 internally in libxml2/libxslt

  It's an XML parser so yes, as I said conformance REQUIRES it. If a parser
can't grasp UTF-16 it's not a conformant XML parser, period.


Daniel Veillard      | Red Hat Network
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]