Re: [xml] encoding practice?



On Fri, Dec 13, 2002 at 05:07:46PM +0000, Hannu Krosing wrote:
that any XML parser MUST support. And I would advise against UTF16 in general
because it forces extra conversion in most processing tools and in general
wastes spaces with useless zeroes...

AFAIK this is only true for latin-based scripts. For most far-east

   "in general"

scripts UTF-16 actually saves space by spending only 2 bytes per char
instead of 3.

  Even people using Japanese or Chinese usually don't go for 
UTF-16 anyway. And still most of the non-CDATA will still waste
every other byte. Non ASCII characters are really non-frequent for
markup, that and indentation still make UTF8 win in most documents.
I maintain UTF-8 is a better choice in most cases.

BTW, is there any support for using UTF-16 internally in libxml2/libxslt
?

  It's an XML parser so yes, as I said conformance REQUIRES it. If a parser
can't grasp UTF-16 it's not a conformant XML parser, period.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]