Re: [xml] Character Sets supported



On Fri, Jul 11, 2003 at 01:11:56PM +0200, Peter Jacobi wrote:
Content-Description: Mail message body
Hi Daniel,

I assume you only run with iconv for some years. Reading
your response I made a quick tests with input files
in encodings 8859-1, 8859-2, 8859-15 and 8859-16 (attached),
without iconv (in fact, other than in an early test, I always run
without iconv).

The tests and their results:
xmllint --DEBUG 8859-1.xml
Works as expected.

  normal, without iconv libxml2 handles UTF-8/UTF-16 (required by the spec)
and 8859-1 (widely deployed).

xmllint --DEBUG 8859-2.xml
Gives no warnings or errors but all the strings
in the tree are in ISO-8859-2 instead of UTF-8.
Calls for big trouble.

  yes that should just fail, like 8859-15 and 8859-16.

xmllint --DEBUG 8859-15.xml
Rejected, unknown encoding.

xmllint --DEBUG 8859-16.xml
Rejected, unknown encoding.

  Hum, BTW how much is it to go from 8859-1 to 8859-15 and 16 ?

Daniel what to do with iconv-less execution?

- Forbid it, make hard dependancy on iconv

  no I don't want *any* hard dep to an external lib.

- Restrict to UTF-8 (and perhaps ISO-8859-1)

  The wisest in general when generating XML documents is UTF-8 because
all parsers must use it and allow fast execution (no conversion needed)

- Correct behaviour

  One way to support a given encoding that you know will be needed and
not available, is to make your own conversion routines and register them
at startup using xmlNewCharEncodingHandler() and
xmlRegisterCharEncodingHandler() from encoding.h

I assume nobody still relies on the legacy behaviour
of storing non UTF-8 in the tree? Hello, anybody out
there? HELLO? H-E-L-L-O ?? 

  well if anybody does it, they are not supported and it's nearly garanteed
to break when reserializing. We simply cannot support it in any way.

O.K. we can delete this.

If you want to have a more complete functionality without 
iconv, I can contribute character encoding handlers for
ISO-8859-*

  Contribution is one way but I would avoid growing the set too
much, there is still the possibility to register them at runtime.
But if small like 8859-15 as I expect then patches welcome :-)

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]