libxml - utf8 / 8bit charsets.
- From: Michael Meeks <michael ximian com>
- To: Daniel Veillard <veillard redhat com>
- Cc: gnome-hackers gnome org
- Subject: libxml - utf8 / 8bit charsets.
- Date: Mon, 26 Mar 2001 05:09:13 -0500 (EST)
Hi Daniel,
When I looked into this issue, it seemed to me that libxml was
being too clever for it's own good :-) but first - let me assume that the
only significant user of libxml1 is now the GNOME project - is that fair ?
So - there are a lot of possible char-sets that we could support,
however looking at parser.c (xmlSwitchEncoding), it seems that we flag
errors on all encodings except ENCODING_NONE and ENCODING_UTF8.
So - given that mixed charset xml files exist, why can we not get
libxml to simply return an exact representation of what was in the input
string - regardless of encoding. And similarly on write, we just assume
the application is going to get it correct.
I think what screwed me up using 8 bit, was code that started
examining a byte stream as chars assuming that it was utf-8 and trying to
do validation of it to ensure that no chars in a certain range were
present. If this is the breakage, it is of very limited use to us.
Of course, it's entirely possible that I just mis-remembered
everything.
Regards,
Michael.
--
mmeeks gnu org <><, Pseudo Engineer, itinerant idiot
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]