Re: 'Re: [xml] "Control over encoding declaration (prolog and meta)'
- From: Daniel Veillard <veillard redhat com>
- To: Kasimier Buchcik <kbuchcik 4commerce de>
- Cc: xml gnome org
- Subject: Re: 'Re: [xml] "Control over encoding declaration (prolog and meta)'
- Date: Thu, 15 Jan 2004 08:54:35 -0500
On Thu, Jan 15, 2004 at 01:19:13PM +0100, Kasimier Buchcik wrote:
Ok, this issue is DOM 3 related. As you might remember I'm still
struggeling with "to DOMString serialization" and "from DOMString
parsing", which has to be always UTF-16 encoded, regardless of the
content; so if I have e.g. an ISO-8859-1 document I still need it to be
serialized to UTF-16, but it still *has to* contain an encoding
declaration of ISO-8859-1.
No I'm not sure I understand.
DOM decided to use UTF16 for internal representation and interface,
libxml2 decided to use UTF8. I don't see the relationship w.r.t.
serialization. If DOM3 APIs allows to serialize but don't allow to
control the effective encoding, they are buggy, and you should provide
a comment to the working group for clarification.
XHTML is XML, the tools MUST parse it following the XML rules which are
cristal clear, if your instance says "ISO-8859-1" and is encoded in
As stated above, XML spec on the one side, DOM spec on the other.
Sorry, I have a hard time about this.
Maybe DOM3 is really broken. There is a workaround : save with libxml2
and then convert back to UTF16 with a string conversion API.
Daniel, you wrote some of your mails on the list that there are too many
entrypoints to the library already; I understand your concern, and
things like the xmlReadxxx API with all the nice options are really
compact and concise. So I wonder if it would be good to have a
xmlSerializexxx API; a serialization context sounds a bit heavy, but
more flexible - allowing extensible options for the future. And I would
be happy about a field "declaredEncoding" taking a custom encoding to be
declared. I really think the serialization will become far more complex,
and should be more customizable, if (hopefully) libxml2 will try help
out more with DOM stuff in the future.
If DOM is broken w.r.t. XML, well DOM must be fixed, not XML or
the zillion libraries and tools using it.
Finally I must admit that there would be a workaround for me: I could
serialize with the existing API, then encode to UTF-16LE. But since we
are using quite huge documents, I guess it will not acceptable in
matters of performance and seems rather stupid.
Where is the stupidity coming from ? I think forcing the encoding
of a string containing a serialized document to be different of the
real encoding of the document for braindead interface decision is
where the stupidity lies. That's what must be fixed.
If DOM3 is stupid, get it fixed or don't use it, what else can I say ?
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]