'Re: [xml] "Control over encoding declaration (prolog and meta)'
- From: Kasimier Buchcik <kbuchcik 4commerce de>
- To: <xml gnome org>
- Subject: 'Re: [xml] "Control over encoding declaration (prolog and meta)'
- Date: Thu, 15 Jan 2004 17:26:43 +0100
Hi,
on 1/15/2004 2:54 PM Daniel Veillard wrote:
On Thu, Jan 15, 2004 at 01:19:13PM +0100, Kasimier Buchcik wrote:
Ok, this issue is DOM 3 related. As you might remember I'm still
struggeling with "to DOMString serialization" and "from DOMString
parsing", which has to be always UTF-16 encoded, regardless of the
content; so if I have e.g. an ISO-8859-1 document I still need it to be
serialized to UTF-16, but it still *has to* contain an encoding
declaration of ISO-8859-1.
No I'm not sure I understand.
DOM decided to use UTF16 for internal representation and interface,
libxml2 decided to use UTF8. I don't see the relationship w.r.t.
serialization.
There is no relationship.
If DOM3 APIs allows to serialize but don't allow to
control the effective encoding, they are buggy, and you should provide
a comment to the working group for clarification.
Hmm, ok, I guess I did not explain it clearly enough; so here the specs:
----------
DOMString:
http://www.w3.org/TR/2003/CR-DOM-Level-3-Core-20031107/core.html#ID-C74D1578
The DOMString type is used to store [Unicode] characters as a code unit
string as defined in section 3.4 of [CharModel]. Applications must
encode the characters using UTF-16 as defined in [Unicode] and Amendment
1 of [ISO/IEC 10646].
----------
DOM 3 LS - LSSerializer.writeToString
http://www.w3.org/TR/2003/CR-DOM-Level-3-LS-20031107/load-save.html#LS-LSSerializer-writeToString
The output is written to a DOMString that is returned to the caller
(this method completely ignores all the encoding information available).
----------
XHTML is XML, the tools MUST parse it following the XML rules which are
cristal clear, if your instance says "ISO-8859-1" and is encoded in
As stated above, XML spec on the one side, DOM spec on the other.
Sorry, I have a hard time about this.
You are not alone here.
Maybe DOM3 is really broken. There is a workaround : save with libxml2
and then convert back to UTF16 with a string conversion API.
Hmm
Daniel, you wrote some of your mails on the list that there are too many
entrypoints to the library already; I understand your concern, and
things like the xmlReadxxx API with all the nice options are really
compact and concise. So I wonder if it would be good to have a
xmlSerializexxx API; a serialization context sounds a bit heavy, but
more flexible - allowing extensible options for the future. And I would
be happy about a field "declaredEncoding" taking a custom encoding to be
declared. I really think the serialization will become far more complex,
and should be more customizable, if (hopefully) libxml2 will try help
out more with DOM stuff in the future.
If DOM is broken w.r.t. XML, well DOM must be fixed, not XML or
the zillion libraries and tools using it.
IMHO, I think the DOM people had a good reason to do it this way. Think
of a XML editor, that is not able to display all the zillion encodings
out there; with the specification to serialize any node to a string with
a specific encoding, all components just have to understand Unicode to
work with the data *without* changing the encoding information.
Finally I must admit that there would be a workaround for me: I could
serialize with the existing API, then encode to UTF-16LE. But since we
are using quite huge documents, I guess it will not acceptable in
matters of performance and seems rather stupid.
Where is the stupidity coming from ? I think forcing the encoding
With "stupid" I meant that the efford to get the desired encoding
declaration seems a bit oversized.
of a string containing a serialized document to be different of the
real encoding of the document for braindead interface decision is
where the stupidity lies. That's what must be fixed.
Hmm, Daniel I guess you don't like the people to call your
implementation "braindead", and I guess the DOM people don't like it either.
If DOM3 is stupid, get it fixed or don't use it, what else can I say ?
:-) you know it's not *that* easy...
Regards,
Kasimier
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]