Re: 'Re: [xml] "Control over encoding declaration (prolog and meta)'



On Thu, Jan 15, 2004 at 01:19:13PM +0100, Kasimier Buchcik wrote:
Ok, this issue is DOM 3 related. As you might remember I'm still 
struggeling with "to DOMString serialization" and "from DOMString 
parsing", which has to be always UTF-16 encoded, regardless of the 
content; so if I have e.g. an ISO-8859-1 document I still need it to be 
serialized to UTF-16, but it still *has to* contain an encoding 
declaration of ISO-8859-1.

  No I'm not sure I understand. 
DOM decided to use UTF16 for internal representation and interface,
libxml2 decided to use UTF8. I don't see the relationship w.r.t. 
serialization. If DOM3 APIs allows to serialize but don't allow to
control the effective encoding, they are buggy, and you should provide
a comment to the working group for clarification.

XHTML is XML, the tools MUST parse it following the XML rules which are
cristal clear, if your instance says "ISO-8859-1" and is encoded in

As stated above, XML spec on the one side, DOM spec on the other.

  Sorry, I have a hard time about this.
Maybe DOM3 is really broken. There is a workaround : save with libxml2
and then convert back to UTF16 with a string conversion API.

Daniel, you wrote some of your mails on the list that there are too many 
entrypoints to the library already; I understand your concern, and 
things like the xmlReadxxx API with all the nice options are really 
compact and concise. So I wonder if it would be good to have a 
xmlSerializexxx API; a serialization context sounds a bit heavy, but 
more flexible - allowing extensible options for the future. And I would 
be happy about a field "declaredEncoding" taking a custom encoding to be 
declared. I really think the serialization will become far more complex, 
and should be more customizable, if (hopefully) libxml2 will try help 
out more with DOM stuff in the future.

  If DOM is broken w.r.t. XML, well DOM must be fixed, not XML or
the zillion libraries and tools using it.

Finally I must admit that there would be a workaround for me: I could 
serialize with the existing API, then encode to UTF-16LE. But since we 
are using quite huge documents, I guess it will not acceptable in 
matters of performance and seems rather stupid.

  Where is the stupidity coming from ? I think forcing the encoding
of a string containing a serialized document to be different of the
real encoding of the document for braindead interface decision is
where the stupidity lies. That's what must be fixed.
  If DOM3 is stupid, get it fixed or don't use it, what else can I say ?

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]