'Re: [xml] "Control over encoding declaration (prolog and meta)"'


on 1/15/2004 6:06 PM Daniel Veillard wrote:

On Thu, Jan 15, 2004 at 05:26:43PM +0100, Kasimier Buchcik wrote:

DOM 3 LS - LSSerializer.writeToString

The output is written to a DOMString that is returned to the caller 
(this method completely ignores all the encoding information available).

   This is at best incomplete. Ignored in what sense ?

After a quite long time I received a final answer from the DOM group.
The declaration has to be "UTF-16" regardless of what 
Document.xmlEncoding is existent - so Igor's statement was correct.

The DOM group made the proposal to add the following to the DOM 3 LS 
specification (this is an excerpt from the proposal):

"When the encoding is UTF-16, whether or not the output is big-endian or
little-endian is implementation dependent, but a Byte Order Mark must be
generated for non-character outputs, such as LSOutput.byteStream or
LSOutput.systemId. If the Byte Order Mark is not generated, a
"byte-order-mark-needed" warning is reported. When the encoding is
UTF-16LS or UTF-16BE, the output is big-endian (UTF-16BE) or
little-endian (UTF-16LE) and the Byte Order Mark is not be generated. In
all case, the encoding declaration, if generated, will correspond to the
encoding used during the serialization (e.g. encoding="UTF-16" will
appear if UTF-16 was requested)."

of a string containing a serialized document to be different of the
real encoding of the document for braindead interface decision is
where the stupidity lies. That's what must be fixed.

Hmm, Daniel I guess you don't like the people to call your 
implementation "braindead", and I guess the DOM people don't like it either.

 If DOM3 is stupid, get it fixed or don't use it, what else can I say ?

:-) you know it's not *that* easy...

  It is !
  DOM 3 is not a REC, they are in the Candidate Recommendation phase,
asking for implementor feedback. 
  Stating that writeToString breaks normal XML serialization and makes it
an implementation serious problem is correct feedback and they will have to
reply to it before making progress !

 Except the deadline is over :-(
   "implementation feedbacks are welcome until 30 November 2003"

So at the end it was me who was braindead about thinking that the 
declaration has to be different from UTF-16 ;-)
All I have to do is to serialize to UTF-16 and strip off the BOM, since 
the DOMString has no BOM.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]