"Re: [xml] serialization to UTF-16LE"


Daniel Veillard wrote:
On Mon, Nov 03, 2003 at 01:47:48PM +0100, Kasimier Buchcik wrote:

Kasimier Buchcik wrote:

I get different serialized results for text-nodes and attribute-values 
if using "xmlSaveFormatFileTo" (note that I'm using values like "öäü?").

  Well, right this looks strange but text node and attribute-values must have
different serialization routines anyway (line-feed need to be escaped as
character reference for example).

Yes, I'm aware of those special cases. I just wonder what to do with 
letters that could be serialized more human-readable.

The resulting xml:

<?xml version="1.0" encoding="UTF-16LE"?>
<foo bar="&#xE4;&#xF6;&#xFC;">öäü?</foo>

Sorry, it has to be:

<?xml version="1.0" encoding="UTF-16LE"?>
<foo bar="&#xE4;&#xF6;&#xFC;&#x20AC;">öäü?</foo>

The attribute value is escaped, the text-nodes value not.
Since I did not found anything in the specs that states, that attribute 
values need to be excaped, although they *could* be serialized with the 
stated encoding (UTF-16LE), I'm asking for help on information; did I 
overlook something in the specs, or if this is a not intended bahaviour?
(Maby some context could be usefull: I'm just trying to implement (in 
Delphi) the w3c's saveToString method of the DOMSerializer interface).

  Well, how is that really a problem ? The serialization is different, 
but have to be different anyway. I don't see why this would be not
compliant or a problem for DOM implementation.

Hmm, maby this would not be an issue of compliance, but of readability. 
Why using XML with special encodings if attribute values are escaped and 
consequently made non human-readable? Ironically one could say that 
values of text nodes could be escaped as well, but at that moment 
encoding would begin to make no sense.
However, do you see a workaround for this inconvenience? Which 
characters *need* to be escaped and which not? Can I hook in, somewhere 
in the lib, to adjust the serialization of the values of attributes?



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]