Re: [xml] When will you support xml version 1.1?



Daniel Veillard wrote:
On Wed, Jun 06, 2007 at 11:35:09AM +0200, Oliver Meyer wrote:
Hi everybody,

in xml 1.1 you are allowed to have e.g.  as an attribute value. My xmllint does not support that version.
Are you planning to support xml 1.1?

Kind Regards,
Oliver

foo.xml=

   <?xml version = "1.1" encoding = "UTF-8"?>
   <foo a= '&#7;'/>

And what is the meaning of that &#7; ?

BEL? I don't care :-)

And what is the _meaning_ of &#65; ?

It's ASCII.

My point on the subject is the following:
  - 1.1 allows to dump invalid content unchecked from database without
    worrying about semantic. Does this help interoperability ? No,
    clean up your databases
  - Also note that 1.1 rejects documents which are well-formed from
    an 1.0 perspective, see production RestrictedChar, the code point
    [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F] which used to be allowed as-is
    will now raise a well-formedness error.

I am part of the Working Group which created XML-1.1, there were good intents
for it like cleanup w.r.t. Unicode, but some big vendors also pushed for
allowing characters which were IMHO rightfully blocked in 1.0 . And it's
unfortunately not backward compatible. While I would be sensible to request driven by the good intents, yours
is from my perspective due to the fact that you have not well defined data
and you would like to make this 'portable'. Please clean your data instead of sending the problem to the next person in the food chain.

  I don't see how '&#7;' could make any sense if I received it in a
text document (yes XML is fundamentally text), maybe I need to be enlightened !
I still don't see _why_ an XML parser have to know &#7; or &#65; .

But this was debated to death in the Working Group before, my opinion is
well set, and I prefer to protect my users base from the real use of 1.1
(and thanks to the Web gods, the request to allow code point 0 was blocked !)

  In a nutshell, no, clean up your data, or use something else, if
you really want to send raw data, why not use binary directly ? That's
just fine, but don't pretend it's a text format.
We use XML for the structure.. And so going somewhere else (other, binary) would be a step back.

Our problem area has been ISO2709 which are converted to MARCXML (from network sources beyond our control). Right now problematic chars, say &#7, are just thrown away. Another option to avoid data loss would for us to make _private_ semantics <char num="7"/>.

/ Adam


Daniel





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]