Re: [xml] Content normalization





Daniel Veillard wrote:
On Thu, Jul 07, 2005 at 10:57:05PM +0200, Armin Bauer wrote:

The problem where this appears is in syncml handling. SyncML is often
used to send VCards, VEvents etc which are required to have \r\n as a
line ending. The xml output is then parsed into wbxml and send to some
device like a mobile or PDA etc. The problem is that these devices
expect the vcards to have \r\n as line ending and they dont do replace
the entity reference and they also dont normalize \r\n to \n.


  Well then it's not expected to be processed as XML, so why does this
concerns libxml2 ?


The questions is what to do.


  No the question is "are those XML application you're using or communicating
with ?", if they don't do normalization nor replace the entities references
then the answer is no. You're not manipulating XML but something which looks
like XML but isn't expected to be processed as such. I'm not fond of adding
yet another set of APIs or switches to cope with yet another pseudo XML
parsing framework. That's why we went to errors being fatal in XML and
stringent on spec errors violations. Those tools don't allow you to benefits
from the standardization benefits, complain about them not about libxml2
being compliant.


The option of filtering the output again seems awkward to me...


  they process XML in broken ways, that's why.


And if i understand the xml specs correctly sending and \r\n as _output_
is considered valid (http://www.w3.org/TR/REC-xml/#NT-S) since they are
to be removed in input anyways. So it should be possible to choose
wheter to escape \r or not.


  No because *any* compliant XML parser must do the replacement on input
and replace numeric character references by the character with the associated
code point when passing data to the application. They have problem processing
XML input generated by libxml2 only because they are not compliant. They
are not XML compliant applications, period. Sorry you will have to cope
with this, as this has no place in an XML toolkit IMHO, or use the HTML
serializer or serialize the trees yourself or post process. The added work
on your side is only due to non compliance on theirs.


ok. sounds reasonable to me. i understand that its better to not add
another option just to generate invalid xml.

i think the problem with mobile devices is that they parse the wbxml
directly without converting it to xml (and therefore the entity
refernces are never replaced). Guess i have to post process the xml
output then :)

Daniel

Attachment: signature.asc
Description: OpenPGP digital signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]