Re: [xslt] HTML vs. XHTML: different output when including a file with \r\n [WAS: xmllint vs. xsltproc: different output when including a file with \r\n]

On Mon, Jan 25, 2010 at 12:59:56PM +0100, Boris Schaeling wrote:
> On Sun, 24 Jan 2010 23:20:42 +0100, Nick Wellnhofer <wellnhofer aevum de>
> wrote:
> >[...]It seems that the default behavior of libxml is to encode
> >"\r" as "&#13;". But there is an exception for HTML in
> >xmlEncodeEntitiesReentrant in entities.c. I haven't checked, but
> >looking
> This would confirm our assumption that it's libxml which treats \r
> differently depending on the output format.
> >at the source the XHTML serialization code seems to call
> >xmlEscapeContent in xmlIO.c. There's also xmlEscapeEntities in
> >xmlsave.c but that uses hex char refs. Those two functions don't
> >make an exception for XHTML content.
> >
> >Personally, I think libxml shouldn't escape "\r" at all.
> As one function distinguishes between HTML and XHTML and the others
> escape \r I wonder what the use cases looked like. So far it would
> also make more sense to me if \r is not escaped for XHTML (at least
> one popular reading system for ePub files - which contain XHTML
> files - shows a question mark for &#13; entities).

\r  is turned into \a or simply discarded by any XML parser.
As a result if you put \r  in an XML tree text node libxml2
*must* escape it to avoid this treatment.
This is not needed for HTML because no such rule exists for HTML parser.
If you serialize XML this is expected to be consumed by an XML parser.
If you serialize HTML this is expected to be consumed by an HTML parser.

If you ask for something to be in an XML tree then libxml2 will do
whatever is needed to have this if you reparse the serialized tree.
If a parser choke on &#13; in an XML document, it's not a compliant XML
parser the problem is on the receiving end.

  Not a bug.


Daniel Veillard      | libxml Gnome XML XSLT toolkit
daniel veillard com  | Rpmfind RPM search engine | virtualization library

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]