Re: [xml] [xslt] HTML vs. XHTML: different output when including a file with \r\n [WAS: xmllint vs. xsltproc: different output when including a file with \r\n]



On 24/01/10 21:14, Boris Schaeling wrote:
On Sat, 23 Jan 2010 20:51:46 +0100, Boris Schaeling <boris highscore de>
wrote:

When I use "xmllint --xinclude" to include a text file with \r\n end
line characters xmllint inserts &#13; characters while "xsltproc
--xinclude" doesn't. I'm currently trying to find out (on the DocBook
mailing list; see
http://lists.oasis-open.org/archives/docbook/201001/msg00052.html) why
the output is different and what to do to make xmllint not generate
&#13; characters. Maybe someone here can tell me if this is a bug or
if there is a trick to change xmllint's output (I couldn't find a
command line option so far)?

As it turns out the problem is different. After some discussions on the
DocBook mailing list it is now clear that generating XHTML leads to
&#13; characters being inserted while no &#13; characters are inserted
when generating HTML. Thus it depends on the xsl:output setting of the
stylesheet used with xsltproc if &#13; characters are inserted or not
(please forget what I wrote about xmllint; xmllint can be ignored). Bob
Stayton explained this in his message to the DocBook mailing list:
http://lists.oasis-open.org/archives/docbook/201001/msg00065.html

The question is now why \r becomes &#13; when generating XHTML but not
when generating HTML? Are there any specifications which are different
for XHTML and HTML when it comes to xincluding simple text files with
\r\n end line markers?

(Cross-posting to xml gnome org   )

It seems that the default behavior of libxml is to encode "\r" as "&#13;". But there is an exception for HTML in xmlEncodeEntitiesReentrant in entities.c. I haven't checked, but looking at the source the XHTML serialization code seems to call xmlEscapeContent in xmlIO.c. There's also xmlEscapeEntities in xmlsave.c but that uses hex char refs. Those two functions don't make an exception for XHTML content.

Personally, I think libxml shouldn't escape "\r" at all.

Nick



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]