Re: [xml] Quoting of carriage return characters in libxml2/libxslt output



On 8 June 2011 11:47, Daniel Veillard <veillard redhat com> wrote:
On Tue, Jun 07, 2011 at 11:35:10PM +0100, Laurence Rowe wrote:
I've found that libxml2/libxslt will quote carriage return characters
in output is &#13;. When outputting (X)HTML this causes rendering
errors on at least Firefox, as the html:
"<pre>Line1&#13;\nLine2</pre>" is rendered differently to the html
"<pre>Line1\r\nLine2</pre>".

This is a problem because some of our page content originates from
browser text areas, and as such the text is submitted to the server
with CRLF line endings. We use libxslt in front of several systems,
not all of which we control. It's not really practical to change the
application behaviour here.

Is it possible to switch off this quoting behaviour on serialization?

 For XML, no, the reason is here:

http://www.w3.org/TR/REC-xml/#sec-line-ends

If an XML parser finds \r\n in the input it automatically remove the
first character. XHTML being an XML language it should behave the same.

If libxml2 sees \r\n sequence in an XML text node, then it assume the
user wants its data back intact after XML parsing of the output. Which
is why it outputs &#13;\n to avoid the \r from being stripped when the
consuming XML parser(s) will find the sequence.

Maybe your data didn't come from XML parsing, but really we can't avoid
this a priori in libxml2 serializer (or we would have to extend the
xmlsave APIs to allow this specifically, but anyway XSLT output should
not use the libxml2 serialization directly but the libxslt ones - unless
you really know what you are doing)

Thanks, that helped me to understand what was going on. I'm seeing the
interaction of the HTMLParser and XSLT method="xml", the HTMLParser
does not perform the same substitution of '\r\n' -> '\n' as the
XMLParser. I can reproduce it using xsltproc (see below).

I can perform the string replacement myself before feeding data to the
HTMLParser, though I guess it would be more efficient to make this an
HTMLParser option.

$ cat identity.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
    <xsl:output method="xml" omit-xml-declaration="yes"
        doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";
        doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

$ cat in.html  # <pre> text is "Line1\r\nLine2"
<html>
<body>
<pre>Line1
Line2</pre>
</body>
</html>

$ xsltproc --html identity.xsl in.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";><body>
<pre>Line1&#13;
Line2</pre>
</body></html>


Laurence



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]