Re: [xml] Quoting of carriage return characters in libxml2/libxslt output



On Tue, Jun 07, 2011 at 11:35:10PM +0100, Laurence Rowe wrote:
I've found that libxml2/libxslt will quote carriage return characters
in output is 
. When outputting (X)HTML this causes rendering
errors on at least Firefox, as the html:
"<pre>Line1&#13;\nLine2</pre>" is rendered differently to the html
"<pre>Line1\r\nLine2</pre>".

This is a problem because some of our page content originates from
browser text areas, and as such the text is submitted to the server
with CRLF line endings. We use libxslt in front of several systems,
not all of which we control. It's not really practical to change the
application behaviour here.

Is it possible to switch off this quoting behaviour on serialization?

  For XML, no, the reason is here:

http://www.w3.org/TR/REC-xml/#sec-line-ends

If an XML parser finds \r\n in the input it automatically remove the
first character. XHTML being an XML language it should behave the same.

If libxml2 sees \r\n sequence in an XML text node, then it assume the
user wants its data back intact after XML parsing of the output. Which
is why it outputs &#13;\n to avoid the \r from being stripped when the
consuming XML parser(s) will find the sequence.

Maybe your data didn't come from XML parsing, but really we can't avoid
this a priori in libxml2 serializer (or we would have to extend the
xmlsave APIs to allow this specifically, but anyway XSLT output should
not use the libxml2 serialization directly but the libxslt ones - unless
you really know what you are doing)

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]