Re: [xml] xmlEncodeSpecialChars and carriage reurn / CRLF / 0x0D 0x0A / \r\n / 13, 10



On Thu, Aug 09, 2007 at 11:54:21AM +0400, SABROG wrote:
Converting string contains "\n" with isolat1ToUTF8(...) don't help, i still not see "
", just LF. How i 
can write RAW string without checks and etc ?

  I don't see the relationship with isolat1ToUTF8 which is an encoding 
converter.

  From an XML perspective escaping of code point 10 is needed only in attribute
value because of the rules I pointed in the XML spec. You need this to avoid
attribute normalization that the XML parser may do in attribute values. For
the values in element content there is no need to do that escaping so libxml2
does not do it:

paphio:~/XML -> cat test.xml
<foo attr="a&#10;b">a&#10;b</foo>
paphio:~/XML -> xmllint test.xml
<?xml version="1.0"?>
<foo attr="a&#10;b">a
b</foo>
paphio:~/XML -> 

  Any application using a compliant XML parser *MUST* see exactly the same
input if it receives:

<foo>a&#10;b</foo>

and

<foo>a
b</foo>

  the 2 content strings must be indistinguishable. After having gone though 
an XML parser. If your application behaves differently that means it's not 
really XML compliant, and I'm afraid there is nothing libxml2 should do to
cope wit this.
  If the caracter code point 10 occurs in an attribute value then that's
a completely different story because processing of text strings there is
different and preserving &#10; is needed when serializing. This is tested
in libxml2 regression test/att3 as part of the test suite. See

  http://www.w3.org/TR/REC-xml/#AVNormalize

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]