Re: [xml] xmlSaveFormatFileEnc() creating invalid XML



On Fri, Sep 09, 2011 at 04:30:45PM +0200, Murray Cumming wrote:
On Fri, 2011-09-09 at 10:21 -0400, Jason Viers wrote:
On 9/9/2011 05:37, Murray Cumming wrote:
Here is a simple test case that takes the text from an apparently-valid
UTF-8 file

Not all valid UTF-8 is valid in XML.  Only a subset, as defined in
http://www.w3.org/TR/2008/REC-xml-20081126/#charsets

Note that Form Feed (0xC) is not allowed.  Your original input document 
contains a formfeed character, and this is what ends up being invalid.  
It's not a matter of escaping; form feed as a literal byte, numeric 
reference, etc., is not allowed.
Stripping the form feed from the input allows it to serialize properly.

Ah, I didn't know that it couldn't be there even if escaped. Thanks.

Shouldn't libxml warn about that at the same time that it would escape
characters such as & and < rather than writing invalid XML?

  It's a choice, either you make all APIs validate all input strings
or you rely on the client to do it. In libxml2 I took the second path
and that was decided 10+ years ago. The parser on the other hand is
strict but that's mandatory to follow the spec.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]