[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] Bug#500015: Cannot parse feed containing SOH character
- From: Daniel Veillard <veillard redhat com>
- To: xml gnome org
- Subject: Re: [xml] Bug#500015: Cannot parse feed containing SOH character
- Date: Fri, 3 Oct 2008 14:40:54 +0200
On Thu, Sep 25, 2008 at 10:43:41PM +0200, Mike Hommey wrote:
> Hi,
>
> I got this forwarded as a wishlist bug for libxml2, but that doesn't
> sound right to me. I always thought control characters are not allowed
> in XML, though looking in the XML spec, I can't find anything
> definitive...
>
> Daniel, what do you think?
Your mail was lost within around 150+ bounce mails accumulated on the
list (in a few days), make 100% sure your posting address is the one you're
subscribed with, with such a rate of bounce I can miss valid posts in
the mass of SPAMs and errors.
> > > As a matter of fact, the XML spec says (http://www.w3.org/TR/REC-xml/#dt-character)
> > > that
> > >
> > > Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
> > >
> > > so  is not a valid char for an XML document.
That's correct
> > I don't think this is a correct inference. In
> > http://www.w3.org/TR/REC-xml/#charsets, it says
> >
> > Consequently, XML processors MUST accept any character in the range
> > specified for Char. ]
> >
> > Character Range
> >
> > [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | /* any Unicode character,
> > [#xE000-#xFFFD] | excluding the surrogate
> > [#x10000-#x10FFFF] blocks, FFFE, and FFFF. */
> >
> > but it doesn't specify that it must accept *only* characters in that
> > range. In fact, the next paragraph states
> >
> > All XML processors MUST accept the UTF-8 and UTF-16 encodings of
> > Unicode 3.1 ...
> >
> > In http://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt, the
> > list of Unicode 3.1 characters, the SOH character is the second entry.
That's bull...t
The allowed set of caracter is enumerated in the Char production, that
simple. Put a caracter out of that range in the document (whatever the
encoding used) and the processor MUST consider this a fatal error, raise
it to the application and stop passing data to the application from that
point in the document.
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]