Re: [xml] Problem with CDATA entities



On Wed, May 18, 2005 at 10:16:44PM +0100, Nic Ferrier wrote:
I'm having a problem with CDATA entities. You can see the same problem
by doing this:

  xmllint http://www.oreillynet.com/meerkat/?_fl=rss10&t=ALL&c=5136

In other words download the O'Reilly ONJAVA RSS feed. This feed uses
an HTML DTD include like this:

  <!DOCTYPE rdf:RDF [
  <!ENTITY % HTMLlat1 PUBLIC
     "-//W3C//ENTITIES Latin1//EN//HTML"
     "http://www.w3.org/TR/PR-html40/HTMLlat1.ent";>
  %HTMLlat1;
  ]>

  Seems they seriously lack a QA department there.

The w3 dtd has this in it:

  <!ENTITY nbsp   CDATA "&#160;" -- no-break space = non-breaking space, 
                                    U+00A0 ISOnum -->
[...]
And the error from xmllint one gets is related directly to this:

  http://www.w3.org/TR/html40/HTMLlat1.ent:12: parser error : Entity value required

  Your XML file reference an SGML DTD fragment which has a different syntax.
Your XML is as a result not an XML file, it is not well formed, but only
a validating XML parser fetching the external subset can detect it.
  As far as I can tell xmllint is right and the error message is quite accurate.


Interestingly, this:

  http://www.flightlab.com/~joe/sgml/cdata.html

suggests that there is common confusion about CDATA entities.

  In SGML ! You are using an XML parser. Show me how you generate

   <!ENTITY nbsp   CDATA "&#160;" 

from the production [70] of
  http://www.w3.org/TR/REC-xml/#NT-EntityDecl

Seems people are so used to digest any crap in RSS that they didn't even
managed to find this monstruosity any validating XML parser should show.
Blame them, not libxml2, thanks.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]