Re: [xml] Parser error: html entities not defined
- From: "Bruno Dilly" <bruno dilly gmail com>
- To: veillard redhat com
- Cc: xml <xml gnome org>, Liam R E Quin <liam holoweb net>
- Subject: Re: [xml] Parser error: html entities not defined
- Date: Thu, 6 Sep 2007 19:22:21 -0300
Indeed, the rss is not-well-formed. Is it possible to load an external
dtd not included in the rss?
For example, can I load
http://my.netscape.com/publish/formats/rss-0.91.dtd before parse the
file? And is possible to load it from a local file? How could I do it?
Thanks a lot
On 9/4/07, Daniel Veillard <veillard redhat com> wrote:
On Tue, Sep 04, 2007 at 10:19:18AM -0400, Liam R E Quin wrote:
On Tue, 2007-09-04 at 07:01 -0400, Daniel Veillard wrote:
On Tue, Sep 04, 2007 at 06:39:01AM -0300, Bruno Dilly wrote:
Hi people,
I'm trying to parse RSS with html entities, but I'm having the
following errors when it tries to parse the rss file:
Entity 'ntilde' not defined;
Entity 'iacute' not defined;
[...]
are the HTML entities defined in the RSS DTD ? if yes then you
need to ask to load the DTD. If no, then using them there is an error.
It's worse than that :-)
RSS requires HTML markup to be escaped in descriptions, so you have
to write things like
&ntilde;
and the same for elements, <i>...</i> to get <i>...</i> into
an RSS feed.
A lot of RSS feeds are invalid.
By default libxml2 XML parser does not fetch the external subset of DTDs.
So the undeclared entities are an error but not a fatal error (since of
course they never set standalone="yes"). The fact they are invalid doesn't
bother me that much, but it seems in practice they are often not well-formed
and a lot of RSS readers don't use XML parsers to work around this. One
thing is sure, I won't encourage this trend...
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]