Re: [xml] disabling entity replacement

From: "Alex Khesin" <alexk google com>
To: "Liam R E Quin" <liam holoweb net>, xml gnome org
Cc:
Subject: Re: [xml] disabling entity replacement
Date: Wed, 19 Apr 2006 00:58:04 -0400

On 4/18/06, Liam R E Quin <liam holoweb net> wrote:

On Tue, 2006-04-18 at 19:43 -0400, Alex Khesin wrote:
> I am building Atom/RSS SAX2 parser using libxml, and in order to
> implement http://www.atomenabled.org/developers/syndication/#text for
> type="xhtml", I need to be able to completely disable entity
> replacement.

You don't need to turn entity replacement off to read RSS or Atom.
If you are writing RSS, you need to escape the embedded markup.

Or, for the special one-time fee of only half a million dollars I'll
come to Google and explain how XML works :-)

I knew I should not have sent this from my work email :)

OK, I really do hope I am being dumb, as I thought I too knew how XML
worked.  Only I think Atom spec is breaking that - please take a look
at the spec I referenced in my email,
http://www.atomenabled.org/developers/syndication/#text

"If type="html", then this element contains entity escaped html.

<title type="html">
 AT&amp;amp;T bought &lt;b&gt;by SBC&lt;/b&gt;!
</title>

If type="xhtml", then this element contains inline xhtml, wrapped in a
div element.

<title type="xhtml">
 <div xmlns="http://www.w3.org/1999/xhtml";>
   AT&amp;T bought <b>by SBC</b>!
 </div>
</title>
"

which means that in the xhtml case, the spec calls for the parser to

not to do entity replacement, but return the child nodes verbatim.Else what would be the difference between type="html" and

type="xhtml"?

The spec might be broken, from XML perspective, but it is already in
the wild.  Here is a snippet from a valid Atom 1.0 feed,
http://www.intertwingly.net/blog/index.atom:

<content type="xhtml">
  ...
 <pre class="code">&lt;script src="pager.js" type="text/javascript" /&gt;</pre>

But I now know how to fix this, taking inspiration from
http://feedparser.org/ - I will introduce entities back when
type="xhtml".  Suboptimal, but works.

- Alex

Follow-Ups:
- Re: [xml] disabling entity replacement
  - From: A. Pagaltzis

References:
- [xml] disabling entity replacement
  - From: Alex Khesin
- Re: [xml] disabling entity replacement
  - From: Liam R E Quin

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]