Re: [xml] disabling entity replacement



On 4/18/06, Liam R E Quin <liam holoweb net> wrote:
On Tue, 2006-04-18 at 19:43 -0400, Alex Khesin wrote:
> I am building Atom/RSS SAX2 parser using libxml, and in order to
> implement http://www.atomenabled.org/developers/syndication/#text for
> type="xhtml", I need to be able to completely disable entity
> replacement.

You don't need to turn entity replacement off to read RSS or Atom.
If you are writing RSS, you need to escape the embedded markup.

Or, for the special one-time fee of only half a million dollars I'll
come to Google and explain how XML works :-)

I knew I should not have sent this from my work email :)

OK, I really do hope I am being dumb, as I thought I too knew how XML
worked.  Only I think Atom spec is breaking that - please take a look
at the spec I referenced in my email,
http://www.atomenabled.org/developers/syndication/#text

"If type="html", then this element contains entity escaped html.

<title type="html">
 AT&amp;amp;T bought &lt;b&gt;by SBC&lt;/b&gt;!
</title>

If type="xhtml", then this element contains inline xhtml, wrapped in a
div element.

<title type="xhtml">
 <div xmlns="http://www.w3.org/1999/xhtml";>
   AT&amp;T bought <b>by SBC</b>!
 </div>
</title>
"

which means that in the xhtml case, the spec calls for the parser to
not to do entity replacement, but return the child nodes verbatim. Else what would be the difference between type="html" and
type="xhtml"?

The spec might be broken, from XML perspective, but it is already in
the wild.  Here is a snippet from a valid Atom 1.0 feed,
http://www.intertwingly.net/blog/index.atom:

<content type="xhtml">
  ...
 <pre class="code">&lt;script src="pager.js" type="text/javascript" /&gt;</pre>

But I now know how to fix this, taking inspiration from
http://feedparser.org/ - I will introduce entities back when
type="xhtml".  Suboptimal, but works.

- Alex



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]