Re: [xml] Proposal to move mutation of HTML boolean attribute values out of the parser



On Wed, Mar 03, 2010 at 09:22:52AM -0500, Joshua Marantz wrote:
In https://bugzilla.gnome.org/show_bug.cgi?id=611655 I reported what looked
like a bug where a tag like:

  <option selected>

would be transformed, in the parser to

  <option selected="selected">

This is consistent with
http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.3.4.2 but that spec also
warns:

   Authors should be aware that many user agents *only* recognize the
minimized form of boolean attributes and not the full form.

By making this transformation in the parser, it is not possible to use
libxml2 to process HTML without potentially breaking behavior in some
browsers.


Currently this transformation is implemented in HTMLparser.c in the static
function htmlParseAttribute, based on htmlIsBooleanAttr(name) .  Daniel
Veillard explains that some downstream tools expect this transformation to
be done. However, I would like to propose that this transformation be moved
out of the parser and done in a later phase.  Daniel suggested "the SAX2.c
module building the tree". This would be OK for my purposes, as I am using
my own SAX bindings. and not relying on the tree-building code.

So I'm proposing this change to see if there are objections.

  Okay, I just did this as I think this makes sense overall. Change is
in git head.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]