[xml] Proposal to move mutation of HTML boolean attribute values out of the parser



In https://bugzilla.gnome.org/show_bug.cgi?id=611655 I reported what looked like a bug where a tag like:

  <option selected>

would be transformed, in the parser to

  <option selected="selected">

This is consistent with http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.3.4.2 but that spec also warns:

   Authors should be aware that many user agents only recognize the minimized form of boolean attributes and not the full form.

By making this transformation in the parser, it is not possible to use libxml2 to process HTML without potentially breaking behavior in some browsers.


Currently this transformation is implemented in HTMLparser.c in the static function htmlParseAttribute, based on htmlIsBooleanAttr(name) .  Daniel Veillard explains that some downstream tools expect this transformation to be done. However, I would like to propose that this transformation be moved out of the parser and done in a later phase.  Daniel suggested "the SAX2.c module building the tree". This would be OK for my purposes, as I am using my own SAX bindings. and not relying on the tree-building code.

So I'm proposing this change to see if there are objections.

Thanks!
-Josh



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]