Re: [xml] xmllint and HTML



On Fri, Oct 31, 2003 at 01:00:41AM +0000, Nick Kew wrote:
On Thu, 30 Oct 2003, Daniel Veillard wrote:
  That's a BAD idea !!!  Suppose the libxml2 HTML parser sees

  <p>
  <foo>

Does <foo> closes <p> ? is <foo> itself expected to be closed ?

That would be defined by DTD.  

  SGML DTDs are too complext to be supported by libxml2, so this won't work.

Or it could be built in to the processor,
as HTML4 is in libxml's htmlParser.

  I don't think augmenting the HTML parser with random vocabularies
makes any sense either. XML was defined to do this precisely because
it was to hard within the HTML SGML framework.

It's entirely possible to use <foo/> with htmlParser and get meaningful
behaviour.  The parser will, by default, generate the SAX events as-if
it were XML.

  No sorry, there is still the undertainty of the autoclosure, do you get
    startElement(p)
    endElement(p)
    startElement(foo)
or
    startElement(p)
    startElement(foo)
 
 it is not meaningful to act on SAX strem without processing endElement()

If you want to extend the base syntax *use XML* ! It was designed
precisely to overcome the limitation that an SGML HTML parser has.

On the contrary, an SGML parser has fewer limitations, due to the
far greater flexibility and expressiveness of the language.  For

  and there are 2 parser conformant in the world, the DoD one and
James Clark. Use SGML if you want, but it's not a libxml2 topic !

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]