Re: [xml] XML/HTML Mixed mode parsing

From: Daniel Veillard <veillard redhat com>
To: GPN <gpn libxml gmail com>
Cc: xml gnome org
Subject: Re: [xml] XML/HTML Mixed mode parsing
Date: Mon, 26 Sep 2005 11:15:48 -0400

On Mon, Sep 26, 2005 at 08:30:13PM +0530, GPN wrote:

Daniel Veillard wrote:

and hence most browsers do not complain about a page even
if it has errors. (This can be turned on though, but the
page display does not stop if there was an error).



 right that's how browser interpret HTML 4.x based on SGML with
an text/html Mime type. If there is an XML mime type they must 
use a real XML parser and fail on fatal errors.

I am seeing if there is a viable solution for this. I need to parse
html pages, which will have xml content.
a) If I use an XML parser, then the parsing process will stop
even there was an error in html tags.
b) If I use a html parser, then the tags/atributes will be converted
to lower case (breaking XML rules).


  b) is a bit extreme, and should probably be fixed *but* any XML 
passed though an HTML parser loose all its garantee of portability
that drove to use XML in the first place, this is broken. island
of foreign vocabularies in XHTML makes sense, but not in SGML HTML.
  Add a request for enhancement about not converting the names
to lower case in bugzilla, that could be added as an HTML parsing option
and probably not too hard to add.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

References:
- [xml] XML/HTML Mixed mode parsing
  - From: GPN
- Re: [xml] XML/HTML Mixed mode parsing
  - From: Daniel Veillard
- Re: [xml] XML/HTML Mixed mode parsing
  - From: GPN

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]