[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] XML/HTML Mixed mode parsing



On Mon, Sep 26, 2005 at 08:30:13PM +0530, GPN wrote:
> Daniel Veillard wrote:
> >>and hence most browsers do not complain about a page even
> >>if it has errors. (This can be turned on though, but the
> >>page display does not stop if there was an error).
> >
> >
> >  right that's how browser interpret HTML 4.x based on SGML with
> >an text/html Mime type. If there is an XML mime type they must 
> >use a real XML parser and fail on fatal errors.
> >
> I am seeing if there is a viable solution for this. I need to parse
> html pages, which will have xml content.
> a) If I use an XML parser, then the parsing process will stop
> even there was an error in html tags.
> b) If I use a html parser, then the tags/atributes will be converted
> to lower case (breaking XML rules).

  b) is a bit extreme, and should probably be fixed *but* any XML 
passed though an HTML parser loose all its garantee of portability
that drove to use XML in the first place, this is broken. island
of foreign vocabularies in XHTML makes sense, but not in SGML HTML.
  Add a request for enhancement about not converting the names
to lower case in bugzilla, that could be added as an HTML parsing option
and probably not too hard to add.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]