Re: [xml] XML/HTML Mixed mode parsing



Daniel Veillard wrote:
On Mon, Sep 26, 2005 at 07:42:40PM +0530, GPN wrote:

Hello,
Today's web page designs, include both xml and html content,
most popularly xml content embedded in to html pages.

My questions are -
1. In order to handle both html and xml, it is better to use
the xml...() API's rather than the html...() API's.
Is this correct?


either it is a complete XML document or not. If this is then use an XML parser. There is no intermediate or compound
level at the spec level.


2. Handling errors encountered while parsing the content.
I think (or assuming) that html errors are to be ignored,


  no.


and hence most browsers do not complain about a page even
if it has errors. (This can be turned on though, but the
page display does not stop if there was an error).


  right that's how browser interpret HTML 4.x based on SGML with
an text/html Mime type. If there is an XML mime type they must use a real XML parser and fail on fatal errors.

I am seeing if there is a viable solution for this. I need to parse
html pages, which will have xml content.
a) If I use an XML parser, then the parsing process will stop
even there was an error in html tags.
b) If I use a html parser, then the tags/atributes will be converted
to lower case (breaking XML rules).

Best Regards,
GPN



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]