[xml] Does libxml parse html like browsers?



I'm currently using TidyLib to correct HTML codes from
web site pages into proper XHTML for eventual parsing
into a tree, but I'm thinking about using libxml to do
this instead because a) Tidylib seems to choke on some
pages with errors and b) it might be redundant code
since libxml has an HTML parser already.

But I need to know if libxml will parse an 'incorrect'
HTML (you know, the usual kind on the net) and build a
tree like browsers do, regardless of those errors (ie.
throwing away whatever's not valid), or if it will
stop on errors, leaving me with no tree.

Also, is there an example specific to parsing HTML? Or
is the procedure the same as for xml with the
xmlXXXXXXX commands replaced with their htmlXXXXXXX equivalents?


                
__________________________________ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/ 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]