Re: [xml] Problem parsing MSWord HTML

On Fri, Feb 19, 2010 at 04:24:38PM +0100, Joachim Zobel wrote:

I am trying to parse HTML generated by MS Word. Although this starts
with a 

<html ... xmlns:o="urn:schemas-microsoft-com:office:office"

The parser complains about 

Tag o:p invalid

when I encounters such a tag?

Why is this?

  Because you are using an HTML parser to parse what looks like XHTML
i.e. XML version of HTML with what looks like MS extensions. You could
try to use the XML parser instead ,


Daniel Veillard      | libxml Gnome XML XSLT toolkit
daniel veillard com  | Rpmfind RPM search engine | virtualization library

