Re: [xml] Problem parsing MSWord HTML



On Fri, Feb 19, 2010 at 04:24:38PM +0100, Joachim Zobel wrote:
Hi.

I am trying to parse HTML generated by MS Word. Although this starts
with a 

<html ... xmlns:o="urn:schemas-microsoft-com:office:office"

The parser complains about 

Tag o:p invalid

when I encounters such a tag?

Why is this?

  Because you are using an HTML parser to parse what looks like XHTML
i.e. XML version of HTML with what looks like MS extensions. You could
try to use the XML parser instead ,

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]