Re: [xml] Problem parsing MSWord HTML
- From: Daniel Veillard <veillard redhat com>
- To: Joachim Zobel <jz-2006 heute-morgen de>
- Cc: libxml gnome <xml gnome org>
- Subject: Re: [xml] Problem parsing MSWord HTML
- Date: Mon, 22 Feb 2010 09:01:56 +0100
On Fri, Feb 19, 2010 at 04:24:38PM +0100, Joachim Zobel wrote:
Hi.
I am trying to parse HTML generated by MS Word. Although this starts
with a
<html ... xmlns:o="urn:schemas-microsoft-com:office:office"
The parser complains about
Tag o:p invalid
when I encounters such a tag?
Why is this?
Because you are using an HTML parser to parse what looks like XHTML
i.e. XML version of HTML with what looks like MS extensions. You could
try to use the XML parser instead ,
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]