[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] Applying XSLT to HTML
- From: Stefan Behnel <stefan_ml behnel de>
- To: Dmitry Dzhus <mail sphinx net ru>
- Cc: xml gnome org
- Subject: Re: [xml] Applying XSLT to HTML
- Date: Mon, 02 Jul 2007 16:11:13 +0200
Dmitry Dzhus wrote:
> My aim is to apply XSLT to some HTML document (which may be broken
> just a little).
>
> I'm using standard Python libxml2/libxslt bindings.
>
> My code is:
>
> mf_extract = libxslt.parseStylesheetFile("mf-extract.xsl")
>
> doc = libxml2.readHtmlFile(url, None, libxml2.HTML_PARSE_RECOVER)
>
> mf_extract.applyStylesheet(doc, None)
>
> Applying XSLT results as if there were no content in `doc` tree at
> all. Using `readFile` instead of `readHtmlFile` works fine as
> expected.
>
> I tried to `print doc` after using both `readHtmlFile` and `readFile`
> and noticed that, given the input document is well-formed, the output
> differs only in XML declaration at the very beginning.
>
> As I understand (and `document.type` indicates), using `readFile` and
> `readHtmlFile` results in different kinds of documents --
> `document_xml` and `document_html` -- while applying XSLT is only
> possible with `document_xml` one. Is there any way to convert
> `document_html` to `document_xml`?
Consider using lxml.
http://codespeak.net/lxml/
untested:
import lxml.etree as et
parser = et.HTMLParser()
doc = et.parse(url, parser)
doc.xslt(et.parse("mf-extract.xsl"))
for el in doc.getiterator("*"):
if '{' not in el.tag:
el.tag = "{http://www.w3.org/1999/xhtml}" + el.tag
Stefan
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]