[xml] html parsing
- From: JASON JESSO <jesso1607 rogers com>
- To: xml gnome org
- Subject: [xml] html parsing
- Date: Wed, 1 Dec 2004 10:30:32 -0500 (EST)
I have a python program where I am parsing a simple
html document. When I use "htmlReadFile", I have no
problem. When I use htmlReadMemory to parse from a
buffer, it does not work as I would expect.
The document is
<html>
<body>
Hello
</body>
</html>
and my output is
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
Am I using htmlReadMemory the right way here???
#!/usr/bin/env python
import libxml2, sys
def main( argv ):
html = open( argv[1], 'r' ).read()
# THIS WORKS.
#doc = libxml2.htmlReadFile( argv[1], None, 32 )
# THIS DOES NOT WORK.
doc = libxml2.htmlReadMemory( html, len(html),
None, None, 32 )
doc.htmlDocDump( sys.stdout )
doc.freeDoc()
if __name__ == "__main__":
main( sys.argv )
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]