[xml] html parsing



I have a python program where I am parsing a simple
html document.  When I use "htmlReadFile", I have no
problem.  When I use htmlReadMemory to parse from a
buffer, it does not work as I would expect.

The document is
<html>
   <body>
       Hello
   </body>
</html>
 
and my output is
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>

Am I using htmlReadMemory the right way here???

#!/usr/bin/env python

import libxml2, sys

def main( argv ):
     html = open( argv[1], 'r' ).read()

     # THIS WORKS.
     #doc = libxml2.htmlReadFile( argv[1], None, 32 )

     # THIS DOES NOT WORK.
     doc = libxml2.htmlReadMemory( html, len(html),
None, None, 32 )

     doc.htmlDocDump( sys.stdout )

     doc.freeDoc()

if __name__ == "__main__":
     main( sys.argv )




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]