[xml] html parsing

From: JASON JESSO <jesso1607 rogers com>
To: xml gnome org
Subject: [xml] html parsing
Date: Wed, 1 Dec 2004 10:30:32 -0500 (EST)

I have a python program where I am parsing a simple
html document.  When I use "htmlReadFile", I have no
problem.  When I use htmlReadMemory to parse from a
buffer, it does not work as I would expect.

The document is
<html>
   <body>
       Hello
   </body>
</html>
 
and my output is
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>

Am I using htmlReadMemory the right way here???

#!/usr/bin/env python

import libxml2, sys

def main( argv ):
     html = open( argv[1], 'r' ).read()

     # THIS WORKS.
     #doc = libxml2.htmlReadFile( argv[1], None, 32 )

     # THIS DOES NOT WORK.
     doc = libxml2.htmlReadMemory( html, len(html),
None, None, 32 )

     doc.htmlDocDump( sys.stdout )

     doc.freeDoc()

if __name__ == "__main__":
     main( sys.argv )

Follow-Ups:
- Re: [xml] html parsing
  - From: =?utf-8?b?RGFuaWxvIMWgZWdhbg==?=

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]