Re: [xml] html parsing



John Fleck said:
I also get the same as Danilo:

[jfleck localhost htmlread]$ ./htmlread.py html.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body><p>
      Hello
  </p></body></html>
[jfleck localhost htmlread]$ xmllint --version
xmllint: using libxml version 20615-CVS2255
   compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer
XInclude Iconv Unicode Regexps Automata Schemas



--
John Fleck
http://www.inkstain.net/fleck/

"In a world gone mad, what right do we have to yok it up?"
 - Griffy

I have no idea what the version numbers being used by Jason Jesso are.  I do
know that on my system, using the absolutely latest CVS, there is no problem:

bill billsuper work $ cat jesso.html
<html>
   <body>
       Hello
   </body>
</html>

bill billsuper work $ cat jesso.py
#!/usr/bin/env python
import libxml2, sys
def main( argv ):
     html = open( argv[1], 'r' ).read()
     doc = libxml2.htmlReadMemory( html, len(html), None, None, 32 )
     doc.htmlDocDump( sys.stdout )
     doc.freeDoc()
if __name__ == "__main__":
     main( sys.argv )

bill billsuper work $ ./jesso.py jesso.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body><p>
       Hello
   </p></body></html>

bill billsuper work $ xmllint --version
xmllint: using libxml version 20616-CVS2257
   compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer XInclude
Iconv MemDebug Unicode Regexps Automata Schemas





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]