Re: [xml] html parsing
- From: "William M. Brack" <wbrack mmm com hk>
- To: xml gnome org
- Subject: Re: [xml] html parsing
- Date: Thu, 2 Dec 2004 11:30:10 +0800 (HKT)
John Fleck said:
I also get the same as Danilo:
[jfleck localhost htmlread]$ ./htmlread.py html.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>
Hello
</p></body></html>
[jfleck localhost htmlread]$ xmllint --version
xmllint: using libxml version 20615-CVS2255
compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer
XInclude Iconv Unicode Regexps Automata Schemas
--
John Fleck
http://www.inkstain.net/fleck/
"In a world gone mad, what right do we have to yok it up?"
- Griffy
I have no idea what the version numbers being used by Jason Jesso are. I do
know that on my system, using the absolutely latest CVS, there is no problem:
bill billsuper work $ cat jesso.html
<html>
<body>
Hello
</body>
</html>
bill billsuper work $ cat jesso.py
#!/usr/bin/env python
import libxml2, sys
def main( argv ):
html = open( argv[1], 'r' ).read()
doc = libxml2.htmlReadMemory( html, len(html), None, None, 32 )
doc.htmlDocDump( sys.stdout )
doc.freeDoc()
if __name__ == "__main__":
main( sys.argv )
bill billsuper work $ ./jesso.py jesso.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>
Hello
</p></body></html>
bill billsuper work $ xmllint --version
xmllint: using libxml version 20616-CVS2257
compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer XInclude
Iconv MemDebug Unicode Regexps Automata Schemas
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]