Re: [xml] html parsing
- From: JASON JESSO <jesso1607 rogers com>
- To: xml gnome org
- Subject: Re: [xml] html parsing
- Date: Thu, 2 Dec 2004 08:27:50 -0500 (EST)
Very interesting. On my system I just the DOCTYPE and
no html.
This is what I have on my system:
[jason localhost jason]$ rpm -qa | grep libxml2
libxml2-2.6.6-1mdk
libxml2-python-2.6.6-1mdk
libxml2-utils-2.6.6-1mdk
libxml2-devel-2.6.6-1mdk
[jason localhost jason]$ python
Python 2.3.4 (#2, Aug 19 2004, 15:49:40)
[GCC 3.4.1 (Mandrakelinux (Alpha 3.4.1-3mdk)] on
linux2
Type "help", "copyright", "credits" or "license" for
more information.
--- "William M. Brack" <wbrack mmm com hk> wrote:
John Fleck said:
I also get the same as Danilo:
[jfleck localhost htmlread]$ ./htmlread.py
html.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>
Hello
</p></body></html>
[jfleck localhost htmlread]$ xmllint --version
xmllint: using libxml version 20615-CVS2255
compiled with: DTDValid FTP HTTP HTML C14N
Catalog XPath XPointer
XInclude Iconv Unicode Regexps Automata Schemas
--
John Fleck
http://www.inkstain.net/fleck/
"In a world gone mad, what right do we have to yok
it up?"
- Griffy
I have no idea what the version numbers being used
by Jason Jesso are. I do
know that on my system, using the absolutely latest
CVS, there is no problem:
bill billsuper work $ cat jesso.html
<html>
<body>
Hello
</body>
</html>
bill billsuper work $ cat jesso.py
#!/usr/bin/env python
import libxml2, sys
def main( argv ):
html = open( argv[1], 'r' ).read()
doc = libxml2.htmlReadMemory( html, len(html),
None, None, 32 )
doc.htmlDocDump( sys.stdout )
doc.freeDoc()
if __name__ == "__main__":
main( sys.argv )
bill billsuper work $ ./jesso.py jesso.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>
Hello
</p></body></html>
bill billsuper work $ xmllint --version
xmllint: using libxml version 20616-CVS2257
compiled with: DTDValid FTP HTTP HTML C14N
Catalog XPath XPointer XInclude
Iconv MemDebug Unicode Regexps Automata Schemas
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]