Re: [xml] html parsing



Very interesting.  On my system I just the DOCTYPE and
no html. 

This is what I have on my system:

[jason localhost jason]$ rpm -qa | grep libxml2
libxml2-2.6.6-1mdk
libxml2-python-2.6.6-1mdk
libxml2-utils-2.6.6-1mdk
libxml2-devel-2.6.6-1mdk
[jason localhost jason]$ python
Python 2.3.4 (#2, Aug 19 2004, 15:49:40)
[GCC 3.4.1 (Mandrakelinux (Alpha 3.4.1-3mdk)] on
linux2
Type "help", "copyright", "credits" or "license" for
more information.




 --- "William M. Brack" <wbrack mmm com hk> wrote: 
John Fleck said:
I also get the same as Danilo:

[jfleck localhost htmlread]$ ./htmlread.py
html.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body><p>
      Hello
  </p></body></html>
[jfleck localhost htmlread]$ xmllint --version
xmllint: using libxml version 20615-CVS2255
   compiled with: DTDValid FTP HTTP HTML C14N
Catalog XPath XPointer
XInclude Iconv Unicode Regexps Automata Schemas



--
John Fleck
http://www.inkstain.net/fleck/

"In a world gone mad, what right do we have to yok
it up?"
 - Griffy

I have no idea what the version numbers being used
by Jason Jesso are.  I do
know that on my system, using the absolutely latest
CVS, there is no problem:

bill billsuper work $ cat jesso.html
<html>
   <body>
       Hello
   </body>
</html>

bill billsuper work $ cat jesso.py
#!/usr/bin/env python
import libxml2, sys
def main( argv ):
     html = open( argv[1], 'r' ).read()
     doc = libxml2.htmlReadMemory( html, len(html),
None, None, 32 )
     doc.htmlDocDump( sys.stdout )
     doc.freeDoc()
if __name__ == "__main__":
     main( sys.argv )

bill billsuper work $ ./jesso.py jesso.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body><p>
       Hello
   </p></body></html>

bill billsuper work $ xmllint --version
xmllint: using libxml version 20616-CVS2257
   compiled with: DTDValid FTP HTTP HTML C14N
Catalog XPath XPointer XInclude
Iconv MemDebug Unicode Regexps Automata Schemas


_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml
 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]