[xml] Reliability of htmlElementStatusHere and line number
- From: Gabriele Bartolini <gabriele bartolini gmail com>
- To: xml gnome org
- Subject: [xml] Reliability of htmlElementStatusHere and line number
- Date: Thu, 05 Jun 2008 12:31:07 +1000
Hello,
I am evaluating the HTML parsing features of the libxml2 library. I
am evaluating the opportunity to embed the library in a C++ open-source
project which I maintain, in order to get rid of my previous parser and
use libxml2 to analyse both HTML and XHTML documents (and possibly
validate them).
I have created an HTML parser context using htmlNewParserCtxt(), then
read from memory the contents of a web page, using the
htmlCtxtReadMemory() function as follows:
htmlDocPtr doc (htmlCtxtReadMemory (ctxt,
document.getContents().c_str(),
document.getContents().length(),
document.getURI().c_str(),
document.getEncoding().c_str(),
HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING) );
document is an object of a class I have developed that holds the web
page.
I then use the document tree from the 'doc' variable then quickly
iterate the children of the document (similarly to tree1.c example at
http://xmlsoft.org/examples/tree1.c).
However, I have run into two problems:
1) I cannot get the line number of the element (I have used the node's
line attribute and even the XML_GET_LINE macro)
2) I seem to receive false INVALID results from the
htmlElementStatusHere function which I call on each element. For
instance, I get an 'HTML_INVALID' result for an 'a' element within a 'p'
element.
I am new to the libxml2 library from a development point of view. I
have tried to read the documentation, the examples and the code as well.
But unfortunately I cannot find a lot of information regarding the
flexible HTML parser (which is the one that worries me more). I hope I
am just missing something stupid.
Thank you very much for your help.
Cheers,
Gabriele
--
Gabriele Bartolini: Open source programmer and data architect
Current Location: Prato, Tuscany, Italy
Associazione Italian PostgreSQL Users Group: www.itpug.org
gabriele bartolini gmail com | www.gabrielebartolini.it
"If I had been born ugly, you would never have heard of Pelé", George Best
http://www.linkedin.com/in/gbartolini
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]