[xml] Reliability of htmlElementStatusHere and line number


   I am evaluating the HTML parsing features of the libxml2 library. I
am evaluating the opportunity to embed the library in a C++ open-source
project which I maintain, in order to get rid of my previous parser and
use libxml2 to analyse both HTML and XHTML documents (and possibly
validate them).

   I have created an HTML parser context using htmlNewParserCtxt(), then
read from memory the contents of a web page, using the
htmlCtxtReadMemory() function as follows:

        htmlDocPtr doc (htmlCtxtReadMemory (ctxt,

   document is an object of a class I have developed that holds the web

   I then use the document tree from the 'doc' variable then quickly
iterate the children of the document (similarly to tree1.c example at

   However, I have run into two problems:

1) I cannot get the line number of the element (I have used the node's
line attribute and even the XML_GET_LINE macro)
2) I seem to receive false INVALID results from the
htmlElementStatusHere function which I call on each element. For
instance, I get an 'HTML_INVALID' result for an 'a' element within a 'p'

   I am new to the libxml2 library from a development point of view. I
have tried to read the documentation, the examples and the code as well.
But unfortunately I cannot find a lot of information regarding the
flexible HTML parser (which is the one that worries me more). I hope I
am just missing something stupid.

   Thank you very much for your help.


Gabriele Bartolini: Open source programmer and data architect
Current Location: Prato, Tuscany, Italy
Associazione Italian PostgreSQL Users Group: www.itpug.org
gabriele bartolini gmail com | www.gabrielebartolini.it
"If I had been born ugly, you would never have heard of Pelé", George Best

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]