The HTML parser does not keep track of line numbers.  If you parse an
HTML file, search for any non-text node, then get the line number from
the node, it will always return 0.

I've created a file to reproduce the problem, along with a patch to
libxml2 to fix it:


Also, I've filed a bug in bugzilla here:


Here is the patch (again):

diff --git a/HTMLparser.c b/HTMLparser.c
index 24b0fc0..de497eb 100644
--- a/HTMLparser.c
+++ b/HTMLparser.c
@@ -5652,6 +5652,7 @@ htmlSAXParseFile(const char *filename, const char *encodin

     ctxt = htmlCreateFileParserCtxt(filename, encoding);
+    ctxt->linenumbers = 1;
     if (ctxt == NULL) return(NULL);
     if (sax != NULL) {
        oldsax = ctxt->sax;

Aaron Patterson

