Re: [xml] wrong line number with XML_PARSE_DTDVALID

On Fri, Sep 16, 2011 at 10:47:16AM +0200, François Delyon wrote:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root[
<!ELEMENT root (a*)>
      <a xml:id="ID0" parent="ID3"/>
      <a xml:id="ID1"/>

This document is well-formed, but the attribute parent (line 9) is
not an existing ID.

xmlReadFile followed by xmlValidateDocument
--> line 9, IDREF attribute parent references an unknown ID "ID3"

xmlReadFile with options XML_PARSE_DTDVALID
--> line 11, IDREF attribute parent references an unknown ID "ID3"
Wrong, line 11 is the end of the document.

The error occurs at the end of the parsing in
xmlValidateDocumentFinal (valid.c).
xmlValidateDocumentFinal raise an error thru  xmlErrValidNode.
In the first case ctxt->finishDtd is 0 and in the second case
ctxt->finishDtd is  XML_CTXT_FINISH_DTD_0 (or XML_CTXT_FINISH_DTD_1).
Thus, while parsing, libxml reports any error in
xmlValidateDocumentFinal  as occurring at the end of the document.
I suspect that the flag ctxt->finishDtd  is not correct at this step.

I suggest  to add:
in xmlValidateDocumentFinal.
This fixes my case, but perhaps may yield unexpected boundary
effects ( variable ctxt->finishDtd is rather cryptic).

    Bonjour Francois,

  Yeah, it's a bit cryptic, I think that comes from one of the mistakes I
made ealier on to include the structure xmlValidCtxt in the _xmlParserCtxt
one, instead of using a pointer, as a result due to ABI compatibility
xmlValidCtxt is stuck, I really can't grow it. A pointer of such a
structure is also passed when doing validity checking, and IIRC I added
finishDtd to try to document where I was in the parsing phase when
validating. The validation handler may try to find out if the parsing
context is actually in effect and where we are. For example if
validating a preparsed tree there is no parsing context and the error
routine can't use it to point out where the error is. When setting it
to 0 in xmlValidateDocumentFinal, the validity error handler will take
this as "we're not validating while parsing, so I should not use the
current input buffer to indicate where the error is located, but I
can use the node pointer provided".
  Well in the case of xmlValidateDocumentFinal, we actually have a
tree, and the current input buffer is not relevant, so doing what you
suggest leads to better error reporting.

  I hope the mystery is solved :-)

I will apply the following patch, it does change one of the regression
test output as the error is improved for it too !

diff --git a/valid.c b/valid.c
index 2cb32f3..5de491d 100644
--- a/valid.c
+++ b/valid.c
@@ -6559,6 +6559,7 @@ xmlValidateCheckRefCallback(xmlListPtr ref_list, xmlValidCtxtPtr ctxt,
 xmlValidateDocumentFinal(xmlValidCtxtPtr ctxt, xmlDocPtr doc) {
     xmlRefTablePtr table;
+    unsigned int save;
     if (ctxt == NULL)
@@ -6568,6 +6569,10 @@ xmlValidateDocumentFinal(xmlValidCtxtPtr ctxt, xmlDocPtr doc) {
+    /* trick to get correct line id report */
+    save = ctxt->finishDtd;
+    ctxt->finishDtd = 0;
      * Check all the NOTATION/NOTATIONS attributes
@@ -6581,6 +6586,8 @@ xmlValidateDocumentFinal(xmlValidCtxtPtr ctxt, xmlDocPtr doc) {
     ctxt->doc = doc;
     ctxt->valid = 1;
     xmlHashScan(table, (xmlHashScanner) xmlValidateCheckRefCallback, ctxt);
+    ctxt->finishDtd = save;
diff --git a/result/valid/xlink.xml.err b/result/valid/xlink.xml.err
index 08c84bd..c0eea7c 100644
--- a/result/valid/xlink.xml.err
+++ b/result/valid/xlink.xml.err
@@ -1,6 +1,4 @@
 ./test/valid/xlink.xml:450: element termdef: validity error : ID dt-arc already defined
        <p><termdef id="dt-arc" term="Arc">An <term>arc</term> is contained within an e
-./test/valid/xlink.xml:530: element termref: validity error : IDREF attribute def references an unknown ID 
+./test/valid/xlink.xml:199: element termref: validity error : IDREF attribute def references an unknown ID 

  I will commit this, thanks a lot for the sugegstion :-)


Daniel Veillard      | libxml Gnome XML XSLT toolkit
daniel veillard com  | Rpmfind RPM search engine | virtualization library

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]