[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[xml] bug in pushparser implementation



hallo daniel,
hallo all,

after being active this weekend, i found a 'funny' bug in the push parser
implementation. i don't want to mention that the error messages are not 
too usefull, but the pushparser is not able detect some trailing junk at the
end of a document. more detailed the push parser will not report an error,
if there is only a single character after the root element was closed.

the following lines will show the problem with maximum simplicity:

  xmlParseChunk(ctxt, "<A/>", 4, 0);
  xmlParseChunk(ctxt, "X", 1, 0);
  xmlParseChunk(ctxt, "", 0, 1);/* finish the parse */

  if ( ctxt->errNo == 0 ) {
      printf( "ouch!\n" );
  }

of course '<A/>X' is *not* a well formed XML document and so the document
returned by the parser is only '<A/>'. while this is ok if one likes to 
repair a document, not reporting an error at all, is definitly wrong. 

because of this i would like to provide a tiny patch, that fixes this 
problem. i am not shure if this is the correct way, but at least it fixes the
misbehaviour. 

christian glahn

*** push_parser_patch.diff
*** this patch fixes a bug in the push parser implementation:
*** if the push parser recieves a single character in the epilog
*** at the end of the document, it will not report an error. this happens
*** because the parser will not access any input in the misc section that
*** is shorter than 2 characters. 
*** 
*** i don't think this patch is most beautiful, but it handles this case
*** correctly.
*** 
*** start diff 
*** parser.c	Sun Oct 13 13:15:44 2002
--- parser.c.patched	Sun Oct 13 13:15:30 2002
***************
*** 8894,8899 ****
--- 8894,8905 ----
  	/*
  	 * Check for termination
  	 */
+ 	    int avail = 0;
+ 	    if (ctxt->input->buf == NULL)
+                 avail = ctxt->input->length - (ctxt->input->cur - ctxt->input->base);
+             else
+                 avail = ctxt->input->buf->buffer->use - (ctxt->input->cur - ctxt->input->base);
+ 			    
  	if ((ctxt->instate != XML_PARSER_EOF) &&
  	    (ctxt->instate != XML_PARSER_EPILOG)) {
  	    ctxt->errNo = XML_ERR_DOCUMENT_END;
***************
*** 8903,8908 ****
--- 8909,8923 ----
  	    ctxt->wellFormed = 0;
  	    ctxt->disableSAX = 1;
  	} 
+ 	if ( ctxt->instate == XML_PARSER_EPILOG && avail > 0 ) {
+ 	    ctxt->errNo = XML_ERR_DOCUMENT_END;
+ 	    if ((ctxt->sax != NULL) && (ctxt->sax->error != NULL))
+ 		ctxt->sax->error(ctxt->userData,
+ 		    "Extra content at the end of the document\n");
+ 	    ctxt->wellFormed = 0;
+ 	    ctxt->disableSAX = 1;
+ 
+ 	}
  	if (ctxt->instate != XML_PARSER_EOF) {
  	    if ((ctxt->sax) && (ctxt->sax->endDocument != NULL))
  		ctxt->sax->endDocument(ctxt->userData);


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]