[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[xml] recovering of well balanced chunks



Hallo, 

while checking the code of XML data recovering I realized that recovering
is only possible if well formed documents are parsed. Although libxml2 
provides an iterface to parse well balanced chunks from memory, I found 
that is not possiblem to recover such chunk as it is possible with 
documents. 

I agree, that this is definitly not a have to for libxml2. Since only 
a tiny patch would make such a feature available I wrote this extension
and attached it to this mail.

The patch I provide is based on libxml2 2.4.23. It should not affect any 
existing implementation. 

I would like to provide a more general XML recovering available 
with one of the next releases of XML::LibXML(1), it would be nice if this 
patch could make into the distributed code. 

Thanks in advance.
Christian Glahn

(1) XML::LibXML is a perl interface to libxml2
diff -r -c libxml2-2.4.23/include/libxml/parser.h libxml2-2.4.23_a/include/libxml/parser.h
*** libxml2-2.4.23/include/libxml/parser.h	Wed Mar 13 04:35:14 2002
--- libxml2-2.4.23_a/include/libxml/parser.h	Tue Jul 16 19:47:38 2002
***************
*** 767,772 ****
--- 767,779 ----
  					 int depth,
  					 const xmlChar *string,
  					 xmlNodePtr *lst);
+ int		xmlParseBalancedChunkMemoryRecover(xmlDocPtr doc,
+ 					 xmlSAXHandlerPtr sax,
+ 					 void *user_data,
+ 					 int depth,
+ 					 const xmlChar *string,
+ 					 xmlNodePtr *lst,
+ 					 int recover);
  int		xmlParseExternalEntity	(xmlDocPtr doc,
  					 xmlSAXHandlerPtr sax,
  					 void *user_data,
diff -r -c libxml2-2.4.23/parser.c libxml2-2.4.23_a/parser.c
*** libxml2-2.4.23/parser.c	Sat Jul  6 21:58:35 2002
--- libxml2-2.4.23_a/parser.c	Tue Jul 16 19:44:58 2002
***************
*** 9712,9717 ****
--- 9712,9748 ----
  int
  xmlParseBalancedChunkMemory(xmlDocPtr doc, xmlSAXHandlerPtr sax,
       void *user_data, int depth, const xmlChar *string, xmlNodePtr *lst) {
+     return xmlParseBalancedChunkMemoryRecover( doc, sax, user_data,
+                                                depth, string, lst, 0 );
+ }
+ 
+ /**
+  * xmlParseBalancedChunkMemoryRecover:
+  * @doc:  the document the chunk pertains to
+  * @sax:  the SAX handler bloc (possibly NULL)
+  * @user_data:  The user data returned on SAX callbacks (possibly NULL)
+  * @depth:  Used for loop detection, use 0
+  * @string:  the input string in UTF8 or ISO-Latin (zero terminated)
+  * @lst:  the return value for the set of parsed nodes
+  * @recover: return nodes even if the data is broken (use 0)
+  *
+  * Parse a well-balanced chunk of an XML document
+  * called by the parser
+  * The allowed sequence for the Well Balanced Chunk is the one defined by
+  * the content production in the XML grammar:
+  *
+  * [43] content ::= (element | CharData | Reference | CDSect | PI | Comment)*
+  *
+  * Returns 0 if the chunk is well balanced, -1 in case of args problem and
+  *    the parser error code otherwise
+  *    
+  * In case recover is set to 1, the nodelist will not be empty even if
+  * the parsed chunk is not well balanced. 
+  */
+ int
+ xmlParseBalancedChunkMemoryRecover(xmlDocPtr doc, xmlSAXHandlerPtr sax,
+      void *user_data, int depth, const xmlChar *string, xmlNodePtr *lst, 
+      int recover) {
      xmlParserCtxtPtr ctxt;
      xmlDocPtr newDoc;
      xmlSAXHandlerPtr oldsax = NULL;
***************
9827
  	else
  	    ret = ctxt->errNo;
      } else {
! 	if (lst != NULL) {
! 	    xmlNodePtr cur;
  
! 	    /*
! 	     * Return the newly created nodeset after unlinking it from
! 	     * they pseudo parent.
! 	     */
! 	    cur = newDoc->children->children;
! 	    *lst = cur;
! 	    while (cur != NULL) {
! 		cur->parent = NULL;
! 		cur = cur->next;
! 	    }
!             newDoc->children->children = NULL;
  	}
! 	ret = 0;
      }
      if (sax != NULL) 
  	ctxt->sax = oldsax;
--- 9837,9859 ----
  	else
  	    ret = ctxt->errNo;
      } else {
! 	ret = 0;
!     }
!    
!     if (lst != NULL && (ret == 0 || recover == 1) {
! 	xmlNodePtr cur;
  
! 	/*
! 	 * Return the newly created nodeset after unlinking it from
! 	 * they pseudo parent.
! 	 */
! 	cur = newDoc->children->children;
! 	*lst = cur;
! 	while (cur != NULL) {
! 	    cur->parent = NULL;
! 	    cur = cur->next;
  	}
!         newDoc->children->children = NULL;
      }
      if (sax != NULL) 
  	ctxt->sax = oldsax;


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]