[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
[xml] recovering of well balanced chunks
- From: Christian Glahn <christian glahn uibk ac at>
- To: xml gnome org
- Subject: [xml] recovering of well balanced chunks
- Date: Tue, 16 Jul 2002 20:16:33 +0200
Hallo,
while checking the code of XML data recovering I realized that recovering
is only possible if well formed documents are parsed. Although libxml2
provides an iterface to parse well balanced chunks from memory, I found
that is not possiblem to recover such chunk as it is possible with
documents.
I agree, that this is definitly not a have to for libxml2. Since only
a tiny patch would make such a feature available I wrote this extension
and attached it to this mail.
The patch I provide is based on libxml2 2.4.23. It should not affect any
existing implementation.
I would like to provide a more general XML recovering available
with one of the next releases of XML::LibXML(1), it would be nice if this
patch could make into the distributed code.
Thanks in advance.
Christian Glahn
(1) XML::LibXML is a perl interface to libxml2
diff -r -c libxml2-2.4.23/include/libxml/parser.h libxml2-2.4.23_a/include/libxml/parser.h
*** libxml2-2.4.23/include/libxml/parser.h Wed Mar 13 04:35:14 2002
--- libxml2-2.4.23_a/include/libxml/parser.h Tue Jul 16 19:47:38 2002
***************
*** 767,772 ****
--- 767,779 ----
int depth,
const xmlChar *string,
xmlNodePtr *lst);
+ int xmlParseBalancedChunkMemoryRecover(xmlDocPtr doc,
+ xmlSAXHandlerPtr sax,
+ void *user_data,
+ int depth,
+ const xmlChar *string,
+ xmlNodePtr *lst,
+ int recover);
int xmlParseExternalEntity (xmlDocPtr doc,
xmlSAXHandlerPtr sax,
void *user_data,
diff -r -c libxml2-2.4.23/parser.c libxml2-2.4.23_a/parser.c
*** libxml2-2.4.23/parser.c Sat Jul 6 21:58:35 2002
--- libxml2-2.4.23_a/parser.c Tue Jul 16 19:44:58 2002
***************
*** 9712,9717 ****
--- 9712,9748 ----
int
xmlParseBalancedChunkMemory(xmlDocPtr doc, xmlSAXHandlerPtr sax,
void *user_data, int depth, const xmlChar *string, xmlNodePtr *lst) {
+ return xmlParseBalancedChunkMemoryRecover( doc, sax, user_data,
+ depth, string, lst, 0 );
+ }
+
+ /**
+ * xmlParseBalancedChunkMemoryRecover:
+ * @doc: the document the chunk pertains to
+ * @sax: the SAX handler bloc (possibly NULL)
+ * @user_data: The user data returned on SAX callbacks (possibly NULL)
+ * @depth: Used for loop detection, use 0
+ * @string: the input string in UTF8 or ISO-Latin (zero terminated)
+ * @lst: the return value for the set of parsed nodes
+ * @recover: return nodes even if the data is broken (use 0)
+ *
+ * Parse a well-balanced chunk of an XML document
+ * called by the parser
+ * The allowed sequence for the Well Balanced Chunk is the one defined by
+ * the content production in the XML grammar:
+ *
+ * [43] content ::= (element | CharData | Reference | CDSect | PI | Comment)*
+ *
+ * Returns 0 if the chunk is well balanced, -1 in case of args problem and
+ * the parser error code otherwise
+ *
+ * In case recover is set to 1, the nodelist will not be empty even if
+ * the parsed chunk is not well balanced.
+ */
+ int
+ xmlParseBalancedChunkMemoryRecover(xmlDocPtr doc, xmlSAXHandlerPtr sax,
+ void *user_data, int depth, const xmlChar *string, xmlNodePtr *lst,
+ int recover) {
xmlParserCtxtPtr ctxt;
xmlDocPtr newDoc;
xmlSAXHandlerPtr oldsax = NULL;
***************
9827
else
ret = ctxt->errNo;
} else {
! if (lst != NULL) {
! xmlNodePtr cur;
! /*
! * Return the newly created nodeset after unlinking it from
! * they pseudo parent.
! */
! cur = newDoc->children->children;
! *lst = cur;
! while (cur != NULL) {
! cur->parent = NULL;
! cur = cur->next;
! }
! newDoc->children->children = NULL;
}
! ret = 0;
}
if (sax != NULL)
ctxt->sax = oldsax;
--- 9837,9859 ----
else
ret = ctxt->errNo;
} else {
! ret = 0;
! }
!
! if (lst != NULL && (ret == 0 || recover == 1) {
! xmlNodePtr cur;
! /*
! * Return the newly created nodeset after unlinking it from
! * they pseudo parent.
! */
! cur = newDoc->children->children;
! *lst = cur;
! while (cur != NULL) {
! cur->parent = NULL;
! cur = cur->next;
}
! newDoc->children->children = NULL;
}
if (sax != NULL)
ctxt->sax = oldsax;
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]