Re: [xml] libxml2 issue with corrupted data

Just a theory but string functions tend to depend on a terminating NULL.  If there is non-ascii data and there is no NULL (you did not specifically say if there was or was not) it could be spending all that time reading a lot of memory until a NULL comes up by chance someplace else.  And then who knows what happens!  I always take my XML and check it for non-ascii, and if there is non-ascii I reject it out-of-hand.  Because I got weird problems when some people sent me garbage XML.  E

On 8/2/2012 7:43 AM, Gidon Sidis wrote:



Our product uses libxml2.dll release 2.7.8 and have encountered the following issue:

We receive an XML with a tag “<Tag>” that its value contains non UTF-8 string format. The string format is actually messed up and has no meaning in any format I know. The string hex value is :0xe941e6.

We pass the entire XML to the parser using the method:


XMLPUBFUN int XMLCALL xmlParseChunk (xmlParserCtxtPtr ctxt,

                                    const char *chunk,

                                    int size,

                                    int terminate);


where the ctxt used is a member of a wrapper class and is initiated with an xmlSAXHandler member that has functions set in his startElementSAXFunc startElement &

endElementSAXFunc endElement


when the parser reaches to the <Tag> section than there is a call to startElement however there is no call for the endElement for at least 200 ms. What happens after this 200ms is that a new XML message arrives and our application calls xmlFreeParserCtxt for the ctxt member and call xmlParseChunk again thus losing the info the previous XML (not a good behavior on our side).


However if there is 1000ms delay between the messages than the parser succeeds to parse the XML and there is call to endElement.


Also if the string value in <Tag> is not messed up than the parse calls startElement & endElement within the same millisecond.


Can you please tell me why it takes so long for the parser to handle corrupted data? The parser doesn’t fail to parse but rather takes his time… Is that an expected behavior?






Description: Description: Description: cid:image001.png@01CD4349.3BFD1DD0

Description: Description: Description: Description: Description: image003

Noa Gross Scopia Desktop – SW Engineer |

Video Business Unit 972.3.767.9498 | gidons avaya com

Delivering the Visual Experience™




_______________________________________________ xml mailing list, project page xml gnome org

Eric S. Eberhard
PO Box 3661
Camp Verde, AZ  86322

928-567-3727  work                      928-301-7537  cell             (our work)     (fun pictures)

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]