please use an attachment, not in the mail body, mailers breaks body content.
<...>
provide test example as attachmnent too, I will plug them in test/HTML
The attached tar.gz includes the contextual patch of HTMLparser.c of libxml2-2.6.24 (now with htmlParseLookupSequence) and the test HTML file "chunk-boundary-cdata.html". The test HTML file triggers the error in libxml2 because it has the closing "</script>" tag exactly on the 4096 boundary. To reproduce the test, the number of chars in the test HTML file and the number of bytes read by testHTML must not be changed(!). The character alignment needs to match exactly to trigger the error. Before the patch, libxml2-2.6.24 will fail the following test with the simple test HTML file: ./testHTML --push --sax --debug chunk-boundary-cdata.html SAX.setDocumentLocator() SAX.startDocument() SAX.startElement(html) SAX.startElement(body) SAX.characters(.............................., 1000) SAX.characters(........................... .., 1000) SAX.characters(.............................., 1000) SAX.characters(........................... .., 1000) SAX.characters(.............................., 74) SAX.startElement(script) SAX.error: Invalid char in CDATA 0x0 SAX.cdata(</, 2) SAX.error: htmlParseEndTag: '</' not found SAX.cdata(cript> <a href="test", 26) SAX.error: Unexpected end tag : a SAX.cdata( , 1) SAX.endElement(script) SAX.endElement(body) SAX.ignorableWhitespace( , 1) SAX.endElement(html) SAX.ignorableWhitespace( , 1) SAX.endDocument() After the patch, the result is correct. Cyrill
Attachment:
libxml2-HTMLparser-cdata-fix.tar.gz
Description: libxml2-HTMLparser-cdata-fix.tar.gz