Re: [xml] Crash while SAX parsing HTML



Hi Daniel,
    thanks for the response. I did some additional debugging and I verified that the crash doesn't occur on the main thread, but is 100% reproducible on secondary thread. This explains why you cannot reproduce it with xmllint.

Here's the plain c code:

---- XMLParserBug.c ----

#include <libXML/HTMLparser.h>
#include <libXML/parser.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <pthread.h>
#include <stdio.h>

xmlChar buffer[1024 * 128];

void* _thread(void *threadid)
{
printf("thread started\n");
fflush(stdout);
xmlSAXHandler _SAXParserFunctionHandlers;
memset( &_SAXParserFunctionHandlers, 0, sizeof(xmlSAXHandler) );

htmlDocPtr saxDoc = htmlSAXParseDoc(buffer, 
"utf-8", 
(htmlSAXHandlerPtr)&_SAXParserFunctionHandlers, 
NULL);
if (saxDoc)
xmlFreeDoc(saxDoc);
return NULL;
}

int main (int argc, const char * argv[]) 
{
memset( &buffer, 0, sizeof(buffer) );
int fd = open(argv[1], O_RDONLY);
if ( read(fd, &buffer, sizeof(buffer)) <= 0 ) { 
printf("Failed to read file\n");
return 1;
}
pthread_t thread;
if (pthread_create(&thread, NULL, _thread, NULL)) {
printf("pthread_create() failed\n");
return 1;
}
sleep(15);
return 0;
}

---- ---- ---- ---- ---- ---- ---- 

Steps to reproduce (on Mac OS X):
1. gcc XMLParserBug.c  -I/usr/include/libxml2 -I/usr/include/libxml2/libxml /usr/lib/libxml2.dylib -o parserbug
2. curl http://www.dlc.fi/~hurmari/index96.html > bugpage.html
3. ./parserbug bugpage.html 
thread started
Bus error


Any idea what could possibly go wrong here? Should I file a bug?

Thanks!
Giovanni


 I can't debug your code, but
 xmllint --sax --html http://www.dlc.fi/~hurmari/index96.html
seems to have no problem with the document, so I guess the problem
is on your side.

Daniel

--
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]