[xml] htmlParseChunk loop
- From: Bill Moseley <moseley hank org>
- To: xml gnome org
- Subject: [xml] htmlParseChunk loop
- Date: Sat, 29 Sep 2001 16:11:21 -0700
I'm sorry for not being better at debugging.
I'm parsing HTML with the SAX interface, and one thing I need to do is abort
processing when I find <meta name="robots" content="noindex">. When I find
that I set a flag in my user data structure.
It seems that the parser hangs if I use 4096 for my chunk size. 4095
doesn't hang, 4097 either. Bad luck on picking a input buffer size! In
fact, I haven't been able to find any other size that make it hang...
So,
if ( !(res = read_next_chunk( fprop, chars, 4 ))
return 0;
ctxt = htmlCreatePushParserCtxt(
SAXHandler, parse_data, chars, res, fprop->real_path,0);
// now read in 4096 chunks
while ( !parse_data->abort &&
(res = read_next_chunk( f, chars, READ_CHUNK_SIZE )) )
{
htmlParseChunk(ctxt, chars, res, 0);
}
htmlParseChunk( ctxt, chars, 0, 1 );
htmlFreeParserCtxt( ctxt);
But, when I do abort, which is likely on the first chunk, the parser hangs
in a relatively tight loop.
If I change my READ_CHUNK_SIZE from 4096 it works. (Well, then quit using
4096!)
0x40088e7a in htmlParseTryOrFinish (ctxt=0x82070f0, terminate=1) at
HTMLparser.c:4307
4307 if (ctxt->token != 0) {
(gdb) bt
#0 0x40088e7a in htmlParseTryOrFinish (ctxt=0x82070f0, terminate=1) at
HTMLparser.c:4307
#1 0x400896c3 in htmlParseChunk (ctxt=0x82070f0,
chunk=0xbfffe388 "CTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0
Transitional//EN\">\n<html><head><meta name=\"robots\"
content=\"noindex,noarchive\"><title>\nQt Toolkit - desktop/desktop.cpp
example file\n</title><style type=\"text/c"..., size=0,
terminate=1) at HTMLparser.c:4620
BTW -- I've asked before, but is there a recommended way to abort the SAX
parsing?
static void abort_parsing( PARSE_DATA *parse_data, int abort_code )
{
parse_data->abort = abort_code; /* Flag that the we are all done */
parse_data->SAXHandler->startElement = (startElementSAXFunc)NULL;
parse_data->SAXHandler->endElement = (endElementSAXFunc)NULL;
parse_data->SAXHandler->characters = (charactersSAXFunc)NULL;
}
Bill Moseley
mailto:moseley hank org
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]