[xml] SAX HTML still stuck.



Sorry for the boring and non-sexy questions, but I need help:

The parser is hanging when I try to abort processing.

I'd like to abort SAX parsing mid-document (in this case I'm aborting
right after </title>.  I set an abort flag in my user data, 
then on the next chunk read and bail out.  

Here's the exciting details:

I'm using the Apache core.html documentation page for testing.  
(Note, if I grab a copy from apache.org it doesn't hang, 
so something in the doc seems to confuse it.)

lwp-download http://hank.org/modules/core.html
Saving to 'core.html'...
119 KB received

gdb ./testlibxml2                                                     
GNU gdb 4.18
(gdb) run core.html
Starting program: /data/_g/lii/swish-e/src/./testlibxml2 core.html

*hangs*

Program received signal SIGINT, Interrupt.
0x4006beed in htmlParseTryOrFinish (ctxt=0x804ac60, terminate=1) at HTMLparser.c:4317
4317                    if ((avail == 1) && (terminate)) {
(gdb) bt
#0  0x4006beed in htmlParseTryOrFinish (ctxt=0x804ac60, terminate=1) at HTMLparser.c:4317
#1  0x4006c6d3 in htmlParseChunk (ctxt=0x804ac60, 
    chunk=0xbfffe7fc "CTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\n<HTML>\n<HEAD>\n<TITLE>Apache 
Core Features</TI 
    terminate=1) at HTMLparser.c:4620
#2  0x80487f0 in main (argc=2, argv=0xbffff8b4) at testxmllib2.c:35
(gdb) q



cat testxmllib2.c

#include <stdlib.h>
#include <string.h>
#include <libxml/HTMLparser.h>

static void end_hndl(int *abort, const char *el);

int main(int argc, char **argv) {
    htmlSAXHandler      SAXHandlerStruct;
    htmlSAXHandlerPtr   SAXHandler = &SAXHandlerStruct;
    int                 abort = 0;
    char                buf[4096];
    htmlParserCtxtPtr   ctxt;
    int                 res;
        FILE *f;

    memset( SAXHandler, 0, sizeof( htmlSAXHandler ) );
    SAXHandler->endElement = (endElementSAXFunc)&end_hndl;


    if ( !(f = fopen( argv[1], "r")))
        {
            printf("Failed to open '%s'\n", argv[1]);
            return -1;
        }
            
    if ( !(res = fread(buf, 1, 4, f)))
        return -1;
        
    ctxt = htmlCreatePushParserCtxt(
        SAXHandler, &abort, buf, res, argv[1], 0);

    while ( !abort && (res = fread(buf, 1, 2048, f)) > 0)
        htmlParseChunk(ctxt, buf, res, 0);

    htmlParseChunk(ctxt, buf, 0, 1);
    htmlFreeParserCtxt(ctxt);

    printf("done!\n");

    return 0;
}

static void end_hndl(int *abort, const char *el)
{
    if ( strcmp( el, "title") == 0 )
        *abort = 1;
}        
    
    
gcc -o testlibxml2 -g -O2 -Wall -pedantic testxmllib2.c -lxml2

libxml2 2.4.5

gcc -v
Reading specs from /usr/local/lib/gcc-lib/i686-pc-linux-gnu/2.95.3/specs
gcc version 2.95.3 20010315 (release)


Bill Moseley
mailto:moseley hank org




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]