Re: [xml] debugging outputbuffer code in libxml



[clipped]
Either you or me are misunderstanding the encoder code. Can you send me
your test program, along with the XML it was parsing, or tell me what to
do to provoke libxml to write beyond the end of the buffer it
allocated?

Ciao,
Igor

Ok, here is the latest/smallest patch that I put together. I wasn't sure
what was better just pasting the new piece of code or doing a "diff -u".
The fix is a rather simple one and fixes most of the problems that I
stumbled onto. I am still tracing as to why we got that condition in
place. One of the major reasons is that the new UTF8toUTF8 encoder
function got added which triggered the condition because it returns -1
and doesn't reset the written value. What I did was just put a safety
check around the code and it runs great now.

The function that I fixed is  xmlCharEncOutFunc() in encoding.c file
around line 2193. 

Current code in CVS:
2195     written = out->size - out->use;
2196 
2197     /*
2198      * First specific handling of in = NULL, i.e. the
initialization call
2199      */
2200     if (in == NULL) {
2201         toconv = 0;
2202         if (handler->output != NULL) {
2203             ret = handler->output(&out->content[out->use], &written,
2204                                   NULL, &toconv);
2205             out->use += written;
2206             out->content[out->use] = 0;
2207         }

Here is the new code:

    written = out->size - out->use;

        if (written > 0) 
                written--; /* count '\0' */

    /*
     * First specific handling of in = NULL, i.e. the initialization call
     */
    if (in == NULL) {
        toconv = 0;
        if (handler->output != NULL) {
            ret = handler->output(&out->content[out->use], &written,
                                  NULL, &toconv);

                if (ret >=0 ) { /* check return value */
                    out->use += written;
                    out->content[out->use] = 0;
                }
        }

P.S. I am attaching my test file source file and I am pretty sure that
you can use any XML file bigger than 4kb to test it on. Actually in some
cases any XML file will do.

G.F. aka Gena01
http://www.gena01.com
/*
 * Test performance of loading/parsing an XML file.
 *
 */
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <libxml/parser.h>
#include <libxml/HTMLtree.h>

int xsltOutputBufferWrite(void * context, const char * buffer, int len){
    fprintf(stderr, "wrote bucket (len %i bytes)\n", len);
    return len;
}


int main(int argc, char *argv[])
{
        xmlDocPtr doc;
        xmlOutputBufferPtr output;
        xmlCharEncodingHandlerPtr encoder = NULL;


        printf("xml_ob_test v0.1 - testing/debugging outputbuffer callback interface in libxml.\n");

        if (argc != 2) {
                printf("Usage:\n\txml_ob_test <filename.xml>\nPlease use the file that's bigger than 4kb.\n");
                return 1;
        }

        xmlInitMemory();
//      xmlLineNumbersDefault(XSL_LINENUMBERSDEFAULT);
        xmlLineNumbersDefault(1);

  /*
     * disable CDATA from being built in the document tree
     */
    xmlDefaultSAXHandlerInit();
    xmlDefaultSAXHandler.cdataBlock = NULL;


//      xmlLoadExtDtdDefaultValue=1;
        xmlLoadExtDtdDefaultValue=0; /* don't load DTD for performance */

        xmlSubstituteEntitiesDefault(1);

        printf("parsing nstate.xml file\n");

        doc = xmlParseFile(argv[1]);
        
        if (doc != NULL) {
                printf("parsed nstate.xml file initializing the output buffer\n");

                encoder = xmlFindCharEncodingHandler("UTF-8");
                if (encoder) {
                                        printf("encoder found -> %s\n", encoder->name);

                                        if ((encoder->input == NULL) || (encoder->output == NULL)) {
                                                printf("resetting encoder to NULL!\n");

                                                encoder=NULL;
                                        }

                                        output = xmlAllocOutputBuffer(encoder);
                                        if (output) {
                                                printf("created output-buffer\n");
                                                output->writecallback = xsltOutputBufferWrite;
                                                
                                                htmlDocContentDumpFormatOutput(output, doc, "UTF-8", 0);
                                                xmlOutputBufferFlush(output);

                                                xmlOutputBufferClose(output);
        
                                        } else  /* no output */
                                                printf("could not create outputbuffer\n");
                                        
                                } else 
                                        printf("cannot encode %s\n","UTF-8");

                xmlFreeDoc(doc);
        } else 
                printf("Error parsing nstate.xml file.\n");

        printf("all done. doing cleanup.\n");
        xmlCleanupParser();
        xmlMemoryDump();

        return 0;
}


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]