[xml] Push-parsing Unicode with LibXML2


I'm having difficulties with libxml2's push-parsing api and passing it data incrementally.

I've adapted the parser4.c example to mimic what the code in my app is doing, and I'm wondering if one of you can help me spot my error (or help me determine if there is a bug in libxml2):

I'm reading in data off the network, converting it to utf16, and then passing it off to libxml2. In the parser4 adapted example, I'm reading ascii from a local file, expanding it to integers (effectively utf16) and then passing it to libxml2:

#include <stdio.h>
#include <libxml/parser.h>
#include <libxml/tree.h>

FILE *desc;

static int
readPacket(char *mem, int size) {
    int res;

    res = fread(mem, 1, size, desc);

static void
example4Func(const char *filename) {
    xmlParserCtxtPtr ctxt;
    char chars[1];
    xmlDocPtr doc;
    int res;

    ctxt = xmlCreatePushParserCtxt(0, 0, 0, 0, 0);
    ctxt->replaceEntities = 1;
    const unsigned BOM = 0xFEFF;
    const unsigned char BOMHighByte = *(const unsigned char *)&BOM;
xmlSwitchEncoding(ctxt, BOMHighByte == 0xFF ? XML_CHAR_ENCODING_UTF16LE : XML_CHAR_ENCODING_UTF16BE);

    while ((res = readPacket(chars, 1)) > 0) {
        unsigned unicode = chars[0];
xmlParseChunk(ctxt, (const char *)&unicode, sizeof (unsigned), 0);
    xmlParseChunk(ctxt, chars, 0, 1);

    doc = ctxt->myDoc;
    res = ctxt->wellFormed;

    if (res)
        fprintf(stderr, "Success!\n");
        fprintf(stderr, "Failed to parse %s\n", filename);


int main(int argc, char **argv) {
    if (argc != 2) {
        fprintf(stderr, "Incorrect number of args\n");

    desc = fopen(argv[1], "rb");
    if (desc != NULL) {
    } else
        fprintf(stderr, "Failed to open %s\n", argv[1]);

    return 0;

The above code fails with:

Entity: line 1: parser error : Document is empty


on my OS X box (libxml 2.2)


Entity: line 1: parser error : StartTag: invalid element name

on my linux box (libxml 2.6.2)

Any thoughts?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]