[xml] Push-parsing Unicode with LibXML2
- From: Eric Seidel <eseidel apple com>
- To: xml gnome org
- Subject: [xml] Push-parsing Unicode with LibXML2
- Date: Mon, 13 Feb 2006 14:07:32 -0800
Greetings.
I'm having difficulties with libxml2's push-parsing api and passing
it data incrementally.
I've adapted the parser4.c example to mimic what the code in my app
is doing, and I'm wondering if one of you can help me spot my error
(or help me determine if there is a bug in libxml2):
I'm reading in data off the network, converting it to utf16, and then
passing it off to libxml2. In the parser4 adapted example, I'm
reading ascii from a local file, expanding it to integers
(effectively utf16) and then passing it to libxml2:
#include <stdio.h>
#include <libxml/parser.h>
#include <libxml/tree.h>
FILE *desc;
static int
readPacket(char *mem, int size) {
int res;
res = fread(mem, 1, size, desc);
return(res);
}
static void
example4Func(const char *filename) {
xmlParserCtxtPtr ctxt;
char chars[1];
xmlDocPtr doc;
int res;
ctxt = xmlCreatePushParserCtxt(0, 0, 0, 0, 0);
ctxt->replaceEntities = 1;
const unsigned BOM = 0xFEFF;
const unsigned char BOMHighByte = *(const unsigned char *)&BOM;
xmlSwitchEncoding(ctxt, BOMHighByte == 0xFF ?
XML_CHAR_ENCODING_UTF16LE : XML_CHAR_ENCODING_UTF16BE);
while ((res = readPacket(chars, 1)) > 0) {
unsigned unicode = chars[0];
xmlParseChunk(ctxt, (const char *)&unicode, sizeof
(unsigned), 0);
}
xmlParseChunk(ctxt, chars, 0, 1);
doc = ctxt->myDoc;
res = ctxt->wellFormed;
xmlFreeParserCtxt(ctxt);
if (res)
fprintf(stderr, "Success!\n");
else
fprintf(stderr, "Failed to parse %s\n", filename);
xmlFreeDoc(doc);
}
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "Incorrect number of args\n");
return(1);
}
LIBXML_TEST_VERSION
desc = fopen(argv[1], "rb");
if (desc != NULL) {
example4Func(argv[1]);
fclose(desc);
} else
fprintf(stderr, "Failed to open %s\n", argv[1]);
xmlCleanupParser();
xmlMemoryDump();
return 0;
}
The above code fails with:
Entity: line 1: parser error : Document is empty
^
on my OS X box (libxml 2.2)
and:
Entity: line 1: parser error : StartTag: invalid element name
<
^
on my linux box (libxml 2.6.2)
Any thoughts?
-eric
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]