[xml] libxml2 fails to parse UCS-4 memory input
- From: Stefan Behnel <behnel_ml gkec informatik tu-darmstadt de>
- To: xml gnome org
- Subject: [xml] libxml2 fails to parse UCS-4 memory input
- Date: Mon, 08 May 2006 13:53:19 +0200
Hi,
I have a problem parsing UCS-4LE encoded text with libxml2 2.6.24. My iconv
supports that, I checked. However, when I do this:
------------------------------------
pctxt = xmlNewParserCtxt();
/* you can also statically use "UCS4" here, no change */
encoding = xmlGetCharEncodingName(xmlDetectCharEncoding(text,buffer_len));
result = xmlCtxtReadMemory(pctxt, text, buffer_len, filename, encoding, options);
------------------------------------
I get a fatal parser error stating "Start tag expected, '<' not found". I
checked that the input really is UCS-4. libxml2 tells me it's UCS-4, iconv
perfectly converts it to whatever I like and "wc -c" tells me that it
correctly uses four bytes per character. I'm pretty convinced by now that the
problem is not on my side of the screen.
I tried to track down the problem in the libxml2 source, but I'm having a
pretty hard time figuring out which of the three different stages where
encoding could take place (parser, input, buffer) would make a difference here.
So, I don't know, has anyone ever used this part of the libxml2 code and
verified that it worked?
One of the problems I found was that xmlFindCharEncodingHandler passes the
"ISO-..." names of the UCS-4 encoding to iconv and iconv doesn't know those,
but from what I read on, libxml2 then checks the alias names, which would
normally yield the name "UCS-4" or "UCS4" which iconv recognises. So that
takes a bit longer but should still work. And as I said, passing straight
"UCS4" as encoding doesn't work either...
Any hints on this one?
Thanks,
Stefan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]