Re: [xml] parsing UCS4 in chunks fails with 2.7.4/5
- From: Daniel Veillard <veillard redhat com>
- To: Stefan Behnel <stefan_ml behnel de>
- Cc: xml gnome org
- Subject: Re: [xml] parsing UCS4 in chunks fails with 2.7.4/5
- Date: Thu, 1 Oct 2009 00:06:32 +0200
On Mon, Sep 28, 2009 at 08:43:08PM +0200, Stefan Behnel wrote:
Hi,
there seems to be a change in libxml2 2.7.4 that prevents it from parsing a
Python unicode string buffer, which is UCS4-LE encoded on my system. The
first call to xmlCtxtResetPush() works and parses the first chunk as
expected, but subsequent calls to xmlParseChunk() then fail with an error:
"input conversion failed due to input error, bytes 0x22 0x00 0x00 0x00"
(the latter being '"', which was the first character in the second chunk).
So, when passing '<?xml version=' to xmlCtxtResetPush() and '"1.0"?><ro' to
xmlParseChunk(), I get the error above. I only noticed this by accident, as
a few badly written test cases in lxml happened to parse from Unicode
strings when run under Python 3.
Any ideas where this might originate from?
https://bugzilla.gnome.org/show_bug.cgi?id=566012
and git recent commit "Fix a parsing problem with little data at startup"
if you can give me reproducer preferably in C (or with the default python
bindings) I can check. It's about guessing the encoding at the beginning
of the document and before the encoding is being specified in the
XMLDecl
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]