[xml] parsing UCS4 in chunks fails with 2.7.4/5



Hi,

there seems to be a change in libxml2 2.7.4 that prevents it from parsing a
Python unicode string buffer, which is UCS4-LE encoded on my system. The
first call to xmlCtxtResetPush() works and parses the first chunk as
expected, but subsequent calls to xmlParseChunk() then fail with an error:
"input conversion failed due to input error, bytes 0x22 0x00 0x00 0x00"
(the latter being '"', which was the first character in the second chunk).

So, when passing '<?xml version=' to xmlCtxtResetPush() and '"1.0"?><ro' to
xmlParseChunk(), I get the error above. I only noticed this by accident, as
a few badly written test cases in lxml happened to parse from Unicode
strings when run under Python 3.

Any ideas where this might originate from?

Stefan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]