Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8
- From: Nick Wellnhofer <wellnhofer aevum de>
- To: Tomi Belan <tomi belan gmail com>
- Cc: xml gnome org
- Subject: Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8
- Date: Wed, 23 Jan 2019 20:22:40 +0100
On 23/01/2019 16:14, Tomi Belan wrote:
I don't know too much
about Python's C API, but [2] [3] suggests lxml is using a deprecated macro
and giving libxml2 a multibyte buffer even though the input would fit into
pure ASCII. This explains why it behaved differently than xmllint.
Right, if Python passes ASCII codes as, say, 16-bit integers, this will be
detected as UTF-16 by libxml2 and encoding conversion will happen behind the
scenes. I'm not sure what would happen with an encoding that isn't Unicode
compatible. Maybe there's a bug lurking in lxml.
It would be good to add some tests to decrease the likelihood that
this issue or something similar happens again.
Yes, that would be nice. But it was only a short-lived regression that I
personally don't want to spend more time on. A UTF-16 test case derived from
either your or the Chromium bug report would probably make most sense.
Nick
[
Date Prev][
Date Next] [
Thread Prev][Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]