Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8
- From: Nick Wellnhofer <wellnhofer aevum de>
- To: Tomi Belan <tomi belan gmail com>, xml gnome org
- Subject: Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8
- Date: Tue, 22 Jan 2019 17:11:07 +0100
On 22/01/2019 15:43, Tomi Belan via xml wrote:
After a lot of debugging, I determined the problem is in libxml2 and not the
other libraries in my stack, and that it only seems to happen on version
2.9.8. But I don't see any related changes in news.html for 2.9.9, nor in the
diff between them, so I am still worried: I don't know if the bug is really
fixed, or just dormant. I hope you can find the root cause, and maybe add a
regression test if you do.
I also don't see any directly related changes in either 2.9.8 or 2.9.9.
This will download
the manylinux binary build of lxml 4.2.5, which is statically linked to
libxml2 2.9.8.
Are you sure that a pristine 2.9.8 build was used? Maybe there are additional
patches added by a distro?
I couldn't shorten the file very much, because if I delete even a single
character, the bug stops triggering. (Could it be some buffer boundary issue?)
Yes, a buffer boundary issue seems likely.
I also built my own lxml 4.2.5 with libxml2 2.9.9 and it was not affected. So
I believe this is a bug in libxml2 2.9.8 specifically, and not in a particular
version of lxml.
Did you also try your own build with the official libxml2 2.9.8 sources?
I hope you can solve the mystery. Please let me know if I can be of any help.
It would help if you could reproduce the issue with xmllint and no Python code
involved. git-bisect might also be useful.
Nick
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]