Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8



On 22/01/2019 15:43, Tomi Belan via xml wrote:
After a lot of debugging, I determined the problem is in libxml2 and not the other libraries in my stack, and that it only seems to happen on version 2.9.8. But I don't see any related changes in news.html for 2.9.9, nor in the diff between them, so I am still worried: I don't know if the bug is really fixed, or just dormant. I hope you can find the root cause, and maybe add a regression test if you do.

I also don't see any directly related changes in either 2.9.8 or 2.9.9.

This will download the manylinux binary build of lxml 4.2.5, which is statically linked to libxml2 2.9.8.

Are you sure that a pristine 2.9.8 build was used? Maybe there are additional patches added by a distro?

I couldn't shorten the file very much, because if I delete even a single character, the bug stops triggering. (Could it be some buffer boundary issue?)

Yes, a buffer boundary issue seems likely.

I also built my own lxml 4.2.5 with libxml2 2.9.9 and it was not affected. So I believe this is a bug in libxml2 2.9.8 specifically, and not in a particular version of lxml.

Did you also try your own build with the official libxml2 2.9.8 sources?

I hope you can solve the mystery. Please let me know if I can be of any help.

It would help if you could reproduce the issue with xmllint and no Python code involved. git-bisect might also be useful.

Nick


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]