Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8



On 23/01/2019 01:47, Tomi Belan wrote:
But even so I still wasn't able to reproduce it in pure C. Could it be because xmllint reads ctxt->myDoc, and lxml uses SAX2 event handlers (according to parsertarget.pxi)? AFAICT xmllint's --push and --sax options are incompatible.

ctxt->myDoc is also built via internal SAX2 handlers, so I'm not sure what's going on exactly.

I had more luck with git bisect. Using a dynamically linked build of lxml, and pointing LD_LIBRARY_PATH to libxml2/.libs/, I successfully found out that the bug was: - introduced by https://github.com/GNOME/libxml2/commit/6e6ae5daa6cd9640c9a83c1070896273e9b30d14 - fixed(?) by https://github.com/GNOME/libxml2/commit/7a1bd7f6497ac33a9023d556f6f47a48f01deac0

The first commit was an attempt to fix an (ICU-related?) issue but it turned out to be buggy. It's unfortunate that the commit made it into 2.9.8.

https://mail.gnome.org/archives/xml/2018-January/msg00003.html
https://bugs.chromium.org/p/chromium/issues/detail?id=820163

I hope that's meaningful to you, because I have no idea what are those commits doing and how could it be related to this bug... The commits sound related to character encoding, but bad.html is plain ASCII...

The commit obviously also affected documents that didn't need encoding conversion. I didn't realize that. At least we know that the issue is isolated to 2.9.8. Thanks for your efforts!

Nick



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]