Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8
- From: Nick Wellnhofer <wellnhofer aevum de>
- To: Tomi Belan <tomi belan gmail com>
- Cc: xml gnome org
- Subject: Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8
- Date: Wed, 23 Jan 2019 12:55:48 +0100
On 23/01/2019 01:47, Tomi Belan wrote:
But even so I still wasn't able to reproduce it in pure C. Could it be
because xmllint reads ctxt->myDoc, and lxml uses SAX2 event handlers
(according to parsertarget.pxi)? AFAICT xmllint's --push and --sax options are
incompatible.
ctxt->myDoc is also built via internal SAX2 handlers, so I'm not sure what's
going on exactly.
I had more luck with git bisect. Using a dynamically linked build of lxml, and
pointing LD_LIBRARY_PATH to libxml2/.libs/, I successfully found out that the
bug was:
- introduced by
https://github.com/GNOME/libxml2/commit/6e6ae5daa6cd9640c9a83c1070896273e9b30d14
- fixed(?) by
https://github.com/GNOME/libxml2/commit/7a1bd7f6497ac33a9023d556f6f47a48f01deac0
The first commit was an attempt to fix an (ICU-related?) issue but it turned
out to be buggy. It's unfortunate that the commit made it into 2.9.8.
https://mail.gnome.org/archives/xml/2018-January/msg00003.html
https://bugs.chromium.org/p/chromium/issues/detail?id=820163
I hope that's meaningful to you, because I have no idea what are those commits
doing and how could it be related to this bug... The commits sound related to
character encoding, but bad.html is plain ASCII...
The commit obviously also affected documents that didn't need encoding
conversion. I didn't realize that. At least we know that the issue is isolated
to 2.9.8. Thanks for your efforts!
Nick
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]