[xml] HTML push parser in recovery mode



Hi all,

The HTML push parser in recovery mode misses ending </script> tags
when they happen to occur at chunk border. That is, the ending
tag is only partly in the pushed chunk, eg. the last characters
in the chunk are "</scri", and the rest of the tag will be in the next chunk.
When this unfortunate case happens, the push parser is lost. It won't emit
events for any tags, and it will (incorrectly) append additional ending tags
</body> and </html> after the end of the document has been encountered.

I've filed a bug report on this, but it is uncommented, perhaps forgotten.
The bug report includes C source code for reproducing the bug, see
https://bugzilla.gnome.org/show_bug.cgi?id=706952

The bug affects mod_proxy_html for Apache 2 which relies on the recovery
mode entirely. It is where I ran into the problem. Occasionally, the document
which went through the filter was only half filtered, and most of the document
seemed to be untouched, and there were also additional tags at the end.
Because of the bug's random nature (whether the </script> occurs right at
the border of a chunk), it is rarely encountered in practice, and it is even
more difficult to catch it and nail it down to libxml2.

Jani



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]