[libxml2] Make htmlCurrentChar always translate U+0000
- From: Nick Wellnhofer <nwellnhof src gnome org>
- To: commits-list gnome org
- Cc:
- Subject: [libxml2] Make htmlCurrentChar always translate U+0000
- Date: Wed, 15 Jul 2020 14:42:04 +0000 (UTC)
commit e050062ca9f921c6da8e62ff7a42925ba8f08268
Author: Nick Wellnhofer <wellnhofer aevum de>
Date: Wed Jul 15 14:38:55 2020 +0200
Make htmlCurrentChar always translate U+0000
The general assumption is that htmlCurrentChar only returns 0 if the
end of the input buffer is reached. The UTF-8 path already logged an
error if a zero byte U+0000 was found and returned a space character
instead. Make the ASCII code path do the same.
htmlParseTryOrFinish skips zero bytes at the beginning of a buffer, so
even if 0 was returned from htmlCurrentChar, the push parser would make
progress. But rescanning the input could cause performance problems.
The pull parser would abort parsing and now handles zero bytes in ASCII
mode the same way as the push parser or as in UTF-8 mode.
It would be better to return the replacement character U+FFFD instead,
but some of the client code assumes that the UTF-8 length of input and
output matches.
HTMLparser.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
---
diff --git a/HTMLparser.c b/HTMLparser.c
index ec88eed0..06d8c602 100644
--- a/HTMLparser.c
+++ b/HTMLparser.c
@@ -436,6 +436,12 @@ htmlCurrentChar(xmlParserCtxtPtr ctxt, int *len) {
*/
if ((int) *ctxt->input->cur < 0x80) {
*len = 1;
+ if ((*ctxt->input->cur == 0) &&
+ (ctxt->input->cur < ctxt->input->end)) {
+ htmlParseErrInt(ctxt, XML_ERR_INVALID_CHAR,
+ "Char 0x%X out of allowed range\n", 0);
+ return(' ');
+ }
return((int) *ctxt->input->cur);
}
@@ -5437,6 +5443,12 @@ htmlParseTryOrFinish(htmlParserCtxtPtr ctxt, int terminate) {
}
if (avail < 1)
goto done;
+ /*
+ * This is done to make progress and avoid an infinite loop
+ * if a parsing attempt was aborted by hitting a NUL byte. After
+ * changing htmlCurrentChar, this probably isn't necessary anymore.
+ * We should consider removing this check.
+ */
cur = in->cur[0];
if (cur == 0) {
SKIP(1);
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]