[epiphany/mcatanzaro/unbreak-reader-mode] reader-handler: unbreak reader mode and add CSP
- From: Marge Bot <marge-bot src gnome org>
- To: commits-list gnome org
- Cc:
- Subject: [epiphany/mcatanzaro/unbreak-reader-mode] reader-handler: unbreak reader mode and add CSP
- Date: Thu, 16 Dec 2021 18:44:58 +0000 (UTC)
commit 7f8580dae1815b52efdd454256a59fd4f46e152f
Author: Michael Catanzaro <mcatanzaro redhat com>
Date: Thu Dec 16 12:31:51 2021 -0600
reader-handler: unbreak reader mode and add CSP
HTML-encoding the content passed to reader mode does not work because it
contains HTML markup generated by Readability.js. Oops. I must have
seriously screwed up when testing this yesterday, because there is no
way this could ever have worked.
Upstream recommends use of a DOM purifier, but in theory, if we
completely block *all* script execution, we can avoid the need for
that. So add a CSP recommended by Patrick.
We'll sneak in a couple bonus fixes: use ' rather than " to improve
readability, and close the </body> tag that we opened rather than abuse
the permissiveness of the parser.
Fixes: #1612
See also: #1661
Part-of: <https://gitlab.gnome.org/GNOME/epiphany/-/merge_requests/1047>
embed/ephy-reader-handler.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
---
diff --git a/embed/ephy-reader-handler.c b/embed/ephy-reader-handler.c
index 0b6daa29f..304a1854d 100644
--- a/embed/ephy-reader-handler.c
+++ b/embed/ephy-reader-handler.c
@@ -160,7 +160,6 @@ readability_js_finish_cb (GObject *object,
g_autofree gchar *byline = NULL;
g_autofree gchar *encoded_byline = NULL;
g_autofree gchar *content = NULL;
- g_autofree gchar *encoded_content = NULL;
g_autofree gchar *encoded_title = NULL;
g_autoptr (GString) html = NULL;
g_autoptr (GBytes) style_css = NULL;
@@ -182,7 +181,6 @@ readability_js_finish_cb (GObject *object,
title = webkit_web_view_get_title (web_view);
encoded_byline = byline ? ephy_encode_for_html_entity (byline) : g_strdup ("");
- encoded_content = ephy_encode_for_html_entity (content);
encoded_title = ephy_encode_for_html_entity (title);
html = g_string_new (NULL);
@@ -203,7 +201,8 @@ readability_js_finish_cb (GObject *object,
g_string_append_printf (html, "<style>%s</style>"
"<title>%s</title>"
- "<meta http-equiv=\"Content-Type\" content=\"text/html;\" charset=\"UTF-8\">" \
+ "<meta http-equiv='Content-Type' content='text/html;' charset='UTF-8'>" \
+ "<meta http-equiv='Content-Security-Policy' content=\"script-src 'none'\">" \
"<body class='%s %s'>"
"<article>"
"<h2>"
@@ -219,8 +218,22 @@ readability_js_finish_cb (GObject *object,
color_scheme,
encoded_title,
encoded_byline);
- g_string_append (html, encoded_content);
+
+ /* We cannot encode the page content because it contains HTML tags inserted by
+ * Readability.js. Upstream recommends that we use an XSS sanitizer like
+ * DOMPurify plus Content-Security-Policy, but I'm not keen on adding more
+ * bundled JS dependencies, and we have an advantage over Firefox in that we
+ * don't need scripts to work at this point. So instead the above CSP
+ * completely blocks all scripts, which should hopefully obviate the need for
+ * a DOM purifier.
+ *
+ * Note the encoding for page title and byline is still required, as they're
+ * not supposed to contain markup, and Readability.js unescapes them before
+ * returning them to us.
+ */
+ g_string_append (html, content);
g_string_append (html, "</article>");
+ g_string_append (html, "</body>");
finish_uri_scheme_request (request, g_strdup (html->str), NULL);
}
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]