[epiphany/mcatanzaro/unbreak-reader-mode] reader-handler: unbreak reader mode and add CSP

From: Michael Catanzaro <mcatanzaro src gnome org>
To: commits-list gnome org
Cc:
Subject: [epiphany/mcatanzaro/unbreak-reader-mode] reader-handler: unbreak reader mode and add CSP
Date: Thu, 16 Dec 2021 18:43:50 +0000 (UTC)


commit ac78b919a9c8876733282c93b5403b388b69efe0
Author: Michael Catanzaro <mcatanzaro redhat com>
Date:   Thu Dec 16 12:31:51 2021 -0600

    reader-handler: unbreak reader mode and add CSP
    
    HTML-encoding the content passed to reader mode does not work because it
    contains HTML markup generated by Readability.js. Oops. I must have
    seriously screwed up when testing this yesterday, because there is no
    way this could ever have worked.
    
    Upstream recommends use of a DOM purifier, but in theory, if we
    completely block *all* script execution, we can avoid the need for
    that. So add a CSP recommended by Patrick.
    
    We'll sneak in a couple bonus fixes: use ' rather than " to improve
    readability, and close the </body> tag that we opened rather than abuse
    the permissiveness of the parser.
    
    Fixes: #1612
    
    See also: #1661

 embed/ephy-reader-handler.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)
---
diff --git a/embed/ephy-reader-handler.c b/embed/ephy-reader-handler.c
index 0b6daa29f..304a1854d 100644
--- a/embed/ephy-reader-handler.c
+++ b/embed/ephy-reader-handler.c
@@ -160,7 +160,6 @@ readability_js_finish_cb (GObject      *object,
   g_autofree gchar *byline = NULL;
   g_autofree gchar *encoded_byline = NULL;
   g_autofree gchar *content = NULL;
-  g_autofree gchar *encoded_content = NULL;
   g_autofree gchar *encoded_title = NULL;
   g_autoptr (GString) html = NULL;
   g_autoptr (GBytes) style_css = NULL;
@@ -182,7 +181,6 @@ readability_js_finish_cb (GObject      *object,
   title = webkit_web_view_get_title (web_view);
 
   encoded_byline = byline ? ephy_encode_for_html_entity (byline) : g_strdup ("");
-  encoded_content = ephy_encode_for_html_entity (content);
   encoded_title = ephy_encode_for_html_entity (title);
 
   html = g_string_new (NULL);
@@ -203,7 +201,8 @@ readability_js_finish_cb (GObject      *object,
 
   g_string_append_printf (html, "<style>%s</style>"
                           "<title>%s</title>"
-                          "<meta http-equiv=\"Content-Type\" content=\"text/html;\" charset=\"UTF-8\">" \
+                          "<meta http-equiv='Content-Type' content='text/html;' charset='UTF-8'>" \
+                          "<meta http-equiv='Content-Security-Policy' content=\"script-src 'none'\">" \
                           "<body class='%s %s'>"
                           "<article>"
                           "<h2>"
@@ -219,8 +218,22 @@ readability_js_finish_cb (GObject      *object,
                           color_scheme,
                           encoded_title,
                           encoded_byline);
-  g_string_append (html, encoded_content);
+
+  /* We cannot encode the page content because it contains HTML tags inserted by
+   * Readability.js. Upstream recommends that we use an XSS sanitizer like
+   * DOMPurify plus Content-Security-Policy, but I'm not keen on adding more
+   * bundled JS dependencies, and we have an advantage over Firefox in that we
+   * don't need scripts to work at this point. So instead the above CSP
+   * completely blocks all scripts, which should hopefully obviate the need for
+   * a DOM purifier.
+   *
+   * Note the encoding for page title and byline is still required, as they're
+   * not supposed to contain markup, and Readability.js unescapes them before
+   * returning them to us.
+   */
+  g_string_append (html, content);
   g_string_append (html, "</article>");
+  g_string_append (html, "</body>");
 
   finish_uri_scheme_request (request, g_strdup (html->str), NULL);
 }

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]