[meld] misc: Avoid string copies during filtering (bgo#768300)

From: Kai Willadsen <kaiw src gnome org>
To: commits-list gnome org
Cc:
Subject: [meld] misc: Avoid string copies during filtering (bgo#768300)
Date: Sat, 2 Jul 2016 00:37:18 +0000 (UTC)
commit 080a20c18c4e7970b690c4785663fe26333fc313
Author: Kai Willadsen <kai willadsen gmail com>
Date:   Sat Jul 2 10:20:56 2016 +1000

    misc: Avoid string copies during filtering (bgo#768300)
    
    When we switched over to doing better regex filtering and highlighting
    of ignored regions, we changed the way we were applying filters from a
    simple multiple-regex approach to a merged-span based approach. This is
    fine, except that this also changed the way we sliced the existing text
    to produce the filtered version.
    
    Prior to this commit, we removed matching filtered text by
    concatenating two string slices, which is extremely slow in Python due
    to the overhead of string allocation, among other things. With this
    patch, we use a more idiomatic approach of grabbing all of the text
    sections that we care about and concatenating them in a single join
    operation at the end.
    
    The test case in bgo#768300 was previously extremely slow (I gave up
    waiting), but with this change takes a few seconds.
    
    This commit also switches up the role of the "cutter" function, which
    now only applies changes rather than expecting to modify the text. Text
    modification is carried out by apply_text_filters itself, since it can
    do so much more efficiently.

 meld/filediff.py |    7 ++-----
 meld/misc.py     |   22 ++++++++++++++--------
 2 files changed, 16 insertions(+), 13 deletions(-)
---
diff --git a/meld/filediff.py b/meld/filediff.py
index 3d7175d..9144127 100644
--- a/meld/filediff.py
+++ b/meld/filediff.py
@@ -768,19 +768,16 @@ class FileDiff(melddoc.MeldDoc, gnomeglade.Component):
         dimmed_tag = buf.get_tag_table().lookup("dimmed")
         buf.remove_tag(dimmed_tag, txt_start_iter, txt_end_iter)
 
-        def cutter(txt, start, end):
-            assert txt[start:end].count("\n") == 0
-            txt = txt[:start] + txt[end:]
+        def highlighter(start, end):
             start_iter = txt_start_iter.copy()
             start_iter.forward_chars(start)
             end_iter = txt_start_iter.copy()
             end_iter.forward_chars(end)
             buf.apply_tag(dimmed_tag, start_iter, end_iter)
-            return txt
 
         try:
             regexes = [f.filter for f in self.text_filters if f.active]
-            txt = misc.apply_text_filters(txt, regexes, cutter)
+            txt = misc.apply_text_filters(txt, regexes, apply_fn=highlighter)
         except AssertionError:
             if not self.warned_bad_comparison:
                 misc.error_dialog(
diff --git a/meld/misc.py b/meld/misc.py
index 60b5eef..aea2b3e 100644
--- a/meld/misc.py
+++ b/meld/misc.py
@@ -503,15 +503,13 @@ def merge_intervals(interval_list):
     return merged_intervals
 
 
-def apply_text_filters(txt, regexes, cutter=lambda txt, start, end:
-                       txt[:start] + txt[end:]):
+def apply_text_filters(txt, regexes, apply_fn=None):
     """Apply text filters
 
     Text filters "regexes", resolved as regular expressions are applied
     to "txt".
 
-    "cutter" defines the way how to apply them. Default is to just cut
-    out the matches.
+    "apply_fn" is a callable run for each filtered interval
     """
     filter_ranges = []
     for r in regexes:
@@ -533,7 +531,15 @@ def apply_text_filters(txt, regexes, cutter=lambda txt, start, end:
 
     filter_ranges = merge_intervals(filter_ranges)
 
-    for (start, end) in reversed(filter_ranges):
-        txt = cutter(txt, start, end)
-
-    return txt
+    if apply_fn:
+        for (start, end) in reversed(filter_ranges):
+            apply_fn(start, end)
+
+    offset = 0
+    result_txts = []
+    for (start, end) in filter_ranges:
+        assert txt[start:end].count("\n") == 0
+        result_txts.append(txt[offset:start])
+        offset = end
+    result_txts.append(txt[offset:])
+    return "".join(result_txts)
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]