[meld] matchers: Copy the passed-in text for mutability and speed



commit c57c36f62b05d8fa75a75dd2d76ccc2443a6579f
Author: Kai Willadsen <kai willadsen gmail com>
Date:   Mon Sep 26 06:56:28 2016 +1000

    matchers: Copy the passed-in text for mutability and speed
    
    The mutability argument here is pretty clear: we should take a copy of
    the sequences, because we can't guarantee that they're not going to
    change while we're running our comparison. I'm pretty sure that our
    yield points actually happen to guarantee this anyway, but I'd much
    prefer being explicit here.
    
    The speed argument is much weirder and more annoying. What this differ
    almost always gets passed is a pair of MeldBufferLines instances, which
    expose a Python-list-like interface over the lines of a GtkTextBuffer.
    What this means in practice is that doing things like iterating over
    MeldBufferLines results in half a dozen GTK+ API calls to e.g., get
    the text iterator for a visual line, get the start and end of the line,
    get the text from that line, clean it up... it's a nightmare and it's
    super, super slow.
    
    Doing the whole-buffer copy here does all of this, but only once.
    Obviously we pay the memory penalty of copying the whole file, but
    given the performance improvements I'm willing to take this as a peak
    usage cost.

 meld/matchers.py |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)
---
diff --git a/meld/matchers.py b/meld/matchers.py
index af51fa9..160db12 100644
--- a/meld/matchers.py
+++ b/meld/matchers.py
@@ -82,8 +82,11 @@ class MyersSequenceMatcher(difflib.SequenceMatcher):
     def __init__(self, isjunk=None, a="", b=""):
         if isjunk is not None:
             raise NotImplementedError('isjunk is not supported yet')
-        self.a = a
-        self.b = b
+        # The sequences we're comparing must be considered immutable;
+        # calling e.g., GtkTextBuffer methods to retrieve these line-by-line
+        # isn't really a thing we can or should do.
+        self.a = a[:]
+        self.b = b[:]
         self.matching_blocks = self.opcodes = None
         self.aindex = []
         self.bindex = []


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]