Hi,I have noticed that raw output of Myers diff algorithm may be a bit confusing in some cases, most notably when highlighting inline changes. The issue may be easily reproduced and I am attaching a simple test case (see e.g. first g in gnomeglade) and a screenshot showing the results. The included patch tries to address the issue by adding a post-processing clean-up step and reducing the number of small matching chunks.
Comments and testing welcome. Cheers, Piotr
>From d6feec0227b3b3ae67c23aba6650edd26a64b0f7 Mon Sep 17 00:00:00 2001 From: Piotr Piastucki <the_leech users berlios de> Date: Fri, 28 Jan 2011 10:57:05 +0100 Subject: [PATCH] Add post-processing cleanup algoritm to MyersSequenceMatcher While there may exist several optimal edit scripts some of them are more suitable for human use. The goal of a post-processing cleanup algorithm is to provide more meaningful results by reducing 'chaff' in diff algorithm output. This commit adds a simple cleanup algorithm that combines small commonalities together in order to yield more human-friendly yet optimal results. --- meld/matchers.py | 27 +++++++++++++++++++++++++++ 1 files changed, 27 insertions(+), 0 deletions(-) diff --git a/meld/matchers.py b/meld/matchers.py index 5393de0..9540b90 100644 --- a/meld/matchers.py +++ b/meld/matchers.py @@ -133,6 +133,32 @@ class MyersSequenceMatcher(difflib.SequenceMatcher): b = b2 return (a, b) + def postprocess(self): + mb = [] + i = len(self.matching_blocks) - 2 + mb.append(self.matching_blocks[i + 1]) + while i >= 0: + cur_block = self.matching_blocks[i] + cur_a = cur_block[0] + cur_b = cur_block[1] + cur_len = cur_block[2] + i -= 1 + while i >= 0: + prev_block = self.matching_blocks[i] + prev_a = prev_block[0] + prev_b = prev_block[1] + prev_len = prev_block[2] + if prev_b + prev_len == cur_b or prev_a + prev_len == cur_a: + if self.a[cur_a - prev_len : cur_a] == self.b[cur_b - prev_len : cur_b]: + cur_b -= prev_len + cur_a -= prev_len + cur_len += prev_len + i -= 1 + continue + break + mb.insert(0, (cur_a,cur_b,cur_len)) + self.matching_blocks = mb + def build_matching_blocks(self, lastsnake, snakes): """ Build list of matching blocks based on snakes taking into consideration all preprocessing @@ -263,4 +289,5 @@ class MyersSequenceMatcher(difflib.SequenceMatcher): lastsnake = node break self.build_matching_blocks(lastsnake, snakes) + self.postprocess() yield 1 -- 1.7.1
Attachment:
diff_chaff.tar.gz
Description: GNU Zip compressed data