Re: One last bizarre meld example



Hi Jeff,

It looks like the python difflib module is the bad guy here. It has a
heuristic to speed up degenerate cases. This can lead to unintuitive
diffs like you are seeing.

You can disable the heuristic by copying the standard module
difflib.py into the meld directory and making the following change at
line 316. It will slow meld down for some comparisons (100x for some
generated files)
-                if n >= 200 and len(indices) * 100 > n:
+                if 0 and n >= 200 and len(indices) * 100 > n:

I'm reasonably familiar with the string matching algorithms. I think
difflib could be modified without having to reinvent the wheel. For
instance, it usually makes sense to strip common suffixes and prefixes
from in-line differences, never mind the warning on line 393.

Stephen.

On 8/23/06, Jeff Smith <jeff smithicus com> wrote:
My text files are formatted with carriage returns at the end of paragraphs.
This means that a single line of text could have several hundred words. Most
other diff tools tell me that Old Paragraph is different from New Paragraph
and stop right there - they don't tell me where the two lines differ. This
is most vexing when the difference is a single space, or a single
punctuation point. The thing I love about meld is that it highlights the
actual words that are different, and doesn't just tell me that my 235 words
of text differ at some unspecified point in the line.

Of course, in my context, there are still some cases where the word-level
highlighting of meld is not as robust as it could be, but it works
surprisingly well - especially if its design didn't give much thought to
this particular context of use.

I'm including two screen captures to show you what I mean. In meld1.png, the
two highlighted paragraphs differ by a single comma. Meld points it out to
me, but fails to show me that the two lines are identical after the comma.
meld2.png, on the other hand, DOES show me that the lines are identical
after the single word of difference is accounted for. I realize that I'm
likely a complete outlier in your user community, but I'd love to see an
even more robust line differentiation algorithm that was better able to show
such in-line differences.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]