Re: Meld roadmap proposals



On 12 July 2010 08:51, Piotr Piastucki <leech miranda gmail com> wrote:
> Hello
> On Sun, Jul 11, 2010 at 12:10 AM, Kai <kai willadsen gmail com> wrote:
>>
>> Ages ago when you first submitted the Myers diff implementation, you
>> said that it was slower than the difflib implementation; do you have
>> any idea how much slower? I'd be interested in finding out whether
>> switching to Myers wholesale would be a reasonable thing to do.
>
>
>>
>> Also, there is an implementation of Myers diff in Review Board (in
>> reviewboard/diffviewer/myersdiff.py). It may be worthwhile looking at
>> that code (and other related bits) to see whether there is anything we
>> can share/appropriate for use in Meld.
>
> Ages ago it was indeed and things have changed :) I am attaching a small
> patch that boosts the performance of Myers matcher significantly. Once
> applied, MyersSequenceMatcher will become comparable with other existing
> sequence matchers.

The patch is certainly a big improvement speed-wise. I think we can
probably make the filtering code quite a bit faster still, and can
probably remove some of the checks. For example, is it actually worth
our while to only use the cut-down files conditionally? It seems like
we're trading best-case for average-case behaviour there.

> I performed a couple of tests to check the impact of this path and the
> results (time taken to complete the test) are attached below.
> I also included Myers diff from Review Board, but it looks like their
> implementation cannot offer anything when it comes to performance (the
> result of rewriting GNU diff in Python with no Python-specific
> optimizations). There is one post-processing trick (shift inserts/deletions
> of identical lines) that may be worth adding to meld though. However, more
> testing is definitely needed.

Would you be able to attach or make available the files you've used
for testing? It would be nice to have a sensible sample set, and I'd
also like to do some profiling on it myself.

> I will be off for 2 weeks, but when I am back I will try to test a couple of
> scenarios related to inline changes highlighting as they are slightly
> different due to very low number of unique lines (each line = 1 character).

I seem to recall that I saw somewhere a comment from the Review Board
developers saying that they stayed with Python's implementation for
inline matching because it tended to produce more natural results.
Having said that, there are several examples of poor highlighting in
Meld's bugzilla, so it's not like it couldn't be better.

Thanks for continuing to look at this stuff.

cheers,
Kai


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]