Re: Inconsistent behaviour of BufferLines




On 29/03/12 23:09, Kai Willadsen wrote:
On 28 March 2012 03:25, Piotr Piastucki<leech miranda gmail com>  wrote:
Hi,

I have notices some inconsistencies in the way BufferLines class behaves.
Please see the attached test case for an example of a buggy behaviour. I
expect len(x), iter(x) and iter(x[0:len(x)] to return the same length,
however, due to the current implementation of the __getitem__ method iter(x)
returns 1 element more than expected.
This bug affects comparison results when one of the files is empty for
instance.
There are 2 possible fixes (both marked with FIXME comment and commented out
in the attached file):
1) change>  operator in __getitem__ to>=,>= makes more sense here anyway,
but I guess>  is used for some reason
2) implement __iter__ method
Implementing __iter__ looks like a hack; really __getitem__ should
just Do The Right Thing. I was all ready to make your change, when I
thought I'd do some history digging first. Turns out I added that
check to fix an iter case as well; at the time, I said:

     __getitem__ should raise an IndexError if an item beyond the last index
     is requested. Previously we did not do this, because we expect requests
     beyond the last line; specifically requests for (last line + 1). This
     patch changes our behaviour to correctly raise an IndexError if the
     request is for anything past (last line + 1).

     http://git.gnome.org/browse/meld/commit/?id=0e7cc5202895acbe5c23ca964e2962ea2d3cec2a

So there apparently *is* a reason for that check being obviously
wrong. Problem is, I now can't recall whether the (last_line + 1)
requests actually occurred for a good reason. My best guess is that
it's something to do with gtk.TextBuffer's end-of-file line behaviour.

The testcase you attached clearly shows the bug, but I'm wondering
what the impact on real comparisons is. I'm very wary of changing such
core behaviour now (I was hoping to release a 1.6 RC this weekend)
unless there are real issues in actual comparisons.

I do not think it's a major issue, however, it would be nice to sort it out :) Actually the same issue seems to affect __getslice__ too. Adding the following lines to the test case will reveal it:

    print "len(bl[0:])=" , len(bl[0:])
    print "len(bl[1:])=" , len(bl[1:])

In both cases the result is 1 !

The issues in BufferLines class can be worked around by changing the code in __init__ in MyersSequenceMatcher to:
        self.a = a[0:len(a)]
        self.b = b[0:len(b)]

I am attaching 2 files you can play with. The result of the initial comparison and the results of applying selected chunks are also shown here:

http://piastucki.bdl.pl/meld/buffer_lines_bug1.png
http://piastucki.bdl.pl/meld/buffer_lines_bug2.png
http://piastucki.bdl.pl/meld/buffer_lines_bug3.png

Moreover, I have noticed that for some reason an extra empty line is added at the end of the files (see a2.txt in the screenshots). I thought this feature had been removed long time ago.

Regards,
Piotr

line1

line2

line3


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]