Re: Inconsistent behaviour of BufferLines



On 1 April 2012 18:06, Piotr Piastucki <leech miranda gmail com> wrote:
>
> On 29/03/12 23:09, Kai Willadsen wrote:
>>
>> On 28 March 2012 03:25, Piotr Piastucki<leech miranda gmail com>  wrote:
>>>
>>> Hi,
>>>
>>> I have notices some inconsistencies in the way BufferLines class behaves.
>>> Please see the attached test case for an example of a buggy behaviour. I
>>> expect len(x), iter(x) and iter(x[0:len(x)] to return the same length,
>>> however, due to the current implementation of the __getitem__ method
>>> iter(x)
>>> returns 1 element more than expected.
>>> This bug affects comparison results when one of the files is empty for
>>> instance.
>>> There are 2 possible fixes (both marked with FIXME comment and commented
>>> out
>>> in the attached file):
>>> 1) change>  operator in __getitem__ to>=,>= makes more sense here anyway,
>>> but I guess>  is used for some reason
>>> 2) implement __iter__ method
>>
>> Implementing __iter__ looks like a hack; really __getitem__ should
>> just Do The Right Thing. I was all ready to make your change, when I
>> thought I'd do some history digging first. Turns out I added that
>> check to fix an iter case as well; at the time, I said:
>>
>>     __getitem__ should raise an IndexError if an item beyond the last
>> index
>>     is requested. Previously we did not do this, because we expect
>> requests
>>     beyond the last line; specifically requests for (last line + 1). This
>>     patch changes our behaviour to correctly raise an IndexError if the
>>     request is for anything past (last line + 1).
>>
>>
>> http://git.gnome.org/browse/meld/commit/?id=0e7cc5202895acbe5c23ca964e2962ea2d3cec2a
>>
>> So there apparently *is* a reason for that check being obviously
>> wrong. Problem is, I now can't recall whether the (last_line + 1)
>> requests actually occurred for a good reason. My best guess is that
>> it's something to do with gtk.TextBuffer's end-of-file line behaviour.
>>
>> The testcase you attached clearly shows the bug, but I'm wondering
>> what the impact on real comparisons is. I'm very wary of changing such
>> core behaviour now (I was hoping to release a 1.6 RC this weekend)
>> unless there are real issues in actual comparisons.
>
>
> I do not think it's a major issue, however, it would be nice to sort it out
> :)

Definitely. Just checking what the impact was. Given that no one has
complained about it yet, and it's existed for a *long* time now, I'm
going to leave fixing it until post-1.6. The change(s) can go in at
the start of the cycle and give people plenty of time to bash on them.

> Actually the same issue seems to affect __getslice__ too. Adding the
> following lines to the test case will reveal it:
>
>    print "len(bl[0:])=" , len(bl[0:])
>    print "len(bl[1:])=" , len(bl[1:])
>
> In both cases the result is 1 !

Right, but that's the documented... weirdness... in __getslice__. That
explicitly exists because of GTK+'s EOF behaviour which is why we have
get_iter_at_line_or_eof(). I've played with fixing this, but the
issues involved in matching up Python list expectations with GTK+'s
TextBuffer semantics and our chunk-based requirements gave me
nightmares, so I stopped. Unless it causes a user-visible bug, I'm not
convinced it's worth trying to address, but you are of course free to
try.

> The issues in BufferLines class can be worked around by changing the code in
>  __init__ in MyersSequenceMatcher to:
>        self.a = a[0:len(a)]
>        self.b = b[0:len(b)]

Is the a workaround for the slice behaviour or the getitem behaviour?
I think we should fix the latter; the former I'm not so sure about.

> I am attaching 2 files you can play with. The result of the initial
> comparison and the results of applying selected chunks are also shown here:
>
> http://piastucki.bdl.pl/meld/buffer_lines_bug1.png
> http://piastucki.bdl.pl/meld/buffer_lines_bug2.png
> http://piastucki.bdl.pl/meld/buffer_lines_bug3.png
>
> Moreover, I have noticed that for some reason an extra empty line is added
> at the end of the files (see a2.txt in the screenshots). I thought this
> feature had been removed long time ago.

We're not adding the empty line, that's GTK+. Meld used to have an
option that if the text file didn't end in an EOL character, we'd add
one, but that was removed. However, a2.txt ends in \n, we hand the
text directly off to GTK+, and the last line in a gtk.TextBuffer never
ends in an EOL. This means that if your last line contains an EOL,
it's not the last line. I'm sure there's a sensible way to handle this
(gedit does after all), but I'm too scarred by the many many small
issues Meld has had about changes and chunk actions at EOF to actually
try and fix it.

cheers,
Kai


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]