Re: Profiling Meld?




On 12 May 2015 4:20 am, "Magnus Ihse Bursie" <magnus ihse net> wrote:
>
> Hi,
>
> When viewing large (> 1 MB) files with Meld, it takes a noticable amount of time for the diff to show up. (My current example is a 2 MB file which, on my machine takes ~8 seconds).
>
> I have always naively assumed that it was the underlying diff algorithm that was slow, perhaps due to (my perceived) slowness of python. This turned out to be completely incorrect. When I created a unit test to just read the files into an array and feed it directly to MyersSequenceMatcher, it's so blazingly fast I can't even get a reliable measurement on it.
>
> So it's something else that causes the delays.
>
> I tried running 'python -m cProfile -s time  bin/meld' but this only gives the following un-helpful result:
>
>          457930 function calls (454080 primitive calls) in 9.639 seconds
>
>    Ordered by: internal time
>
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>         1    7.278    7.278    9.639    9.639 meld:19(<module>)
>       476    0.227    0.000    0.348    0.001 diffgrid.py:232(do_draw)
>       976    0.209    0.000    0.210    0.000 Gtk.py:676(insert)
>      6647    0.174    0.000    0.188    0.000 meldbuffer.py:57(do_apply_tag)
>        42    0.143    0.003    0.293    0.007 meldbuffer.py:212(__getitem__)
>         3    0.139    0.046    0.149    0.050 gnomeglade.py:39(__init__)
>        80    0.119    0.001    0.119    0.001 {method 'splitlines' of 'unicode' objects}
>      3540    0.098    0.000    0.321    0.000 diffgrid.py:182(child_allocate)
>         2    0.085    0.042    0.096    0.048 matchers.py:128(index_matching)
>      8496    0.080    0.000    0.140    0.000 diffgrid.py:168(get_child_prop_int)
>       236    0.078    0.000    0.114    0.000 diffgrid.py:214(_get_min_sizes)
> <cut>
>
> So the complete run took more than 9 seconds, of which 7 was spend in "meld:19". The rest of the calls contributes to negligeble times and is clearly not the culprit here. (Note that the timing includes a split second for me to close the meld Window after the diff has finished rendering.)

You can add a low priority sys.exit call to the meld task queue to help automate what you're doing.

> I assume that "meld" refers to the bin/meld.py script. Line 19 is obviosly bogus (that's the first non-comment line, an import statement). I guess the problem here is
>     status = meld.meldapp.app.run(sys.argv)
> and that cProfile and Gtk.Application does not play well together.

Yes. My guess is that you're seeing stuff dispatched from the gobject main loop, though I have never figured out the rules for what things end up in python profiling data.

> I have tried googling on how to profile a Gtk.Application-based python program, but ended up with no usable results.
>
> Have anyone here tried profiling Meld before? If so, how did you do?

I've done a lot of profiling previously, but mostly pre-GTK3, and while some stuff shows up in the profiles, much does not.

I'm going to suggest two probable slownesses from previous experience.

Firstly, inserting text into a textbuffer that's editable validates the UTF8-ness of the text on every insert, which can end up being slow for large files. You can check the file loading code and make it unbuffered to see whether you're seeing this. Also, make it non editable and see whether that helps.

Secondly, the initial diff is typically fairly fast. The slowness usually comes from the (often very many) online highlighting comparisons, which are threaded or multiprocessed, depending. You can play with making that single threaded (for example, which is something I've toyed with as a comparison) which generally improves the initial responsiveness at the cost of overall performance.

As I said, I haven't looked at this much since the GTK3 port, so I'd be interested to know if you find anything.

Cheers,
Kai



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]