Re:[PATCH] Speedup parsing of svn status on huge repositories



Thanks for review!

I'm still interested in getting patch in, but I'm not quick - 
it takes some time to make fun improving my python skill to get code
fast AND readable AND more selective in tags pulling.

Unfortunately the patch now is two times longer, but otherwise at least one of the 3 goals above was not 
reached.

I fixed your issues and also reformat code to 80-col limit. 
Filed this as https://bugzilla.gnome.org/show_bug.cgi?id=761608

Though this patch is "svn-specific step" to making meld comfortably usable with huge svn repositories - it 
doesn't achieve this goal yet:
in my repo meld is still even more slow (many minutes with high cpu usage) in iterating over all files in 
filesystem in huge repositories while loading "repository status".
I'm not filing bug about this problem now, because it is too general - I'm even not sure if it is 
system-dependent or not.

On 19 December 2015 at 06:52, Vasily Galkin <galkin-vv yandex ru> wrote:

I'm using meld with 100000-file svn repository which has svn status -v --xml generating ~25MB output 
running 18 seconds on ~10-year-old machine. (linux, svn 1.9.2, python 2.7.10 32 bit)
Current version of meld uses quite slow xml.etree.ElementTree which constructs entire xml tree in python,
so svn.py _update_tree_state_cache with ElementTree takes
wall clock 130 seconds, 360 MB resident memory.
Simply replacing xml.etree.ElementTree with xml.etree.cElementTree reduces that to
wall clock 35 seconds, 195 MB resident memory
More optimizing with streaming parser based on xml.parsers.expat (attached patch) gives
wall clock 27 seconds, 84 MB resident memory

Nearly same wall clock improvements with 400000-file svn repo on ~4-year-old machine (windows7 x64, svn 
1.9.2, python 2.7.10 32 bit)

From the user experience reducing for _update_tree_state_cache time reduces the time while meld is 
initially totally unresponsive (UI thread is performing task).

Unfortunately I've not pre-1.9 svn, so only svn 1.9 was tested.

I'm sorry, but I still haven't found the time to test this on older
SVN, so I'm just putting down some very quick notes from looking at
the patch.

First up, this generally looks good and I'd like to be able to merge it.

The patch removes our EAGAIN handling, and I'd rather keep it.
Granted, there are excellent chances that it's never hit, and if it is
then there's not much reason to believe that the failure is actually
temporary... but I value keeping it for consistency with the other
plugins. Maybe we should pull that out in to a helper?

This patch is also less selective when pulling out tags from the XML,
due to the nature of the streaming parser. It would be nice if it
actually only checked under target and changelist tags for the
relevant entry + status lines.

Finally, there's a few bits where the handling has changed subtly, in
ways that I'm not 100% sure are correct. For example, we now handle
empty path and item tags differently than we did, and the
_tree_meta_cache behaves slightly differently.

I hope you're still interested in getting this patch in, and I
apologise for the time it's taken for me to get to it. If you'd like
to file a bug in bugzilla, I think we can probably work through any
issues pretty quickly.

cheers,
Kai


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]