[PATCH] Speedup parsing of svn status on huge repositories



I'm using meld with 100000-file svn repository which has svn status -v --xml generating ~25MB output running 
18 seconds on ~10-year-old machine. (linux, svn 1.9.2, python 2.7.10 32 bit)
Current version of meld uses quite slow xml.etree.ElementTree which constructs entire xml tree in python,
so svn.py _update_tree_state_cache with ElementTree takes
wall clock 130 seconds, 360 MB resident memory.
Simply replacing xml.etree.ElementTree with xml.etree.cElementTree reduces that to
wall clock 35 seconds, 195 MB resident memory
More optimizing with streaming parser based on xml.parsers.expat (attached patch) gives
wall clock 27 seconds, 84 MB resident memory

Nearly same wall clock improvements with 400000-file svn repo on ~4-year-old machine (windows7 x64, svn 
1.9.2, python 2.7.10 32 bit)

From the user experience reducing for _update_tree_state_cache time reduces the time while meld is initially 
totally unresponsive (UI thread is performing task).

Unfortunately I've not pre-1.9 svn, so only svn 1.9 was tested.

Attachment: 0001-Speedup-parsing-of-svn-status.patch
Description: Text Data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]