Hello everbody, I have been playing around with trimming the gdk-pixbuf repository as it carries all the changes from Gtk+ 2.x before it was split. I have used the script attached, it basically figures out which files have been removed and removes all the content related to them from the history. Commits that are not affecting any existing files are gone. It would be nice if someone with a deeper knowledge of git could review it, everything seems fine on my side. There's one problem with the script, after I run git gc --prune --aggressive the loose objects won't go away. However once I push the repository to github and clone again, everything is stripped from the repo. Afterwards it's tripped and the tags can be restored. (see the bottom of the shell script and the python script for details on this). I am looking for comments from git experts on any of the operations I'm doing and how feasible would be to rewrite the git.gnome.org repository. The resulting repository can be found in my github site[0]. I've done a test of cloning it over git and this is the result: aruiz@watchover:~/src$ time git clone git github com:aruiz/gdk-pixbuf-trim.git Initialized empty Git repository in /home/aruiz/src/gdk-pixbuf-trim/.git/ remote: Counting objects: 27243, done. remote: Compressing objects: 100% (4500/4500), done. remote: Total 27243 (delta 22645), reused 27243 (delta 22645) Receiving objects: 100% (27243/27243), 29.40 MiB | 789 KiB/s, done. Resolving deltas: 100% (22645/22645), done. real 0m47.499s user 0m9.473s sys 0m0.788s ------------------- aruiz@watchover:~/src$ time git clone git://git.gnome.org/gdk-pixbuf Initialized empty Git repository in /home/aruiz/src/gdk-pixbuf/.git/ remote: Counting objects: 229709, done. remote: Compressing objects: 100% (32966/32966), done. remote: Total 229709 (delta 197249), reused 228507 (delta 196355) Receiving objects: 100% (229709/229709), 133.87 MiB | 495 KiB/s, done. Resolving deltas: 100% (197249/197249), done. real 5m21.110s user 0m57.792s sys 0m4.900s That's a difference of 104 Mb of transfer and 4:30ish minutes in my rather fast ADSL. I think this would be a major improvement for anybody cloning that repo. [0] https://github.com/aruiz/gdk-pixbuf-trim -- Un saludo, Alberto Ruiz
Attachment:
script
Description: Binary data
commitlist = [i.strip() for i in open ('/tmp/tmp_commitlist', 'r').readlines ()] new_commitlist = [i.strip() for i in open ('/tmp/tmp_new_commitlist', 'r').readlines ()] tagmaplist = [i.strip().split (' ', 1) for i in open ('/tmp/tmp_tagmap', 'r').readlines ()] def find_candidate (sortedindexes, index): for i in sortedindexes: if i >= index: return i return None #We turn the tag/map list into a dictionary tagmap = {} for i in tagmaplist: tagmap[i[1]] = i[0] #We create a map of the index of each remaining ref in the original repo sortedindexes = [commitlist.index(commit) for commit in new_commitlist] new_tagmap = {} for tag in tagmap.keys(): #We get the original index of each index = commitlist.index(tagmap[tag]) candidate = find_candidate (sortedindexes, index) if not candidate: continue # print "%s -> %s : %s" % (tagmap[tag], commitlist[candidate], tag) new_tagmap[tag] = commitlist[candidate] for tag in new_tagmap.keys(): print "git tag %s %s" % (tag, new_tagmap[tag])