Trimming gdk-pixbuf repository

Hello everbody,

I have been playing around with trimming the gdk-pixbuf repository as
it carries all the changes from Gtk+ 2.x before it was split.
I have used the script attached, it basically figures out which files
have been removed and removes all the content related to them from the
history. Commits that are not affecting any existing files are gone.
It would be nice if someone with a deeper knowledge of git could
review it, everything seems fine on my side.

There's one problem with the script, after I run git gc --prune
--aggressive the loose objects won't go away. However once I push the
repository to github and clone again, everything is stripped from the
repo. Afterwards it's tripped and the tags can be restored. (see the
bottom of the shell script and the python script for details on this).

I am looking for comments from git experts on any of the operations
I'm doing and how feasible would be to rewrite the
repository. The resulting repository can be found in my github

I've done a test of cloning it over git and this is the result:

aruiz@watchover:~/src$ time git clone git github com:aruiz/gdk-pixbuf-trim.git
Initialized empty Git repository in /home/aruiz/src/gdk-pixbuf-trim/.git/
remote: Counting objects: 27243, done.
remote: Compressing objects: 100% (4500/4500), done.
remote: Total 27243 (delta 22645), reused 27243 (delta 22645)
Receiving objects: 100% (27243/27243), 29.40 MiB | 789 KiB/s, done.
Resolving deltas: 100% (22645/22645), done.

real	0m47.499s
user	0m9.473s
sys	0m0.788s


aruiz@watchover:~/src$ time git clone git://
Initialized empty Git repository in /home/aruiz/src/gdk-pixbuf/.git/
remote: Counting objects: 229709, done.
remote: Compressing objects: 100% (32966/32966), done.
remote: Total 229709 (delta 197249), reused 228507 (delta 196355)
Receiving objects: 100% (229709/229709), 133.87 MiB | 495 KiB/s, done.
Resolving deltas: 100% (197249/197249), done.

real	5m21.110s
user	0m57.792s
sys	0m4.900s

That's a difference of 104 Mb of transfer and 4:30ish minutes in my
rather fast ADSL. I think this would be a major improvement for
anybody cloning that repo.

Un saludo,
Alberto Ruiz

Attachment: script
Description: Binary data

commitlist     = [i.strip() for i in open ('/tmp/tmp_commitlist', 'r').readlines ()]
new_commitlist = [i.strip() for i in open ('/tmp/tmp_new_commitlist', 'r').readlines ()]
tagmaplist     = [i.strip().split (' ', 1) for i in open ('/tmp/tmp_tagmap', 'r').readlines ()]

def find_candidate (sortedindexes, index):
	for i in sortedindexes:
		 if i >= index:
			return i

	return None

#We turn the tag/map list into a dictionary
tagmap = {}
for i in tagmaplist:
	tagmap[i[1]] = i[0]

#We create a map of the index of each remaining ref in the original repo
sortedindexes = [commitlist.index(commit) for commit in new_commitlist]

new_tagmap = {}
for tag in tagmap.keys():
	#We get the original index of each
	index = commitlist.index(tagmap[tag])
	candidate = find_candidate (sortedindexes, index)
	if not candidate:

#	print "%s -> %s : %s" % (tagmap[tag], commitlist[candidate], tag)
	new_tagmap[tag] = commitlist[candidate]

for tag in new_tagmap.keys():
	print "git tag %s %s" % (tag, new_tagmap[tag])

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]