Converting to git



Hi,

I've been looking into converting the GNOME SVN repositories to git over
the last couple of days.  This email sums up the different approaches
I've considered and experimented with, and at the end I describe the
approach I'm recommending for doing the bulk conversion.

My first though was to just use git svn clone, or maybe even just grab
the repos from git-mirror.gnome.org, drop them in place and call it a
day.  Using the git-mirror repositories was qucikly dismissed, since
we'll want to use full names and some saner looking email addresses in
the commit logs.  Currently the git-mirror commits look like this in git
(I'm using pango and Behdad as examples here):

  Author:     behdad <behdad 123ab921-de25-0410-83fa-f409d9e86667>

which is sufficient, but not pretty.  So we'll need to do a re-import
and use a username->fullname map to get something like this:

  Author:     Behdad Esfahbod <behdad src gnome org>

We discussed different ways to generate the email address: try to dig up
a real list of email addresses, but that's problematic since a lot of
contributors have changed email addresses over time and contributing old
commits to a new email address (ie employer) would be unfortunate.
Another option was to try to automatically extract it from the ChangeLog
or commit message but that's going to be messy and fragile.  So in the
end we're recommending generic user src gnome org email addresses for
the conversion.  Whether it should be src, git, scm, vcs, dvcs or
whatever is a wonderful bikeshedding subject.  There's no requirement
that the addresses should be working email addresses.  I have src in my
script now, and that's going to be hard to change... I mean, I'll have
to edit the file and stuff, so I suggest we just go with src unless
there's a really good reason not to.

Ok, so to reconvert I started with git svn clone.  This tool takes an
author map that lets us map the usernames to the fullname as discussed
above, but it has a couple of problems: 1) it creates an empty commit
for branches and tags 2) it's very slow, even when you have everything
locally.  The empty commits come from the fact that creating a branch or
tag in SVN requires doing a of the branch, which will introduce a new
revision, with no changes to the source code.  git svn clone doesn't
filter this out, so we end up with a commit graph that looks something
like this:

  http://people.freedesktop.org/~krh/pango-git-mirror-gitk.png

ie, the PANGO_1_9_1 tag is sitting on a little branch on its own instead
of pointing to a commit that is actually on the 1.9 branch.  Being slow
is less of a problem, but the faster we can convert the repositories the
better.

A little googling finds the svn-all-fast-export tool.  This tool is
written by Thiago Macieira from the KDE project, it uses the git
fast-import feature and is designed to do a one-shot import of a big SVN
repository (the KDE repository) and optionally split it into multiple
git repositories in the process.  It's very fast and detects and
excludes the empty commits inherent in how SVN represents branching and
tagging.  And it's very fast - it imports evolution in half an hour.

Now, the problem with this tool is that when comparing all tags in the
original SVN repository and the new git repository, some of the tags
differ.  The git-mirror repositories match the SVN repositories tag for
tag, so in this respect git svn clone is better.  However, a little
digging reveals that the tags that svn-all-fast-export doesn't handle
are the tags that were carried over from the CVS to SVN conversion.
That is still a problem with the svn-all-fast-export tool, but those SVN
tags are actually badly broken.  Take a look at

  http://svn.gnome.org/viewvc/gconf?view=revision&revision=1837

Which is supposed to be the 2.6.4 tag for GConf.  Notice how
the /tags/GCONF_2_6_4 directory is recorded as being copy from trunk,
not from /branches/gnome-2-6.  ChangeLog and many other files on the
other hand come from the 2.6 branch replacing whatever was in the
directory that was copied from trunk.  Just as a reminder, this is what
a tag is supposed to look like in SVN:

  http://svn.gnome.org/viewvc/pango?view=revision&revision=2736

(copied from trunk because pango doesn't have a 1.22 branch yet, so
that's ok).

So the SVN tags are badly messed up, and branches have a similar
problem:

  http://svn.gnome.org/viewvc/gtk%2B?view=revision&revision=12227

The good news is that we can fix this.  The process I've using now is a
little complicated but it undoes the SVN import damage and preserves the
history, tags and branches better than git svn clone.  The basic idea is
to redo the import from CVS directly to git and then replay the SVN
activity on top of that.  For the CVS to git import I'm using Keith
Packard's parsecvs tool.  We used Keith's tool for importing all of
X.org (after splitting the monolithic CVS repository into all the
components we have now) and we've used it for mesa, libdrm, hal and many
other repositories on freedesktop.org.  It's a great tool - it's fast
and it handles all kinds special cases and brokeness usually encountered
in old, hand trimmed CVS repositories (because, well, you should see the
XFree86 CVS repository...).  Then to import the SVN activity after the
CVS to SVN conversion, I'm using the svn-all-fast-export tool.  The only
problem I've seen with that tool is that it got confused by the broken
SVN tags and ended up with different tag contents than the SVN repo, but
since I'm only using it for importing the part of history that
originated in SVN, that shouldn't be a problem.

Once we have a complete import, I'll put the repos up so people can help
verifying them.  I have a script to compare contents of all branches
between a git repo and a svn repo, and I'm working on a tool to compare
blame output to the extent that it's possible.  For blame lines that map
to a SVN commit we can map from the git commit to the git revno using
the comment in the git commit, for for blame lines that reach into CVS
commits, we can't easily determine if the commit that git gives us
matches what SVN gives us.  I think that we can compare the commit
message to see if the commits are the same, but on the other hand, I
don't know how much I trust the SVN import of CVS history now.  So if we
really want to verify this, we should consult CVS for those lines that
go further back than SVN.  I'll put my scripts in git somewhere once I
get them to a point where they're generally useful and doesn't need too
much handholding.

Alright, this mail is already too long, but it sums up where we are with
converting the repositories.  I'll send out a heads up once we get some
projects online.

cheers,
Kristian




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]