Git repo history rewritten: re-clone your repositories

775 views

Skip to first unread message

Andrew Arnott

unread,

Mar 30, 2008, 1:24:12 AM3/30/08

to dotnet...@googlegroups.com

I've just completed a one-time rewrite of the author history in our Git repo to unscramble some of the authors that Google scrambled during the conversion from SVN to Git (who knows why). As a result of this, all the commit hash names changed, which means a "git pull" to get the latest changes will result in some amount of havoc on your repository. Sorry about that. Like I said, this is a one-time change so the history is accurate.

If you haven't committed any changes of your own into your own repositories, the easiest way to overcome this road bump is to simply delete your DotNetOpenId repository and re-clone it using "git clone" like you did originally (assuming you have one at all). Everything will work just like it did before, but you'll have a repo with all the new commit names.

If you have made commits to your own version of the library already, read on, as I've copied and pasted some instructions by one of the Git gurus (Peff is his name) on how to preserve your commits on the new repo commits:

> Thanks, Jeff. That was very helpful. I have published my repo online
> already, but only a couple people (if even that) have cloned it by now
> and I am prepared to email the list of interested parties letting them
> know. About this rebasing thing, is there a better way than for them
> to just wipe their repo and clone again? Would a simple git fetch and
> git rebase do the trick?

Short answer: if they haven't done any work on top of yours, re-cloning
is probably the simplest route.

If they do have work, then they will want to fetch and rebase. The
commands are fairly simple, but what is happening is a little tricky, so
I'll subject you to some ascii art.

# user has commits C..D built on top of your original A..B (in the
# diagram, "..." refers to an arbitrary number of commits)
#
# A--...--B <-- origin/master
# \
# C--...--D <-- master
git fetch

# after the fetch, we now have the filtered A'..B' pointed to by
# origin/master, but the reflog for origin/master points to the
# original.
#
# A'--...--B' <-- origin/master
# A--...--B <-- origin/master@{1}
# \
# C--...--D <-- master
#
# so now we can rebase. We want all of the commits between the
# _original_ upstream and our current state to be rebased on top
# of the new upstream.
git rebase --onto origin/master origin/master@{1} master

Three things to note here.

1. This works even if C..D is empty, so it is valid even if they didn't
do any work. Though in that case, simply doing "git reset --hard
origin/master" would work just as well.

2. The annoying thing is that you have to do this for every branch. So
depending on how many branches you have and how much work they did, it
may just be simpler to export the work as patches, re-clone, and then
apply:

git checkout master
git format-patch origin/master >/some/path/outside/repo
cd .. && rm -rf repo
git clone /path/to/repo && cd repo
git am /some/path/outside/repo

3. Since you haven't changed the trees at all, a fetch will just need to
download the new commits. Thus a fetch should be way less
network-intensive than a re-clone. Whether that matters, of course,
depends on your repo size and your users' bandwidth.