Google Code to GitHub migration notes

16 views
Skip to first unread message

Kevin Reid

unread,
Apr 17, 2015, 5:15:25 PM4/17/15
to Google Caja Discuss
I am currently working on the complete migration of Caja from Google Code Project hosting to GitHub. (For information on the shutdown of Google Code Project Hosting, see: http://google-opensource.blogspot.com/2015/03/farewell-to-google-code.html ) I will be using this thread to document the steps I am taking in case they need to be redone or audited.

The new repository is located at: https://github.com/google/caja

The process is not complete; this message documents what has been done and what I know needs to be done. If you spot something that needs to be done, please reply on the other thread I started for review: https://groups.google.com/forum/#!topic/google-caja-discuss/dhKQz-02EjQ


--- 1. git-svn migration ---

The use of git-svn was chosen because I am familar with its characteristics and I knew it would preserve the original SVN revision numbers (which are referenced many places).

git svn fetch --authors-file=../author-map

I created the author-map file by finding all authors in the SVN repository and hand-inserting the names and canonical email addresses of major contributors. I do not include the full text here out of respect for Google Code's anti-spam policy of not showing full addresses to the public, though they will be available through GitHub.

(Note that the wiki also has authors needing entries in the authors file.)


--- 2. Manual revision/branch/tag fixups. ---

Create the Git version of the es53 branch:

git branch es53 remotes/origin/es53

The following branches existed and were deleted in SVN. Since Git does not keep history-of-the-entire-repository in the same way, we must either delete them with their history or keep them as existant branches or tags (in somebody's repository):

branches/ben-review-test
branches/no-namespaces

I reviewed them and concluded that they contained changes not worth keeping. However, you may wish to disagree; please let me know before they are lost.

I manually rebuilt the commit tagged "SecurityReviewJun2008" in order to give it the ancestry reflected in its commit message; the tree is identical to the svn revision. This was done as follows:

git checkout `git svn find-rev r1767`  # r1767 is the origin mentioned in the tag desc
git reset remotes/origin/tags/SecurityReviewJun2008 -- .
git commit --reuse-message=remotes/origin/tags/SecurityReviewJun2008
git tag SecurityReviewJun2008


--- 3. svn:ignore migration ---

On branches master and es53, the following:

git svn create-ignore
git commit

--- 4. Upload repository to GitHub ---

(no comment needed)

--- 5. Issue migration ---

The snapshot of issues was taken at about 16:20 PDT April 14. As far as I know, no changes were made around that time.

Because GitHub does not support private issues, I manually replaced the currently-private issues with placeholders. We will need to find an alternative issue tracker for security issues; I have not yet researched this problem.

I had to make several changes to the issue import tools to get a satisfactory result:
• insert placeholders for deleted issues, to keep numbering identical
• actions taken by users with no corresponding GitHub username kept their original name rather than being attributed to me

Additionally, in order to avoid spamming people with each test run, I disabled the import of assignee information. I will re-assign every open issue now that the import has settled down.

--- 6. Wiki migration ---

Steps:
1. Use git-svn to clone the wiki.
2. Manually fix up markup the wiki migration tool complained about.
3. Use the wiki migration tools to convert to GitHub Markdown.
4. Touch the GitHub project wiki via the web to create it, then add it as a remote.
5. Force push to overwrite with the migrated content.

I had to fix some bugs in the wiki migration tools:
• td/th colspan/rowspan were not whitelisted properly
• "bug NNNN" autolinks were mangled
• intra-wiki links would go to raw markdown rather than wiki pages

--- TODO LIST ---

Things I have not done at this time, that I know have not been done:
• Add a GitHub-style README. Migrate the project home page content (which is not itself a wiki page nor in the repository)
• migrate private issues to some other system; link to caja-discuss-undisclosed
• Fix our build and deploy things to work with Git rather than SVN. Establish a version numbering policy.
• Reassign all open issues.
• fix SVN revision links in wiki and issues
• fix all other absolute-URL links referring to code.google.com in e.g. security advisories
• migrate/review the directories in the repository not in std layout: doc/, maven/
• review other elements of Google Code project configuration (front page, links, issue tracker templates, committers, label information, etc.)
• set up default emails to google-caja-discuss groups (after all bulk changes are done)
• Make sure all original project data is archived just in case
• Contribute fixes to migration tools back to http://code.google.com/p/support-tools/
• Set up issue labels and other such project configuration.
Reply all
Reply to author
Forward
0 new messages