Migrating convoluted TFS collection to GIT

861 views
Skip to first unread message

Nati Elgavi

unread,
Jun 4, 2018, 3:56:48 PM6/4/18
to git-tfs-dev
Hi,

I want to migrate over to Git. Doing a one way migration is possible, but a bridge will be preferable, even for a short time. 
I went through all the vast documentation Microsoft released over the past couple of years about migrating to Git. 
Overall they highly recommend migrating without history, but my colleagues and I decided we want to try and migrate the history anyway.

I tried using Git-TFS and immediately encountered several issues (that I managed to resolve after some research). 
Instead of hitting every wall along the way and hope to survive, I wanted to share my plan with this group and get some feedback.
Basically trying to get your insight on whether migrating with history in my case is even possible. 

To start, I will describe our code and TFS structure:
  • 4 solutions
  • 10 main projects
  • Hundreds of branches
  • 11 years of history
Each of the 4 solutions uses some of 10 main projects.
So for example:
Solution A is located in team project 1 and uses team projects 2, 3 and 4 (mapping must preserve relative path). 
Solution B is located in team project 5 and uses team projects 2, 3 and 6 (mapping must preserve relative path). 

Similarly,  when I want to branch solution A, I must branch all of it's projects.
Since many teams worked on the different solutions over the years, there is no single branch for solution A. There are many.
A solution A branch will look like that:
$/some-random-path/some_random_name/Project1 
$/some-random-path/some_random_name/Project2
$/some-random-path/some_random_name/Project3
$/some-random-path/some_random_name/Project4
  
When I used Git-TFS, I noticed that every time a project was branched into a folder in another team project, the history of that entire team project was pulled. 
Even if there were only a couple of changes to that project in that branch. 
So when I looked at the history of a specific file it looked fine, but the history of the newly migrated repo was a mess.
Question number 1 - is there a way to avoid this?

During the migration, I encountered some issues that caused the Git-TFS clone to stop. 
Usually it involved team projects that were deleted and recreated later on or check-ins that had files from multiple team projects and one of them was deleted. 
I managed to resume the clone with another get command, specifying the next changeset number after the one that stopped it. 
Question number 2 - Is it the right way to solve this? If so, is there an option to automate this? 

Out team projects are packed with large files and we must use LFS to properly handle them.
I noticed a few open issues with using LFS, but they describe problems with the Git -> TFS bridge, not the migration itself. 
So I assume that if we'll do a one way migration, we will be able to store our large files in LFS. 
Question number 3 - Did anyone actually tried that? When during the process should we specify the LFS attributes? 

When we're finally in Git world, working on a single solution that is using code from multiple repositories is annoying. 
At first we thought we can merge some of the projects, but projects that are shared between solution had to be separate (and mapped using submodules or something like that). 
But after trying that for a short while, we decided the better way to go is to have a single repository with all 10 projects and 4 solutions. 
Since Git-TFS clones team projects into repositories, we will have to merge those repositories (with history) after the migration.
There is a nice trick that helps do that in here - merging-two-git-repositories-into-one-repository-without-losing-file-history

The plan:
  1. Clone each of the 10 projects with full history, branches, etc. 
  2. Use LFS configuration somehow.
  3. Manually fix clone issues (like changesets on deleted team projects) 
  4. If #2 failed, try Git BFG to apply LFS configuration retroactively. 
  5. Merge the repositories.
Do you think this is doable? What pitfalls should I be aware of? 

I am at attempt #3 to migrate to Git and I don't know how much longer I can invest in this. 
Any tip you can give will go a long way. 

Thanks in advance,
Nati.


Philippe Miossec

unread,
Jun 5, 2018, 3:56:58 PM6/5/18
to git-tfs-dev
I went through all the vast documentation Microsoft released over the past couple of years about migrating to Git. 
Overall they highly recommend migrating without history, but my colleagues and I decided we want to try and migrate the history anyway.

Yes, it's a big pain to migrate from TFVC due to the way it is slow and how it stores the history that even Microsoft doesn't try to migrate the history.
I was silly enough to try it....(perhaps because I value more my work)


I tried using Git-TFS and immediately encountered several issues (that I managed to resolve after some research).

Yes, there is a lot issues even if we try to fix it. Be sure to take the last version (v0.29).
 
Instead of hitting every wall along the way and hope to survive, I wanted to share my plan with this group and get some feedback.
Basically trying to get your insight on whether migrating with history in my case is even possible. 

To start, I will describe our code and TFS structure:
  • 4 solutions
  • 10 main projects
  • Hundreds of branches
  • 11 years of history
First remarks, git-tfs (mostly due to the slowless of TFVC) can't handle such a big history. It will takes weeks to migrates, if it works well from the first try, which I think is quite impossible.
The biggest is your history (long and with a lot of branches), less probable is that git-tfs could handle it well :(

I highly advise you to determine what is the minimum viable history you need and migrate only this history ( few months, 1 year,...)
And use the `--changeset=` to limit from which changeset to fetch...

And keep the TFVC history has archive!
 
Each of the 4 solutions uses some of 10 main projects.
So for example:
Solution A is located in team project 1 and uses team projects 2, 3 and 4 (mapping must preserve relative path). 
Solution B is located in team project 5 and uses team projects 2, 3 and 6 (mapping must preserve relative path). 

Similarly,  when I want to branch solution A, I must branch all of it's projects.
Since many teams worked on the different solutions over the years, there is no single branch for solution A. There are many.
A solution A branch will look like that:
$/some-random-path/some_random_name/Project1 
$/some-random-path/some_random_name/Project2
$/some-random-path/some_random_name/Project3
$/some-random-path/some_random_name/Project4
  
When I used Git-TFS, I noticed that every time a project was branched into a folder in another team project, the history of that entire team project was pulled.

By default yes. The last version introduce a regex parameter `--ignore-branches-regex` to ignore branches: https://github.com/git-tfs/git-tfs/pull/1194/files

You could also ignore the other branch and clone only your branch.
 
Even if there were only a couple of changes to that project in that branch. 
So when I looked at the history of a specific file it looked fine, but the history of the newly migrated repo was a mess.

That's often the problem...

 
Question number 1 - is there a way to avoid this?

Except the solution I told you, no, there is no good solution to do that.
I think the best solution would have been to provide a file containg the path of the branches we don't want to initialized. But that has not been done...
 

During the migration, I encountered some issues that caused the Git-TFS clone to stop. 
Usually it involved team projects that were deleted and recreated later on or check-ins that had files from multiple team projects and one of them was deleted. 
I managed to resume the clone with another get command, specifying the next changeset number after the one that stopped it.
Question number 2 - Is it the right way to solve this? If so, is there an option to automate this? 

That's the main pain point of git-tfs. All the devs that tried to fix this strange TFVC history failed :(
I have no knowledge on how to fix that, sorry.
 

Out team projects are packed with large files and we must use LFS to properly handle them.
I noticed a few open issues with using LFS, but they describe problems with the Git -> TFS bridge, not the migration itself. 
So I assume that if we'll do a one way migration, we will be able to store our large files in LFS. 
Question number 3 - Did anyone actually tried that? When during the process should we specify the LFS attributes?

No, I didn't hear about that. I think it's not possible because of the way git-tfs use libgit2sharp to create a commit.
But I may be wrong so you could try:

1. `git tfs init` instead of `git tfs clone`
2. initialize git lfs and commit the `.gitattributes`
3. `git fetch`

If that don't work, you will have to convert the repository to `git-lfs` retrroactively
 

When we're finally in Git world, working on a single solution that is using code from multiple repositories is annoying. 
At first we thought we can merge some of the projects, but projects that are shared between solution had to be separate (and mapped using submodules or something like that).

Sumodules or building packages (nuget packages?) which could be a better solution
 
But after trying that for a short while, we decided the better way to go is to have a single repository with all 10 projects and 4 solutions.

That's called monorepository and indeed it could be a good solution.

Since Git-TFS clones team projects into repositories, we will have to merge those repositories (with history) after the migration.
There is a nice trick that helps do that in here - merging-two-git-repositories-into-one-repository-without-losing-file-history

It seems difficult and I don't know the result that you will achieve.
For me the simplest is :
1. `git fetch` repo2 inside repo1
2. `git subtree -P nameOfTheSubDirectory Sha1OfTheLastCommitOfRepo2
 

The plan:
  1. Clone each of the 10 projects with full history, branches, etc. 
  2. Use LFS configuration somehow.
  3. Manually fix clone issues (like changesets on deleted team projects) 
  4. If #2 failed, try Git BFG to apply LFS configuration retroactively. 
  5. Merge the repositories.
Do you think this is doable? What pitfalls should I be aware of?

If you succeed, 2. will be done in the same time than 1. or should be done after 5.
It will be a hard work and the success depends on:

* Your knowledge of git and TFVC
* The (long) time you have and will pass to gather knowledge with git-tfs
* The history of TFVC that is not well supported by git-tfs and that will be very time consuming to fix
 
Note: you could use `git tfs verify` on the tip of each branch to verify that at least the last revision has been well migrated. In the contrary, you will be in the shit :(

I am at attempt #3 to migrate to Git and I don't know how much longer I can invest in this. 
Any tip you can give will go a long way. 


Good luck (really!!!).
My greatest advice. If you don't succeed, don't invest too much time and just clone the last commit of each branches.
And Keep TFVC as archive the time you recreate some usfull history.
I know it's not perfect but that's the only thing I could tell you through a forum post


 
Thanks in advance,
Nati.


Nati Elgavi

unread,
Jun 10, 2018, 7:40:24 AM6/10/18
to git-tfs-dev
Hi Philippe

Thank you for the detailed response. It gave me a lot of reassurance on some things and points to reconsider on others.  
Since posting this message, I tried the following:
  1. Migrate the main branch only. 
    The history is lossy and possibly misleading (changes done on a branch by person X will be listed as changes by person Y who merged them into the main branch).
    Still, many here in the company prefer this approach over no history at all. 
    The good news are that out of the 10 main branch team projects, one had a single migration issue that I managed to overcome using 'tfs pull --changeset=[the next changeset in that project]'.
    Evidently, people were a lot more cautious when it came to deleting history on the main branch (unlike on other branches). 
  2. I tried checking why is it taking so long to perform the migration and I saw the performance monitor on my PC and the server hosting the TFS did not seem burdened at all.
    I then tried running all 10 projects in parallel and to my surprise it actually finished a lot faster then running them one at a time. 
  3. I merged the repositories into one using the technique I mentioned (where each project is a branch that was merged in right after the initial commit). 
    It worked surprisingly well. I wanted to use that technique over submodules or subtrees because we regularly work on all projects and I want the workflow to be smooth.
  4. I defined LFS on the merged repository only.
    Although we have several large files in our projects, my test proved that most of them didn't change much. 
    We are way more likely to add somedll_2.dll in addition to somedll_1.dll instead of keeping multiple versions of the same large file.
    As a result, the size of the migrated repo is only slightly bigger than that of the manually created repo that has no history and is initialised with LFS (3.3Gb with 11 years history compared to 2.6Gb). 
    This means that going forward, the existing large files will be added to the history of the repo no more than twice (which is not that bad).
  5. This method even kept the bridge alive (at least for now).
    I can pull changesets from TFS into the migrated (bridge) repos and pull those changes into the dedicated branch on the merged repo.
    In theory that should work both ways, but so far I only tested TFS --> GIT. It is a temporary solution, until we bring the rest of the R&D over to Git.
    Once we do bring everyone else to git, we will probably try to rebuild the history with LFS (using BFG Cleaner). 
I have all of this available in a script which I will upload here in a short while.
I think it is a good solution to a complicated problem with a reasonable time investment. 
Let me know your thoughts.

Thanks again,
Nati. 

Pranav Patel

unread,
Sep 24, 2020, 11:37:03 AM9/24/20
to git-tfs-dev
Nati - I am going through this myself for a project of about 100s of solution and 11 years of history. I am finding it very slow and would like to touchbase with you and learn from your experience with this. Would you please share the script that you mentioned here in the meantime for me to go over and digest?

Philippe Miossec

unread,
Sep 24, 2020, 12:54:56 PM9/24/20
to git-t...@googlegroups.com
Abandon the idea to migrate 11 years of TFVC history. 
At least with tools like git-tfs or git-tf... because tfs apis are slow! 

Perhaps you could try this project :
https://github.com/kunom/tfsdb-fast-export


Good luck (because you will need some! ) 

Nati Elgavi

unread,
Sep 24, 2020, 3:10:48 PM9/24/20
to git-t...@googlegroups.com
Hi Pranav,

I'm no longer working for that company so I can't share the scripts with you.

However, I'm glad to say that I did manage to migrate to git, with some drawbacks, but it was a migration tailor made for our TFS collection and needs.

I do think I can give some guidelines. 
Would you like to meet online and chat about it?

If you'll find it useful, I'll try to document it here for future reference.

Nati.

--
You received this message because you are subscribed to the Google Groups "git-tfs-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to git-tfs-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/git-tfs-dev/CADaF9ni3-sSeE%2BAQMxzGxhBJnRAHVPsQ-FgowpNy0PWA70-_4w%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages