Making git-tfs faster

406 views
Skip to first unread message

Ivan Danilov

unread,
Jul 20, 2011, 11:15:26 AM7/20/11
to git-t...@googlegroups.com
Here is the suggestion to drive git-tfs towards libgit2. What makes you think that it is git operations (namely GitSharp implementation) make git-tfs so slow on large imports? Did you do any measurements?

My initial thought was that it is TFS communication is inefficient. At least in my case TFS is placed far-far away and accessed through the VPN so it is really slow. And when getting each file-change requires dedicated round-trip to TFS - it is becoming a bottleneck. So I was going to check either a) if TFS allows to obtain changeset in more goarse-grained way; or b) if we can request changes in several independent threads.

Any other thoughts?

Matt Burke

unread,
Jul 25, 2011, 7:35:14 AM7/25/11
to git-t...@googlegroups.com
The desire to replace gitsharp with libgit2 is mostly because of the
relative activity of the two projects, at least as of the last time i
had checked on them. The biggest available performance improvement was
getting object storage in-proc (the original version use Process.Start
and a temp file). The next thing that would be nice is to eliminate
OutOfMemoryExceptions on large files.

And yes, if there is a way to improve communication with TFS, that
would be outstanding. It seems to me that the most likely improvement
would come from changing Fetch from walking the changeset in code to
using `tfs get` (or equivalent) and taking a snapshot of the result.

If you're working remotely via a VPN, performance is just going to be
bad, especially for the initial clone. (That's a big part of why
quick-clone exists.) I haven't spent much time with it, but
quick-clone seems about as fast as a fresh `tfs get`, which is really
about the best I'd expect. (It might not be the best possible, but it
seems like it must be close.)

Another idea would be a command to convert a TFS workspace into a git
repository, maybe `git tfs hijack`. It could do something like `tfpt
treeclean` or `tfpt scorch` and clean up the working directory and use
it as the quick-clone basis and remove the TFS workspace. The thought
here is that you'll probably already have a workspace, though I don't
know that I personally would want to do this.

Yet another idea is to make quick-clone do a `tf get` and then `git
add .` instead of the current API-based walk.

> --
> You received this message because you are subscribed to the Google Groups
> "git-tfs-dev" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/git-tfs-dev/-/AbnA8Dj2v4kJ.
> To post to this group, send email to git-t...@googlegroups.com.
> To unsubscribe from this group, send email to
> git-tfs-dev...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/git-tfs-dev?hl=en.
>

Ivan Danilov

unread,
Aug 6, 2011, 10:10:29 PM8/6/11
to git-t...@googlegroups.com
I've found interesting thing and going to compare calls made from VS2010 with calls made from git-tfs when we are getting large number of items (e.g. on cloning). I have suspicion that VS uses more coarse-grained calls like 'download this entire changeset' and we are getting items one-by-one.

I'll post results here afterwards.

Ivan Danilov

unread,
Aug 6, 2011, 11:52:26 PM8/6/11
to git-t...@googlegroups.com
VS2010 gets entire workspace with single call to this method. In response, TFS' Client spawns 13 worker threads which are downloading things asynchronously (these are implementation details thus its not so important).

The fact is that way ~2GB source files are being received in ~4 minutes (via local wifi connection). The same sources are cloned with git-tfs in ~10 minutes and that is local network. With slow VPN things would be much worse.

I'm pretty sure now that bottleneck is between git-tfs and tfs, so it is the place that needs upgrading.

Matt Burke

unread,
Aug 7, 2011, 1:13:52 PM8/7/11
to git-t...@googlegroups.com
I wonder what the difference in web service calls is, because I assume
that's where the significant difference is. From watching a tcpdump or
ethereal stream during a TFS get, I remember noticing that many of the
requests happen with a single keep-alive connection, and they seem to
be a series of downloadfile requests, and authentication is only
performed on the first request per connect. I wonder if git-tfs is
doing multiple web service calls per file (will be slower) or even
just a new tcp connection for each request, thus requiring an
authentication handshake for each file.

The disparity could depend on lots of factors: if every TFS web
service call was as fast as a local in-memory call, (a) there probably
would be no difference between the performance of git-tfs and tfs and
(b) tfs would be a little bit more reasonable of an alternative to
git. ;) Yeah, so I could see the performance of git-tfs being impacted
by many factors: network speed/latency/etc, server latency (i.e. the
capabilities of the TFS app and db servers). The workspace.get method
is apparently well-tuned, and it would be nice to do the same for
git-tfs.

It would be interesting to try a workspace.get approach... when I
started git-tfs, I had thought about doing that or the current
implementation, and the current implementation was closer to what
git-svn did, so it was easier to make it work.

One thing that would be more possible with a workspace.get-based
approach to fetching is that we could do a very selective clone. Like,
`git tfs clone blah --sample=1w` and just pull basically a summary git
commit for each week's worth of work in tfs. It might make a `git tfs
clone` a bit more sane for a gigantic repo. But, that would be a pain
for branching. Hm. So, maybe not so good.

Anyway, it also removes some of the fussiness of tracking file names
and such... just set up a workspace, let tfs do its get for each
changeset we're trying to fetch, then (effectively) snapshot into a
commit.

> --
> You received this message because you are subscribed to the Google Groups
> "git-tfs-dev" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/git-tfs-dev/-/QZL1tWOVP68J.

Reply all
Reply to author
Forward
0 new messages