Git blame's performance on chromium

187 views
Skip to first unread message

Matheus Tavares Bernardino

unread,
Apr 3, 2019, 12:30:01 PM4/3/19
to Chromium-dev
Hi, everyone

Some years ago, a concern about git blame's performance on chromium repo was
raised at git's mailing list[1]. Although it was long ago, it seems that the proposal to
make blame multi-threaded didn't make it to the end. So I would like to know if this is
still an issue at chromium and if you have any other problems with slower than
expected git commands.

I'm planning to work on git performance for my GSoC, so it's important for me to
identify if there's still a real problem nowadays and where it is. Any feedback will
be highly appreciated.

Thanks,
Matheus Tavares

As an addendum:

To take a look by myself, I downloaded chromium and got (on a machine with i7 and
SSD, running Manjaro Linux):

- 17s on blame for a file with long history[2]
- 2m on blame for a huge file[3]
- 15s on log for both [2] and [3]
- 1s for git status

It seems quite a lot, especially with SSD, IMO.

[1] https://public-inbox.org/git/CA+TurHgyUK5sfCKrK+3xY8AeOg0t66vEvFxX=JiA9wXww7eZXQ@mail.gmail.com/
[2] ./chrome/browser/about_flags.cc (same with ./DEPS)
[3] third_party/sqlite/amalgamation/sqlite3.c (7.5M)

Erik Chen

unread,
Apr 3, 2019, 12:42:25 PM4/3/19
to matheus.b...@usp.br, Chromium-dev
Thanks for reaching out, Matheus.

Yes, this is absolutely still a problem for Chrome. I filed some bugs for common operations that are slow for Chrome: git blame, git stash, git status

On Linux, blame is the only operation that is really problematic. On macOS and Windows ... it's hard to find a git operation that isn't slow. :(



On Wed, Apr 3, 2019 at 9:29 AM Matheus Tavares Bernardino <matheus.b...@usp.br> wrote:
Hi, everyone

Some years ago, a concern about git blame's performance on chromium repo was
raised at git's mailing list[1]. Although it was long ago, it seems that the proposal to
make blame multi-threaded didn't make it to the end. So I would like to know if this is
still an issue at chromium and if you have any other problems with slower than
expected git commands.

I'm planning to work on git performance for my GSoC, so it's important for me to
identify if there's still a real problem nowadays and where it is. Any feedback will
be highly appreciated.

Thanks,
Matheus Tavares

As an addendum:

To take a look by myself, I downloaded chromium and got (on a machine with i7 and
SSD, running Manjaro Linux):

- 17s on blame for a file with long history[2]
- 2m on blame for a huge file[3]
- 15s on log for both [2] and [3]
- 1s for git status

It seems quite a lot, especially with SSD, IMO.


[2] ./chrome/browser/about_flags.cc (same with ./DEPS)
[3] third_party/sqlite/amalgamation/sqlite3.c (7.5M)

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/b639c0f0-f250-464a-add0-71f6ff0c72e5%40chromium.org.

Matheus Tavares Bernardino

unread,
Apr 3, 2019, 4:46:07 PM4/3/19
to Erik Chen, Chromium-dev
On Wed, Apr 3, 2019 at 1:41 PM Erik Chen <erik...@chromium.org> wrote:
>
> Thanks for reaching out, Matheus.
>
> Yes, this is absolutely still a problem for Chrome. I filed some bugs for common operations that are slow for Chrome: git blame, git stash, git status
>
> On Linux, blame is the only operation that is really problematic. On macOS and Windows ... it's hard to find a git operation that isn't slow. :(

Thanks for letting me know, Erik. Although my project may not target
specifically macOS or Windows, I think adding threading to git blame
should result in a better overall performance in all this systems. But
I'll talk with the mentors to see what else can be done here.

Also, have you already checked the following git configurations:
core.commitGraph, useBitmaps and core.untrackedCache? From the list
you told me, it may speedup git status, at least.

Thanks,
Matheus

hcu...@chromium.org

unread,
Apr 4, 2019, 1:59:27 PM4/4/19
to Chromium-dev, erik...@chromium.org
Hi Matheus,

Personally I don't use git blame an awful lot (I tend to use the blame function on cs.chromium.org instead because I find it a bit easier to read), but I do get slowed down by the tab completion for git add (in Z-shell). It often takes a few seconds for each path part, meaning that it's not really a speed up verses just typing. I took a brief look at the completion code and it looks like that could be a Git thing, but I'm not totally sure.

Thanks,

Harry Cutts
Chrome OS Touch/Input

Matheus Tavares Bernardino

unread,
Apr 7, 2020, 10:17:44 AM4/7/20
to hcu...@chromium.org, Chromium-dev, Erik Chen
Hi, Erick, Harry and others

This thread started a long time ago (a year!), but I can finally come
back to share the results with you :)

TL;DR: in Git 2.26, git-grep can now use multiple threads when
searching in historical revision. This allows for a better
performance, with speedups of up to 3.3x in my tests with the Chromium
repo.

If you remember, I asked you about slow Git commands in the Chromium
repository. The idea was to select one and try to optimize it as my
Google Summer of Code 2019 project. You gave me great feedback! And
although you didn't specifically mention git-grep, that was a command
that my mentors and I realized could benefit from better
parallelization. So that's what we went for.

I'm happy to say that the latest version of Git (2.26) contains the
code from my GSoC project, with faster git-grep searches. The tests
showed a speedup of up to 3.3x running git-grep with 8 threads on the
Chromium repository (in a quad-core CPU w/ hyper-threading). If you
would be interested in knowing more about the project and results,
here is a small blog post I wrote:
https://matheustavares.gitlab.io/posts/git-2.26-faster-git-grep

Thanks,
Matheus

Peter Kasting

unread,
Apr 7, 2020, 1:21:46 PM4/7/20
to matheus.b...@usp.br, hcu...@chromium.org, Chromium-dev, Erik Chen
On Tue, Apr 7, 2020 at 7:15 AM Matheus Tavares Bernardino <matheus.b...@usp.br> wrote:
Hi, Erick, Harry and others

This thread started a long time ago (a year!), but I can finally come
back to share the results with you :)

TL;DR: in Git 2.26, git-grep can now use multiple threads when
searching in historical revision. This allows for a better
performance, with speedups of up to 3.3x in my tests with the Chromium
repo.

Thank you!  I have noticed this perf improvement personally as I do git grep on many-core machines frequently, and with 2.26 it is significantly faster :).  You have made my development experience more pleasant!

PK 

Matheus Tavares Bernardino

unread,
Apr 7, 2020, 1:46:26 PM4/7/20
to Peter Kasting, hcu...@chromium.org, Chromium-dev, Erik Chen
I'm very glad to hear it! :)

Erik Chen

unread,
Apr 7, 2020, 2:07:45 PM4/7/20
to Matheus Tavares Bernardino, Peter Kasting, hcu...@chromium.org, Chromium-dev
Absolutely great results, nice work!!!

Let me know if you're interested in further work in this space. I was doing some research that indicates that on macOS, there's significant room for improvement for `git status`, likely using a similar approach to the one you took.

Matheus Tavares Bernardino

unread,
Apr 8, 2020, 12:58:17 PM4/8/20
to Erik Chen, Peter Kasting, hcu...@chromium.org, Chromium-dev
On Tue, Apr 7, 2020 at 3:05 PM Erik Chen <erik...@chromium.org> wrote:
>
> Absolutely great results, nice work!!!

Thanks, Erik!

> Let me know if you're interested in further work in this space. I was doing some research that indicates that on macOS, there's significant room for improvement for `git status`, likely using a similar approach to the one you took.

Nice! I probably won't be able to look at that right now, because I'm
working on a few more patches on git-grep (to make it honor sparse
checkouts). Unfortunately, I also don't have a macOS box to test :(
But I'd be interested in hearing about it. Did you share your findings
in the Git mailing list?

Erik Chen

unread,
Apr 8, 2020, 1:46:06 PM4/8/20
to Matheus Tavares Bernardino, Peter Kasting, Harry Cutts, Chromium-dev
The raw numbers I already linked in an earlier reply. I did some prototyping but did not post on a public mailing list. The rough takeaways are:

readdir() is both CPU-bound and IO-bound. Multi-threading calls to readdir() will improve permanence by up to N, where N is # of cores.
It is cheaper to use getattrlistbulk() than readdir() and lstat().

I tried making a prototype where we take a snapshot of the FS using multi-threaded getattrlistbulk(), and then use that data-structure for subsequent operations in `git status`, similar to core.fscache on windows. Unfortunately, my naive prototype did not improve performance, because it is not aware of .gitignore. At least for the chromium repo, ignoring .gitignore ends up doubling the number of files that need to be touched -- and this blew out my system's vnode cache.

I suspect that options for moving forward are:
1) something like fscache on windows, but being smart about .gitignore
2) doing something simpler where we simply multi-thread readdir(), and don't worry about lstat() optimization
3) a more complex re-architecture

Benoit Lize

unread,
Apr 14, 2020, 1:03:31 PM4/14/20
to erik...@chromium.org, Matheus Tavares Bernardino, Peter Kasting, Harry Cutts, Chromium-dev
Hi all,

Thanks for the work on git performance!

One thing to note is that, at least for Googlers using Linux, "git status" is actually limited by... reference counting in the kernel. This is due to our use of a Debian-derived distribution, which has AppArmor enabled. Here is what's happening, and how to get the numbers:

$ perf record -g git status
$ perf report

image.png

This is using the internal linux distribution, using a 5.2 kernel.

Now, this is not really up to git, but... some things *could* help (not sure). Perhaps limiting parallelism. On my machine (dual-socket haswell xeon), using taskset shows that time spent in the kernel varies:
$ time git status > /dev/null
real 0m1.169s
user 0m0.884s
sys 0m2.443s

$ time taskset -c 0-4 git status > /dev/null
real 0m1.176s
user 0m0.787s
sys 0m1.194s


That is, restricting git to a smaller number of cores makes it run faster. Does git use many threads for these operations? Perhaps using fewer threads would help here.

Thanks again!

-- 
Benoit



--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAEYHnr1nYPimqUOThMooSQfmDkp2b%3DWCNgEHfL3FjU6KOGa4UQ%40mail.gmail.com.

Henrique Ferreiro

unread,
Apr 16, 2020, 5:11:47 AM4/16/20
to Chromium-dev, matheus.b...@usp.br, hcu...@chromium.org, erik...@chromium.org


On Tuesday, April 7, 2020 at 7:21:46 PM UTC+2, Peter Kasting wrote:
On Tue, Apr 7, 2020 at 7:15 AM Matheus Tavares Bernardino <matheus....@usp.br> wrote:
Hi, Erick, Harry and others

This thread started a long time ago (a year!), but I can finally come
back to share the results with you :)

TL;DR: in Git 2.26, git-grep can now use multiple threads when
searching in historical revision. This allows for a better
performance, with speedups of up to 3.3x in my tests with the Chromium
repo.

Thank you!  I have noticed this perf improvement personally as I do git grep on many-core machines frequently, and with 2.26 it is significantly faster :).  You have made my development experience more pleasant!

If you are a frequent user of the terminal for doing searches, I'd recommend taking a look at both ripgrep (https://github.com/BurntSushi/ripgrep) and fd (https://github.com/sharkdp/fd), both significantly faster than grep/ack/.. and find. 
 
PK 
Reply all
Reply to author
Forward
0 new messages