Re: [PATCH] update-index/diff-index: use core.preloadindex to improve performance

115 views
Skip to first unread message

Erik Faye-Lund

unread,
Oct 30, 2012, 6:15:37 AM10/30/12
to karste...@dcon.de, g...@vger.kernel.org, msy...@googlegroups.com, pro-...@optusnet.com.au
On Tue, Oct 30, 2012 at 10:50 AM, <karste...@dcon.de> wrote:
> 'update-index --refresh' and 'diff-index' (without --cached) don't honor
> the core.preloadindex setting yet. Porcelain commands using these (such as
> git [svn] rebase) suffer from this, especially on Windows.
>
> Use read_cache_preload to improve performance.
>
> Additionally, in builtin/diff.c, don't preload index status if we don't
> access the working copy (--cached).
>
> Results with msysgit on WebKit repo (2GB in 200k files):
>
> | update-index | diff-index | rebase
> ----------------+--------------+------------+---------
> msysgit-v1.8.0 | 9.157s | 10.536s | 42.791s
> + preloadindex | 9.157s | 10.536s | 28.725s
> + this patch | 2.329s | 2.752s | 15.152s
> + fscache [1] | 0.731s | 1.171s | 8.877s
>

Wow, awesome results :)

This also makes me want to play around with the fscache stuff a bit;
about an order of magnitude improvement is quite noticeable :)

Albert Krawczyk

unread,
Oct 30, 2012, 6:50:34 PM10/30/12
to kusm...@gmail.com, karste...@dcon.de, g...@vger.kernel.org, msy...@googlegroups.com
> Wow, awesome results :)

They get even better when used with scripted commands like git svn rebase:

                |  cold | warm | hot
----------------+-------+------+-----
msysgit-v1.8.0  | 7m29s | 36s  | 33s
this patch      | 28s   | 14s  | 14


Karsten's code is amazing. Now if only I could figure out a way to spread the love through the rest of the source :)

Albert 

Jeff King

unread,
Nov 2, 2012, 11:26:16 AM11/2/12
to karste...@dcon.de, g...@vger.kernel.org, msy...@googlegroups.com, pro-...@optusnet.com.au
On Tue, Oct 30, 2012 at 10:50:42AM +0100, karste...@dcon.de wrote:

> 'update-index --refresh' and 'diff-index' (without --cached) don't honor
> the core.preloadindex setting yet. Porcelain commands using these (such as
> git [svn] rebase) suffer from this, especially on Windows.
>
> Use read_cache_preload to improve performance.
>
> Additionally, in builtin/diff.c, don't preload index status if we don't
> access the working copy (--cached).
>
> Results with msysgit on WebKit repo (2GB in 200k files):
>
> | update-index | diff-index | rebase
> ----------------+--------------+------------+---------
> msysgit-v1.8.0 | 9.157s | 10.536s | 42.791s
> + preloadindex | 9.157s | 10.536s | 28.725s
> + this patch | 2.329s | 2.752s | 15.152s
> + fscache [1] | 0.731s | 1.171s | 8.877s

Cool numbers. On my quad-core SSD Linux box, I saw a few speedups, too.
Here are the numbers for "update-index --refresh" on the WebKit repo
(all are wall clock time, best-of-five):

| before | after
-----------+--------+--------
cold cache | 4.513s | 2.059s
warm cache | 0.252s | 0.164s

Not as dramatic, but still nice. I wonder how a spinning disk would fare
on the cold-cache case, though. I also tried it with all but one CPU
disabled, and the warm cache case was a little bit slower.

Still, I don't think we need to worry about performance regressions,
because people who don't have a setup suitable for it will not turn on
core.preloadindex in the first place. And if they have it on, the more
places we use it, probably the better.

-Peff

Jeff King

unread,
Nov 2, 2012, 11:38:00 AM11/2/12
to karste...@dcon.de, g...@vger.kernel.org, msy...@googlegroups.com, pro-...@optusnet.com.au
On Fri, Nov 02, 2012 at 11:26:16AM -0400, Jeff King wrote:

> Still, I don't think we need to worry about performance regressions,
> because people who don't have a setup suitable for it will not turn on
> core.preloadindex in the first place. And if they have it on, the more
> places we use it, probably the better.

BTW, your patch was badly damaged in transit (wrapped, and tabs
converted to spaces). I was able to fix it up, but please check your
mailer's settings.

-Peff

Junio C Hamano

unread,
Nov 13, 2012, 11:46:06 AM11/13/12
to karste...@dcon.de, Jeff King, g...@vger.kernel.org, msy...@googlegroups.com, pro-...@optusnet.com.au
karste...@dcon.de writes:
> Yes, I feared as much, that's why I included the pull URL (the company MTA
> only accepts outbound mail from Notes clients, sorry).
>
> Is there a policy for people with broken mailers (send patch as
> attachment, add pull URL more prominently, don't include plaintext patch
> at all...)?

If anything, "fix your mailer" probably is the policy you are
looking for, I think.

Karsten Blees

unread,
Nov 13, 2012, 4:51:46 PM11/13/12
to Junio C Hamano, karste...@dcon.de, Jeff King, g...@vger.kernel.org, msy...@googlegroups.com, pro-...@optusnet.com.au
Am 13.11.2012 17:46, schrieb Junio C Hamano:
> karste...@dcon.de writes:
>
> If anything, "fix your mailer" probably is the policy you are
> looking for, I think.

Well then...I've cloned myself @gmail, I hope this is better.

Just some provoking thoughts...(if I may):

RFC-5322 recommends wrapping lines at 78, and mail relays and gateways are allowed to change message content according to the capabilities of the receiver (RFC-5598). In essence, plaintext mail is completely unsuitable for preformatted text such as source code.

On the other hand, git tries to solve the very problem of distributed source code management, and consistency by strong sha-1 checksums is on the top of the feature list.

It is somehow hard to believe that contributing to git itself should only be possible using the most unreliable of protocols. Don't you trust your own software?


-- >8 --
Subject: [PATCH] update-index/diff-index: use core.preloadindex to improve performance

'update-index --refresh' and 'diff-index' (without --cached) don't honor
the core.preloadindex setting yet. Porcelain commands using these (such as
git [svn] rebase) suffer from this, especially on Windows.

Use read_cache_preload to improve performance.

Additionally, in builtin/diff.c, don't preload index status if we don't
access the working copy (--cached).

Results with msysgit on WebKit repo (2GB in 200k files):

| update-index | diff-index | rebase
----------------+--------------+------------+---------
msysgit-v1.8.0 | 9.157s | 10.536s | 42.791s
+ preloadindex | 9.157s | 10.536s | 28.725s
+ this patch | 2.329s | 2.752s | 15.152s
+ fscache [1] | 0.731s | 1.171s | 8.877s

[1] https://github.com/kblees/git/tree/kb/fscache-v3

Thanks-to: Albert Krawczyk <pro-...@optusnet.com.au>
Signed-off-by: Karsten Blees <bl...@dcon.de>
---

Can also be pulled from: https://github.com/kblees/git/tree/kb/update-diff-index-preload-upstream

More performance figures (for msysgit) can be found in this discussion: https://github.com/pro-logic/git/commit/32c03dd8


builtin/diff-index.c | 8 ++++++--
builtin/diff.c | 12 ++++++++----
builtin/update-index.c | 1 +
3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/builtin/diff-index.c b/builtin/diff-index.c
index 2eb32bd..1c737f7 100644
--- a/builtin/diff-index.c
+++ b/builtin/diff-index.c
@@ -41,9 +41,13 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
if (rev.pending.nr != 1 ||
rev.max_count != -1 || rev.min_age != -1 || rev.max_age != -1)
usage(diff_cache_usage);
- if (!cached)
+ if (!cached) {
setup_work_tree();
- if (read_cache() < 0) {
+ if (read_cache_preload(rev.diffopt.pathspec.raw) < 0) {
+ perror("read_cache_preload");
+ return -1;
+ }
+ } else if (read_cache() < 0) {
perror("read_cache");
return -1;
}
diff --git a/builtin/diff.c b/builtin/diff.c
index 9650be2..198b921 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -130,8 +130,6 @@ static int builtin_diff_index(struct rev_info *revs,
usage(builtin_diff_usage);
argv++; argc--;
}
- if (!cached)
- setup_work_tree();
/*
* Make sure there is one revision (i.e. pending object),
* and there is no revision filtering parameters.
@@ -140,8 +138,14 @@ static int builtin_diff_index(struct rev_info *revs,
revs->max_count != -1 || revs->min_age != -1 ||
revs->max_age != -1)
usage(builtin_diff_usage);
- if (read_cache_preload(revs->diffopt.pathspec.raw) < 0) {
- perror("read_cache_preload");
+ if (!cached) {
+ setup_work_tree();
+ if (read_cache_preload(revs->diffopt.pathspec.raw) < 0) {
+ perror("read_cache_preload");
+ return -1;
+ }
+ } else if (read_cache() < 0) {
+ perror("read_cache");
return -1;
}
return run_diff_index(revs, cached);
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 74986bf..ada1dff 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -593,6 +593,7 @@ struct refresh_params {
static int refresh(struct refresh_params *o, unsigned int flag)
{
setup_work_tree();
+ read_cache_preload(NULL);
*o->has_errors |= refresh_cache(o->flags | flag);
return 0;
}
--
1.8.0.msysgit.0.3.g7d9d98c
Reply all
Reply to author
Forward
0 new messages