stable-2.14 to stable-2.16

431 views
Skip to first unread message

Nguyen Tuan Khang Phan

unread,
Feb 4, 2022, 11:07:23 AM2/4/22
to Repo and Gerrit Discussion
Hi,

We currently just did a test upgrade from 2.14 to 2.16 in order to reach 3.4. However, during 2.16 upgrade we faced an increased slow down at step " Migrating data to schema 154 ..    " which took  (39499.524 s) compared to other steps which are less than a second usually. We didn't even start indexing yet.

Migrating data to schema 154 ...    =====> 18:09

Collecting accounts:    93681       =====> 19:02

Counting objects:       3867448

Finding sources:        100% (3867448/3867448)

Getting sizes:          100% (1873509/1873509)

Compressing objects:    100% (2038704/2038704)

Writing objects:        100% (3867448/3867448)

Prune loose objects also found in pack files: 100% (259/259)

Prune loose, unreferenced objects: 100% (259/259)  =====> 23:55

> Done (39499.524 s)

All-Users repo is about 8gb in size.

Matthias Sohn

unread,
Feb 4, 2022, 11:57:55 AM2/4/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
Did you use the latest release 2.16.28 for the upgrade ?

-Matthias

Nguyen Tuan Khang Phan

unread,
Feb 4, 2022, 12:54:31 PM2/4/22
to Repo and Gerrit Discussion
We were using the tip of the current 2.16. Our repos are hosted on NFS, usually, GC on All-Users takes around 4 hours.

Nasser Grainawi

unread,
Feb 4, 2022, 5:32:27 PM2/4/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion

On Feb 4, 2022, at 10:54 AM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:


Try not to top post please as it makes it harder to keep the conversation together. 


On Friday, February 4, 2022 at 11:57:55 AM UTC-5 Matthias Sohn wrote:
On Fri, Feb 4, 2022 at 5:07 PM Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:
Hi,

We currently just did a test upgrade from 2.14 to 2.16 in order to reach 3.4. However, during 2.16 upgrade we faced an increased slow down at step " Migrating data to schema 154 ..    " which took  (39499.524 s) compared to other steps which are less than a second usually. We didn't even start indexing yet.

Migrating data to schema 154 ...    =====> 18:09

Collecting accounts:    93681       =====> 19:02

Counting objects:       3867448

Finding sources:        100% (3867448/3867448)

Getting sizes:          100% (1873509/1873509)

Compressing objects:    100% (2038704/2038704)

Writing objects:        100% (3867448/3867448)

Prune loose objects also found in pack files: 100% (259/259)

Prune loose, unreferenced objects: 100% (259/259)  =====> 23:55

> Done (39499.524 s)

All-Users repo is about 8gb in size.

Did you use the latest release 2.16.28 for the upgrade ?

We were using the tip of the current 2.16. Our repos are hosted on NFS, usually, GC on All-Users takes around 4 hours.


So basically, v2.16.28-6-gdc7bc6f799? That has many optimizations for All-Users and GC, so your results are quite surprising. Are you adding any patches or making any other changes on that branch?

Is 8GB the size of All-Users before or after the test upgrade?

Is that NFS backed with spinning disks or SSDs?

Do you usually run GC for All-Users in Gerrit, with JGit command line, or git command line? If command line, can you share the command(s) you use?

Can you share some git stats from that All-Users repo before and after your test upgrade? If you’d like, you can run this tool to capture and print them.

Thanks,
Nasser



-Matthias


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/f29d46a5-eec4-43a2-b6e2-bbd699d990fdn%40googlegroups.com.

Martin Fick

unread,
Feb 4, 2022, 5:49:43 PM2/4/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
On 2022-02-04 09:07, Nguyen Tuan Khang Phan wrote:
> We currently just did a test upgrade from 2.14 to 2.16 in order to
> reach 3.4. However, during 2.16 upgrade we faced an increased slow
> down at step " Migrating data to schema 154 .. " which took
> (39499.524 s) compared to other steps which are less than a second
> usually. We didn't even start indexing yet.
> ...
> All-Users repo is about 8gb in size.

How many users do you have? How long are their histories
on those branches? I believe the histories will be
re-written all the way to their initial commit, and the longer
you have been running 2.14, the more history there likely is
to rewrite, :(

-Martin

--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

Nguyen Tuan Khang Phan

unread,
Feb 7, 2022, 11:46:38 AM2/7/22
to Repo and Gerrit Discussion
NFS is SSD, so that might not be the problem.
I used clean v2.16.28-6-gdc7bc6f799 straight, without any extra patches.

We use gc-conductor to run gc.. However, before the upgrade, we run the git gc on All-Users (it takes around 20 mins).

Some stats of All-Users before and after upgrade:
Before
1.3G All-Users.git
After Upgrade
11G All-Users.git

git-stat output for before and after upgrade:
Before
git-stats.sh repos/All-Users.git
repos/All-Users.git|loose_refs: 5/
repos/All-Users.git|loose_ref_dirs: 240
repos/All-Users.git|all_refs: 169380
warning: garbage found: ./objects/pack/gc_9222827745659794382.pack_tmp
warning: garbage found: ./objects/pack/gc_9222827745659794382.idx_tmp
warning: garbage found: ./objects/pack/gc_3672157320954054823.pack_tmp
warning: garbage found: ./objects/pack/gc_3672157320954054823.idx_tmp
repos/All-Users.git|count: 9
repos/All-Users.git|size: 36
repos/All-Users.git|in-pack: 1510349
repos/All-Users.git|packs: 1
repos/All-Users.git|size-pack: 396272
repos/All-Users.git|prune-packable: 0
repos/All-Users.git|garbage: 4
repos/All-Users.git|size-garbage: 792544

After upgrade
git-stats.sh repos/All-Users.git
repos/All-Users.git|loose_refs: 93771/
repos/All-Users.git|loose_ref_dirs: 363
repos/All-Users.git|all_refs: 233218
warning: garbage found: ./objects/pack/preserved
repos/All-Users.git|count: 2510466
repos/All-Users.git|size: 9448008
repos/All-Users.git|in-pack: 2060672
repos/All-Users.git|packs: 1
repos/All-Users.git|size-pack: 505063
repos/All-Users.git|prune-packable: 0
repos/All-Users.git|garbage: 1
repos/All-Users.git|size-garbage: 4


Matthias Sohn

unread,
Feb 7, 2022, 12:02:19 PM2/7/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
93k loose refs seems a lot, try to pack them
 
repos/All-Users.git|loose_ref_dirs: 363
repos/All-Users.git|all_refs: 233218
warning: garbage found: ./objects/pack/preserved
repos/All-Users.git|count: 2510466
repos/All-Users.git|size: 9448008
repos/All-Users.git|in-pack: 2060672
repos/All-Users.git|packs: 1
repos/All-Users.git|size-pack: 505063
repos/All-Users.git|prune-packable: 0
repos/All-Users.git|garbage: 1
repos/All-Users.git|size-garbage: 4


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Nguyen Tuan Khang Phan

unread,
Feb 7, 2022, 2:38:52 PM2/7/22
to Repo and Gerrit Discussion
We didn't start the server yet, since we plan to test upgrade up to stable-3.4, but I will do a gc run on All-Users and print new stats later.

Nguyen Tuan Khang Phan

unread,
Feb 8, 2022, 10:20:53 AM2/8/22
to Repo and Gerrit Discussion
> 93k loose refs seems a lot, try to pack them
I did the Repack:
Current output:
Size: 4.3 GB

git-stats.sh
loose_refs: 0/
loose_ref_dirs: 263

all_refs: 233218
warning: garbage found: ./objects/pack/preserved
count: 698947
size: 2795764
in-pack: 3872191
packs: 1
size-pack: 845303
prune-packable: 0
garbage: 1
size-garbage: 4

As for number of accounts we have around 90k users

Kaushik Lingarkar

unread,
Feb 8, 2022, 7:52:13 PM2/8/22
to Repo and Gerrit Discussion

So, if the latest war was indeed used, the GC output seen here could be from a GC triggered in background in one of the previous schemas. Do you see GC logs before schema 154 starts as well? It might help if you can paste the full log from schema 146 to schema 154.

Also, please paste the init cmd that was used and also your All-User's git config.

Nguyen Tuan Khang Phan

unread,
Feb 9, 2022, 12:02:13 PM2/9/22
to Repo and Gerrit Discussion
Sure. I will do another attempt soon. This time I will record the information that you need.
Message has been deleted

Nguyen Tuan Khang Phan

unread,
Feb 14, 2022, 12:26:42 PM2/14/22
to Repo and Gerrit Discussion
We did another attempt this weekend, and it took weekends around 45 hours to reach stable-2.16. The upgrade log is below.
There are also some time logs to see that we reached the second day.

All-Users config:

[core]

repositoryformatversion = 0
filemode = true
bare = true
trustfolderstat = false
trustfilestat = true
logAllRefUpdates = true
[remote "origin"]
url = .../All-Users.git
fetch = +refs/*:refs/*
mirror = true
[gc]
prunePackExpire = 10.minutes.ago
pruneExpire = 1.week.ago
autoDetach = false
[receive]
autogc = false
[repack]
packKeptObjects = true


Init command:
gerrit-war-2.16.28-6-gdc7bc6f799.war init --no-auto-start --install-all-plugins -d /opt/gerrit/review_site

Upgrade log:

Upgrade /opt/gerrit/review_site/bin/gerrit.war [Y/n]?
Copying gerrit-war-2.16.28-6-gdc7bc6f799.war to /opt/gerrit/review_site/bin/gerrit.war

Upgrading schema to 143 ...
Upgrading schema to 144 ...
Upgrading schema to 145 ...
Upgrading schema to 146 ...
Upgrading schema to 147 ...
Upgrading schema to 148 ...
Upgrading schema to 149 ...
Upgrading schema to 150 ...
Upgrading schema to 151 ...
Upgrading schema to 152 ...
Upgrading schema to 153 ...
Upgrading schema to 154 ...
Upgrading schema to 155 ...
Upgrading schema to 156 ...
Upgrading schema to 157 ...
Upgrading schema to 158 ...
Upgrading schema to 159 ...
Upgrading schema to 160 ...
Upgrading schema to 161 ...
Upgrading schema to 162 ...
Upgrading schema to 163 ...
Upgrading schema to 164 ...
Upgrading schema to 165 ...
Upgrading schema to 166 ...
Upgrading schema to 167 ...
Upgrading schema to 168 ...
Upgrading schema to 169 ...
Upgrading schema to 170 ...
Migrating data to schema 143 ...
        > Done (0.017 s)
Migrating data to schema 144 ...
        > Done (82.706 s)
Migrating data to schema 145 ...
        > Done (0.301 s)
Migrating data to schema 146 ...
Migrating accounts
... (83.075 s) scan accounts
... using 56 threads ...
Thu Feb 10 21:59:33 CET 2022
... (0.022 s) gc --prune=now
Pack refs:                0% (     1/187519)
...
Pack refs:              100% (187519/187519)
... (181.636 s) migrated 1% (500/94034) accounts
Fri Feb 11 00:03:06 CET 2022
... (7652.758 s) migrated 61% (57000/94034) accounts
... (16111.009 s) Migrated all 94034 accounts to schema 146
        > Done (16027.958 s) (around 4.5 hours)
Migrating data to schema 147 ...
        > Done (15.275 s)
Migrating data to schema 148 ...
        > Done (12.057 s)
Migrating data to schema 149 ...
        > Done (0.015 s)
Migrating data to schema 150 ...
        > Done (0.013 s)
> Done (2444.773 s)
Migrating data to schema 152 ...
        > Done (0.014 s)
Migrating data to schema 153 ...
Fri Feb 11 03:09:06 CET 2022
        > Done (1412.598 s)

Migrating data to schema 154 ...
Collecting accounts:    1
...
Collecting accounts:    94034
Counting objects:       1
...
Counting objects:       1930549
Finding sources:          0% (  11833/1930549)
...
Finding sources:        100% (1930549/1930549)
Getting sizes:            1% (  7098/709713)
...
Getting sizes:          100% (709713/709713)
Writing objects:          1% (  19306/1930549)
...
Writing objects:        100% (1930549/1930549)
...
... (0.000 s) gc --prune=now
Pack refs:                0% (     1/202575)
...
Pack refs:              100% (202575/202575)
Fri Feb 11 04:06:06 CET 2022
Counting objects:       1
...
Counting objects:       2061440
...

Prune loose, unreferenced objects: 100% (259/259)
Fri Feb 11 14:03:07 CET 2022
        > Done (37946.178 s) (around 10 hours)
Migrating data to schema 155 ...
        > Done (0.770 s)
Migrating data to schema 156 ...
        > Done (0.014 s)
Migrating data to schema 157 ...
        > Done (0.014 s)
Migrating data to schema 158 ...
        > Done (0.012 s)
Migrating data to schema 159 ...
Migrate draft changes to private changes (default is work-in-progress) [y/N]? Replace draft changes with work_in_progress changes ...
Fri Feb 11 14:06:07 CET 2022
done
        > Done (174.241 s)
Migrating data to schema 160 ...
        > Done (12694.689 s) (3.5 hours)

Migrating data to schema 161 ...
        > Done (0.967 s)
Migrating data to schema 162 ...
        > Done (0.191 s)
Migrating data to schema 163 ...
        > Done (0.986 s)
Migrating data to schema 164 ...
        > Done (0.508 s)
Migrating data to schema 165 ...
        > Done (0.179 s)
Migrating data to schema 166 ...
        > Done (0.795 s)
Migrating data to schema 167 ...
        > Done (41804.221 s) (around 11 hours)

Migrating data to schema 168 ...
        > Done (0.015 s)
Migrating data to schema 169 ...
        > Done (0.012 s)
Migrating data to schema 170 ...
        > Done (0.015 s)
Execute the following SQL to drop unused objects:
...
Initialized /opt/gerrit/review_site
Init complete, reindexing projects with: reindex --site-path /opt/gerrit/review_site --threads 1 --index projects
Reindexed 66076 documents in projects index in 392.4s (168.4/s)

Index projects in version 4 is NOT ready
Sat Feb 12 05:33:08 CET 2022

Kaushik Lingarkar

unread,
Feb 15, 2022, 2:30:51 PM2/15/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion

On Feb 14, 2022, at 9:26 AM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:

We did another attempt this weekend, and it took weekends around 45 hours to reach stable-2.16. The upgrade log is below.
There are also some time logs to see that we reached the second day.

All-Users config:

[core]

repositoryformatversion = 0
filemode = true
bare = true
trustfolderstat = false

We have seen performance degradation with trustfolderstat set as false. Can you try a run with it set to true during the upgrade process? You should probably do it for *all* your repositories as it will also impact the notedb migration performance. 

trustfilestat = true
logAllRefUpdates = true
[remote "origin"]
url = .../All-Users.git
fetch = +refs/*:refs/*
mirror = true
[gc]
prunePackExpire = 10.minutes.ago
pruneExpire = 1.week.ago
autoDetach = false

Our repositories are configured with gc.auto=0 and gc.autoDetach=true. You can try doing these as well.
You received this message because you are subscribed to a topic in the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/repo-discuss/JoqJMddgpLA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/d773076c-381d-4f2b-8542-ff9c665f9032n%40googlegroups.com.

Nasser Grainawi

unread,
Feb 17, 2022, 12:05:34 AM2/17/22
to Kaushik Lingarkar, Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
On Feb 15, 2022, at 12:30 PM, Kaushik Lingarkar <kaus...@codeaurora.org> wrote:



On Feb 14, 2022, at 9:26 AM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:

We did another attempt this weekend, and it took weekends around 45 hours to reach stable-2.16. The upgrade log is below.
There are also some time logs to see that we reached the second day.

All-Users config:

[core]

repositoryformatversion = 0
filemode = true
bare = true
trustfolderstat = false

We have seen performance degradation with trustfolderstat set as false. Can you try a run with it set to true during the upgrade process? You should probably do it for *all* your repositories as it will also impact the notedb migration performance. 

To really emphasize this, there is a HUGE performance difference with trustFolderStat = true. There’s an incredible amount of duplicate work that happens with ‘= false’ that you can safely avoid because you have a single client/process running the init upgrade steps.

You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/705D4F1A-A373-47FE-8E3B-AABA38151648%40codeaurora.org.

Nguyen Tuan Khang Phan

unread,
Feb 25, 2022, 1:40:06 PM2/25/22
to Repo and Gerrit Discussion
With all of the improvements. We brought down the time to  7 hours.
I will share logs once I can get to them.

Nguyen Tuan Khang Phan

unread,
Mar 2, 2022, 2:02:20 PM3/2/22
to Repo and Gerrit Discussion
The upgrade to stable-2.16 from stable-2.14 took approximately 5 hours + 2 hours of GC on All-Users prior to the upgrade.

The NoteDb migration took 28 hours. The first attempt ended in failure, heap memory exhaustion.
The command used:
java -jar gerrit.war migrate-to-note-db --shuffle-project-slices --reindex=false
To cap memory usage:
java -Xmx256g -jar gerrit.war migrate-to-note-db --shuffle-project-slices --reindex=false

The next step for us is to upgrade to 3.1 and do a reindex on groups, accounts and changes:
java -Xmx256g -jar gerrit.war reindex --threads 56 --index <what_to_index>

Is it correct so far? Is there a way to speed up the migration?

Nasser Grainawi

unread,
Mar 2, 2022, 3:53:30 PM3/2/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
On Mar 2, 2022, at 12:02 PM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:

The upgrade to stable-2.16 from stable-2.14 took approximately 5 hours + 2 hours of GC on All-Users prior to the upgrade.

That’s a great improvement! Did you apply the trustFolderStat = true update to all repositories or only All-Users? It will greatly impact performance for all repositories, especially when you run migrate-to-note-db.


The NoteDb migration took 28 hours. The first attempt ended in failure, heap memory exhaustion.
The command used:
java -jar gerrit.war migrate-to-note-db --shuffle-project-slices --reindex=false
To cap memory usage:
java -Xmx256g -jar gerrit.war migrate-to-note-db --shuffle-project-slices --reindex=false

In addition to -Xmx, you probably want to include any other options you have in gerrit.config container.javaOptions.

If you’re using 56 threads for reindex, maybe you want to use --threads 56 for migrate-to-note-db too?

We found that using --shuffle-project-slices degraded performance for us. Maybe try without that?

How many changes and projects total do you have for this instance? If you aren’t sure of the gc status of all the repos, it might be worth running that git-stats.sh script on all the repos to see if you need to GC everything before starting the upgrade (which can happen while the server is still online).

Nasser


The next step for us is to upgrade to 3.1 and do a reindex on groups, accounts and changes:
java -Xmx256g -jar gerrit.war reindex --threads 56 --index <what_to_index>

Is it correct so far? Is there a way to speed up the migration?

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Nguyen Tuan Khang Phan

unread,
Mar 2, 2022, 4:29:19 PM3/2/22
to Repo and Gerrit Discussion
That’s a great improvement! Did you apply the trustFolderStat = true update to all repositories or only All-Users? It will greatly impact performance for all repositories, especially when you run migrate-to-note-db.
We applied trustFolderStat = true to all repositories.

> If you’re using 56 threads for reindex, maybe you want to use --threads 56 for migrate-to-note-db too?

I see that ISSUE_8022_THREAD_LIMIT was used. Are you implying to use more than 4? What number did you try using?

> We found that using --shuffle-project-slices degraded performance for us. Maybe try without that?

We will try after arriving to 3.1.

> How many changes and projects total do you have for this instance? If you aren’t sure of the gc status of all the repos, it might be worth running that git-stats.sh script on all the repos to see if you need to GC everything before starting the upgrade (which can happen while the server is still online).

This instance is a copy of production. So, a lot :)


Nasser Grainawi

unread,
Mar 2, 2022, 5:54:30 PM3/2/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
On Mar 2, 2022, at 2:29 PM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:

That’s a great improvement! Did you apply the trustFolderStat = true update to all repositories or only All-Users? It will greatly impact performance for all repositories, especially when you run migrate-to-note-db.
We applied trustFolderStat = true to all repositories.

> If you’re using 56 threads for reindex, maybe you want to use --threads 56 for migrate-to-note-db too?

I see that ISSUE_8022_THREAD_LIMIT was used. Are you implying to use more than 4? What number did you try using?

We used 24 threads.


> We found that using --shuffle-project-slices degraded performance for us. Maybe try without that?

We will try after arriving to 3.1.

This option is only relevant to migrate-to-note-db when you’re on 2.16. Maybe I’m misunderstanding your plan with 3.1, but I don’t see how that option is relevant.


> How many changes and projects total do you have for this instance? If you aren’t sure of the gc status of all the repos, it might be worth running that git-stats.sh script on all the repos to see if you need to GC everything before starting the upgrade (which can happen while the server is still online).

This instance is a copy of production. So, a lot :)

Sure, but how much is a lot? :-) Some instances have 100k changes and think that’s a lot, others have 4 million.

For our instance with almost 4 million changes and 20k projects, we can complete migrate-to-note-db in just under 2 hours. Since yours took more than 10x that time, I would assume you have 1) *much more* data, 2) a much slower filesystem for your git repos, 3) a much slower database, 4) repositories badly in need of GC, or 5) some combination of those other 4.




--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Nguyen Tuan Khang Phan

unread,
Mar 3, 2022, 9:37:19 AM3/3/22
to Repo and Gerrit Discussion
> This option is only relevant to migrate-to-note-db when you’re on 2.16. Maybe I’m misunderstanding your plan with 3.1, but I don’t see how that option is relevant.

I meant we will re-upgrade to see if we can improve timings after we reach 3.1.

> Sure, but how much is a lot? :-) Some instances have 100k changes and think that’s a lot, others have 4 million.

We have around 60K projects and 1million changes

> We used 24 threads.

Thanks, we will also try this number or something closer to 56.

Nguyen Tuan Khang Phan

unread,
Mar 3, 2022, 11:33:26 AM3/3/22
to Repo and Gerrit Discussion
> We have around 60K projects and 1million changes

I lost a 0, its 10 million

Nguyen Tuan Khang Phan

unread,
Mar 3, 2022, 1:10:16 PM3/3/22
to Repo and Gerrit Discussion
During the last community meeting, the topic of using populated disk cache to speed up reindex was brought up. How much disk cache is needed for Gerrit instance of 1 TB. Is it 2 TB?
We want some approximation to order hardware for the upcoming upgrade.

Nasser Grainawi

unread,
Mar 3, 2022, 1:36:03 PM3/3/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
On Mar 3, 2022, at 9:33 AM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:

> We have around 60K projects and 1million changes

I lost a 0, its 10 million

Ok. So hopefully using more threads (and maybe not using --shuffle-project-slices) you can get under 6 hours for migrate-to-note-db. If you’re still far from that, I would suspect a slow database or maybe gc-conductor is not doing enough for you.

What are your gc-conductor packed/loose config values set to? I also noticed gc-conductor doesn’t have an evaluation value for loose refs nor a way of only packing refs as it always does a full aggressive gc. If you run git-stats on your repos and see many loose refs for repos with few loose/packed objects, adding that could be beneficial as it would be much less expensive.

On a related note, we use git-exproll.sh to avoid repacking large packs too often. git.git repack now has a --geometric pack option that does a similar thing, but I don’t believe JGit has that available yet.

Another overall strategy I failed to mention earlier but that we found incredibly useful (and shared in our summit talk) was to narrow down performance problems using subsets of production data so that we could have faster iterations. For example, you could pick a set of repos that are about 5% of your total changes, then in your test area, remove data for anything else (including from the database tables). It makes it much less costly to do testing where you only modify one variable at a time and then you get much higher confidence in which modifications are important. Once you see improvement with your modification that you can project would meet your target goal for all data, increase the size of your subset. We started with a subset that only had 2 repos (plus All-Projects and All-Users) and ~150k changes, then 8 repos and ~400k changes, then 50 repos and ~850k changes. Each time we wanted to try a new idea, we evaluated what was the smallest subset where we thought we could see an improvement and used that for initial testing.


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Nguyen Tuan Khang Phan

unread,
Mar 3, 2022, 1:55:30 PM3/3/22
to Repo and Gerrit Discussion
>  What are your gc-conductor packed/loose config values set to? I also noticed gc-conductor doesn’t have an evaluation value for loose refs nor a way of only packing refs as it always does a full aggressive gc. If you run git-stats on your repos and see many loose refs for repos with few loose/packed objects, adding that could be beneficial as it would be much less expensive.

I believe its set to 400. Do you suggest running gc on all of the projects prior to migration?

> Another overall strategy I failed to mention earlier but that we found incredibly useful (and shared in our summit talk) was to narrow down performance problems using subsets of production data so that we could have faster iterations. For example, you could pick a set of repos that are about 5% of your total changes, then in your test area, remove data for anything else (including from the database tables). It makes it much less costly to do testing where you only modify one variable at a time and then you get much higher confidence in which modifications are important. Once you see improvement with your modification that you can project would meet your target goal for all data, increase the size of your subset. We started with a subset that only had 2 repos (plus All-Projects and All-Users) and ~150k changes, then 8 repos and ~400k changes, then 50 repos and ~850k changes. Each time we wanted to try a new idea, we evaluated what was the smallest subset where we thought we could see an improvement and used that for initial testing.

We are currently experimenting with deleting very old changes and abandoned changes.

Nasser Grainawi

unread,
Mar 3, 2022, 1:56:14 PM3/3/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
On Mar 3, 2022, at 11:10 AM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:

During the last community meeting, the topic of using populated disk cache to speed up reindex was brought up. How much disk cache is needed for Gerrit instance of 1 TB. Is it 2 TB?

I don’t think it relates well to overall git repo disk space usage. Gerrit’s disk caches are there for Gerrit data. So you could have huge git repos, but if they have no changes, you’ll have tiny disk caches (slight exaggeration).

I mentioned using the cache stats output from reindex, which was added in 3.2.14. That will give you output the same as the ’show-caches’ SSH/REST commands will, with table headers like:

  Name                          |Entries              |  AvgGet |Hit Ratio|
                                |   Mem   Disk   Space|         |Mem  Disk|
--------------------------------+---------------------+---------+---------+

Your ideal goal should be to have the disk cache sizes in gerrit.config large enough that if you run reindex once, then copy your cache/ directory and re-use it when you run reindex again, your Disk Hit Ratio is 100%.

Overall for the best offline reindex performance, you want to be using at least Gerrit v3.2.14, and ideally v3.3.9, because of all the improvements made in these changes. You definitely only want to test reindex with the final version you plan to upgrade to in production. For example, I mentioned in the meeting today that we’ve switched all our testing to using stable-3.5, so we only run reindex once in our end-to-end upgrade testing path and we use stable-3.5 tip. Our steps are roughly 1) schema upgrades (init) to 2.16 with --no-reindex --no-auto-start, 2) migrate-to-note-db --reindex=false, 3) init to 3.5 with --no-auto-start, 4) reindex using 3.5.

We want some approximation to order hardware for the upcoming upgrade.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Nasser Grainawi

unread,
Mar 3, 2022, 2:03:03 PM3/3/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion

On Mar 3, 2022, at 11:55 AM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:

>  What are your gc-conductor packed/loose config values set to? I also noticed gc-conductor doesn’t have an evaluation value for loose refs nor a way of only packing refs as it always does a full aggressive gc. If you run git-stats on your repos and see many loose refs for repos with few loose/packed objects, adding that could be beneficial as it would be much less expensive.

I believe its set to 400. Do you suggest running gc on all of the projects prior to migration?

I think the defaults of 40 packs and 400 loose objects are fine.

If you mean running gc as part of migration downtime, no, I think that would add a lot to your overall downtime. I would check that your gc conductor queue is being emptied. If gc isn’t keeping up, it doesn’t matter what your config is set to.


> Another overall strategy I failed to mention earlier but that we found incredibly useful (and shared in our summit talk) was to narrow down performance problems using subsets of production data so that we could have faster iterations. For example, you could pick a set of repos that are about 5% of your total changes, then in your test area, remove data for anything else (including from the database tables). It makes it much less costly to do testing where you only modify one variable at a time and then you get much higher confidence in which modifications are important. Once you see improvement with your modification that you can project would meet your target goal for all data, increase the size of your subset. We started with a subset that only had 2 repos (plus All-Projects and All-Users) and ~150k changes, then 8 repos and ~400k changes, then 50 repos and ~850k changes. Each time we wanted to try a new idea, we evaluated what was the smallest subset where we thought we could see an improvement and used that for initial testing.

We are currently experimenting with deleting very old changes and abandoned changes.

Note, I was only talking about for test purposes. The only changes we’ve removed from production are ones that the migration testing showed to be broken.


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Przemyslaw Waliszewski

unread,
Mar 3, 2022, 2:43:18 PM3/3/22
to Repo and Gerrit Discussion
Hi we are using
GC settings
  packed = 40
  loose = 400
 
We are running upgrade on staging env with snapshot from last week. There is no new changes before upgrade so gc queue is empty.
You wrote:
"  best offline reindex performance, you want to be using at least Gerrit v3.2.14, and ideally v3.3.9, because of all the improvements made in these changes." Can we safely cherry pick those changes to 3.1 and use it :
a) only during 1st reidex after upgrade?
b) using during upgrade and on production
c) to risky because to many changes between 3.1 -> 3.2.14

You also mention 
"I also noticed gc-conductor doesn’t have an evaluation value for loose refs nor a way of only packing refs as it always does a full aggressive gc. If you run git-stats on your repos and see many loose refs for repos with few loose/packed objects, adding that could be beneficial as it would be much less expensive"

example for All-Users
../repos/All-Users.git|loose_refs: 76/
../repos/All-Users.git|loose_ref_dirs: 129
../repos/All-Users.git|all_refs: 203646
../repos/All-Users.git|count: 1374
../repos/All-Users.git|size: 5524
../repos/All-Users.git|in-pack: 1828848
../repos/All-Users.git|packs: 1
../repos/All-Users.git|size-pack: 476257
../repos/All-Users.git|prune-packable: 0
../repos/All-Users.git|garbage: 3
../repos/All-Users.git|size-garbage: 476600\

random big repo
loose_refs: 22/
loose_ref_dirs: 30
all_refs: 1469996
count: 1558
size: 7868
in-pack: 24588317
packs: 60
size-pack: 7418963
prune-packable: 0
garbage: 3
size-garbage: 814785

Regarding running gc in aggressive mode we are introducing change to gc-conductor that will run gc in non-agressive mode. Does it mean that we also should modify evaluation part ? more details here->  https://gerrit-review.googlesource.com/c/plugins/gc-conductor/+/329363 

Nasser Grainawi

unread,
Mar 4, 2022, 12:24:30 AM3/4/22
to Przemyslaw Waliszewski, Repo and Gerrit Discussion

On Mar 3, 2022, at 12:38 PM, Przemyslaw Waliszewski <pwalis...@gmail.com> wrote:

Hi we are using
GC settings
  packed = 40
  loose = 400
 
We are running upgrade on staging env with snapshot from last week. There is no new changes before upgrade so gc queue is empty.
You wrote:
"  best offline reindex performance, you want to be using at least Gerrit v3.2.14, and ideally v3.3.9, because of all the improvements made in these changes." Can we safely cherry pick those changes to 3.1 and use it :
a) only during 1st reidex after upgrade?
b) using during upgrade and on production
c) to risky because to many changes between 3.1 -> 3.2.14

I don’t know how well those changes will apply to 3.1. Is there something special about 3.1 you need or something undesirable in 3.2+? If not, I would at least upgrade to 3.4.3, and a 3.5.1 (doesn’t exist yet) would be even better.
Those repos seem to have very few loose refs, so I wouldn’t be concerned about it for them.

Regarding running gc in aggressive mode we are introducing change to gc-conductor that will run gc in non-agressive mode. Does it mean that we also should modify evaluation part ? more details here->  https://gerrit-review.googlesource.com/c/plugins/gc-conductor/+/329363 

I’d have to look into what JGit does differently for (non-)aggressive to say for sure. For any gc that doesn’t do ref-packing, looking at # of loose objects and # of packs is appropriate.

If you were seeing repos that weren’t triggering gc because they had few loose objects and few packs, but were accumulating many loose refs, then adding a ‘pack refs only’ mode to gc-conductor could be valuable.

On Thursday, March 3, 2022 at 8:03:03 PM UTC+1 nas...@codeaurora.org wrote:

On Mar 3, 2022, at 11:55 AM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:

>  What are your gc-conductor packed/loose config values set to? I also noticed gc-conductor doesn’t have an evaluation value for loose refs nor a way of only packing refs as it always does a full aggressive gc. If you run git-stats on your repos and see many loose refs for repos with few loose/packed objects, adding that could be beneficial as it would be much less expensive.

I believe its set to 400. Do you suggest running gc on all of the projects prior to migration?

I think the defaults of 40 packs and 400 loose objects are fine.

If you mean running gc as part of migration downtime, no, I think that would add a lot to your overall downtime. I would check that your gc conductor queue is being emptied. If gc isn’t keeping up, it doesn’t matter what your config is set to.


> Another overall strategy I failed to mention earlier but that we found incredibly useful (and shared in our summit talk) was to narrow down performance problems using subsets of production data so that we could have faster iterations. For example, you could pick a set of repos that are about 5% of your total changes, then in your test area, remove data for anything else (including from the database tables). It makes it much less costly to do testing where you only modify one variable at a time and then you get much higher confidence in which modifications are important. Once you see improvement with your modification that you can project would meet your target goal for all data, increase the size of your subset. We started with a subset that only had 2 repos (plus All-Projects and All-Users) and ~150k changes, then 8 repos and ~400k changes, then 50 repos and ~850k changes. Each time we wanted to try a new idea, we evaluated what was the smallest subset where we thought we could see an improvement and used that for initial testing.

We are currently experimenting with deleting very old changes and abandoned changes.

Note, I was only talking about for test purposes. The only changes we’ve removed from production are ones that the migration testing showed to be broken.


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Nguyen Tuan Khang Phan

unread,
Mar 8, 2022, 12:54:03 PM3/8/22
to Repo and Gerrit Discussion
We finally were able to reach 3.1 took like (80+ hours). We plan on upgrading the test instance to 3.4 to get the index metric.

Reaching 3.1 we ran some load tests and had issues with slow performance. Is there something we should keep in mind upgrading to a new UI? Some config to tune maybe? As far as I understand new UI is RestApi based.

Martin Fick

unread,
Mar 21, 2022, 6:36:27 PM3/21/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
On 2022-03-08 10:54, Nguyen Tuan Khang Phan wrote:
> We finally were able to reach 3.1 took like (80+ hours). We plan on
> upgrading the test instance to 3.4 to get the index metric.
>
> Reaching 3.1 we ran some load tests and had issues with slow
> performance.

What specifically is problematic?

-Martin

--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

Nguyen Tuan Khang Phan

unread,
Mar 28, 2022, 12:08:28 PM3/28/22
to Repo and Gerrit Discussion
What specifically is problematic?

We reached 3.4 on a test env and ran some load tests using both HTTP and SSH command/queries. The result is HTTP has an overall faster response time while SSH got a slower response time compared to 2.14.
We ran into some Group key null issues, that we don't know where they originated from.

Martin Fick

unread,
Mar 28, 2022, 12:25:57 PM3/28/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
On 2022-03-28 10:08, Nguyen Tuan Khang Phan wrote:
>> What specifically is problematic?
>
> We reached 3.4 on a test env and ran some load tests using both HTTP
> and SSH command/queries. The result is HTTP has an overall faster
> response time while SSH got a slower response time compared to 2.14.

Specific commands?

Nguyen Tuan Khang Phan

unread,
May 5, 2022, 4:19:11 PM5/5/22
to Repo and Gerrit Discussion
Small update on our side regarding the upgrade. We were able to init to 2.16 in 2 hours instead. We forgot the --no-reindex option last time. However, noteDB migration took a log of time, 22 hours. We had more threads available as well (100 threads) so the improvement was around 6 hours from 28. The reindex on 3.4 with disk cache took 5h. Is it possible to speed up noteDB even more?

Luca Milanesio

unread,
May 5, 2022, 4:37:02 PM5/5/22
to Repo and Gerrit Discussion, Luca Milanesio, Nguyen Tuan Khang Phan


> On 5 May 2022, at 21:19, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:
>
> Small update on our side regarding the upgrade. We were able to init to 2.16 in 2 hours instead. We forgot the --no-reindex option last time. However, noteDB migration took a log of time, 22 hours. We had more threads available as well (100 threads) so the improvement was around 6 hours from 28. The reindex on 3.4 with disk cache took 5h. Is it possible to speed up noteDB even more?

How many refs and how many repositories do you have?

a) Small number of refs per repository (<<100k) but a lot of repositories
b) Large number of refs per repository (>>100k to Millions)

Luca.

Nguyen Tuan Khang Phan

unread,
May 5, 2022, 4:48:12 PM5/5/22
to Repo and Gerrit Discussion
a) Small number of refs per repository (<<100k) but a lot of repositories
b) Large number of refs per repository (>>100k to Millions)
We have around 60K projects and 10 million changes

There are only a little number of repos with >> 100k refs around 10%. So, I would say we are in category a)

Luca Milanesio

unread,
May 5, 2022, 6:15:31 PM5/5/22
to Repo and Gerrit Discussion, Luca Milanesio, Nguyen Tuan Khang Phan
I know you guys are running in HA: have you tried running the migration in parallel on multiple boxes by sharding the allocation of repositories?

Luca.

Martin Fick

unread,
May 6, 2022, 11:10:40 AM5/6/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
On Thu, May 5, 2022 at 2:19 PM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:
Small update on our side regarding the upgrade. We were able to init to 2.16 in 2 hours instead. We forgot the --no-reindex option last time. However, noteDB migration took a log of time, 22 hours. We had more threads available as well (100 threads) so the improvement was around 6 hours from 28. The reindex on 3.4 with disk cache took 5h. Is it possible to speed up noteDB even more?

We have ~4M changes, approx 20K repos and ~6TBrepo data and we can migrate to NoteDB in around 2 hours, so I would expect you to be able to get close to 8 hours with your setup. Since you are still at 22 hours, it might be time to focus in on your migration and see where the time is being spent. 

Are you bottlenecked with Lucene, with your git data, or with your DB? If I recall correctly we are mostly bound by our DB for NoteDB migration, i.e. most of the data is being read from DB, then written to git, (and then updating the DB) so there isn't a lot of git data reading, only git data writing. I would focus on whether you are able to read and write the data from the DB fast enough. Can you remind me which DB you are using? We are using PostgressDB and we added an optimization for it already to the 2.16 codebase. If you are not using PostgressDB, then I suspect some optimizations can also easily be made for other DBs. How well is your DB configured? Can your DB hold your entire dataset in RAM? I believe there are only 5 main tables used in NoteDB migration, we were able to get insights into our DB limitations by running full SELECT queries of those 5 tables outside of Gerrit and timing them. Since the NoteDB migration seems to involve a transaction for each change, your DB write performance is also important. Maybe you can perform some DB profiling and report back?

-Martin

Nguyen Tuan Khang Phan

unread,
May 6, 2022, 12:38:42 PM5/6/22
to Repo and Gerrit Discussion
We are using PostgressDB as well. We have around 24gb RAM, however the cache size for diffs is around 41gb. Did you host your DB on SSD as well? Your test instance is using NFS for storage as well?

Martin Fick

unread,
May 6, 2022, 12:57:40 PM5/6/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion


On Fri, May 6, 2022 at 10:38 AM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:
We are using PostgressDB as well.

How long does SELECT * from changes take?
How long does SELECT * from patch_sets take?
How long does SELECT * from change_messages take?


We have around 24gb RAM,

I think your DB could benefit from more RAM. I believe we have around 40GB of data in our DB, you probably have more, 100GB?


however the cache size for diffs is around 41gb.

I don't think the diffs are in the DB, so I'm not sure about this one.


Did you host your DB on SSD as well?

I think so, and I would suspect that you do want this for the NoteDb transactions to be faster.


Your test instance is using NFS for storage as well?

Only for the git repos, not for the DB or caches,

-Martin


Nguyen Tuan Khang Phan

unread,
May 13, 2022, 4:13:49 PM5/13/22
to Repo and Gerrit Discussion
> I think your DB could benefit from more RAM. I believe we have around 40GB of data in our DB, you probably have more, 100GB?

We have it at 140GB I think. We will proceed with more RAM provisioning.

Another issue we have is once we reached 3.4 we started to have a lot of issues of null-pointer whenever we run ls-projects:
null key in entry: null=Group[ / null]. That fails to load in the UI for those projects. Are you familiar with this problem?

Matthias Sohn

unread,
May 15, 2022, 4:13:44 PM5/15/22
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
Can you provide a full stack trace of these NPEs ?
Did they start occurring after updating to 3.4 or already after one of the earlier steps ?

-Matthias

Nguyen Tuan Khang Phan

unread,
May 16, 2022, 10:25:51 AM5/16/22
to Repo and Gerrit Discussion

Can you provide a full stack trace of these NPEs ?
 
[SSH gerrit ls-projects  (<user>)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user catbuilder account <account_number>) during gerrit ls-projects
com.google.common.util.concurrent.UncheckedExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.NullPointerException: null key in entry: null=Group[ / null]
    at com.github.benmanes.caffeine.guava.CaffeinatedGuavaLoadingCache.get(CaffeinatedGuavaLoadingCache.java:65)
    at com.google.gerrit.server.project.ProjectCacheImpl.get(ProjectCacheImpl.java:200)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
    at com.google.common.collect.CollectSpliterators$1WithCharacteristics.lambda$tryAdvance$0(CollectSpliterators.java:62)
    at java.base/java.util.stream.Streams$RangeIntSpliterator.tryAdvance(Streams.java:82)
    at com.google.common.collect.CollectSpliterators$1WithCharacteristics.tryAdvance(CollectSpliterators.java:62)
    at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
    at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
    at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
    at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
    at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
    at com.google.gerrit.server.restapi.project.ListProjects.display(ListProjects.java:443)
    at com.google.gerrit.server.restapi.project.ListProjects.displayToStream(ListProjects.java:414)
    at com.google.gerrit.sshd.commands.ListProjectsCommand.run(ListProjectsCommand.java:45)
    at com.google.gerrit.sshd.SshCommand.lambda$start$1(SshCommand.java:63)
    at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:493)
    at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:113)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:612)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.NullPointerException: null key in entry: null=Group[ / null]
    at com.github.benmanes.caffeine.guava.CaffeinatedGuavaLoadingCache.get(CaffeinatedGuavaLoadingCache.java:65)
    at com.google.gerrit.server.cache.h2.H2CacheImpl.get(H2CacheImpl.java:130)
    at com.google.gerrit.server.project.ProjectCacheImpl$InMemoryLoader.load(ProjectCacheImpl.java:377)
    at com.google.gerrit.server.project.ProjectCacheImpl$InMemoryLoader.load(ProjectCacheImpl.java:329)
    at com.github.benmanes.caffeine.guava.CaffeinatedGuavaLoadingCache$SingleLoader.load(CaffeinatedGuavaLoadingCache.java:136)
    at com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$2(LocalLoadingCache.java:141)
    at com.github.benmanes.caffeine.cache.LocalCache.lambda$statsAware$0(LocalCache.java:139)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2344)
    at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2342)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2325)
    at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
    at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54)
    at com.github.benmanes.caffeine.guava.CaffeinatedGuavaLoadingCache.get(CaffeinatedGuavaLoadingCache.java:59)
    ... 23 more
Caused by: java.lang.NullPointerException: null key in entry: null=Group[ / null]
    at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:30)
    at com.google.common.collect.ImmutableMap.entryOf(ImmutableMap.java:172)
    at com.google.common.collect.ImmutableMap$Builder.put(ImmutableMap.java:282)
    at com.google.gerrit.entities.CachedProjectConfig$Builder.addGroup(CachedProjectConfig.java:148)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
    at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
    at com.google.gerrit.server.cache.serialize.entities.CachedProjectConfigSerializer.deserialize(CachedProjectConfigSerializer.java:72)
    at com.google.gerrit.server.project.ProjectCacheImpl$PersistedProjectConfigSerializer.deserialize(ProjectCacheImpl.java:444)
    at com.google.gerrit.server.project.ProjectCacheImpl$PersistedProjectConfigSerializer.deserialize(ProjectCacheImpl.java:434)
    at com.google.gerrit.server.cache.h2.H2CacheImpl$SqlStore.getIfPresent(H2CacheImpl.java:437)
    at com.google.gerrit.server.cache.h2.H2CacheImpl$Loader.load(H2CacheImpl.java:254)
    at com.google.gerrit.server.cache.h2.H2CacheImpl$Loader.load(H2CacheImpl.java:237)
    at com.github.benmanes.caffeine.guava.CaffeinatedGuavaLoadingCache$SingleLoader.load(CaffeinatedGuavaLoadingCache.java:136)
    at com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$2(LocalLoadingCache.java:141)
    at com.github.benmanes.caffeine.cache.LocalCache.lambda$statsAware$0(LocalCache.java:139)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2344)
    at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2342)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2325)
    at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
    at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54)
    at com.github.benmanes.caffeine.guava.CaffeinatedGuavaLoadingCache.get(CaffeinatedGuavaLoadingCache.java:59)
    ... 36 more

Did they start occurring after updating to 3.4 or already after one of the earlier steps ?

It happens on the first offline re-index after the upgrade. The solution was to delete the "cache" directory and rerun reindex on account, groups and then project. But if start gerrit(ls-project) or rerun the project reindex, it occurs again.
 

Przemyslaw Waliszewski

unread,
May 16, 2022, 4:24:16 PM5/16/22
to Repo and Gerrit Discussion
Ok we did a analysis and problem is in the
CachedProjectConfigSerializer.java:72
The GroupReference pass to the builder can contain null for UUID field. 
Adding null check like this:
 
.map(GroupReferenceSerializer::deserialize)
.filter(groupReference -> groupReference.getUUID() != null)
.forEach(builder::addGroup);

solve the problem.
We suspect that problem exist because old project configs contains groups that no longer exist.
Do you think that we should open source this fix ? 

Reply all
Reply to author
Forward
0 new messages