performance of running offline reindex with more threads

554 views
Skip to first unread message

Khai Do

unread,
Dec 9, 2015, 5:27:53 PM12/9/15
to Repo and Gerrit Discussion
Hello.  We are using Gerrit 2.8.4 and plan to upgrade to 2.11 soon.  We occasionally need to run offline reindex for project renames.  We want that reindex to run as fast as possible as to minimize down time.  I've been testing the reindex '--threads' parameter [1] using the default 1 thread then increasing to 8 threads but I'm not seeing any difference in performance.  Our gerrit is hosted on a very beefy VM (60GB RAM, 12 VCPU) and the DB for it is a local MySQL.  I'm wondering if I don't have something setup correctly or this feature is not working.  I'm wondering if anybody else has used this feature and sees any performance gain when providing more threads? 

offline reindex command:
  java -jar gerrit-v2.11.4.war reindex --threads 8 -d review_site



Ian Kumlien

unread,
Dec 9, 2015, 7:28:22 PM12/9/15
to Khai Do, Repo and Gerrit Discussion

I'd think that re indexing would be memory dependent as well - I don't know what the java default heaps are but increasing them and limiting the lucene flush threshold seems like the correct course of action...

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bassem Rabil

unread,
Dec 11, 2015, 12:28:44 PM12/11/15
to Repo and Gerrit Discussion
Do you have a different value of index.batchThreads or index.threads defined your review_site/etc/gerrit.config ?
Specially index.threads would default to 1 if not specified. In some cases the application might override the value you are passing within the command line with the settings defined in your review_site. At an earlier deployment for Gerrit 2.9 we tuned the number of threads to process the reindexing,  and we got a significant gain but we were using physical machine with 24 CPUs and 128 GB RAM.

Khai Do

unread,
Dec 14, 2015, 2:01:46 PM12/14/15
to Repo and Gerrit Discussion


On Friday, December 11, 2015 at 9:28:44 AM UTC-8, Bassem Rabil wrote:
Do you have a different value of index.batchThreads or index.threads defined your review_site/etc/gerrit.config ?
Specially index.threads would default to 1 if not specified. In some cases the application might override the value you are passing within the command line with the settings defined in your review_site. At an earlier deployment for Gerrit 2.9 we tuned the number of threads to process the reindexing,  and we got a significant gain but we were using physical machine with 24 CPUs and 128 GB RAM.
 
I don't have those configs set at all in my gerrit.config.  I went ahead and ran a few more tests, this time setting index.batchThreads, index.threads, and --threads to 8  but I still do not see any difference in reindexing performance :(

Saša Živkov

unread,
Dec 15, 2015, 4:31:07 AM12/15/15
to Khai Do, Repo and Gerrit Discussion
Reindexing (both offline and online) is not always using CPU optimally. It utilizes multiple threads but only one
thread per project. If you have a lot of projects of similar size (size = number of changes) then it it likely to utilize
all the threads you specify.
However, if you have one huge project and a couple of small ones then it will start reindexing with multiple threads
but as soon as smaller projects are done it will continue with only one thread on that huge project.

This is something we definitely have to improve.

fungi-...@yuggoth.org

unread,
Jun 6, 2016, 1:37:17 AM6/6/16
to Repo and Gerrit Discussion, zaro...@gmail.com
On Tuesday, December 15, 2015 at 9:31:07 AM UTC, zivkov wrote:
Reindexing (both offline and online) is not always using CPU optimally. It utilizes multiple threads but only one
thread per project. If you have a lot of projects of similar size (size = number of changes) then it it likely to utilize
all the threads you specify.
However, if you have one huge project and a couple of small ones then it will start reindexing with multiple threads
but as soon as smaller projects are done it will continue with only one thread on that huge project.

This is something we definitely have to improve.

Sorting the project list by decreasing order of change count
would also help maximize the available threads for longer. We
have quite a few (over a thousand) projects running the gamut
from almost no changes to many thousands each. It looks like
some of the largest are starting to get processed late in the
sequence right now and so they end up running on a few
(eventually only one) threads well after the others are complete.
If the largest were started at the beginning, the time other
threads spent underutilized could be minimized at least.

David Pursehouse

unread,
Jun 7, 2016, 6:01:33 AM6/7/16
to fungi-...@yuggoth.org, Repo and Gerrit Discussion, zaro...@gmail.com
On Mon, Jun 6, 2016 at 2:37 PM <fungi-...@yuggoth.org> wrote:
On Tuesday, December 15, 2015 at 9:31:07 AM UTC, zivkov wrote:
Reindexing (both offline and online) is not always using CPU optimally. It utilizes multiple threads but only one
thread per project. If you have a lot of projects of similar size (size = number of changes) then it it likely to utilize
all the threads you specify.
However, if you have one huge project and a couple of small ones then it will start reindexing with multiple threads
but as soon as smaller projects are done it will continue with only one thread on that huge project.

This is something we definitely have to improve.

Sorting the project list by decreasing order of change count
would also help maximize the available threads for longer.

We
have quite a few (over a thousand) projects running the gamut
from almost no changes to many thousands each. It looks like
some of the largest are starting to get processed late in the
sequence right now and so they end up running on a few
(eventually only one) threads well after the others are complete.
If the largest were started at the beginning, the time other
threads spent underutilized could be minimized at least.

--
Reply all
Reply to author
Forward
0 new messages