status:merged "stuck" thread in interactive-index queue

90 views
Skip to first unread message

Nuno Costa

unread,
May 29, 2026, 7:30:55 AM (3 days ago) May 29
to Repo and Gerrit Discussion
Hi all,

We are seeing "stuck threads" as `GET /changes/?O=5000081&S=0&n=25&q=status%3Amerged&allow-incomplete-results=true`(in Javamelody) running indefinitely and increasing over time.
This threads will stay running in the interactive-index queue as `status:merged`.
The time it stays in the queue is always changing, which indicates the task is not stuck, but running in a loop.
In javamelody, we see that the request is always unauthenticated(anonymous).

We are now able to reproduce the problem in our QA environment by accessing the UI Changes >> Merged as an anonymous user.

The thread dumps shows that the the threads are always stuck on the Lucene index.
Our production index size for closed changes is between 35 and 40GB.
Gerrit home dir is based on fast NVMEs.

This started to notice this problem mid-April, after the users started to lock down their project permissions and removing anonymous access.
Since killing the task is not advised, the only option is to restart Gerrit when we reach a "stuck" task threshold and memory pressure(we have a heap of 600GB) alerts are triggered.
In our use case, we use the interactive index queue prometheus metric and send alert when we have an average of 6 running tasks in the last 24h.
From past occurrences, when the average increases to more that 7 tasks, the rate of stuck threads increases and memory pressure as well.

We are planning to deploy the login-redirect plugin to avoid anonymous users triggering the merged changes through the UI and a custom plugin for REST API, by returning a 204 code.

At first glance it seems that Gerrit is not able to work in a full lock down mode.

Is this a bug of pagination not being applied to anonymous access, our specific setup, is working as expected/designed or something else?

We see this happening in both 3.9.11 and 3.13.6.

Thanks,
Nuno

Luca Milanesio

unread,
May 29, 2026, 7:51:14 AM (3 days ago) May 29
to Repo and Gerrit Discussion, Luca Milanesio
Hi Nuno,

> On 29 May 2026, at 13:30, Nuno Costa <nunoco...@gmail.com> wrote:
>
> Hi all,
>
> We are seeing "stuck threads" as `GET /changes/?O=5000081&S=0&n=25&q=status%3Amerged&allow-incomplete-results=true`(in Javamelody) running indefinitely and increasing over time.
> This threads will stay running in the interactive-index queue as `status:merged`.

Do you have any thread dumps to share?
Did you take regular thread dumps to indentify hotspots?

> The time it stays in the queue is always changing, which indicates the task is not stuck, but running in a loop.
> In javamelody, we see that the request is always unauthenticated(anonymous).
>
> We are now able to reproduce the problem in our QA environment by accessing the UI Changes >> Merged as an anonymous user.

If anonymous users do not have permissions to see anything, then the fact that they are stuck is expected.
They are scanning through *ALL* the changes merged until they reach an empty result.

>
> The thread dumps shows that the the threads are always stuck on the Lucene index.
> Our production index size for closed changes is between 35 and 40GB.
> Gerrit home dir is based on fast NVMEs.
>
> This started to notice this problem mid-April, after the users started to lock down their project permissions and removing anonymous access.
> Since killing the task is not advised, the only option is to restart Gerrit when we reach a "stuck" task threshold and memory pressure(we have a heap of 600GB) alerts are triggered.
> In our use case, we use the interactive index queue prometheus metric and send alert when we have an average of 6 running tasks in the last 24h.
> From past occurrences, when the average increases to more that 7 tasks, the rate of stuck threads increases and memory pressure as well.
>
> We are planning to deploy the login-redirect plugin to avoid anonymous users triggering the merged changes through the UI and a custom plugin for REST API, by returning a 204 code.

Yes, that would help.

HTH

Luca.

>
> At first glance it seems that Gerrit is not able to work in a full lock down mode.
>
> Is this a bug of pagination not being applied to anonymous access, our specific setup, is working as expected/designed or something else?
>
> We see this happening in both 3.9.11 and 3.13.6.
>
> Thanks,
> Nuno
>
> --
> --
> To unsubscribe, email repo-discuss...@googlegroups.com
> More info at http://groups.google.com/group/repo-discuss?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/ba9ca392-a3b4-4b5b-894d-b37b7aba1979n%40googlegroups.com.

Nuno Costa

unread,
May 29, 2026, 9:38:07 AM (3 days ago) May 29
to Repo and Gerrit Discussion
Hi Luca, thanks for your feedback

Do you have any thread dumps to share?
Did you take regular thread dumps to indentify hotspots?

This is what we always see in the production thread dumps.
```
"Index-Interactive-14[status:merged]" prio=5 RUNNABLE
org.apache.lucene.codecs.lucene80.IndexedDISI.advanceExact(IndexedDISI.java:399)
org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$SparseNumericDocValues.advanceExact(Lucene80DocValuesProducer.java:479)
org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.getValueForDoc(LongComparator.java:71)
org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.copy(LongComparator.java:96)
org.apache.lucene.search.MultiLeafFieldComparator.copy(MultiLeafFieldComparator.java:81)
org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.collect(TopFieldCollector.java:180)
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:282)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:238)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:659)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:572)
org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:553)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:474)
com.google.gerrit.lucene.LuceneChangeIndex$QuerySource.doRead(LuceneChangeIndex.java:426)
com.google.gerrit.lucene.LuceneChangeIndex$QuerySource$1.call(LuceneChangeIndex.java:351)
com.google.gerrit.lucene.LuceneChangeIndex$QuerySource$1.call(LuceneChangeIndex.java:348)
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75)
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:113)
java...@17.0.19/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
java...@17.0.19/java.util.concurrent.FutureTask.run(FutureTask.java:264)
java...@17.0.19/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:699)
java...@17.0.19/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
java...@17.0.19/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
java...@17.0.19/java.lang.Thread.run(Thread.java:840)
``` 

and

```
"Index-Interactive-29[status:merged]" prio=5 RUNNABLE
org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:275)
org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:202)
org.apache.lucene.search.TopFieldCollector.updateBottom(TopFieldCollector.java:621)
org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.collect(TopFieldCollector.java:181)
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:282)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:238)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:659)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:572)
org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:553)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:474)
com.google.gerrit.lucene.LuceneChangeIndex$QuerySource.doRead(LuceneChangeIndex.java:426)
com.google.gerrit.lucene.LuceneChangeIndex$QuerySource$1.call(LuceneChangeIndex.java:351)
com.google.gerrit.lucene.LuceneChangeIndex$QuerySource$1.call(LuceneChangeIndex.java:348)
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75)
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:113)
java...@17.0.19/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
java...@17.0.19/java.util.concurrent.FutureTask.run(FutureTask.java:264)
java...@17.0.19/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:699)
java...@17.0.19/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
java...@17.0.19/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
java...@17.0.19/java.lang.Thread.run(Thread.java:840)
```

I was trying to replicate this on a 3.13.6 container without any luck, mostly because we have above 7 million changes Lucene entries in production.
Basically I'm testing by remove/block anonymous access to refs/heads/* and refs/meta/version in All-Projects and creating lots of changes but I will probably not going to be able to reproduce this.

Thanks,
Nuno

mfick

unread,
May 29, 2026, 10:45:12 AM (3 days ago) May 29
to Repo and Gerrit Discussion
On Friday, May 29, 2026 at 7:30:55 AM UTC-4 Nuno Costa wrote:
We are now able to reproduce the problem in our QA environment by accessing the UI Changes >> Merged as an anonymous user.

Anonymous queries can be a problem for this reason.

At first glance it seems that Gerrit is not able to work in a full lock down mode.
 
I think you are correct. I believe at QCOM we used to force users to login to access Gerrit to avoid this issue. It is possible that Gerrit gained a setting or plugin to do this? If the site is truly locked down, then forcing users to login makes sense anyway?

 
Is this a bug of pagination not being applied to anonymous access, our specific setup, is working as expected/designed or something else?

Not really a bug, but an unfortunate expectation.

 -Martin

Nuno Costa

unread,
May 29, 2026, 11:26:27 AM (3 days ago) May 29
to Repo and Gerrit Discussion
Hi Martin, thanks for your feedback.

At first glance it seems that Gerrit is not able to work in a full lock down mode.
 
I think you are correct. I believe at QCOM we used to force users to login to access Gerrit to avoid this issue. It is possible that Gerrit gained a setting or plugin to do this? If the site is truly locked down, then forcing users to login makes sense anyway?

Yes, the goal is to have as much repositories without using anonymous due to: 1) security, obviously :) and 2) accountability in order to properly identify usage that can degrade Gerrit service to everyone.
At this point we are basically forced to install the login-redirect to avoid Gerrit service restarts every 3-4 days.
Not having a plugin would be preferable :)
  
Is this a bug of pagination not being applied to anonymous access, our specific setup, is working as expected/designed or something else?

Not really a bug, but an unfortunate expectation.

Yeah, after I replied to the email, I realized that you cant paginate results if there aren't any :)
At fist glance, it seems that some sort of timeout while searching would be the way to go.

Could server-side deadlines to an account[1] be a solution? Could we set this to "Anonymous Users"?

https://gerrit-documentation.storage.googleapis.com/Documentation/3.13.6/config-gerrit.html#deadline.id.account

Thanks,
Nuno

mfick

unread,
May 29, 2026, 11:53:14 AM (3 days ago) May 29
to Repo and Gerrit Discussion
On Friday, May 29, 2026 at 11:26:27 AM UTC-4 Nuno Costa wrote:
Hi Martin, thanks for your feedback.

At first glance it seems that Gerrit is not able to work in a full lock down mode.
 
I think you are correct. I believe at QCOM we used to force users to login to access Gerrit to avoid this issue. It is possible that Gerrit gained a setting or plugin to do this? If the site is truly locked down, then forcing users to login makes sense anyway?

Yes, the goal is to have as much repositories without using anonymous due to: 1) security, obviously :) and 2) accountability in order to properly identify usage that can degrade Gerrit service to everyone.
At this point we are basically forced to install the login-redirect to avoid Gerrit service restarts every 3-4 days.
Not having a plugin would be preferable :)

Agreed, this seems like something that should be a simple thing to just do in core, I would not object to such a submission.
   
Is this a bug of pagination not being applied to anonymous access, our specific setup, is working as expected/designed or something else?

Not really a bug, but an unfortunate expectation.

Yeah, after I replied to the email, I realized that you cant paginate results if there aren't any :)
At fist glance, it seems that some sort of timeout while searching would be the way to go.

Could server-side deadlines to an account[1] be a solution? Could we set this to "Anonymous Users"?

https://gerrit-documentation.storage.googleapis.com/Documentation/3.13.6/config-gerrit.html#deadline.id.account

Probably. I would also investigate if you could make the quota plugin apply here somehow, but I don't think it can yet, and you don't want another plugin, :(

-Martin 

Nuno Costa

unread,
May 29, 2026, 12:15:26 PM (3 days ago) May 29
to Repo and Gerrit Discussion
Could server-side deadlines to an account[1] be a solution? Could we set this to "Anonymous Users"?

Probably. I would also investigate if you could make the quota plugin apply here somehow, but I don't think it can yet, and you don't want another plugin, :(

We will soon start work on our Rate Limit implementation project and quota plugin is on the list so it is something we can check.

Is this topic worth a bug report?

mfick

unread,
May 29, 2026, 1:40:30 PM (3 days ago) May 29
to Repo and Gerrit Discussion
I don't think so? Generally, this is a known downside to queries, and we know how to reproduce it (to me that is the main purpose of a bug tracker in the open source world),

-Martin

Luca Milanesio

unread,
May 29, 2026, 3:54:19 PM (2 days ago) May 29
to Repo and Gerrit Discussion, Luca Milanesio

On 29 May 2026, at 19:40, 'mfick' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:

On Friday, May 29, 2026 at 12:15:26 PM UTC-4 Nuno Costa wrote:
Could server-side deadlines to an account[1] be a solution? Could we set this to "Anonymous Users"?

Probably. I would also investigate if you could make the quota plugin apply here somehow, but I don't think it can yet, and you don't want another plugin, :(

We will soon start work on our Rate Limit implementation project and quota plugin is on the list so it is something we can check.

Is this topic worth a bug report?

I don't think so?

You could raise it as improvement request.
The point is that Gerrit would try to serve “some user experience” to the anonymous user, but if the ACLs are set in a way that there is really nothing to show, you would just waste CPU cycles and HTTP threads just doing it.

Another option is to just redirect to a static spash page that you can install into the $GERRIT_SITE/static by welcoming the anonymous user and give directions to log in.

Generally, this is a known downside to queries, and we know how to reproduce it (to me that is the main purpose of a bug tracker in the open source world),

The point is: there is really nothing to query for those users, isn’t it?

Luca.
Reply all
Reply to author
Forward
0 new messages