Endless "status:merged" http request and endless lucene index query in Index-Interactive queue

274 views
Skip to first unread message

Chunlin Zhang

unread,
Aug 23, 2022, 4:28:54 AM8/23/22
to Repo and Gerrit Discussion
There have been quite a few threads [below link1-4] to discuss this kind of problem, the endless lucene query handling will consume more and more cpu and memory, make Gerrit hang sometimes no response, at last need to be restarted, now we need to restart Gerrit about once a week.
Check the example http request handling list(get from Statistics of JavaMelody monitoring) and the Index-Interactive queue items(get from show-queue -w -q output) in attatchments.
We can see that those continous lucene query handling of status:merged were caused by http request like "/changes/?q=status:merged&n=25&O=81 GET", there is already limit of 25, but don't know why lucene never return.
But in our server, not all this kind of request/query will block, there are many this kind of request everyday, but only several requests one day will block, for example in our Gerrit after about 1 week, there are about 40 this kind of blocking requests.I think the reason most of this kind of requests are "status:merged" query, is because the number of merged changes in our Gerrit are very large( the number of changes in our Gerrit have been 2360000+).
Many lucene threads were running on this line function: org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java)
Luca suggested to ask Enterprise Support of lucene in one mail thread[link 2], but it is difficult to reproduce, and this problem still exists in Gerrit V3 [link 3], so after I read about the related source code, I suggest to add timeout to the future result of index query task, so if the request timeout then abort the endless task to avoid resource leak.

The modification may like below patch, but may need improvement because there are a list of tasks, if timeout need to abort all tasks in the lists.

diff --git a/java/com/google/gerrit/lucene/LuceneChangeIndex.java b/java/com/google/gerrit/lucene/LuceneChangeIndex.java
index 551f675a32..d4c5f7d00f 100644
--- a/java/com/google/gerrit/lucene/LuceneChangeIndex.java
+++ b/java/com/google/gerrit/lucene/LuceneChangeIndex.java
@@ -441,7 +441,11 @@ public class LuceneChangeIndex implements ChangeIndex {
     @Override
     public ImmutableList<ChangeData> toList() {
       try {
-        List<Document> docs = future.get();
+        List<Document> docs = future.get(30*60, TimeUnit.SECONDS);
+        if (!future.isDone()) {
+          close();
+          throw new ExecutionException("Index query timeout, abort the request");
+        }
         ImmutableList.Builder<ChangeData> result =
             ImmutableList.builderWithExpectedSize(docs.size());
         for (Document doc : docs) {


queue_Index-Interactive.txt
long_http_thread_in_JavaMelody.csv

xin ma

unread,
Sep 13, 2022, 9:59:02 AM9/13/22
to Repo and Gerrit Discussion
We also encountered a similar problem, is there any way to solve it?
thanks

Matthias Sohn

unread,
Sep 13, 2022, 10:17:35 AM9/13/22
to Chunlin Zhang, Repo and Gerrit Discussion
Which version of Gerrit do you use ?

-Matthias 

xin ma

unread,
Sep 13, 2022, 9:38:35 PM9/13/22
to Repo and Gerrit Discussion
gerrit version 3.6.1

gerit config
[cache]
  h2CacheSize = 256g
  directory = cache
[cache "web_sessions"]
  maxAge = 1 mon
  diskLimit = 2g
[cache "ldap_groups"]
  maxAge = 5d
[cache "ldap_group_existence"]
  memoryLimit = 20
[cache "ldap_usernames"]
  memoryLimit = 1024
[cache "projects"]
  memoryLimit = 5000
[cache "diff"]
  memoryLimit = 1g
[cache "diff_intraline"]
  memoryLimit = 500m
[cache "plugin_resources"]
  memoryLimit = 10m
[cache "change_notes"]
  diskLimit = 512m
[cache "permission_sort"]
  memoryLimit = 8192
  diskLimit = 8192
[change]
  conflictsPredicateEnabled = false
  mergeabilityComputationBehavior = NEVER
  submitWholeTopic = true
[commentlink "changeid"]
  match = (I[0-9a-f]{8,40})
  link = "#/q/$1"
[container]
  user = git
  heapLimit = 800g
  javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
  javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
  javaHome = /usr/local/jdk/jdk-11.0.12
[core]
  packedGitLimit = 1g
  packedGitWindowSize = 16k
  packedGitOpenFiles = 32768
  streamFileThreshold = 2048m
[download]
  scheme = ssh
  scheme = repo
  scheme = http
  sshAddr = git.mioffice.cn
  checkForHiddenChangeRefs = true
[upload]
  allowGroup = Administrators
[execution]
  defaultThreadPoolSize = 3
[gc]
  startTime = Sat 00:30
  interval = 1 day
[hooks]
  path = /home/work/gerrit-site/hooks
  executorThreads = 10
[httpd]
  listenUrl = proxy-https://*:8088/
  acceptorThreads = 3
  maxThreads = 300
  maxQueued = 1000
  requestLog = true
[index]
  type = lucene
[index "changes_open"]
  ramBufferSize = 100 m
[index "changes_closed"]
  ramBufferSize = 100 m
[pack]
  threads = 1
[plugins]
  allowRemoteAdmin = true
  jsLoadTimeout = 7s
[plugin "javamelody"]
  allowTopMenu = true
[plugin "metrics-reporter-prometheus"]
  prometheusBearerTioken = token
[receive]
  enableSignedPush = false
  timeout = 4m
  maxObjectSizeLimit = 400m
[retry]
  maxWait = 3s
  timeout = 3s
[sshd]
  listenAddress = *:29418
  #commandStartThreads = 2
  maxConnectionsPerUser = 15
  maxAuthTries = 10000
  batchThreads = 64
  threads = 128
[database "h2"]
    autoServer = true

Chunlin Zhang

unread,
Sep 15, 2022, 1:43:42 AM9/15/22
to Matthias Sohn, Repo and Gerrit Discussion
And old version 2.14.20, but as I said someone reported this problem in Gerrit V3.
We found that kill the long runing http threads in JavaMelody can fix the problem  tempororily, now we kill once a week, and the cpu/memory will be down after killing.

Matthias Sohn <matthi...@gmail.com> 于2022年9月13日周二 22:17写道:

xin ma

unread,
Sep 15, 2022, 3:23:15 AM9/15/22
to Repo and Gerrit Discussion
Could you please tell me how to do it, I'm not very familiar with javamelody

Chunlin Zhang

unread,
Sep 15, 2022, 10:28:20 PM9/15/22
to xin ma, Repo and Gerrit Discussion
javamelody is a Gerrit plugin: https://www.gerritcodereview.com/plugins.html
You can kill the thread in the page like below screenshot, click the red button in the right side to kill:

image.png

xin ma <zaodi...@gmail.com> 于2022年9月15日周四 15:23写道:
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to a topic in the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/repo-discuss/EIIe18hVngo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/72a73fcc-cb95-4723-afd9-5ed5cb027313n%40googlegroups.com.

xin ma

unread,
Sep 16, 2022, 3:40:01 AM9/16/22
to Repo and Gerrit Discussion
thanks for your reply。

Chunlin Zhang

unread,
Mar 3, 2023, 1:42:08 AM3/3/23
to Repo and Gerrit Discussion
Now we found how to reproduce the problem:
- In a Gerrit which have a large number of private changes
- With a user, which have no read permission to these private, anonymous user also can reproduce
- try to query the status:merged, the request will take long time and at last get 50x error, and there will be a more endless http thread

But in a Gerrit of Google like partner-android-review.googlesource.com, if try to reproduce this problem, will get a error "Error 400: cannot exceed 10000 results (after filtering for visibility)", in here [1] Shawn Pearce explained:“To answer Johannes original question, the search index server limits at 10,000 raw results in our indexing system. Gerrit is then filtering changes based on ACLs to only those changes that are visible. A private change for example is going to be hidden from your results view. If there are 100 private changes in the first 10,000 results than you can only get 9,900 results back. Tweaking your query to have fewer results (e.g. scoping per project and issuing a query per project) will get you a larger set of changes, but each query is still limited to that 10,000 raw results.”
And here [2], someone said: “FWIW I think the cannot exceed 10000 results (after filtering for visibility) error is something specific to Google's internal version of Gerrit; that error message does not exist in the open source version.”
So my question is, is there anyone from Google can provide some instructions about how to limit the results Lucene query so we can avoid the problem?

Thanks!

Matthias Sohn

unread,
Mar 3, 2023, 3:49:09 AM3/3/23
to Chunlin Zhang, Repo and Gerrit Discussion
On Fri, Mar 3, 2023 at 7:42 AM Chunlin Zhang <zhangc...@gmail.com> wrote:
Now we found how to reproduce the problem:
- In a Gerrit which have a large number of private changes
- With a user, which have no read permission to these private, anonymous user also can reproduce
- try to query the status:merged, the request will take long time and at last get 50x error, and there will be a more endless http thread

But in a Gerrit of Google like partner-android-review.googlesource.com, if try to reproduce this problem, will get a error "Error 400: cannot exceed 10000 results (after filtering for visibility)", in here [1] Shawn Pearce explained:“To answer Johannes original question, the search index server limits at 10,000 raw results in our indexing system. Gerrit is then filtering changes based on ACLs to only those changes that are visible. A private change for example is going to be hidden from your results view. If there are 100 private changes in the first 10,000 results than you can only get 9,900 results back. Tweaking your query to have fewer results (e.g. scoping per project and issuing a query per project) will get you a larger set of changes, but each query is still limited to that 10,000 raw results.”
And here [2], someone said: “FWIW I think the cannot exceed 10000 results (after filtering for visibility) error is something specific to Google's internal version of Gerrit; that error message does not exist in the open source version.”
So my question is, is there anyone from Google can provide some instructions about how to limit the results Lucene query so we can avoid the problem?

You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/8d134391-e959-48fd-a7d8-46c4047dc914n%40googlegroups.com.

Chunlin Zhang

unread,
Mar 5, 2023, 8:44:12 PM3/5/23
to Matthias Sohn, Repo and Gerrit Discussion
Yes, I tried it a long time ago, and it didn't work.

Matthias Sohn <matthi...@gmail.com> 于2023年3月3日周五 16:49写道:

Martin Fick

unread,
Mar 6, 2023, 10:01:17 AM3/6/23
to Chunlin Zhang, Repo and Gerrit Discussion
On Thu, Mar 02, 2023 at 10:42:08PM -0800, Chunlin Zhang wrote:
> Now we found how to reproduce the problem:
> - In a Gerrit which have a large number of private changes
> - With a user, which have no read permission to these private, anonymous
> user also can reproduce
> - try to query the status:merged, the request will take long time and at
> last get 50x error, and there will be a more endless http thread
...

> So my question is, is there anyone from Google can provide *some
> instructions about how to limit the results Lucene query* so we can avoid
> the problem?

The latest versions of Gerrit have been modified to paginate these
queries internally to avoid this sort of issue, so upgrading to any tip
version of Gerrit 3.5.x or later is likely to solve this for you. We are
now able to stream over 4M changes in a query with our Gerrit 3.5.5
Lucene (and ES) setups without issues,

-Martin

Nasser Grainawi

unread,
Mar 6, 2023, 11:09:08 AM3/6/23
to Martin Fick, Chunlin Zhang, Repo and Gerrit Discussion
However, it may not be enough to just upgrade, you might want to switch from the default pagination mode of OFFSET to the newly added SEARCH_AFTER, which is much more optimized for deep pagination. See the New features section of the 3.5.3 release notes for pointers to the changes and configuration settings.
 


-Martin


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Chunlin Zhang

unread,
Mar 6, 2023, 8:51:24 PM3/6/23
to Martin Fick, Repo and Gerrit Discussion
Hi, Martin,
I still can get the error "Error 400: cannot exceed 10000 results (after filtering for visibility)" in Gerrit of google (3.7), which Shawn Pearce explained the cause in 2017-10, so I don't think the pagination you mentioned is same with the issue I am talking about, that issue happen before pagination.

Martin Fick <quic_...@quicinc.com> 于2023年3月6日周一 23:01写道:

Nasser Grainawi

unread,
Mar 7, 2023, 11:08:43 AM3/7/23
to Chunlin Zhang, Martin Fick, Repo and Gerrit Discussion
On Mon, Mar 6, 2023 at 6:51 PM Chunlin Zhang <zhangc...@gmail.com> wrote:
Hi, Martin,
I still can get the error "Error 400: cannot exceed 10000 results (after filtering for visibility)" in Gerrit of google (3.7), which Shawn Pearce explained the cause in 2017-10, so I don't think the pagination you mentioned is same with the issue I am talking about, that issue happen before pagination.

You would need 3.7.1 to have the changes Martin is referring to. Please also try to interleave your responses instead of top-posting.
 

Martin Fick <quic_...@quicinc.com> 于2023年3月6日周一 23:01写道:
On Thu, Mar 02, 2023 at 10:42:08PM -0800, Chunlin Zhang wrote:
> Now we found how to reproduce the problem:
> - In a Gerrit which have a large number of private changes
> - With a user, which have no read permission to these private, anonymous
> user also can reproduce
> - try to query the status:merged, the request will take long time and at
> last get 50x error, and there will be a more endless http thread
...

> So my question is, is there anyone from Google can provide *some
> instructions about how to limit the results Lucene query* so we can avoid
> the problem?

The latest versions of Gerrit have been modified to paginate these
queries internally to avoid this sort of issue, so upgrading to any tip
version of Gerrit 3.5.x or later is likely to solve this for you. We are
now able to stream over 4M changes in a query with our Gerrit 3.5.5
Lucene (and ES) setups without issues,

-Martin

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Chunlin Zhang

unread,
Mar 7, 2023, 9:10:02 PM3/7/23
to Repo and Gerrit Discussion, Martin Fick


Nasser Grainawi <nasser....@linaro.org> 于2023年3月8日周三 00:08写道:


On Mon, Mar 6, 2023 at 6:51 PM Chunlin Zhang <zhangc...@gmail.com> wrote:
Hi, Martin,
I still can get the error "Error 400: cannot exceed 10000 results (after filtering for visibility)" in Gerrit of google (3.7), which Shawn Pearce explained the cause in 2017-10, so I don't think the pagination you mentioned is same with the issue I am talking about, that issue happen before pagination.

You would need 3.7.1 to have the changes Martin is referring to. Please also try to interleave your responses instead of top-posting.
I will try with the version 3.7.1 to see whether it works and feedback later. But what I ask is about the method to limit Lucene querying of 10000 which have been used in Google's Gerrit before 2017-10, I think this method will prevent the issue fundamentally. 
Reply all
Reply to author
Forward
0 new messages