Too many open files since 2.15

1,408 views
Skip to first unread message

Tech Advantage

unread,
May 2, 2018, 2:49:35 PM5/2/18
to Repo and Gerrit Discussion
Hi,

I've migrated to 2.15 (then 2.15.1) and NoteDB.
Since then, every 2 days or so, I got "Too many open files" 100 times per second, filling the error log.

[2018-05-02 17:01:21,918] [HTTP-17404] WARN  org.eclipse.jetty.io.ManagedSelector : Accept failed for channel null
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at org.eclipse.jetty.io.ManagedSelector.processAccept(ManagedSelector.java:393)
at org.eclipse.jetty.io.ManagedSelector.access$1000(ManagedSelector.java:56)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.processSelected(ManagedSelector.java:295)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:181)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:249)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)


limits.conf is set to 8192 for user gerrit2

gerrit2 soft nofile 8192
gerrit2 hard nofile 8192


lsof displays a lot of objects/pack and objects/pack/pack-XX.pack handles for all projects in gerrit.

java    14695 gerrit2  417r      DIR                8,1     4096  131798 /tech_datas/gerrit/git/project1.git/objects/pack
java    14695 gerrit2  418r      DIR                8,1     4096  131798 /tech_datas/gerrit/git/project1.git/objects/pack
java    14695 gerrit2  420r      REG                8,1  1322857  393423 /tech_datas/gerrit/git/All-Users.git/objects/pack/pack-04b17c435cdce6ca41c8973900c0530ca584bb13.pack
java    14695 gerrit2  422r      REG                8,1 18339148      22 /tech_datas/gerrit/git/project2.git/objects/pack/pack-55cbe44ed8a17afe462db9c12df6b52759a3afdc.pack
java    14695 gerrit2  423r      REG                8,1 30283737     203 /tech_datas/gerrit/git/project2.git/objects/pack/pack-642db5e411c54fc03498098ba14fc91b531fb50e.pack

Garbage collection is schedule every hour.

What can I do to avoid this ?

Thanks,
IG

Matthias Sohn

unread,
May 2, 2018, 3:19:15 PM5/2/18
to Tech Advantage, Repo and Gerrit Discussion
how many of them are held by the gerrit process, how many by other processes ?
 
Garbage collection is schedule every hour.

What can I do to avoid this ?

Luca Milanesio

unread,
May 2, 2018, 4:15:16 PM5/2/18
to Matthias Sohn, Tech Advantage, Luca Milanesio, Repo and Gerrit Discussion
Bear in mind as well my feedback on All-Users: that single repo is responsible for a peaking in the number of open files.
Changing the GC policy on that repo reduced by a 3x factor the problems on GerritHub.io.

Luca.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martin Fick

unread,
May 2, 2018, 4:17:59 PM5/2/18
to repo-d...@googlegroups.com, Luca Milanesio, Matthias Sohn, Tech Advantage, Luca Milanesio
On Wednesday, May 02, 2018 09:15:07 PM Luca Milanesio wrote:
> Bear in mind as well my feedback on All-Users: that single
> repo is responsible for a peaking in the number of open
> files.

While this may be true, is there something new about this
setup that would make Gerrit hold open more files than the
ulimit currently set by the startup script? If so, then we
should consider improving the startup script to account for
this.

-Martin

--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

Luca Milanesio

unread,
May 2, 2018, 4:23:20 PM5/2/18
to Martin Fick, Luca Milanesio, repo-d...@googlegroups.com, Matthias Sohn, Tech Advantage


> On 2 May 2018, at 21:17, Martin Fick <mf...@codeaurora.org> wrote:
>
> On Wednesday, May 02, 2018 09:15:07 PM Luca Milanesio wrote:
>> Bear in mind as well my feedback on All-Users: that single
>> repo is responsible for a peaking in the number of open
>> files.
>
> While this may be true, is there something new about this
> setup that would make Gerrit hold open more files than the
> ulimit currently set by the startup script?

Oh yes, you need to check if you have any error message when you start Gerrit saying that the ulimits failed.

P.S. The failure of setting the ulimits at startup should cause Gerrit to fail the script IMHO, whilst now it continues with the risk of running of out of file descriptors :-(

Tech Advantage

unread,
May 3, 2018, 10:52:28 AM5/3/18
to Repo and Gerrit Discussion
Actually after verification, the limits.conf setting was overridden by the startup script which set GERRIT_FDS based on core.packedOpenGitFiles which was not set.
I set core.packedOpenGitFiles to 4096 : the script then multiply by 2 and set NOFILES to 8192.

I'll monitor if that fixes my issue.

Tech Advantage

unread,
May 3, 2018, 12:29:03 PM5/3/18
to Repo and Gerrit Discussion
I've restarted gerrit at 08:16 this morning.
At 18:10, I have around 1250 entries in lsof.

Most of them are objects/pack folders.
Currently 20 by projects, including All-projects and All-users repositories.
.pack files handles represents only a dozen of handles, all appearing once in lsof.
 
$ lsof -F -u gerrit2 |grep '/base/gerrit/git'| sort | uniq -c
     20 n/base/gerrit/git/projectX/objects/pack
     20 n/base/gerrit/git/projectY.git/objects/pack

With 50 projects, this is 1000 handles currently used.

I see no correlation with the number of users accessing gerrit. 
Cache is effective and it shows 14 open files.
Flushing all caches does not drop the handle usage.

Gerrit Code Review        2.15.1                    now   18:10:54   CEST
                                                 uptime     9 hrs 53 min

  Name                          |Entries              |  AvgGet |Hit Ratio|
                                |   Mem   Disk   Space|         |Mem  Disk|
--------------------------------+---------------------+---------+---------+
  accounts                      |    57               |   1.3ms | 99%     |
  adv_bases                     |                     |         |100%     |
  change_notes                  |   193               | 810.7us | 93%     |
  changeid_project              |   116               |         | 93%     |
  changes                       |                     |  52.9ms |  0%     |
  groups                        |                     |         |         |
  groups_bymember               |    26               |   5.6ms | 99%     |
  groups_byname                 |                     |         |         |
  groups_bysubgroup             |    16               | 548.8us | 99%     |
  groups_byuuid                 |    45               |   9.0ms | 99%     |
  groups_external               |     1               |  40.9ms | 99%     |
  groups_subgroups              |                     |         |         |
  ldap_group_existence          |     1               |   3.4ms | 50%     |
  ldap_groups                   |    36               |   3.9ms | 99%     |
  ldap_groups_byinclude         |    89               |         | 99%     |
  ldap_usernames                |     7               | 310.3us | 68%     |
  permission_sort               |   258               |         | 99%     |
  plugin_resources              |     6               |         | 98%     |
  project_list                  |     1               |   3.4ms | 99%     |
  projects                      |    50               |   5.6ms | 99%     |
  sshkeys                       |     2               |  36.9ms | 99%     |
  static_content                |    21               |   2.0ms | 84%     |
  lfs-lfs_project_locks         |                     |         |         |
D change_kind                   |   556  19590   8.86m|   4.6ms | 98% 100%|
D conflicts                     |    24    293 269.54k|         | 74% 100%|
D diff                          |   162  10918  39.64m|   4.6ms | 92% 100%|
D diff_intraline                |   166   1818   2.48m|  33.6ms | 22% 100%|
D diff_summary                  |   194  10732  10.69m|   3.8ms | 92% 100%|
D git_tags                      |     1      5 289.41k|         |  0% 100%|
D mergeability                  |   197   4065   2.74m|  55.3ms | 80% 100%|
D oauth_tokens                  |                0.00k|         |         |
D web_sessions                  |    35    555 224.55k|         | 99%   2%|

SSH:      2  users, oldest session started   0 ms ago
Tasks:    3  total =    1 running +      0 ready +    2 sleeping
Mem: 2.25g total = 885.64m used + 1.39g free + 3.34m buffers
     5.33g max
          14 open files

Threads: 2 CPUs available, 84 threads

I don't know if setting the NOFILES that high is a real correction or if its just delaying the issue.

IG.

Luca Milanesio

unread,
May 3, 2018, 12:31:32 PM5/3/18
to Tech Advantage, Luca Milanesio, Repo and Gerrit Discussion
Have you JavaMelody installed? Can you share the "Open files" graph?

Luca.

Martin Fick

unread,
May 3, 2018, 12:39:05 PM5/3/18
to repo-d...@googlegroups.com, Tech Advantage
Top posting is making it hard to follow this thread...

On Thursday, May 03, 2018 09:29:02 AM Tech Advantage wrote:
> I've restarted gerrit at 08:16 this morning.
> At 18:10, I have around 1250 entries in lsof.

Is this something you consider bad or normal?

> Most of them are objects/pack folders.

As in directories being held open, or do you mean files under
those directories?

Tech Advantage

unread,
May 3, 2018, 12:45:21 PM5/3/18
to Repo and Gerrit Discussion


Le jeudi 3 mai 2018 18:39:05 UTC+2, MartinFick a écrit :
Top posting is making it hard to follow this thread...


Sorry.
 
On Thursday, May 03, 2018 09:29:02 AM Tech Advantage wrote:
> I've restarted gerrit at 08:16 this morning.
> At 18:10, I have around 1250 entries in lsof.

Is this something you consider bad or normal?

bad as it's a new behavior of 2.15 I didn't oversee.
 

> Most of them are objects/pack folders.

As in directories being held open, or do you mean files under
those directories?


Yes, directories being held opened.
java    31901 gerrit2  294r      DIR                8,1     4096   139740 /base/gerrit/git/All-Projects.git/objects/pack

.pack file are listed as REG
java    31901 gerrit2 1198r      REG                8,1  1322857   401349 /base/gerrit/git/All-Users.git/objects/pack/pack-04b17c435cdce6ca41c8973900c0530ca584bb13.pack

Tech Advantage

unread,
May 3, 2018, 12:55:54 PM5/3/18
to Repo and Gerrit Discussion


Le jeudi 3 mai 2018 18:31:32 UTC+2, lucamilanesio a écrit :
Have you JavaMelody installed? Can you share the "Open files" graph?

I don't.
I just installed it so I will be able to get back to you with a nice graph by tomorrow.
 
Luca.

Tech Advantage

unread,
May 4, 2018, 2:10:12 AM5/4/18
to Repo and Gerrit Discussion


Le jeudi 3 mai 2018 18:31:32 UTC+2, lucamilanesio a écrit :
Have you JavaMelody installed? Can you share the "Open files" graph?

It adds a hundred of handle every hour.

 
Luca.

luca.mi...@gmail.com

unread,
May 4, 2018, 2:28:30 AM5/4/18
to Tech Advantage, Repo and Gerrit Discussion


Sent from my iPhone

On 4 May 2018, at 07:10, Tech Advantage <a...@tech-advantage.com> wrote:



Le jeudi 3 mai 2018 18:31:32 UTC+2, lucamilanesio a écrit :
Have you JavaMelody installed? Can you share the "Open files" graph?

It adds a hundred of handle every hour.


Oh yes, that graph is very worrying :-O
Do you have any cronjob? The growth seems very regular and exactly scheduled every hour !

Luca

To unsubscribe, email repo-discuss...@googlegroups.com

Tech Advantage

unread,
May 4, 2018, 2:43:49 AM5/4/18
to Repo and Gerrit Discussion


Le vendredi 4 mai 2018 08:28:30 UTC+2, lucamilanesio a écrit :


Sent from my iPhone

On 4 May 2018, at 07:10, Tech Advantage <a...@tech-advantage.com> wrote:



Le jeudi 3 mai 2018 18:31:32 UTC+2, lucamilanesio a écrit :
Have you JavaMelody installed? Can you share the "Open files" graph?

It adds a hundred of handle every hour.


Oh yes, that graph is very worrying :-O
Do you have any cronjob? The growth seems very regular and exactly scheduled every hour !

No system cronjob.
In gerrit settings, the only explicit schduled actions are GC every hour, accountDeactivation every day and cache maxAge set to 1 hour.
From the client side, there is a jenkins CI running but it's polling every 3 minutes, else it's basic activity, and mostly no use by night ...


[accountDeactivation]
    startTime = 01:05
    interval = 1 day

[cache "accounts"]
    maxAge = 1 hour
[cache "accounts_byemail"]
    maxAge = 1 hour
[cache "groups"]
    maxAge = 1 hour
[cache "groups_byinclude"]
    maxAge = 1 hour
[cache "ldap_usernames"]
    maxAge = 1 hour
[cache "ldap_groups"]
    maxAge = 1 hour
[cache "ldap_groups_byinclude"]
    maxAge = 1 hour
[cache "web_sessions"]
    maxAge = 8 hours

[gc]
    startTime = Mon 06:00
    interval = 1 hour


 

Luca Milanesio

unread,
May 4, 2018, 2:53:24 AM5/4/18
to Tech Advantage, Luca Milanesio, Repo and Gerrit Discussion
Can you try to disable the [gc] stanza in Gerrit?
You could have actually just found a bug in Gerrit GC scheduler in 2.15.x ;-)

P.S. Can you confirm the *exact* version you are running?

Sven Selberg

unread,
May 4, 2018, 2:53:57 AM5/4/18
to Repo and Gerrit Discussion
Just stating the obvious:
* 50 repositories
* 1 GC/hour
* 100 new file handles / hour

Which would indicate 2 new file handles for each GC (or other hourly action).

Is this a production server with traffic or is it a staging server with hardly no traffic?
Which file system?

Luca Milanesio

unread,
May 4, 2018, 2:56:30 AM5/4/18
to Sven Selberg, Luca Milanesio, Repo and Gerrit Discussion

On 4 May 2018, at 07:53, Sven Selberg <sven.s...@axis.com> wrote:

Just stating the obvious:
* 50 repositories
* 1 GC/hour
* 100 new file handles / hour

Which would indicate 2 new file handles for each GC (or other hourly action).

Yeah, that sounds like a bug to me :-)

To be honest with you, we don't use the Gerrit's GC schedule and never recommend our users to configure it :-(
We should instead write a "GC Plugin" that allows to tailor the GC schedule needs to the repository incoming traffic ;-)

... but that's a completely different story, let's focus on the bug now.

Tech Advantage

unread,
May 4, 2018, 3:15:39 AM5/4/18
to Repo and Gerrit Discussion
Done. I add it since I saw another post stating a somewhat expected increase of usage of the All-User.git repo.
Also there is a message in the startup logs about a missing gc schedule.
 
P.S. Can you confirm the *exact* version you are running?
 
Official 2.15.1 release with the following plugins :
[2018-05-04 09:10:30,246] [main] INFO  org.eclipse.jetty.util.log : Logging initialized @7747ms
[2018-05-04 09:10:30,625] [main] INFO  com.google.gerrit.server.git.LocalDiskRepositoryManager : Defaulting core.streamFileThreshold to 1366m
[2018-05-04 09:10:30,662] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loading plugins from /tech_datas/gerrit/plugins
[2018-05-04 09:10:30,725] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin commit-message-length-validator, version v2.15.1
[2018-05-04 09:10:30,791] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin delete-project, version v2.13-38-gf4c9b76c51
[2018-05-04 09:10:30,876] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin download-commands, version v2.15.1
[2018-05-04 09:10:30,974] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin hooks, version v2.15.1
[2018-05-04 09:10:31,028] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin its-base, version dbe3c05592
[2018-05-04 09:10:31,077] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin its-jira, version v2.14-30-g237689d483
[2018-05-04 09:10:31,140] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin javamelody, version v2.14-35-g04b8764
[2018-05-04 09:10:31,225] [main] INFO  com.google.gerrit.server.config.PluginConfigFactory : No /tech_datas/gerrit/etc/lfs.config; assuming defaults
[2018-05-04 09:10:31,245] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin lfs, version 5be6bcc9e1
[2018-05-04 09:10:31,330] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin replication, version v2.15.1
[2018-05-04 09:10:31,353] [main] INFO  com.google.gerrit.server.config.PluginConfigFactory : No /tech_datas/gerrit/etc/reviewers.config; assuming defaults
[2018-05-04 09:10:31,372] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin reviewers, version 905fc9b0de
[2018-05-04 09:10:31,408] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin reviewnotes, version v2.15.1
[2018-05-04 09:10:31,432] [main] INFO  com.google.gerrit.server.plugins.PluginLoader : Loaded plugin singleusergroup, version v2.15.1

[2018-05-04 09:10:31,799] [main] INFO  com.google.gerrit.server.git.GarbageCollectionRunner : Ignoring missing gc schedule configuration
[2018-05-04 09:10:31,842] [main] INFO  com.google.gerrit.server.change.ChangeCleanupRunner : Ignoring missing changeCleanup schedule configuration
[2018-05-04 09:10:31,873] [main] INFO  com.google.gerrit.sshd.SshDaemon : Started Gerrit SSHD-CORE-1.6.0 on *:29418
[2018-05-04 09:10:31,875] [main] INFO  org.eclipse.jetty.server.Server : jetty-9.3.18.v20170406
[2018-05-04 09:10:32,428] [main] INFO  org.eclipse.jetty.server.handler.ContextHandler : Started o.e.j.s.ServletContextHandler@36c45b54{/,null,AVAILABLE}
[2018-05-04 09:10:32,436] [main] INFO  org.eclipse.jetty.server.AbstractConnector : Started ServerConnector@2140de63{HTTP/1.1,[http/1.1]}{0.0.0.0:8081}
[2018-05-04 09:10:32,437] [main] INFO  org.eclipse.jetty.server.Server : Started @9939ms
[2018-05-04 09:10:32,438] [main] INFO  com.google.gerrit.pgm.Daemon : Gerrit Code Review 2.15.1 ready



 

Tech Advantage

unread,
May 4, 2018, 3:25:36 AM5/4/18
to Repo and Gerrit Discussion


Le vendredi 4 mai 2018 08:56:30 UTC+2, lucamilanesio a écrit :


On 4 May 2018, at 07:53, Sven Selberg <sven.s...@axis.com> wrote:

Just stating the obvious:
* 50 repositories
* 1 GC/hour
* 100 new file handles / hour

Which would indicate 2 new file handles for each GC (or other hourly action).

More precisely, as I told MartinFick in an answer to a previous message, file handles are directory handles pointing at objects/pack directories.
gerrit/git/sandbox.git/objects/pack

 
Yeah, that sounds like a bug to me :-)

To be honest with you, we don't use the Gerrit's GC schedule and never recommend our users to configure it :-(
We should instead write a "GC Plugin" that allows to tailor the GC schedule needs to the repository incoming traffic ;-)

... but that's a completely different story, let's focus on the bug now.

 
here's an informational message on startup about gc schedule not beeing set.

[2018-05-04 09:10:31,799] [main] INFO  com.google.gerrit.server.git.GarbageCollectionRunner : Ignoring missing gc schedule configuration

I take it as an invitation to set a gc schedule.


Is this a production server with traffic or is it a staging server with hardly no traffic?
Which file system?
This is a production server with low trafic : around 30 users, less than a hundred changes a day, mostly connecting over http behind a proxy-https (the reverse proxy is not hosted on the same machine).
OS is an up-to-date debian 9 as a VMWare guest, 2 vCPU and 8GB RAM.
FS is ext4.

Duft Markus

unread,
May 4, 2018, 3:39:18 AM5/4/18
to Tech Advantage, Repo and Gerrit Discussion

Uh oh. Just had a look at our production 2.15.1 instance:

 

[root@git site]# lsof | grep objects/pack | grep DIR | wc -l

41800

 

We’re not using Gerrit.sh but java -jar gerrit.war directly in a docker container.

 

[gc]

    aggressive = true

    startTime = 23:00

    interval = 1 day

 

the container running Gerrit is “Up 9 days” – that’s not too much… I would love to not be required to restart Gerrit every week xD

 

Cheers,

Markus


SSI Schäfer IT Solutions GmbH | Friesachstrasse 15 | 8114 Friesach | Austria
Registered Office: Friesach | Commercial Register: 49324 K | VAT no. ATU28654300
Commercial Court: Landesgericht für Zivilrechtssachen Graz

Luca Milanesio

unread,
May 4, 2018, 4:01:10 AM5/4/18
to Duft Markus, Luca Milanesio, Tech Advantage, Repo and Gerrit Discussion

On 4 May 2018, at 08:39, Duft Markus <Marku...@ssi-schaefer.com> wrote:

Uh oh. Just had a look at our production 2.15.1 instance:
 
[root@git site]# lsof | grep objects/pack | grep DIR | wc -l
41800

yeah, that's NOT nice at all.

We have  been running 2.15.1 for almost two weeks now and we have no issues with leaked file descriptors.

Our numbers:
- 14k active users
- 40k repositories
- 1k changes/day

*BUT* we don't do GC using Gerrit's scheduler but instead we have custom-made cronjob that uses some metrics to understand what *should be GCed* and what *shouldn't*.
What's different in Gerrit 2.15.1 that we forced the All-Users.git to be fully GCed every 30', otherwise the number of files goes up a lot and the overall latency increases.

See how typical graph below:

Saša Živkov

unread,
May 4, 2018, 4:01:48 AM5/4/18
to Tech Advantage, Repo and Gerrit Discussion
On Fri, May 4, 2018 at 8:10 AM, Tech Advantage <a...@tech-advantage.com> wrote:


Le jeudi 3 mai 2018 18:31:32 UTC+2, lucamilanesio a écrit :
Have you JavaMelody installed? Can you share the "Open files" graph?

It adds a hundred of handle every hour.
 
Have you tried collecting lsof output every hour and diff-ing them to find out new opened files?


--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

Sven Selberg

unread,
May 4, 2018, 4:55:59 AM5/4/18
to Repo and Gerrit Discussion


On Friday, May 4, 2018 at 9:25:36 AM UTC+2, Tech Advantage wrote:


Le vendredi 4 mai 2018 08:56:30 UTC+2, lucamilanesio a écrit :


On 4 May 2018, at 07:53, Sven Selberg <sven.s...@axis.com> wrote:

Just stating the obvious:
* 50 repositories
* 1 GC/hour
* 100 new file handles / hour

Which would indicate 2 new file handles for each GC (or other hourly action).

More precisely, as I told MartinFick in an answer to a previous message, file handles are directory handles pointing at objects/pack directories.
gerrit/git/sandbox.git/objects/pack

 
Yeah, that sounds like a bug to me :-)

To be honest with you, we don't use the Gerrit's GC schedule and never recommend our users to configure it :-(
We should instead write a "GC Plugin" that allows to tailor the GC schedule needs to the repository incoming traffic ;-)

... but that's a completely different story, let's focus on the bug now.

 
here's an informational message on startup about gc schedule not beeing set.
[2018-05-04 09:10:31,799] [main] INFO  com.google.gerrit.server.git.GarbageCollectionRunner : Ignoring missing gc schedule configuration

I take it as an invitation to set a gc schedule.


Is this a production server with traffic or is it a staging server with hardly no traffic?
Which file system?
This is a production server with low trafic : around 30 users, less than a hundred changes a day, mostly connecting over http behind a proxy-https (the reverse proxy is not hosted on the same machine).

Not a solution, but with that kind of load it should be enough with a GC once a week. That way you should at least mitigate the symptoms and you wouldn't have to restart so often until it's solved.

Tech Advantage

unread,
May 4, 2018, 6:25:00 AM5/4/18
to Repo and Gerrit Discussion


Le vendredi 4 mai 2018 10:55:59 UTC+2, Sven Selberg a écrit :


On Friday, May 4, 2018 at 9:25:36 AM UTC+2, Tech Advantage wrote:


Le vendredi 4 mai 2018 08:56:30 UTC+2, lucamilanesio a écrit :


On 4 May 2018, at 07:53, Sven Selberg <sven.s...@axis.com> wrote:

Just stating the obvious:
* 50 repositories
* 1 GC/hour
* 100 new file handles / hour

Which would indicate 2 new file handles for each GC (or other hourly action).

More precisely, as I told MartinFick in an answer to a previous message, file handles are directory handles pointing at objects/pack directories.
gerrit/git/sandbox.git/objects/pack

 
Yeah, that sounds like a bug to me :-)

To be honest with you, we don't use the Gerrit's GC schedule and never recommend our users to configure it :-(
We should instead write a "GC Plugin" that allows to tailor the GC schedule needs to the repository incoming traffic ;-)

... but that's a completely different story, let's focus on the bug now.

 
here's an informational message on startup about gc schedule not beeing set.
[2018-05-04 09:10:31,799] [main] INFO  com.google.gerrit.server.git.GarbageCollectionRunner : Ignoring missing gc schedule configuration

I take it as an invitation to set a gc schedule.


Is this a production server with traffic or is it a staging server with hardly no traffic?
Which file system?
This is a production server with low trafic : around 30 users, less than a hundred changes a day, mostly connecting over http behind a proxy-https (the reverse proxy is not hosted on the same machine).

Not a solution, but with that kind of load it should be enough with a GC once a week. That way you should at least mitigate the symptoms and you wouldn't have to restart so often until it's solved.

Removed GC configuration at restart.
Now the graph is flat.
 



Luca Milanesio

unread,
May 4, 2018, 6:30:28 AM5/4/18
to Tech Advantage, Luca Milanesio, Repo and Gerrit Discussion
*MMUUUUCH BETTER* now :-)

Can you raise a bug?

This looks to me like a P0 and we should FIX IT ASAP before releasing Gerrit v2.15.2 IMHO

Tech Advantage

unread,
May 4, 2018, 6:36:50 AM5/4/18
to Repo and Gerrit Discussion

Makson Lee

unread,
May 26, 2018, 5:41:50 AM5/26/18
to Repo and Gerrit Discussion
have the same issue after upgraded from 2.14.4 to 2.15.2, running gerrit gc will cause too many open files now.
Garbage collection is schedule every hour.

What can I do to avoid this ?

Thanks,
IG

David Hayes

unread,
May 30, 2018, 6:27:17 AM5/30/18
to Repo and Gerrit Discussion
Seeing this also on 2.15.1 after an upgrade from 2.14.3.

Whilst the bug is being worked, I've disabled GC in the Gerrit config. For an interim workaround, is there any danger involved in performing GC using git via the command line, rather than via Gerrit?

Marcelo Ávila de Oliveira

unread,
Jun 5, 2018, 3:34:23 PM6/5/18
to davi...@gmail.com, Repo and Gerrit Discussion
Em qua, 30 de mai de 2018 às 07:27, David Hayes <davi...@gmail.com> escreveu:
Whilst the bug is being worked, I've disabled GC in the Gerrit config. For an interim workaround, is there any danger involved in performing GC using git via the command line, rather than via Gerrit?
 
I did the same... anyone has an answer for that question? It's better to execute "git gc" or simply do nothing?

David Ostrovsky

unread,
Jun 5, 2018, 5:07:49 PM6/5/18
to Repo and Gerrit Discussion
Until this issue is resolved use git gc from the cron job.

Note, that even without this issue, some gerrit site admins
prefer git gc over gerrit gc to take the whole garbage
collection out of the JVM process.

Matthew Webber

unread,
Jun 6, 2018, 4:20:27 AM6/6/18
to Repo and Gerrit Discussion
On Tuesday, 5 June 2018 22:07:49 UTC+1, David Ostrovsky wrote:
Until this issue is resolved use git gc from the cron job.

Note, that even without this issue, some gerrit site admins
prefer git gc over gerrit gc to take the whole garbage
collection out of the JVM process.

Also see the thread "Is jgit gc safe to use?" from Mar 19 this year: https://groups.google.com/forum/#!topic/repo-discuss/-anZ5hpBJqY

 

Makson Lee

unread,
Jun 6, 2018, 9:24:37 PM6/6/18
to Repo and Gerrit Discussion
So which JGit version would you recommend to use if we want to do a JGit gc outside Gerrit?

Jonathan Nieder

unread,
Jun 6, 2018, 9:32:48 PM6/6/18
to Makson Lee, Repo and Gerrit Discussion
I recommend using either 4.11.0.201803080745-r or 5.0.0.201806050710-rc3. I also recommend filing bugs for any troubles you run into at https://www.eclipse.org/jgit/support/. :)

ср, 6 июн. 2018 г. в 18:24, Makson Lee <cdle...@gmail.com>:
--

Makson Lee

unread,
Jun 6, 2018, 9:42:45 PM6/6/18
to Repo and Gerrit Discussion
Thanks, i will try it.

Doug Robinson

unread,
Jun 7, 2018, 5:17:21 PM6/7/18
to Repo and Gerrit Discussion
Jonathan:


On Wednesday, June 6, 2018 at 9:32:48 PM UTC-4, Jonathan Nieder wrote:
I recommend using either 4.11.0.201803080745-r or 5.0.0.201806050710-rc3. I also recommend filing bugs for any troubles you run into at https://www.eclipse.org/jgit/support/. :)

Which version(s) of Gerrit would this be compatible with?  2.11?  2.13?  2.?

Cheers!

Doug

Jonathan Nieder

unread,
Jun 7, 2018, 5:21:57 PM6/7/18
to Doug Robinson, Repo and Gerrit Discussion
> Which version(s) of Gerrit would this be compatible with?  2.11?  2.13?  2.?

If you're running jgit gc as a standalone tool, then it's compatible with all versions of Gerrit.

чт, 7 июн. 2018 г. в 14:17, 'Doug Robinson' via Repo and Gerrit Discussion <repo-d...@googlegroups.com>:

The LIVE DATA Company
Find out more wandisco.com

THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED

If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone. Any distribution, use or copying of this email or the information it contains by other than an intended recipient is unauthorized. The views and opinions expressed in this email message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf. All email sent to or from this address is subject to electronic storage and review by WANdisco. Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

Doug Robinson

unread,
Jun 7, 2018, 5:31:12 PM6/7/18
to Repo and Gerrit Discussion
Jonathan:


On Thursday, June 7, 2018 at 5:21:57 PM UTC-4, Jonathan Nieder wrote:
> Which version(s) of Gerrit would this be compatible with?  2.11?  2.13?  2.?

If you're running jgit gc as a standalone tool, then it's compatible with all versions of Gerrit.

Excellent!  Cheers!

Doug

lucamilanesio

unread,
Jun 8, 2018, 5:38:59 PM6/8/18
to Repo and Gerrit Discussion
This is a JGit issue, found and fixed by Dave Borowitz in v4.11 (see [1]).
I will try to upgrade JGit to v4.11 on Gerrit v2.15.x and let's see if the situation is improved :-)

Luca.

David Pursehouse

unread,
Jun 9, 2018, 4:36:56 AM6/9/18
to lucamilanesio, Repo and Gerrit Discussion
On Sat, Jun 9, 2018 at 6:39 AM lucamilanesio <luca.mi...@gmail.com> wrote:
This is a JGit issue, found and fixed by Dave Borowitz in v4.11 (see [1]).
I will try to upgrade JGit to v4.11 on Gerrit v2.15.x and let's see if the situation is improved :-)


I've cherry-picked that commit back to stable-4.9 on jgit:


If this does fix the problem, maybe we can upgrade gerrit stable-2.15 to use a new snapshot built off jgit stable-4.9 to avoid upgrading all the way to 4.11.

 


On Tuesday, June 5, 2018 at 10:07:49 PM UTC+1, David Ostrovsky wrote:

On Tuesday, June 5, 2018 at 9:34:23 PM UTC+2, Marcelo Ávila de Oliveira wrote:
Em qua, 30 de mai de 2018 às 07:27, David Hayes <davi...@gmail.com> escreveu:
Whilst the bug is being worked, I've disabled GC in the Gerrit config. For an interim workaround, is there any danger involved in performing GC using git via the command line, rather than via Gerrit?
 
I did the same... anyone has an answer for that question? It's better to execute "git gc" or simply do nothing?

Until this issue is resolved use git gc from the cron job.

Note, that even without this issue, some gerrit site admins
prefer git gc over gerrit gc to take the whole garbage
collection out of the JVM process.

luca.mi...@gmail.com

unread,
Jun 9, 2018, 5:54:17 AM6/9/18
to David Pursehouse, Repo and Gerrit Discussion


Sent from my iPhone

On 9 Jun 2018, at 09:36, David Pursehouse <david.pu...@gmail.com> wrote:

On Sat, Jun 9, 2018 at 6:39 AM lucamilanesio <luca.mi...@gmail.com> wrote:
This is a JGit issue, found and fixed by Dave Borowitz in v4.11 (see [1]).
I will try to upgrade JGit to v4.11 on Gerrit v2.15.x and let's see if the situation is improved :-)


I've cherry-picked that commit back to stable-4.9 on jgit:


If this does fix the problem, maybe we can upgrade gerrit stable-2.15 to use a new snapshot built off jgit stable-4.9 to avoid upgrading all the way to 4.11.

+1

David Pursehouse

unread,
Jun 9, 2018, 6:21:52 AM6/9/18
to lucamilanesio, Repo and Gerrit Discussion
On Sat, Jun 9, 2018 at 5:36 PM David Pursehouse <david.pu...@gmail.com> wrote:
On Sat, Jun 9, 2018 at 6:39 AM lucamilanesio <luca.mi...@gmail.com> wrote:
This is a JGit issue, found and fixed by Dave Borowitz in v4.11 (see [1]).
I will try to upgrade JGit to v4.11 on Gerrit v2.15.x and let's see if the situation is improved :-)


I've cherry-picked that commit back to stable-4.9 on jgit:


If this does fix the problem, maybe we can upgrade gerrit stable-2.15 to use a new snapshot built off jgit stable-4.9 to avoid upgrading all the way to 4.11.

I've built a new JGit snapshot off that commit and uploaded a change for gerrit stable-2.15:


Can someone check if it fixes the problem?

Natalie Chen

unread,
Jun 14, 2018, 2:35:41 AM6/14/18
to Repo and Gerrit Discussion
Hi, that fixed the problem, thanks.

Luca Milanesio

unread,
Jun 14, 2018, 4:03:06 AM6/14/18
to Repo and Gerrit Discussion, Luca Milanesio, Natalie Chen
Shall we release Gerrit v2.15.3?

This was quite a serious bug: the GC scheduler in Gerrit is very much used by many and this bug was leading to a production outage.
Glad it is fixed now in v2.15.

Luca.

David Pursehouse

unread,
Jun 14, 2018, 5:55:23 AM6/14/18
to Luca Milanesio, Repo and Gerrit Discussion, Natalie Chen
On Thu, Jun 14, 2018 at 5:03 PM Luca Milanesio <luca.mi...@gmail.com> wrote:
Shall we release Gerrit v2.15.3?


I'd like to wait a bit longer, if possible.  The support for Elasticsearch versions 5 and 6 was merged up from stable-2.14 into stable-2.15, but there are still a few of issues open that would be nice to fix before we announce a release that includes the support.

Makson Lee

unread,
Jun 19, 2018, 12:47:59 AM6/19/18
to Repo and Gerrit Discussion
got pack file corrupted after an external jgit gc, and can not find any error related in os system log, our gerrit version is 2.15.2, external jgit version is 4.11.0.201803080745-r.

[2018-06-19 11:29:37,569] [SSH git-upload-pack /platform/amss/SDM660.LA.2.1 (username)] WARN  org.eclipse.jgit.internal.storage.file.ObjectDirectory : Pack file /var/cache/git/platform/amss/SDM660.LA.2.1.git/objects/pack/pack-23cb20607a7d56b464bd6ade74db59e158d90397.pack is corrupt, removing it from pack list
[2018-06-19 11:29:37,570] [SSH git-upload-pack /platform/amss/SDM660.LA.2.1 (username)] WARN  com.google.gerrit.server.project.ProjectCacheImpl : Cannot read project platform/amss/SDM660.LA.2.1
java.util.concurrent.ExecutionException: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 98cb547e884c103b51e1c60accb2c34d69fb78cc
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:503)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:462)
        at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:79)
        at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:142)
        at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2453)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2417)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2299)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2212)
        at com.google.common.cache.LocalCache.get(LocalCache.java:4147)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4151)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5140)
        at com.google.gerrit.server.project.ProjectCacheImpl.strictCheckedGet(ProjectCacheImpl.java:160)
        at com.google.gerrit.server.project.ProjectCacheImpl.checkedGet(ProjectCacheImpl.java:142)
        at com.google.gerrit.server.project.ProjectControl$GenericFactory.controlFor(ProjectControl.java:87)
        at com.google.gerrit.server.args4j.ProjectControlHandler.parseArguments(ProjectControlHandler.java:82)
        at org.kohsuke.args4j.CmdLineParser.parseArgument(CmdLineParser.java:508)
        at com.google.gerrit.util.cli.CmdLineParser.parseArgument(CmdLineParser.java:222)
        at com.google.gerrit.sshd.BaseCommand.parseCommandLine(BaseCommand.java:237)
        at com.google.gerrit.sshd.BaseCommand.parseCommandLine(BaseCommand.java:216)
        at com.google.gerrit.sshd.AbstractGitCommand$1.executeParseCommand(AbstractGitCommand.java:62)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:448)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:432)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 98cb547e884c103b51e1c60accb2c34d69fb78cc
        at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:163)
        at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:234)
        at org.eclipse.jgit.revwalk.RevWalk.parseAny(RevWalk.java:859)
        at org.eclipse.jgit.revwalk.RevWalk.parseCommit(RevWalk.java:772)
        at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:160)
        at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:136)
        at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:116)
        at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:312)
        at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:294)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3708)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2416)
        ... 23 more

luca.mi...@gmail.com

unread,
Jun 19, 2018, 2:08:05 AM6/19/18
to Makson Lee, Repo and Gerrit Discussion
Have you run a git Fick in the repo?
Sometimes JGIt “thinks” it is corrupt, but in reality is only a caching issue.

Luca

Sent from my iPhone

Makson Lee

unread,
Jun 19, 2018, 2:17:08 AM6/19/18
to Repo and Gerrit Discussion
it is working now after restarted the gerrit instance, so should be a caching issue?

luca.mi...@gmail.com

unread,
Jun 19, 2018, 3:10:02 AM6/19/18
to Makson Lee, Repo and Gerrit Discussion


Sent from my iPhone

On 19 Jun 2018, at 07:17, Makson Lee <cdle...@gmail.com> wrote:

it is working now after restarted the gerrit instance, so should be a caching issue?

Yes


On Tuesday, June 19, 2018 at 2:08:05 PM UTC+8, lucamilanesio wrote:
Have you run a git Fick in the repo?

Sorry, I meant fsck

Makson Lee

unread,
Jun 19, 2018, 3:17:03 AM6/19/18
to Repo and Gerrit Discussion


On Tuesday, June 19, 2018 at 3:10:02 PM UTC+8, lucamilanesio wrote:


Sent from my iPhone

On 19 Jun 2018, at 07:17, Makson Lee <cdle...@gmail.com> wrote:

it is working now after restarted the gerrit instance, so should be a caching issue?

Yes


so how to avoid it? we haven't seen this issue in version 2.14.4 using internal gc.

David Pursehouse

unread,
Jun 19, 2018, 3:21:04 AM6/19/18
to Makson Lee, Repo and Gerrit Discussion
On Tue, Jun 19, 2018 at 4:17 PM Makson Lee <cdle...@gmail.com> wrote:


On Tuesday, June 19, 2018 at 3:10:02 PM UTC+8, lucamilanesio wrote:


Sent from my iPhone

On 19 Jun 2018, at 07:17, Makson Lee <cdle...@gmail.com> wrote:

it is working now after restarted the gerrit instance, so should be a caching issue?

Yes


so how to avoid it? we haven't seen this issue in version 2.14.4 using internal gc.

Can you open a new issue for this?  It's not related to the issue in this topic.

Makson Lee

unread,
Jun 19, 2018, 3:33:42 AM6/19/18
to Repo and Gerrit Discussion


On Tuesday, June 19, 2018 at 3:21:04 PM UTC+8, David Pursehouse wrote:
On Tue, Jun 19, 2018 at 4:17 PM Makson Lee <cdle...@gmail.com> wrote:


On Tuesday, June 19, 2018 at 3:10:02 PM UTC+8, lucamilanesio wrote:


Sent from my iPhone

On 19 Jun 2018, at 07:17, Makson Lee <cdle...@gmail.com> wrote:

it is working now after restarted the gerrit instance, so should be a caching issue?

Yes


so how to avoid it? we haven't seen this issue in version 2.14.4 using internal gc.

Can you open a new issue for this?  It's not related to the issue in this topic.

sure, issue [1] has been created.

Xin Jia

unread,
Aug 10, 2018, 2:11:25 AM8/10/18
to Repo and Gerrit Discussion
Hi,

I am running gerrit 2.14.10 and seeing similar issues, but it's not related to GC.

Affected Version:
2.14.10

Before starting gerrit, 
$lsof | wc -l
891

After restarting gerrit,
$lsof | wc -l
49495

I looked all the open files, they are all related to "tmp" directory.

     29130   15991906 /u01/gerrit/gerrit_site/tmp/gerrit_8626170549382955020_app/asm-tree-5.1.jar                                 ┤
     43309   15991866 /u01/gerrit/gerrit_site/tmp/gerrit_8626170549382955020_app/asm-util-5.1.jar                                 ┤
   1565588   15991874 /u01/gerrit/gerrit_site/tmp/gerrit_8626170549382955020_app/auto-value-1.4.jar                               ┤
    176285   15991944 /u01/gerrit/gerrit_site/tmp/gerrit_8626170549382955020_app/automaton-1.11-8.jar                             ┤
    287786   15991921 /u01/gerrit/gerrit_site/tmp/gerrit_8626170549382955020_app/bcpg-jdk15on-1.56.jar                            ┤
    685403   15991840 /u01/gerrit/gerrit_site/tmp/gerrit_8626170549382955020_app/bcpkix-jdk15on-1.56.jar                          ┤
      8076   15991875 /u01/gerrit/gerrit_site/tmp/gerrit_8626170549382955020_app/blame-cache-0.2-1.jar                            ┤

Any insight should help.

Thanks,
Xin

Yann COLLETTE

unread,
Oct 11, 2018, 10:10:38 AM10/11/18
to Repo and Gerrit Discussion
I still have this problem with 2.15.5:

# lsof | grep git | wc -l
28037

Luca Milanesio

unread,
Oct 11, 2018, 10:13:08 AM10/11/18
to Yann COLLETTE, Luca Milanesio, Repo and Gerrit Discussion
Yep, we know. 

It is currently investigated, in the meantime just rollback to 2.15.4 until it is fixed.

Luca.
Reply all
Reply to author
Forward
0 new messages