Hi all,
We are running replication using ssh tunnels with git protocol. On the slaves we have a git-daemon started with the following options:
--verbose --syslog --reuseaddr --export-all --enable=receive-pack --listen=127.0.0.1
the solution worked fine for around 2 weeks but right now we are seeing on the master the following replication errors:
ERROR com.googlesource.gerrit.plugins.replication.ReplicationQueue (PushOne.java:228): Cannot replicate to git://localhost:31300/repo_name.git; repository not found´
On the slaves in the syslog there are 2 different errors coming from the git-daemon:
"Too many children, dropping connection"
"fatal: failed to read object ccb9b07703f5f70dcc32ed8f29d2d24ea3423ca3: Too many open files"
And indeed, the git-daemon has been forking and not cleaning down the child processes. ps aux | grep git-daemon | wc -l shows 33.
Replication is configured to use up to 3 threads per remote. In the same time gerrit queue does not seem to execute that many replication tasks (in total, not even per remote).
And at the end we get:
git-daemon status
Checking for service git-daemon dead
If we kill the child git-daemon processes and restart the git-daemon service then after some time the situation occurs again.
Here is a sample remote definition:
[remote "name"]
url = git://localhost:30500/${name}.git
adminUrl = gerrit-slave-E:/path/gerrit/data/git/${name}.git
threads = 3
replicationDelay = 0
mirror = true
Gerrit version: 2.7
git version: 1.7.12.4
Has anyone faced such problem? What could be the root cause of this?
Thanks!