Hi,
An internal Gerrit instance we've been running has at random occasions returning 503 service not available errors for which I'm trying to get to the bottom of as I suspect it's just a lack of understanding and misconfiguration on my part.
I'm using apache to terminate the SSL connections and act as a reverse proxy to my gerrit instance
As far as I can tell the 503 errors are immediately being generated from gerrit and returned by apache based on examination of the apache error logs, access logs and gerrit error logs (no 503's recorded in the apache error log, just in the access log).
gerrit configuration settings
[sshd]
listenAddress = *:29418
threads = 96
batchThreads = 16 # when it says non interactive, is that just replication & log compressor? or also accounts using stream-events?
streamThreads = 6 # Not really sure this is necessary or beneficial?
[httpd]
listenUrl = proxy-
https://localhost:8081/I previously tried configuring httpd.maxThreads=48 and httpd.maxQueued=96 and then ensuring that apache's ServerLimit and ClientLimit setttings (using prefork mpm) were set to the sum of these (144).
Assuming that exceeding the sum of those settings would result in connections being queued in apache until timeout is reached, or if it was accepted into httpd queue in gerrit, until the httpd.maxWait timeout was reached. This would in turn mean that for 503 errors to occur some connections would have to be queued by apache for a period of time.
Given that there have been subsequent 503 errors and the responses were almost instant to the request (response < 1s after request), I'm wondering if httpd.maxThreads and httpd.maxQueued are used at all if using a reverse proxy?
If those settings are irrelevant and what's happening is the httpd.acceptorThreads receives the TCP request and attempts to allocate a thread from the sshd.threads directly when accepting requests via the reverse proxy listener, thus resulting in an immediate 503 response if for some reason all 96 threads are consumed, this would explain the behaviour I'm seeing.
I could just decrease the values of ServerLimit and ClientLimit in apache until the problem disappears or results in a different problem, but I'd appreciate some help in understanding how the various settings interact.
--
Darragh