clones over https get temporary 503 errors from gerrit

Darragh Bailey

unread,

Dec 2, 2014, 8:17:27 AM12/2/14

to repo-d...@googlegroups.com, Darragh Bailey

Hi,

An internal Gerrit instance we've been running has at random occasions returning 503 service not available errors for which I'm trying to get to the bottom of as I suspect it's just a lack of understanding and misconfiguration on my part.

I'm using apache to terminate the SSL connections and act as a reverse proxy to my gerrit instance

As far as I can tell the 503 errors are immediately being generated from gerrit and returned by apache based on examination of the apache error logs, access logs and gerrit error logs (no 503's recorded in the apache error log, just in the access log).

gerrit configuration settings
[sshd]
        listenAddress = *:29418
        threads = 96
        batchThreads = 16    # when it says non interactive, is that just replication & log compressor? or also accounts using stream-events?
        streamThreads = 6    # Not really sure this is necessary or beneficial?
[httpd]
        listenUrl = proxy-https://localhost:8081/

I previously tried configuring httpd.maxThreads=48 and httpd.maxQueued=96 and then ensuring that apache's ServerLimit and ClientLimit setttings (using prefork mpm) were set to the sum of these (144).

Assuming that exceeding the sum of those settings would result in connections being queued in apache until timeout is reached, or if it was accepted into httpd queue in gerrit, until the httpd.maxWait timeout was reached. This would in turn mean that for 503 errors to occur some connections would have to be queued by apache for a period of time.

Given that there have been subsequent 503 errors and the responses were almost instant to the request (response < 1s after request), I'm wondering if httpd.maxThreads and httpd.maxQueued are used at all if using a reverse proxy?

If those settings are irrelevant and what's happening is the httpd.acceptorThreads receives the TCP request and attempts to allocate a thread from the sshd.threads directly when accepting requests via the reverse proxy listener, thus resulting in an immediate 503 response if for some reason all 96 threads are consumed, this would explain the behaviour I'm seeing.

I could just decrease the values of ServerLimit and ClientLimit in apache until the problem disappears or results in a different problem, but I'd appreciate some help in understanding how the various settings interact.

--
Darragh

Zaro

unread,

Dec 2, 2014, 12:48:26 PM12/2/14

to Darragh Bailey, repo-d...@googlegroups.com, Darragh Bailey

Darragh, What version of gerrit are you using?

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Darragh Bailey

unread,

Dec 2, 2014, 2:05:50 PM12/2/14

to repo-d...@googlegroups.com, daragh...@gmail.com, dba...@hp.com

Hey Khai,

On Tuesday, December 2, 2014 5:48:26 PM UTC, Khai Do wrote:

Darragh, What version of gerrit are you using?

One that was clearly too old! Seems the version we have running had a bug in that the 5m default httpd.maxWait was treated as 5ms. So doesn't require too much loading for such timeouts to occur :p

Also means all of my reasoning around httpd.maxThreads and httpd.maxQueued is incorrect, they still do play a role when using the reverse proxy, which makes far more sense!

I decided to look around for the openstack config ;) and found the following with some interesting comments:
https://github.com/openstack-infra/system-config/blob/1c0b2915f4d8d5cd490b31472ae4cdeb6d3a4a42/modules/openstack_project/manifests/review.pp

It points out the bug in the version of gerrit we were running (2.4.2) and gave a few more suggestions on how to configure some of the various settings. Configured the httpd.maxWait to be "300000m" to get the correct corresponding timeout of 5 minutes.

Local apache config has also been tweaked to support a less connections since it's now clear that probably don't need them.

Guess that teaches us about not upgrading in a timely manner...

--
Darragh

Reply all

Reply to author

Forward