Unavailable backend seems to be causing Internal Server Error (500) in Apache

Gavin Grieve

unread,

Sep 2, 2013, 11:02:56 PM9/2/13

to th...@googlegroups.com

Running Thruk v1.76-2 on Ubuntu 12.04.2 LTS

We have a Thruk master with 9 backends configured (1 local, 8 remote) and have noticed that when one of the backends becomes unreachable (the network between us and them disappears) instead of seeing an error message or Thruk ignoring the backend while still displaying results from others we get back a 500 server error and Thruk is completely unusable until either the backend is reachable again or we remove it from the thruk_local.conf file and restart Thruk or Apache. This happens even if other backends are selected and working normally.

From /var/log/thruk/error.log:
[2013/09/03 14:26:22][master][ERROR][Thruk.Controller.error] No Backend available
[2013/09/03 14:26:22][master][ERROR][Thruk.Controller.error] on page: https://master.server.name/thruk/cgi-bin/status.cgi?host=all&_=1378175165120
[2013/09/03 14:26:22][master][ERROR][Thruk.Controller.error] remotehost: ERROR: failed to connect (remotehost:6557)

From /var/log/apache/error.log:
[Tue Sep 03 14:26:03 2013] [warn] [client 1.2.3.4] mod_fcgid: error reading data, FastCGI server closed connection, referer: https://master.server.name/thruk/side.html
[Tue Sep 03 14:26:03 2013] [error] [client 1.2.3.4] Premature end of script headers: fcgid_env.sh, referer: https://master.server.name/thruk/side.html
[Tue Sep 03 14:26:30 2013] [warn] [client 1.2.3.4] mod_fcgid: error reading data, FastCGI server closed connection
[Tue Sep 03 14:26:30 2013] [error] [client 1.2.3.4] Premature end of script headers: fcgid_env.sh

I haven't seen this happen prior to updating Thruk to v1.76-2 this weekend just gone so suspect it could be related to https://github.com/sni/Thruk/commit/0e4b54eba1c22391f4a03dffd909c985a627e9da (show error instead of empty result for a single failed instance)

As we monitor the remote hosts directly from our central master server, would defining check_local_states=1 and then setting the state_host for each backend be a potential workaround for this or is this related to the fact it's now calling die() for all error conditions rather than just fatal ones?

--
Gavin Grieve

Sven Nierlein

unread,

Sep 7, 2013, 6:36:20 AM9/7/13

to th...@googlegroups.com

Hi Gavin,

Could you try if setting
connection_pool_size = 0
in your thruk_local.conf temporarily fixes the problem?
I think its related to the connection pool.

Bye,
Sven

Gavin Grieve

unread,

Sep 7, 2013, 7:00:08 AM9/7/13

to th...@googlegroups.com

I've enabled that option on our main server. Will see how it goes.

Thanks.

Reply all

Reply to author

Forward