Naemon/Thruk server completely stucked

809 views
Skip to first unread message

Fabrice Le Dorze

unread,
May 27, 2015, 5:00:12 AM5/27/15
to th...@googlegroups.com
Hello
This morning we got an issue with our main Naemon/Thruk console plugged to 12 backends
It was completely stucked : we could not connect or change the view.
A top command showed a load arount 34 !!
I have written a reset script for our NOC operators, see below, so that they can solve the problem themselves, especially off-hours.
It fixed the issue and the load rapidley decreased.
I'm looking for the root cause, as it not the first time it happens.
It seems that Thruk is trying to kill processes but does not succeed.

Any idea ?

I joined log extracts to help

#!/bin/bash
source /etc/profile
rm -rf /var/lib/naemon/thruk/token /var/cache/naemon/thruk/thruk.cache
pkill -f /usr/share/naemon/script/thruk_fastcgi.pl
/usr/sbin/service apache2 restart
echo
if [[ $? -eq 0 ]]
then
    echo "Thruk reinitialization succeeded."
else
    echo "Thruk reinitialization failed."
fi



thruk_log.txt

Fabrice Le Dorze

unread,
May 29, 2015, 3:05:37 AM5/29/15
to th...@googlegroups.com
I forgot to join apache log extract, in which we can see the errors

Sven Nierlein

unread,
May 29, 2015, 3:13:01 AM5/29/15
to th...@googlegroups.com
On 29/05/15 9:05, Fabrice Le Dorze wrote:
> I forgot to join apache log extract, in which we can see the errors

This didn't work either. The thruk.log looks ok.

Sven

Fabrice Le Dorze

unread,
May 29, 2015, 3:15:51 AM5/29/15
to th...@googlegroups.com

Fabrice Le Dorze

unread,
May 29, 2015, 3:31:47 AM5/29/15
to th...@googlegroups.com
I sent you the apache log by mail .

Sven Nierlein

unread,
May 29, 2015, 4:32:58 AM5/29/15
to th...@googlegroups.com
On 29/05/15 9:31, Fabrice Le Dorze wrote:
> I sent you the apache log by mail .

I didn't get anything. Could you pastebin or gist the relevant error?

Fabrice Le Dorze

unread,
Jun 11, 2015, 11:00:55 AM6/11/15
to th...@googlegroups.com
Again today
The server becomes crazy witha high load and
we can see such messages in Apache error log below.

But I cannot determine the root cause.


 [Thu Jun 11 09:36:28 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
[Thu Jun 11 09:36:29 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
[Thu Jun 11 09:36:32 2015] [warn] [client 172.27.0.24] mod_fcgid: error reading data, FastCGI server closed connection, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/
[Thu Jun 11 09:36:36 2015] [error] [client 172.27.0.24] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/
[Thu Jun 11 09:36:41 2015] [warn] mod_fcgid: process 13157 graceful kill fail, sending SIGKILL
[Thu Jun 11 09:37:22 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/cgi-bin/status.cgi?hst_s0_value=SMH_%7CDSI_%7CCEAC_%7CDESD_%7CLVM_%7CVILM_&hst_s0_value=GOP_(ECT%7CECU%7CERB%7CHON%7CITI%7CJOR%7CLIL%7CLOU%7CMIL%7CNAN%7CNCE%7CORG%7CPEV%7CRUM%7CSBC%7CSEL%7CSHA%7CSHE%7CSJE%7CSPA%7CSPP%7CVER)&svc_s0_servicestatustypes=20&hst_s0_op=!~&hst.....
.....
[Thu Jun 11 09:38:02 2015] [warn] [client 172.27.0.24] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/
[Thu Jun 11 09:38:02 2015] [error] [client 172.27.0.24] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/
[Thu Jun 11 09:38:07 2015] [warn] mod_fcgid: process 30494 graceful kill fail, sending SIGKILL
[Thu Jun 11 09:38:13 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
[Thu Jun 11 09:38:13 2015] [warn] mod_fcgid: process 30551 graceful kill fail, sending SIGKILL
[Thu Jun 11 09:38:14 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
[Thu Jun 11 09:38:14 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
[Thu Jun 11 09:38:13 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
[Thu Jun 11 09:38:14 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
[Thu Jun 11 09:38:15 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
[Thu Jun 11 09:38:15 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
[Thu Jun 11 09:38:17 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
[Thu Jun 11 09:38:31 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
[Thu Jun 11 09:38:32 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
[Thu Jun 11 09:39:18 2015] [error] [client 172.27.0.72] File does not exist: /var/www/favicon.ico
[Thu Jun 11 09:39:19 2015] [error] [client 172.27.0.72] File does not exist: /var/www/favicon.ico
[Thu Jun 11 09:39:30 2015] [warn] [client 172.27.0.104] mod_fcgid: error reading data, FastCGI server closed connection, referer: http://hypervision0/naemon/cgi-bin/status.cgi?style=combined&nav=&hidesearch=2&hidetop=1&title=Vue%20Supervision&title=VILM&hst_s0_hoststatustypes=4&hst_s0_servicestatustypes=31&hst_s0_hostprops=4466730&hst_s0_serviceprops=0&hst_s0_type=host&hst_s.....
....
[Thu Jun 11 16:20:17 2015] [warn] [client 172.27.0.81] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html
[Thu Jun 11 16:20:17 2015] [error] [client 172.27.0.81] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html
[Thu Jun 11 16:20:17 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html
[Thu Jun 11 16:20:18 2015] [error] [client 172.27.0.67] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html
[Thu Jun 11 16:20:24 2015] [warn] mod_fcgid: process 2997 graceful kill fail, sending SIGKILL
[Thu Jun 11 16:20:52 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html
[Thu Jun 11 16:20:52 2015] [error] [client 172.27.0.67] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html
[Thu Jun 11 16:21:35 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
[Thu Jun 11 16:21:35 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
[Thu Jun 11 16:21:35 2015] [warn] [client 172.27.0.103] mod_fcgid: read data timeout in 120 seconds
[Thu Jun 11 16:21:36 2015] [error] [client 172.27.0.103] Premature end of script headers: fcgid_env.sh
[Thu Jun 11 16:21:39 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
[Thu Jun 11 16:21:39 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
[Thu Jun 11 16:21:48 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
[Thu Jun 11 16:21:48 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
[Thu Jun 11 16:21:51 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
[Thu Jun 11 16:21:51 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
[Thu Jun 11 16:22:00 2015] [warn] mod_fcgid: process 9520 graceful kill fail, sending SIGKILL
[Thu Jun 11 16:22:05 2015] [warn] mod_fcgid: process 9520 graceful kill fail, sending SIGKILL
[Thu Jun 11 16:22:25 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
[Thu Jun 11 16:22:25 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
[Thu Jun 11 16:22:50 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html
[Thu Jun 11 16:22:50 2015] [error] [client 172.27.0.67] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html
[Thu Jun 11 16:22:50 2015] [warn] [client 172.27.0.70] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0/naemon/cgi-bin/status.cgi?style=combined&nav=&hidesearch=2&hidetop=1&title=Vue%20Supervision&title=Vue%20Supervision&hst_s0_hoststatustypes=4&hst_s0_servicestatustypes=31&hst_s0_hostprops=4466730&hst_s0_serviceprops=0&hst_s0_type=host&hst_s0_type=host&hst_s0_val_pre=&hst_s0_val_pre=&hst_s0_op=!~&hst_s0_op=!~&hst_s0_value=SMH_%7CDSI_%7CCEAC_%7CDESD_%7CLVM_%7CVILM_&hst_s0_value=GOP_(ECT%7CECU%7CERB%7CHON%7CITI%7CJOR%7CLIL%7CLOU%7CMIL%7CNAN%7CNCE%7CORG%7CPEV%7CRUM%7CSBC%7CSEL%7CSHA%7CSHE%7CSJE%7CSPA%7CSPP%7CVER)&hst_s0_value_sel=5&hst_s0_value_sel=5&section=Bookmarks&newname=&bookmarksp=Bookmarks%3A%3AVILM&bookmarksp=Bookmarks%3A%3AVue%20Infra%20All&bookmarksp=Bookmarks%3A%3AVue%20Infra&bookmarksp=Bookmarks%3A%3AVue%20Supervision&bookmarksp=Bookmarks%3A%3ALVM&view_mode=html&all_col=&all_col=&host_columns=1&host_columns=2&host_columns=3&host_columns=4&host_columns=5&host_columns=6&host_columns=7&host_columns=8&host_columns=9&host_columns=10&host_columns=11&host_columns=12&host_columns=13&service_columns=1&service_columns=2&service_columns=3&service_columns=4&service_columns=5&service_columns=6&service_columns=7&service_columns=8&service_columns=9&service_columns=10&service_columns=11&service_columns=12&service_columns=13&service_columns=14&service_columns=15&service_columns=16&service_colu....

Sven Nierlein

unread,
Jun 11, 2015, 11:09:37 AM6/11/15
to th...@googlegroups.com
The timeouts in the logs could be the result of the high load. Its hard to tell the root cause
without any relevant errors.
I wrote a thruk plugin once, which saves some debug data:
https://github.com/sni/thruk-plugin-omd
Its written for omd, but should work with a standalone thruk too. Just adopt the installation
instructions.
> --
> You received this message because you are subscribed to the Google Groups "Thruk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to thruk+un...@googlegroups.com <mailto:thruk+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Fabrice Le Dorze

unread,
Jun 11, 2015, 4:33:03 PM6/11/15
to th...@googlegroups.com
I tried it on our Dev server under naemon 1.0.3
Apache reload failed by saying :

Bareword "Thruk::ADD_SAFE_DEFAULTS" not allowed while "strict subs" in use at /etc/naemon/plugins//plugins-enabled/omd/lib/Thruk/Controller/omd.pm line 52.
Compilation failed in require at /usr/lib/naemon/perl5/Catalyst/Utils.pm line 307.
Compilation failed in require at /usr/share/naemon/script/thruk_fastcgi.pl line 28
[Thu Jun 11 22:04:08 2015] [warn] [client 10.8.0.6] (104)Connection reset by peer: mod_fcgid: error reading data from FastCGI server, referer: http://s-hypervision0.inf.rms.loc/naemon/startup.html?/naemon/cgi-bin/login.cgi&naemon/
[Thu Jun 11 22:04:08 2015] [error] [client 10.8.0.6] Premature end of script headers: fcgid_env.sh, referer: http://s-hypervision0.inf.rms.loc/naemon/startup.html?/naemon/cgi-bin/login.cgi&naemon/

Sven Nierlein

unread,
Jun 11, 2015, 4:37:04 PM6/11/15
to th...@googlegroups.com
You need to clone the https://github.com/sni/thruk-plugin-omd/tree/maintain-1.x branch.
The master is already prepared for thruk 2.0.



On 11/06/15 22:33, Fabrice Le Dorze wrote:
> I tried it on our Dev server under naemon 1.0.3
> Apache reload failed by saying :
>
> Bareword "Thruk::ADD_SAFE_DEFAULTS" not allowed while "strict subs" in use at /etc/naemon/plugins//plugins-enabled/omd/lib/Thruk/Controller/omd.pm line 52.
> Compilation failed in require at /usr/lib/naemon/perl5/Catalyst/Utils.pm line 307.
> Compilation failed in require at /usr/share/naemon/script/thruk_fastcgi.pl line 28
> [Thu Jun 11 22:04:08 2015] [warn] [client 10.8.0.6] (104)Connection reset by peer: mod_fcgid: error reading data from FastCGI server, referer: http://s-hypervision0.inf.rms.loc/naemon/startup.html?/naemon/cgi-bin/login.cgi&naemon/
> [Thu Jun 11 22:04:08 2015] [error] [client 10.8.0.6] Premature end of script headers: fcgid_env.sh, referer: http://s-hypervision0.inf.rms.loc/naemon/startup.html?/naemon/cgi-bin/login.cgi&naemon/
>
>
>
> Le jeudi 11 juin 2015 17:09:37 UTC+2, Sven Nierlein a écrit :
>
> The timeouts in the logs could be the result of the high load. Its hard to tell the root cause
> without any relevant errors.
> I wrote a thruk plugin once, which saves some debug data:
> https://github.com/sni/thruk-plugin-omd <https://github.com/sni/thruk-plugin-omd>
> Its written for omd, but should work with a standalone thruk too. Just adopt the installation
> instructions.
>
>
>
> On 11/06/15 17:00, Fabrice Le Dorze wrote:
> > Again today
> > The server becomes crazy witha high load and
> > we can see such messages in Apache error log below.
> >
> > But I cannot determine the root cause.
> >
> >
> > [Thu Jun 11 09:36:28 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
> > Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
> > Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
> > Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
> > Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
> > [Thu Jun 11 09:36:29 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 09:36:32 2015] [warn] [client 172.27.0.24] mod_fcgid: error reading data, FastCGI server closed connection, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/ <http://hypervision0/naemon/cgi-bin/login.cgi?naemon/>
> > [Thu Jun 11 09:36:36 2015] [error] [client 172.27.0.24] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/ <http://hypervision0/naemon/cgi-bin/login.cgi?naemon/>
> > [Thu Jun 11 09:36:41 2015] [warn] mod_fcgid: process 13157 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 09:37:22 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/cgi-bin/status.cgi?hst_s0_value=SMH_%7CDSI_%7CCEAC_%7CDESD_%7CLVM_%7CVILM_&hst_s0_value=GOP_(ECT%7CECU%7CERB%7CHON%7CITI%7CJOR%7CLIL%7CLOU%7CMIL%7CNAN%7CNCE%7CORG%7CPEV%7CRUM%7CSBC%7CSEL%7CSHA%7CSHE%7CSJE%7CSPA%7CSPP%7CVER)&svc_s0_servicestatustypes=20&hst_s0_op=!~&hst... <http://hypervision0.inf.rms.loc/naemon/cgi-bin/status.cgi?hst_s0_value=SMH_%7CDSI_%7CCEAC_%7CDESD_%7CLVM_%7CVILM_&hst_s0_value=GOP_(ECT%7CECU%7CERB%7CHON%7CITI%7CJOR%7CLIL%7CLOU%7CMIL%7CNAN%7CNCE%7CORG%7CPEV%7CRUM%7CSBC%7CSEL%7CSHA%7CSHE%7CSJE%7CSPA%7CSPP%7CVER)&svc_s0_servicestatustypes=20&hst_s0_op=!~&hst...>..
> > .....
> > [Thu Jun 11 09:38:02 2015] [warn] [client 172.27.0.24] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/ <http://hypervision0/naemon/cgi-bin/login.cgi?naemon/>
> > [Thu Jun 11 09:38:02 2015] [error] [client 172.27.0.24] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/ <http://hypervision0/naemon/cgi-bin/login.cgi?naemon/>
> > [Thu Jun 11 09:38:07 2015] [warn] mod_fcgid: process 30494 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 09:38:13 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 09:38:13 2015] [warn] mod_fcgid: process 30551 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 09:38:14 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 09:38:14 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 09:38:13 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 09:38:14 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 09:38:15 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 09:38:15 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 09:38:17 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 09:38:31 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 09:38:32 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 09:39:18 2015] [error] [client 172.27.0.72] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 09:39:19 2015] [error] [client 172.27.0.72] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 09:39:30 2015] [warn] [client 172.27.0.104] mod_fcgid: error reading data, FastCGI server closed connection, referer: http://hypervision0/naemon/cgi-bin/status.cgi?style=combined&nav=&hidesearch=2&hidetop=1&title=Vue%20Supervision&title=VILM&hst_s0_hoststatustypes=4&hst_s0_servicestatustypes=31&hst_s0_hostprops=4466730&hst_s0_serviceprops=0&hst_s0_type=host&hst_s... <http://hypervision0/naemon/cgi-bin/status.cgi?style=combined&nav=&hidesearch=2&hidetop=1&title=Vue%20Supervision&title=VILM&hst_s0_hoststatustypes=4&hst_s0_servicestatustypes=31&hst_s0_hostprops=4466730&hst_s0_serviceprops=0&hst_s0_type=host&hst_s...>..
> > ....
> > [Thu Jun 11 16:20:17 2015] [warn] [client 172.27.0.81] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:20:17 2015] [error] [client 172.27.0.81] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:20:17 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:20:18 2015] [error] [client 172.27.0.67] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:20:24 2015] [warn] mod_fcgid: process 2997 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 16:20:52 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:20:52 2015] [error] [client 172.27.0.67] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:21:35 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 16:21:35 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 16:21:35 2015] [warn] [client 172.27.0.103] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 16:21:36 2015] [error] [client 172.27.0.103] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 16:21:39 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 16:21:39 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 16:21:48 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 16:21:48 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 16:21:51 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 16:21:51 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 16:22:00 2015] [warn] mod_fcgid: process 9520 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 16:22:05 2015] [warn] mod_fcgid: process 9520 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 16:22:25 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 16:22:25 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 16:22:50 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:22:50 2015] [error] [client 172.27.0.67] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> <http://hypervision0/naemon/cgi-bin/status.cgi?style=combined&nav=&hidesearch=2&hidetop=1&title=Vue%20Supervision&title=Vue%20Supervision&hst_s0_hoststatustypes=4&hst_s0_servicestatustypes=31&hst_s0_hostprops=4466730&hst_s0_serviceprops=0&hst_s0_type=host&hst_s0_type=host&hst_s0_val_pre=&hst_s0_val_pre=&hst_s0_op=!~&hst_s0_op=!~&hst_s0_value=SMH_%7CDSI_%7CCEAC_%7CDESD_%7CLVM_%7CVILM_&hst_s0_value=GOP_(ECT%7CECU%7CERB%7CHON%7CITI%7CJOR%7CLIL%7CLOU%7CMIL%7CNAN%7CNCE%7CORG%7CPEV%7CRUM%7CSBC%7CSEL%7CSHA%7CSHE%7CSJE%7CSPA%7CSPP%7CVER)&hst_s0_value_sel=5&hst_s0_value_sel=5&section=Bookmarks&newname=&bookmarksp=Bookmarks%3A%3AVILM&bookmarksp=Bookmarks%3A%3AVue%20Infra%20All&bookmarksp=Bookmarks%3A%3AVue%20Infra&bookmarksp=Bookmarks%3A%3AVue%20Supervision&bookmarksp=Bookmarks%3A%3ALVM&view_mode=html&all_col=&all_col=&host_columns=1&host_columns=2&host_columns=3&host_columns=4&host_columns=5&host_columns=6&host_columns=7&host_columns=8&host_columns=9&host_columns=10&host_columns=11&host
_
colu>
>
> m
> ns=12&host_columns=13&service_columns=1&service_columns=2&service_columns=3&service_columns=4&service_columns=5&service_columns=6&service_columns=7&service_columns=8&service_columns=9&service_columns=10&service_columns=11&service_columns=12&service_columns=13&service_columns=14&service_columns=15&service_columns=16&service_colu....
> >
> >
> >
> > Le vendredi 29 mai 2015 10:32:58 UTC+2, Sven Nierlein a écrit :
> >
> > On 29/05/15 9:31, Fabrice Le Dorze wrote:
> > > I sent you the apache log by mail .
> >
> > I didn't get anything. Could you pastebin or gist the relevant error?
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Thruk" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to thruk+un...@googlegroups.com <javascript:> <mailto:thruk+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.

Fabrice Le Dorze

unread,
Jun 11, 2015, 4:44:27 PM6/11/15
to th...@googlegroups.com
Hum well. I don't really know git

git clone https://github.com/sni/thruk-plugin-omd/tree/maintain-1.x omd
Cloning into 'omd'...
error: The requested URL returned error: 403 while accessing https://github.com/sni/thruk-plugin-omd/tree/maintain-1.x/info/refs
fatal: HTTP request failed
Is it the right syntax ?

Sven Nierlein

unread,
Jun 11, 2015, 4:46:53 PM6/11/15
to th...@googlegroups.com
Just run a "git checkout maintain-1.x" in the already cloned folder.


On 11/06/15 22:44, Fabrice Le Dorze wrote:
> Hum well. I don't really know git
>
> git clone https://github.com/sni/thruk-plugin-omd/tree/maintain-1.x omd
> Cloning into 'omd'...
> error: The requested URL returned error: 403 while accessing https://github.com/sni/thruk-plugin-omd/tree/maintain-1.x/info/refs
> fatal: HTTP request failed
> Is it the right syntax ?
>
>
> Le jeudi 11 juin 2015 17:09:37 UTC+2, Sven Nierlein a écrit :
>
> The timeouts in the logs could be the result of the high load. Its hard to tell the root cause
> without any relevant errors.
> I wrote a thruk plugin once, which saves some debug data:
> https://github.com/sni/thruk-plugin-omd <https://github.com/sni/thruk-plugin-omd>
> Its written for omd, but should work with a standalone thruk too. Just adopt the installation
> instructions.
>
>
>
> On 11/06/15 17:00, Fabrice Le Dorze wrote:
> > Again today
> > The server becomes crazy witha high load and
> > we can see such messages in Apache error log below.
> >
> > But I cannot determine the root cause.
> >
> >
> > [Thu Jun 11 09:36:28 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
> > Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
> > Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
> > Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
> > Thread creation failed: pthread_create returned 11 at lib/Thruk/Pool/Simple.pm line 25.
> > [Thu Jun 11 09:36:29 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 09:36:32 2015] [warn] [client 172.27.0.24] mod_fcgid: error reading data, FastCGI server closed connection, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/ <http://hypervision0/naemon/cgi-bin/login.cgi?naemon/>
> > [Thu Jun 11 09:36:36 2015] [error] [client 172.27.0.24] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/ <http://hypervision0/naemon/cgi-bin/login.cgi?naemon/>
> > [Thu Jun 11 09:36:41 2015] [warn] mod_fcgid: process 13157 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 09:37:22 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/cgi-bin/status.cgi?hst_s0_value=SMH_%7CDSI_%7CCEAC_%7CDESD_%7CLVM_%7CVILM_&hst_s0_value=GOP_(ECT%7CECU%7CERB%7CHON%7CITI%7CJOR%7CLIL%7CLOU%7CMIL%7CNAN%7CNCE%7CORG%7CPEV%7CRUM%7CSBC%7CSEL%7CSHA%7CSHE%7CSJE%7CSPA%7CSPP%7CVER)&svc_s0_servicestatustypes=20&hst_s0_op=!~&hst... <http://hypervision0.inf.rms.loc/naemon/cgi-bin/status.cgi?hst_s0_value=SMH_%7CDSI_%7CCEAC_%7CDESD_%7CLVM_%7CVILM_&hst_s0_value=GOP_(ECT%7CECU%7CERB%7CHON%7CITI%7CJOR%7CLIL%7CLOU%7CMIL%7CNAN%7CNCE%7CORG%7CPEV%7CRUM%7CSBC%7CSEL%7CSHA%7CSHE%7CSJE%7CSPA%7CSPP%7CVER)&svc_s0_servicestatustypes=20&hst_s0_op=!~&hst...>..
> > .....
> > [Thu Jun 11 09:38:02 2015] [warn] [client 172.27.0.24] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/ <http://hypervision0/naemon/cgi-bin/login.cgi?naemon/>
> > [Thu Jun 11 09:38:02 2015] [error] [client 172.27.0.24] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0/naemon/cgi-bin/login.cgi?naemon/ <http://hypervision0/naemon/cgi-bin/login.cgi?naemon/>
> > [Thu Jun 11 09:38:07 2015] [warn] mod_fcgid: process 30494 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 09:38:13 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 09:38:13 2015] [warn] mod_fcgid: process 30551 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 09:38:14 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 09:38:14 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 09:38:13 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 09:38:14 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 09:38:15 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 09:38:15 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 09:38:17 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 09:38:31 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 09:38:32 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 09:39:18 2015] [error] [client 172.27.0.72] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 09:39:19 2015] [error] [client 172.27.0.72] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 09:39:30 2015] [warn] [client 172.27.0.104] mod_fcgid: error reading data, FastCGI server closed connection, referer: http://hypervision0/naemon/cgi-bin/status.cgi?style=combined&nav=&hidesearch=2&hidetop=1&title=Vue%20Supervision&title=VILM&hst_s0_hoststatustypes=4&hst_s0_servicestatustypes=31&hst_s0_hostprops=4466730&hst_s0_serviceprops=0&hst_s0_type=host&hst_s... <http://hypervision0/naemon/cgi-bin/status.cgi?style=combined&nav=&hidesearch=2&hidetop=1&title=Vue%20Supervision&title=VILM&hst_s0_hoststatustypes=4&hst_s0_servicestatustypes=31&hst_s0_hostprops=4466730&hst_s0_serviceprops=0&hst_s0_type=host&hst_s...>..
> > ....
> > [Thu Jun 11 16:20:17 2015] [warn] [client 172.27.0.81] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:20:17 2015] [error] [client 172.27.0.81] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:20:17 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:20:18 2015] [error] [client 172.27.0.67] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:20:24 2015] [warn] mod_fcgid: process 2997 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 16:20:52 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:20:52 2015] [error] [client 172.27.0.67] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:21:35 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 16:21:35 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 16:21:35 2015] [warn] [client 172.27.0.103] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 16:21:36 2015] [error] [client 172.27.0.103] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 16:21:39 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 16:21:39 2015] [error] [client 172.27.0.103] File does not exist: /var/www/favicon.ico
> > [Thu Jun 11 16:21:48 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 16:21:48 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 16:21:51 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 16:21:51 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 16:22:00 2015] [warn] mod_fcgid: process 9520 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 16:22:05 2015] [warn] mod_fcgid: process 9520 graceful kill fail, sending SIGKILL
> > [Thu Jun 11 16:22:25 2015] [warn] [client ::1] mod_fcgid: read data timeout in 120 seconds
> > [Thu Jun 11 16:22:25 2015] [error] [client ::1] Premature end of script headers: fcgid_env.sh
> > [Thu Jun 11 16:22:50 2015] [warn] [client 172.27.0.67] mod_fcgid: read data timeout in 120 seconds, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:22:50 2015] [error] [client 172.27.0.67] Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/side.html <http://hypervision0.inf.rms.loc/naemon/side.html>
> > [Thu Jun 11 16:22:50 2015] [warn] [client 172.27.0.70] mod_fcgid: read data timeout in 120 seconds, referer:
> >
> http://hypervision0/naemon/cgi-bin/status.cgi?style=combined&nav=&hidesearch=2&hidetop=1&title=Vue%20Supervision&title=Vue%20Supervision&hst_s0_hoststatustypes=4&hst_s0_servicestatustypes=31&hst_s0_hostprops=4466730&hst_s0_serviceprops=0&hst_s0_type=host&hst_s0_type=host&hst_s0_val_pre=&hst_s0_val_pre=&hst_s0_op=!~&hst_s0_op=!~&hst_s0_value=SMH_%7CDSI_%7CCEAC_%7CDESD_%7CLVM_%7CVILM_&hst_s0_value=GOP_(ECT%7CECU%7CERB%7CHON%7CITI%7CJOR%7CLIL%7CLOU%7CMIL%7CNAN%7CNCE%7CORG%7CPEV%7CRUM%7CSBC%7CSEL%7CSHA%7CSHE%7CSJE%7CSPA%7CSPP%7CVER)&hst_s0_value_sel=5&hst_s0_value_sel=5&section=Bookmarks&newname=&bookmarksp=Bookmarks%3A%3AVILM&bookmarksp=Bookmarks%3A%3AVue%20Infra%20All&bookmarksp=Bookmarks%3A%3AVue%20Infra&bookmarksp=Bookmarks%3A%3AVue%20Supervision&bookmarksp=Bookmarks%3A%3ALVM&view_mode=html&all_col=&all_col=&host_columns=1&host_columns=2&host_columns=3&host_columns=4&host_columns=5&host_columns=6&host_columns=7&host_columns=8&host_columns=9&host_columns=10&host_columns=11&host_
c
olu
> <http://hypervision0/naemon/cgi-bin/status.cgi?style=combined&nav=&hidesearch=2&hidetop=1&title=Vue%20Supervision&title=Vue%20Supervision&hst_s0_hoststatustypes=4&hst_s0_servicestatustypes=31&hst_s0_hostprops=4466730&hst_s0_serviceprops=0&hst_s0_type=host&hst_s0_type=host&hst_s0_val_pre=&hst_s0_val_pre=&hst_s0_op=!~&hst_s0_op=!~&hst_s0_value=SMH_%7CDSI_%7CCEAC_%7CDESD_%7CLVM_%7CVILM_&hst_s0_value=GOP_(ECT%7CECU%7CERB%7CHON%7CITI%7CJOR%7CLIL%7CLOU%7CMIL%7CNAN%7CNCE%7CORG%7CPEV%7CRUM%7CSBC%7CSEL%7CSHA%7CSHE%7CSJE%7CSPA%7CSPP%7CVER)&hst_s0_value_sel=5&hst_s0_value_sel=5&section=Bookmarks&newname=&bookmarksp=Bookmarks%3A%3AVILM&bookmarksp=Bookmarks%3A%3AVue%20Infra%20All&bookmarksp=Bookmarks%3A%3AVue%20Infra&bookmarksp=Bookmarks%3A%3AVue%20Supervision&bookmarksp=Bookmarks%3A%3ALVM&view_mode=html&all_col=&all_col=&host_columns=1&host_columns=2&host_columns=3&host_columns=4&host_columns=5&host_columns=6&host_columns=7&host_columns=8&host_columns=9&host_columns=10&host_columns=11&host
_
colu>
>
> m
> ns=12&host_columns=13&service_columns=1&service_columns=2&service_columns=3&service_columns=4&service_columns=5&service_columns=6&service_columns=7&service_columns=8&service_columns=9&service_columns=10&service_columns=11&service_columns=12&service_columns=13&service_columns=14&service_columns=15&service_columns=16&service_colu....
> >
> >
> >
> > Le vendredi 29 mai 2015 10:32:58 UTC+2, Sven Nierlein a écrit :
> >
> > On 29/05/15 9:31, Fabrice Le Dorze wrote:
> > > I sent you the apache log by mail .
> >
> > I didn't get anything. Could you pastebin or gist the relevant error?
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Thruk" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to thruk+un...@googlegroups.com <javascript:> <mailto:thruk+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.

Fabrice Le Dorze

unread,
Jun 11, 2015, 5:09:03 PM6/11/15
to th...@googlegroups.com

Ok Menu is in thruk
But what should be set $OMD_ROOT in naemon context ? Where should it be set ?

Fabrice Le Dorze

unread,
Jun 12, 2015, 2:10:36 AM6/12/15
to th...@googlegroups.com
I found other errrors in Apache log :
broken pipe at lib/Thruk/Backend/Pool.pm line 33
        Thruk::Backend::Pool::__ANON__('PIPE') called at /usr/lib/naemon/perl5/Plack/Handler/FCGI.pm line 159
        eval {...} called at /usr/lib/naemon/perl5/Plack/Handler/FCGI.pm line 159
        Plack::Handler::FCGI::run('Plack::Handler::FCGI=HASH(0x6bac508)', 'CODE(0x8fd4b18)', 'HASH(0x6bc20b0)') called at /usr/lib/naemon/perl5/Catalyst/Engine.pm line 787
        Catalyst::Engine::run('Catalyst::Engine=HASH(0x6e5c278)', 'Thruk', 'CODE(0x8fd4b18)', undef, 'HASH(0x6bc20b0)', 'Plack::Handler::FCGI=HASH(0x6bac508)') called at /usr/lib/naemon/perl5/Catalyst.pm line 2701
        Catalyst::run('Thruk', undef, 'HASH(0x6bc20b0)', 'Plack::Handler::FCGI=HASH(0x6bac508)') called at /usr/lib/naemon/perl5/Catalyst/ScriptRole.pm line 99
        Catalyst::ScriptRole::_run_application('Catalyst::Script::FastCGI=HASH(0x561c798)') called at /usr/lib/naemon/perl5/Catalyst/ScriptRole.pm line 69
        Catalyst::ScriptRole::run('Catalyst::Script::FastCGI=HASH(0x561c798)') called at /usr/lib/naemon/perl5/Catalyst/ScriptRunner.pm line 50
        Catalyst::ScriptRunner::run('Catalyst::ScriptRunner', 'Thruk', 'FastCGI') called at /usr/share/naemon/script/thruk_fastcgi.pl line 28
[Fri Jun 12 07:39:47 2015] [warn] mod_fcgid: cleanup zombie process 6738
[Fri Jun 12 07:39:47 2015] [warn] mod_fcgid: cleanup zombie process 6867
broken pipe at lib/Thruk/Backend/Pool.pm line 33
        Thruk::Backend::Pool::__ANON__('PIPE') called at /usr/lib/naemon/perl5/Plack/Handler/FCGI.pm line 159
        eval {...} called at /usr/lib/naemon/perl5/Plack/Handler/FCGI.pm line 159
        Plack::Handler::FCGI::run('Plack::Handler::FCGI=HASH(0x7f7eb1abcbb8)', 'CODE(0x70095c8)', 'HASH(0x7f7eb1ad2780)') called at /usr/lib/naemon/perl5/Catalyst/Engine.pm line 787
        Catalyst::Engine::run('Catalyst::Engine=HASH(0x7f7eb1d25118)', 'Thruk', 'CODE(0x70095c8)', undef, 'HASH(0x7f7eb1ad2780)', 'Plack::Handler::FCGI=HASH(0x7f7eb1abcbb8)') called at /usr/lib/naemon/perl5/Catalyst.pm line 2701
        Catalyst::run('Thruk', undef, 'HASH(0x7f7eb1ad2780)', 'Plack::Handler::FCGI=HASH(0x7f7eb1abcbb8)') called at /usr/lib/naemon/perl5/Catalyst/ScriptRole.pm line 99
        Catalyst::ScriptRole::_run_application('Catalyst::Script::FastCGI=HASH(0x7f7eb0a56988)') called at /usr/lib/naemon/perl5/Catalyst/ScriptRole.pm line 69
        Catalyst::ScriptRole::run('Catalyst::Script::FastCGI=HASH(0x7f7eb0a56988)') called at /usr/lib/naemon/perl5/Catalyst/ScriptRunner.pm line 50
        Catalyst::ScriptRunner::run('Catalyst::ScriptRunner', 'Thruk', 'FastCGI') called at /usr/share/naemon/script/thruk_fastcgi.pl line 28

No idea if it is the root cause of my problem. Does it reveal a connection lost with a backend ?
 I found this kind of messages even on the Dev naemon server on which shadownaemon is activated.
The dilemna is the following :
- without shadownaemon : sometimes CPU crisis that make the interface unusable by our NOC operators who complain
- with shadownaemon : visibly never or rarely such crisis but no relevant in_check_period nor in_notification_period parameters.

Difficult to choose for the moment.

Fabrice Le Dorze

unread,
Jun 16, 2015, 3:52:59 AM6/16/15
to th...@googlegroups.com
Again this morning : server load is 30.

I found some advice on Internet for fcgi parameters in Apache.
<IfModule mod_fcgid.c>
IdleTimeout 3600
ProcessLifeTime 7200
MaxProcessCount 64
DefaultMaxClassProcessCount 8
IPCConnectTimeout 300
IPCCommTimeout 7200
BusyTimeout 300
</IfModule>

Any advice about Fcgi adjustments ?
Thx


Fabrice Le Dorze

unread,
Jun 16, 2015, 8:26:47 AM6/16/15
to th...@googlegroups.com

Another symptom : when server become crazy, we see many perl processes :

PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                                                                                                        
15835 www-data  20   0  384m  47m 2240 R   8,5  1,2   0:48.01 perl                                                                                                           
18244 www-data  20   0  435m  61m 1912 R   8,5  1,5   0:36.67 perl                                                                                                           
17007 www-data  20   0  451m  69m 2280 R   8,3  1,7   0:41.59 perl                                                                                                           
17430 www-data  20   0  447m  70m 2300 R   8,0  1,8   0:38.69 perl                                                                                                           
19082 www-data  20   0  152m  57m  832 R   8,0  1,5   0:13.62 perl                                                                                                           
19475 www-data  20   0  101m  31m 1852 R   7,8  0,8   0:09.01 perl                                                                                                           
19588 www-data  20   0  335m  83m 2460 R   7,2  2,1   0:20.96 perl                                                                                                           
19643 www-data  20   0  377m  86m 2432 R   7,2  2,2   0:20.77 perl                                                                                                           
19919 www-data  20   0 90184  26m 1784 R   7,2  0,7   0:06.07 perl                                                                                                           
20336 www-data  20   0  254m  73m 1896 R   7,2  1,9   0:14.10 perl                                                                                                           
20663 www-data  20   0  230m  49m 2372 R   7,2  1,3   0:06.40 perl                                                                                                           
21021 www-data  20   0 97004  26m 2372 R   7,2  0,7   0:04.02 perl                                                                                                           
21301 www-data  20   0 41152 5872 2068 R   7,2  0,1   0:00.83 perl                                                                                                           
20689 www-data  20   0  208m  43m 2372 R   7,0  1,1   0:05.64 perl                                                                                                           
20717 www-data  20   0  119m  35m 2372 R   7,0  0,9   0:05.03 perl                                                                                                           
20996 www-data  20   0  104m  27m 2368 R   7,0  0,7   0:04.31 perl                                                                                                           
21063 www-data  20   0 71896  17m 2356 R   7,0  0,4   0:02.84 perl                                                                                                           
21398 naemon    20   0 17264 2644 1580 R   7,0  0,1   0:00.27 snmpget                                                                                                        
19770 www-data  20   0  337m  91m 2480 R   6,7  2,3   0:21.66 perl                                                                                                           
20341 www-data  20   0  221m  57m  764 R   6,7  1,5   0:09.67 perl                   


ps -ef gives
www-data 15835 15834  5 14:09 ?        00:00:54 perl -x /usr/bin/thruk -a bpd
www-data 17007 17005  5 14:12 ?        00:00:47 perl -x /usr/bin/thruk -a bpd
www-data 17430 17429  6 14:13 ?        00:00:48 perl -x /usr/bin/thruk -a bpd
www-data 18244 18243  6 14:15 ?        00:00:43 perl -x /usr/bin/thruk -a bpd
www-data 19082 19076  3 14:17 ?        00:00:21 perl -x /usr/bin/thruk -a bpd
root     19133 19097  0 14:17 ?        00:00:03 perl -x /usr/share/naemon/thruk_auth
www-data 19475 19440  3 14:18 ?        00:00:15 perl -x /usr/bin/thruk -a bpd
www-data 19919 19910  2 14:19 ?        00:00:11 perl -x /usr/bin/thruk -a bpd
www-data 20336 20335  6 14:20 ?        00:00:22 perl -x /usr/bin/thruk -a bpd
www-data 20557 20543  2 14:21 ?        00:00:06 perl -x /usr/bin/thruk -a bpd
www-data 20663 19107  5 14:22 ?        00:00:12 perl /usr/share/naemon/script/thruk_fastcgi.pl
www-data 20665 20664  1 14:22 ?        00:00:04 perl -x /usr/bin/thruk -a bpd
www-data 20689 19107  5 14:22 ?        00:00:11 perl /usr/share/naemon/script/thruk_fastcgi.pl
www-data 20717 19107  5 14:22 ?        00:00:11 perl /usr/share/naemon/script/thruk_fastcgi.pl
www-data 20996 19107  4 14:22 ?        00:00:10 perl /usr/share/naemon/script/thruk_fastcgi.pl
www-data 21021 19107  4 14:22 ?        00:00:09 perl /usr/share/naemon/script/thruk_fastcgi.pl
www-data 21057 21052  1 14:23 ?        00:00:03 perl -x /usr/bin/thruk -a bpd
www-data 21063 19107  4 14:23 ?        00:00:08 perl /usr/share/naemon/script/thruk_fastcgi.pl
www-data 21301 19107  6 14:23 ?        00:00:09 perl /usr/share/naemon/script/thruk_fastcgi.pl
www-data 21466 21465  3 14:24 ?        00:00:04 perl -x /usr/bin/thruk -a bpd
www-data 21468 19107  5 14:24 ?        00:00:07 perl /usr/share/naemon/script/thruk_fastcgi.pl
www-data 21715 21702  4 14:25 ?        00:00:02 perl -x /usr/bin/thruk -a bpd
...

Sven Nierlein

unread,
Jun 16, 2015, 8:33:58 AM6/16/15
to th...@googlegroups.com
On 16/06/15 14:26, Fabrice Le Dorze wrote:
> Another symptom : when server become crazy, we see many perl processes :

The key is to find out if those numbers of perl processes are the result or
the source of the problem. If everything is slow already, requests will take
longer and the apache trys to compensate this by starting more perl processes
which makes the problem even worse.

Fabrice Le Dorze

unread,
Jun 16, 2015, 9:01:32 AM6/16/15
to th...@googlegroups.com

I agree but no way too figure it for the moment.

I tried to increase IPCCommTimeout from default to 7200, as I saw it in a post about Fcgi.
But according to what you say, It may make things worse. I went back to default one.

In normal time, it is reasonably slow, nothing abnormal compared to the Dev server with shadownaemon.

What about thruk -a bpd processes ? Just a consequence ? We killed them and load drastically went down.

What are the conditions where a new  /usr/share/naemon/script/thruk_fastcgi.pl process is created ?

Could it be the following scenario :
- one or several backends responding slow, for any reason
- request thus take longer
- apache creates new processes to compensate it
- but it increases load, making requests slower
- and so on

Fabrice Le Dorze

unread,
Jun 24, 2015, 10:06:04 AM6/24/15
to th...@googlegroups.com
Again a storm today
The usual messages in error log are :
Premature end of script headers: fcgid_env.sh, referer: http://hypervision0.inf.rms.loc/naemon/cgi-bin/login.cgi?
Connection reset by peer: mod_fcgid: error reading data from FastCGI server, referer ....
mod_fcgid: cleanup zombie process 18283

It seems to happen when a person or several persons do searches in Thruk. The Pnp4Nagios graphs for CPU reveals that it happen in working hours.
it does not seem to happen when just one session is opened as main view to consolidate all backend items.

I've seen some posts concerning FastCGI parameters Adjustment in Apache.
Like :
FcgidIdleTimeout 40->3600
FcgidProcessLifeTime 30->7200
FcgidMaxProcesses 40->64
FcgidMaxProcessesPerClass 8
FcgidMinProcessesPerClass 0
FcgidConnectTimeout 30->300
FcgidIOTimeout 70->7200
FcgidInitialEnv RAILS_ENV production
FcgidIdleScanInterval 10
IPCCommTimeout  7200
IPCConnectTimeout 180

Any advice ?

Fabrice Le Dorze

unread,
Sep 7, 2015, 2:53:39 PM9/7/15
to Thruk
Hi Sven.
I have installed Thruk 2 from Labs console Jessie depot and this kind of problems seems to have disappeared. Furthermore, the performance are much better, as good as former version with shadow naemon.
Am I dreaming or the code rewriting in Thruk 2 may be responsible for these improvement ?
If comfirmed, it solves completely our problems.

Sven Nierlein

unread,
Sep 8, 2015, 4:30:58 AM9/8/15
to th...@googlegroups.com
On 07.09.2015 20:53, Fabrice Le Dorze wrote:
> Hi Sven.
> I have installed Thruk 2 from Labs console Jessie depot and this kind of problems seems to have disappeared. Furthermore, the performance are much better, as good as former version with shadow naemon.
> Am I dreaming or the code rewriting in Thruk 2 may be responsible for these improvement ?
> If comfirmed, it solves completely our problems.

A lot has happenend for Thruk 2.x so yes, that might be possible that it already solves your problems.
But Thruk 2 should be even faster with shadownaemon, so give that a try too.

Cheers,
Sven

Fabrice Le Dorze

unread,
Sep 8, 2015, 5:05:17 AM9/8/15
to Thruk
Well, as I explained in https://github.com/naemon/naemon-core/issues/113#issuecomment-118803375, shadow naemon is not usable for us because of the in_notification_period and in_check_period problem, unless we do some code modifications.
So if the performance improvement is confirmed, I will be useless for the moment, as our NOC operators are completely satisfied by the stability and performance of this new Thruk version. We will see later if required.
Thanks a lot again for this great job !!!

Fabrice Le Dorze

unread,
Jan 4, 2016, 6:45:27 AM1/4/16
to Thruk
Hi Sven, the high load problem is reappearing.
Sometimes over 20 or 30 ....
But may be for other reasons.
 I found in Thruk Log  the following sequence :
[2016/01/04 06:01:45][hypervision0][ERROR][Thruk] unknown filter: plugin%20output at /usr/share/thruk/lib/Thruk/Utils/Status.pm line 947.
        Thruk::Utils::Status::single_search(Thruk::Context=HASH(0xb9facc8), HASH(0xb500f98)) called at /usr/share/thruk/lib/Thruk/Utils/Status.pm line 412
        Thruk::Utils::Status::do_search(Thruk::Context=HASH(0xb9facc8), ARRAY(0xc3019c0), "svc_") called at /usr/share/thruk/lib/Thruk/Utils/Status.pm line 234
        Thruk::Utils::Status::do_filter(Thruk::Context=HASH(0xb9facc8), "svc_") called at /usr/share/thruk/lib/Thruk/Controller/status.pm line 910
        Thruk::Controller::status::_process_combined_page(Thruk::Context=HASH(0xb9facc8)) called at /usr/share/thruk/lib/Thruk/Controller/status.pm line 103
        Thruk::Controller::status::index(Thruk::Context=HASH(0xb9facc8)) called at /usr/share/thruk/lib/Thruk.pm line 261
        eval {...} called at /usr/share/thruk/lib/Thruk.pm line 249
        Thruk::_dispatcher(HASH(0xc310d78)) called at /usr/lib/thruk/perl5/Plack/Util.pm line 142
        eval {...} called at /usr/lib/thruk/perl5/Plack/Util.pm line 142
        Plack::Util::run_app(CODE(0x8af32d8), HASH(0xc310d78)) called at /usr/lib/thruk/perl5/Plack/Handler/FCGI.pm line 143
        Plack::Handler::FCGI::run(Plack::Handler::FCGI=HASH(0x2473058), CODE(0x8af32d8)) called at /usr/share/thruk/script/thruk_fastcgi.pl line 24
[2016/01/04 06:01:45][hypervision0][ERROR][Thruk] internal server error
[2016/01/04 06:01:45][hypervision0][ERROR][Thruk] on page: http://hypervision0/thruk/cgi-bin/status.cgi?style=combined&hidesearch=2&hidetop=1&title=Vue%20Supervision&title=Supervision&title=Vue%20Supervision&hst_s0_hoststatustypes=4&hst_s0_servicestatustypes=31&hst_s0_hostprops=4466730&hst_s0_serviceprops=0&hst_s0_type=host&hst_s0_type=host&hst_s0_type=hostgroup&hst_s0_val_pre=&hst_s0_val_pre=&hst_s0_val_pre=&hst_s0_op=!~&hst_s0_op=!~&hst_s0_op=!~&hst_s0_value=SMH_%7CDSI_%7CCEAC_%7CDESD_%7CLVM_%7CVILM_&hst_s0_value=GOP_%7CDCP_&hst_s0_value=NON-RMS&hst_s0_value_sel=5&hst_s0_value_sel=5&hst_s0_value_sel=5&bookmarksp=Bookmarks%3A%3AVue%20Infra&bookmarksp=Bookmarks%3A%3AVILM&bookmarksp=Bookmarks%3A%3ALVM&bookmarksp=Bookmarks%3A%3AVue%20Supervision&all_col=&svc_s0_hoststatustypes=3&svc_s0_servicestatustypes=20&svc_s0_hostprops=2090&svc_s0_serviceprops=4466698&svc_s0_type=host&svc_s0_type=plugin%20output&svc_s0_type=host&svc_s0_type=servicegroup&svc_s0_type=hostgroup&svc_s0_val_pre=&svc_s0_val_pre=&svc_s0_val_pre=&svc_s0_val_pre=&svc_s0_val_pre=&svc_s0_op=!~&svc_s0_op=!~&svc_s0_op=!~&svc_s0_op=!~&svc_s0_op=!~&svc_s0_value=SMH_%7CDSI_%7CCEAC_%7CDESD_%7CLVM_%7CVILM_&svc_s0_value=Service%20Check%20Timed%20Out%7CReturn%20code%20of%20255%20is%20out%20of%20bounds%7CSocket%20timeout%20after&svc_s0_value=GOP_%7CDCP_&svc_s0_value=NON-RMS&svc_s0_value=NON-RMS&svc_s0_value_sel=5&svc_s0_value_sel=5&svc_s0_value_sel=5&svc_s0_value_sel=5&svc_s0_value_sel=5

Im apache log, messages like 'mod_fcgid: read data timeout in 120 seconds'.

Would it be a problem with filters ? This filter is working fine in normal conditions. I have noticed that when our NOC operators tries to modify such a filter, they cannot save it, and server become crazy too.



Le mardi 8 septembre 2015 10:30:58 UTC+2, Sven Nierlein a écrit :
Reply all
Reply to author
Forward
0 new messages