Check latency issues

Thomas Wittmann

unread,

Mar 31, 2017, 5:50:59 AM3/31/17

to synagios-users

Hi
now i have a question too:
i have Synagios running on a Syno DS216j, have 3 hosts with 28 services in total.
Though Nagios seems to execute all checks in time, the Check latency check always says there are at least 5 checks running >600s late.
Load and CPU checks seem just fine.
Can anyone help me about this?

Tom

Mark Clarkson

unread,

Apr 2, 2017, 5:21:54 PM4/2/17

to synagios-users

Hi Tom,
The 'Check latency' check is not particularly smart. It assumes that every service check is set to run every 5 minutes. I assume you have 5 checks that run less often.

Here's the help from that plugin:

------------------- snip -----------------

Usage: check_statusdat_latency [options]

-h   : Display this help text.
-np : Don't output performance data.
-wp : Warning threshold for percentage of checks that are late.
-cp : Critical threshold for percentage of checks that are late.
-wl : Warning threshold for average lateness of late checks.
-cl : Critical threshold for average lateness of late checks.
-i   : Check interval of most checks.
-s   : Location of nagios status.dat.

Running with no options is equivalent to:
    check_statusdat_latency -wp 5 -cp 50 -wl 60 -cl 300

------------------- snip -----------------

That plugin needs an update to do it dynamically, ignoring the -i option. This should fix it (it does for me!):

Find the lines in that plugin:

    stat=`sed -n '/servicestatus {/,/ *}/p' $statusdat | \
        awk -v interval=$interval -v offset=$offset '
            BEGIN { tt=0;t=0;ttn=0;min=1000;
            "/bin/date +%s" | getline b
            b=b-offset} /current_state=[123]/ { num_warncrit=num_warncrit+1; }
            /last_check=/ {
                tot=tot+1;
                a=substr( $0, index($0, "=")+1 );
                if(a<=1) { pending=pending+1; next };

and add the section:

            /check_interval=/ {
                interval=substr( $0, index($0, "=")+1 );
                interval=interval*60;
            };

so it looks like:

    stat=`sed -n '/servicestatus {/,/ *}/p' $statusdat | \
        awk -v interval=$interval -v offset=$offset '
            BEGIN { tt=0;t=0;ttn=0;min=1000;
            "/bin/date +%s" | getline b
            b=b-offset} /current_state=[123]/ { num_warncrit=num_warncrit+1; }
            /check_interval=/ {
                interval=substr( $0, index($0, "=")+1 );
                interval=interval*60;
            };
            /last_check=/ {
                tot=tot+1;
                a=substr( $0, index($0, "=")+1 );
                if(a<=1) { pending=pending+1; next };

The above fix assumes that "check_interval" always comes before "last_check" in the status.dat file. This is a bad assumption but in reality I think it is always true. If not then some extra checks (flags) would need to be added to the code to cover those cases.

Cheers!
Mark

Thomas Wittmann

unread,

Apr 3, 2017, 5:50:11 AM4/3/17

to synagios-users

Hi Mark
thank you. I thought this check is for checking if all checks (hahaha 3 times check in one sentence) are executed and computed with results on time.
At the moment i have no SSH access so this works for me now: check_statusdat_latency!-s /var/cache/nagios3/status.dat -wp 30 -cp 60 -wl 600 -cl 1000
Reading the plugin help would have been too easy :-D

Cheers from "Home of quattro" ;-)

Tom

Reply all

Reply to author

Forward