Thruk alert history log parsing - slower and with side effects

632 views
Skip to first unread message

bjornf

unread,
Sep 4, 2012, 2:28:33 AM9/4/12
to th...@googlegroups.com
Hi,

It seems that alert history log parsing is quite a bit slower via Thruk than directly from Nagios web-interface. When doing searches it also seems to affect the parent Nagios process, causing check latency to increase. When doing it directly from Nagios history.cgi is used and is not affecting parent Nagios process much, if any!?

Anything that can be done to prevent and speedup this? I suppose this is not limited to alert history but reporting in general. 

Regards, Bjorn

bjornf

unread,
Sep 4, 2012, 2:42:56 AM9/4/12
to th...@googlegroups.com

This is the "state" of the Nagios parent process, even after alert history parsing was completed:

10426 nagios    25   0  939m 766m 2176 R 98.6  6.4 563:14.35 nagios

Seems to continue to use a lot of CPU and memory and by that maintaining high check latency, see attached image.  I think restarting the Nagios process will "resolve" this. 

bjornf

unread,
Sep 4, 2012, 3:28:49 AM9/4/12
to th...@googlegroups.com
Actually things recovered by itself after a bit less than an hour. Check latency is down to 0.5sec and CPU and memory usage of Nagios parent proxy is "OK":

10426 nagios    25   0  435m 268m 2176 R 73.2  2.2 601:09.59 nagios

Still, would be interested to know if this can be prevented. I think it was triggered by alert history searching via Thruk.  Also seems to apply to SLA reports.

Sven Nierlein

unread,
Sep 4, 2012, 3:40:38 AM9/4/12
to th...@googlegroups.com
According to
http://mathias-kettner.de/checkmk_livestatus.html#H1:%20Setting%20up%20and%20using%20Livestatus
you may want to set a lower "max_cached_messages" value.
If that does not help, thruk supports a transparent log cache mechanism based on a mongodb.


On 9/4/12 9:28, bjornf wrote:
> Actually things recovered by itself after a bit less than an hour. Check latency is down to 0.5sec and CPU and memory usage of Nagios parent proxy is "OK":
>
> 10426 nagios 25 0 435m 268m 2176 R 73.2 2.2 601:09.59 nagios
>
> Still, would be interested to know if this can be prevented. I think it was triggered by alert history searching via Thruk. Also seems to apply to SLA reports.
>
>
> On Tuesday, September 4, 2012 8:42:56 AM UTC+2, bjornf wrote:
>
>
> This is the "state" of the Nagios parent process, even after alert history parsing was completed:
>
> 10426 nagios 25 0 939m 766m 2176 R 98.6 6.4 563:14.35 nagios
>
> Seems to continue to use a lot of CPU and memory and by that maintaining high check latency, see attached image. I think restarting the Nagios process will "resolve" this.
>
> <https://lh6.googleusercontent.com/-lCsbza1iats/UEWiihPyjII/AAAAAAAAAFs/sqhxiZi89pg/s1600/check-latency.png>

bjornf

unread,
Sep 4, 2012, 3:03:27 PM9/4/12
to th...@googlegroups.com
MongoDB means Shinken, right? Meaning it's not supported by Nagios.

Sven Nierlein

unread,
Sep 7, 2012, 3:12:49 AM9/7/12
to th...@googlegroups.com
On 9/4/12 21:03, bjornf wrote:
> MongoDB means Shinken, right? Meaning it's not supported by Nagios.

No, MongoDB has nothing to do with shinken. Its just used as transparent cache for logfiles.
That works will all cores.

Sven

bjornf

unread,
Sep 14, 2012, 1:35:57 AM9/14/12
to th...@googlegroups.com
Tried searching for installation guidelines but couldn't find any.   Any hints on how it's installed and configured for Nagios? Or perhaps it's straightforward?

Sven Nierlein

unread,
Sep 15, 2012, 3:56:09 AM9/15/12
to th...@googlegroups.com
On 9/14/12 7:35, bjornf wrote:
> Tried searching for installation guidelines but couldn't find any. Any hints on how it's installed and configured for Nagios? Or perhaps it's straightforward?

There are no guides currently but its pretty straightforward. You just need to configure the
mongodb connection string. Although this feature is still somehow experimental. I don't know
any bugs but its probably not used by a large number of users.

You should then run the thruk cli binary to import the logfiles initially:

thruk -a importlogs

From time to time you could run
thruk -a updatelogs
but thats done automally on pages which access logfiles, doing this regulary just speeds
things up a bit.

Sven

bjornf

unread,
Sep 15, 2012, 7:35:29 AM9/15/12
to th...@googlegroups.com
Hi,

Why is it trying to connect to localhost port 80? Fails:

thruk -a importlogs
[Sat Sep 15 13:24:18 2012][ERROR]  -> failed: $VAR1 = bless( {
[Sat Sep 15 13:24:18 2012][ERROR]                  '_content' => 'Can\'t connect to localhost:80 (Connection refused)
[Sat Sep 15 13:24:18 2012][ERROR]
[Sat Sep 15 13:24:18 2012][ERROR] LWP::Protocol::http::Socket: connect: Connection refused at /usr/lib/thruk/perl5/LWP/Protocol/http.pm line 51.
[Sat Sep 15 13:24:18 2012][ERROR] ',
[Sat Sep 15 13:24:18 2012][ERROR]                  '_rc' => 500,
[Sat Sep 15 13:24:18 2012][ERROR]                  '_headers' => bless( {
[Sat Sep 15 13:24:18 2012][ERROR]                                         'client-warning' => 'Internal response',
[Sat Sep 15 13:24:18 2012][ERROR]                                         'client-date' => 'Sat, 15 Sep 2012 11:24:18 GMT',
[Sat Sep 15 13:24:18 2012][ERROR]                                         'content-type' => 'text/plain'
[Sat Sep 15 13:24:18 2012][ERROR]                                       }, 'HTTP::Headers' ),
[Sat Sep 15 13:24:18 2012][ERROR]                  '_msg' => 'Can\'t connect to localhost:80 (Connection refused)',
[Sat Sep 15 13:24:18 2012][ERROR]                  '_request' => bless( {
[Sat Sep 15 13:24:18 2012][ERROR]                                         '_content' => 'data=%7B%22options%22%3A%7B%22verbose%22%3A0%2C%22version%22%3Anull%2C%22listbackends%22%3Anull%2C%22remoteurl_specified%22%3A0%2C%22local%22%3Anull%2C%22auth%22%3Anull%2C%22backends%22%3A%5B%5D%2C%22remoteurl%22%3A%22http%3A%2F%2Flocalhost%2Fthruk%2Fcgi-bin%2Fremote.cgi%22%2C%22action%22%3A%22importlogs%22%2C%22help%22%3Anull%2C%22credential%22%3A%225518882a633a5df72d4704e0c5390b64%22%7D%2C%22credential%22%3A%225518882a633a5df72d4704e0c5390b64%22%7D',
[Sat Sep 15 13:24:18 2012][ERROR]                                         '_uri' => bless( do{\(my $o = 'http://localhost/thruk/cgi-bin/remote.cgi')}, 'URI::http' ),
[Sat Sep 15 13:24:18 2012][ERROR]                                         '_headers' => bless( {
[Sat Sep 15 13:24:18 2012][ERROR]                                                                'user-agent' => 'libwww-perl/6.04',
[Sat Sep 15 13:24:18 2012][ERROR]                                                                'content-type' => 'application/x-www-form-urlencoded',
[Sat Sep 15 13:24:18 2012][ERROR]                                                                'content-length' => 454
[Sat Sep 15 13:24:18 2012][ERROR]                                                              }, 'HTTP::Headers' ),
[Sat Sep 15 13:24:18 2012][ERROR]                                         '_method' => 'POST'
[Sat Sep 15 13:24:18 2012][ERROR]                                       }, 'HTTP::Request' )
[Sat Sep 15 13:24:18 2012][ERROR]                }, 'HTTP::Response' );

Sven Nierlein

unread,
Sep 15, 2012, 12:35:01 PM9/15/12
to th...@googlegroups.com
On 9/15/12 13:35, bjornf wrote:
> Why is it trying to connect to localhost port 80? Fails:


Because the command line tool trys to run the command via the fastcgi server first which is much faster. Therefor
it trys to contact the fastcgi server first unless you run the commands with "--local".
If you installed Thruk somewhere else, you might create a .thruk file and put
export REMOTEURL="http://server/thruk/cgi-bin/remote.cgi"
into it.

bjornf

unread,
Sep 15, 2012, 2:18:37 PM9/15/12
to th...@googlegroups.com
Indeed local seems very slow. I have around 10GB of logs. Is that even worth trying to import?

How do I provide username and password when going via fastcgi? It's on the same machine so I should perhaps make them available without authentication for localhost?

bjornf

unread,
Sep 16, 2012, 3:35:32 AM9/16/12
to th...@googlegroups.com
Btw, does Thruk (time)limit the log query to livestatus according to this:

On Saturday, September 15, 2012 6:35:03 PM UTC+2, Sven Nierlein wrote:

Sven Nierlein

unread,
Sep 17, 2012, 3:23:22 AM9/17/12
to th...@googlegroups.com
On 9/15/12 20:18, bjornf wrote:
> Indeed local seems very slow. I have around 10GB of logs. Is that even worth trying to import?


Depends on what you want to do... Usually keeping the last year of logfiles is enough.

Sven Nierlein

unread,
Sep 17, 2012, 3:24:13 AM9/17/12
to th...@googlegroups.com
On 9/16/12 9:35, bjornf wrote:
> Btw, does Thruk (time)limit the log query to livestatus according to this:
>
> http://mathias-kettner.de/checkmk_livestatus.html#H1:Access%20to%20Logfiles


Definitly, you don't want to know what would happen if you leave the filter out on
10gb logfiles.

Sven Nierlein

unread,
Sep 17, 2012, 3:38:21 AM9/17/12
to th...@googlegroups.com, bjornf
On 9/15/12 13:35, bjornf wrote:
> Tried searching for installation guidelines but couldn't find any.

Hi,

Do you want to write a few lines for http://thruk.org/advanced.html once your
setup works?

Thanks,
Sven

bjornf

unread,
Sep 17, 2012, 7:53:39 AM9/17/12
to th...@googlegroups.com, bjornf
Sure, but, since going to mod-gearman yesterday I'm not sure I need mongoDB. Things are working much better now when the Nagios process is not as busy.  All log tasks from/via Thruk interface are much faster and does not seem to cause check latency to increase dramatically as it did before. At least so far. 

Sven Nierlein

unread,
Sep 17, 2012, 7:57:24 AM9/17/12
to th...@googlegroups.com
The side effect of accessing logfiles via livestatus is a growing nagios process.
If the process grows in memory, all forks of nagios will get slower and use more
system ressources. Using Mod-Gearman helps a lot, because its worker will not
grow in memory.
So yes, Thruk and Mod-Gearman is a good combo.

bjornf

unread,
Sep 18, 2012, 9:21:17 AM9/18/12
to th...@googlegroups.com
But, I suppose it grows even with Mod-Gearman and Thruk then? Is it to a point that Nagios will have to be restarted sooner or later?

Sven Nierlein

unread,
Sep 19, 2012, 4:18:35 AM9/19/12
to th...@googlegroups.com
On 9/18/12 15:21, bjornf wrote:
> But, I suppose it grows even with Mod-Gearman and Thruk then? Is it to a point that Nagios will have to be restarted sooner or later?

It grows, but it does not affect performance when using Mod-Gearman. But you have to restart nagios all the time due to
config changes anway, right?

bjornf

unread,
Sep 19, 2012, 5:25:47 AM9/19/12
to th...@googlegroups.com
Not restart but reload. I noticed in the past that restarting Nagios helps from a performance point of view, but that was before Mod-Gearman. Not sure how it is now, yet.
Reply all
Reply to author
Forward
0 new messages