Memory leak rrdbotd

23 views
Skip to first unread message

Bart van den Heuvel

unread,
Jun 3, 2009, 3:26:05 AM6/3/09
to rrd...@googlegroups.com
Hi Rrdbotters,
 
I'm running rrdbot in a somewhat larger test. It is now polling around 4.000 networkports. It's fast, simpel but it is not as stable as i would like. It turns out that rrdbot runs for just a few hours before it consumes al memory. This is the error we see in the logfile (which has been piped from stderr):
 
rrdbotd: out of memory: Cannot allocate memory
 
 
Ps -v <pidnr> shows the following:
 
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
16685 ? Sl 791:08 149794 110 3139965 2280744 81.1 /usr/local/sbin/rrdbotd
 
Free shows the following:
total used free shared buffers cached
Mem: 2810880 2745856 65024 0 26532 183112
-/+ buffers/cache: 2536212 274668
Swap: 2031608 847648 1183960
 
The rrdbot is running on a Centos 5.2 machine, it has the whole machine for it's polling duty.
It's an old HP Proliant DL380 machine with a dual Xeon processor.
 
These packages are installed (from rpmforge)
rpm -qa | grep rrd

perl-rrdtool-1.2.30-1.el5.rf
rrdtool-1.2.30-1.el5.rf
php-rrdtool-1.0.50-3.el5.rf
rrdtool-devel-1.2.30-1.el5.rf
ldd rrdbotd shows the following:
 
        linux-gate.so.1 =>  (0x00d8c000)
        librrd.so.2 => /usr/lib/librrd.so.2 (0x00b91000)
        libpthread.so.0 => /lib/i686/nosegneg/libpthread.so.0 (0x00537000)
        libc.so.6 => /lib/i686/nosegneg/libc.so.6 (0x003ba000)
        libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0x00101000)
        libpng12.so.0 => /usr/lib/libpng12.so.0 (0x00dd7000)
        libz.so.1 => /usr/lib/libz.so.1 (0x00961000)
        libart_lgpl_2.so.2 => /usr/lib/libart_lgpl_2.so.2 (0x041a5000)
        libm.so.6 => /lib/i686/nosegneg/libm.so.6 (0x00594000)
        /lib/ld-linux.so.2 (0x00519000)
 
Could someone please help me with this isseu? Any ideas on how this is caused?
Kind regards,
 
Bart

Stef Walter

unread,
Jun 10, 2009, 3:52:58 PM6/10/09
to rrd...@googlegroups.com
Bart van den Heuvel wrote:
> Hi Rrdbotters,
>
> I'm running rrdbot in a somewhat larger test. It is now polling around 4.000
> networkports. It's fast, simpel but it is not as stable as i would like. It
> turns out that rrdbot runs for just a few hours before it consumes al
> memory. This is the error we see in the logfile (which has been piped from
> stderr):
>
> rrdbotd: out of memory: Cannot allocate memory

Yikes, that'd be good to get fixed. I personally run it with thousands
too, but haven't seen a leak like that.

Do you have valgrind installed (or could you install it) on your system?
Once that's done, we'd run it like this:

valgrind --leak-check=full --show-reachable=yes rrdbotd -d 1

rrdbot will run in the foreground and slower than normal. Once you see
memory leaking could you kill rrdbotd (with Ctrl-C) and send me the output?

Thanks and looking forward to getting this fixed,

Stef

Bart van den Heuvel

unread,
Jun 11, 2009, 1:23:26 PM6/11/09
to rrd...@googlegroups.com
Hi Stef,
 
Thank you for responding. Seems like a simple task, however, i have bit of a workload this week.
I will get back to you somewhere next week.
 
(Why don't i have a memory leak... Just put a whole lot of *nice* thoughts in my brain until it's full and than rollover ;-)

Regards, Bart
 
2009/6/10 Stef Walter <stef...@memberwebs.com>

Bart van den Heuvel

unread,
Sep 23, 2009, 4:22:09 AM9/23/09
to rrd...@googlegroups.com
Hi Stef,

It's been a while, sorry about that. yesterday i spend some time on rrdbot and it's memory mystery. I started a valgrind session. It too ran out of memory and gave up. i will monitor the session more closely and quit sooner.

Anyway, some results for nowm, will post more results later today:


rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range
rrdbotd: out of memory: Numerical result out of range

Valgrind's memory management: out of memory:
   newSuperblock's request for 1048576 bytes failed.
   3127234560 bytes have already been allocated.
Valgrind cannot continue.  Sorry.



2009/6/11 Bart van den Heuvel <zok...@gmail.com>



--
Groeten,

Bart van den Heuvel

Any society that would give up a little liberty to gain a little security
will deserve neither and lose both.
Benjamin Franklin

Bart van den Heuvel

unread,
Sep 24, 2009, 5:10:58 AM9/24/09
to rrd...@googlegroups.com
Hello Stef,

Here's the data. I've cleared the logs of ipadresses as my client won't let me publish them. The ip addresses have been replaced by <ipaddress>. rrdbotd ran for 12-13 houres. It drained the machine of all it's memory.
See the attached logfile.

Hope this helpes to solve the situation.

Regards,
Bart van den Heuvel

2009/9/23 Bart van den Heuvel <zok...@gmail.com>
rrdbot.valgrind.txt

Stef Walter

unread,
Sep 24, 2009, 3:45:37 PM9/24/09
to rrd...@googlegroups.com, zok...@gmail.com
Bart van den Heuvel wrote:
> Hello Stef,
>
> Here's the data. I've cleared the logs of ipadresses as my client won't let
> me publish them. The ip addresses have been replaced by <ipaddress>. rrdbotd
> ran for 12-13 houres. It drained the machine of all it's memory.
> See the attached logfile.
>
> Hope this helpes to solve the situation.

Yes, it certainly has. Attached is a patch, which should fix the
problem. I've identified three leaks. The one related to snmp_pdu_clear
(in the patch) was the main one leaking in your case.

Just to cross reference things ... you were having rrdbotd query octet
strings (ie: values that are not integers or counters) is that correct?
It seems that way, and that normally shouldn't be a problem, just wanted
to see if I understood the problem correctly.

Please let me know if this fixes your problem. Thanks!

Cheers,

Stef

rrdbot-leak-fixes.patch

Bart van den Heuvel

unread,
Sep 25, 2009, 4:23:21 AM9/25/09
to st...@memberwebs.com, rrd...@googlegroups.com
Thanks!

I ran the patche, it's running now. we will see how it holds.

Hmm... all i want is get traffic from lots of interfaces, see the config below. I'm after counter data.
Maybe there is some unexpected data reported back?

Regards, Bart

# The two fields to store in the RRD: 'in' and 'out'. The
# interface number (at the end of the OID) may vary from
# router to router. In this example the SNMP community is
# 'public'
in.source: snmp://public@<ipaddress>/ifInOctets?ifDesc=GigabitEthernet1/0/4
out.source: snmp://public@<ipaddress>/ifOutOctets?ifDesc=GigabitEthernet1/0/4

# You might also use table queries to acheive the above.
# If the interface's names is 'eth0', then this would work.
#
# in.source: snmp://pub...@router.example.com/ifInOctets?ifDescr=eth0
# out.source: snmp://pub...@router.example.com/ifOutOctets?ifDescr=eth0

# Poll every 5 minutes of 300 seconds
interval: 300



# These settings are used by rrdbot-create --------------------------------
[create]

# Counters are values that continually increase
in.type: COUNTER
in.min: 0
out.type: COUNTER
out.min: 0



2009/9/24 Stef Walter <stef...@memberwebs.com>

Stef Walter

unread,
Sep 28, 2009, 10:05:33 PM9/28/09
to rrd...@googlegroups.com
Bart van den Heuvel wrote:
> Thanks!
>
> I ran the patche, it's running now. we will see how it holds.

How's it looking?

Cheers,

Stef

Bart van den Heuvel

unread,
Sep 29, 2009, 2:28:42 AM9/29/09
to rrd...@googlegroups.com
Hi Stef,

It look allright. It still takes a more memory over time. I've ran it for about 10 hours with the memchecker and there was no  evidence of any leak. I'm affraid that i did not got around to a longer test.

tomorow ill be in office again and start a test with for a longer period.
Thanks again for the quick patch, i'll post the results of the friday test.

regards, Bart

2009/9/29 Stef Walter <stef...@memberwebs.com>

Bart van den Heuvel

unread,
Sep 30, 2009, 4:16:37 AM9/30/09
to rrd...@googlegroups.com
Hi Stef,

This is last weeks valgrind test. It shows good numbers.

Regards, Bart


2009/9/29 Bart van den Heuvel <zok...@gmail.com>
rrdbot-1.valgrind.txt

Stef Walter

unread,
Sep 30, 2009, 1:00:02 PM9/30/09
to rrd...@googlegroups.com, zok...@gmail.com
Bart van den Heuvel wrote:
> Hi Stef,
>
> This is last weeks valgrind test. It shows good numbers.

Yes. All of the still reachable blocks are ones that are loaded for the
lifetime of the application. So they're not memory leaks per se.

That said, I thought that they are freed when rrdbotd stops. Strange.

Anyway, thanks for your help in finding the leak. Much appreciated! I'll
include the patches in the next release.

Cheers,

Stef

Reply all
Reply to author
Forward
0 new messages