OOM killer is killing my weewx program from time to time. I am seen
this problem since I upgraded to 2.5, but I am sending my data to
quite a few online services (recent change), so maybe the problem
existed too in 2.4.
Currently my weewx process is growing around 2MB per hour so far. My
archive interval is 5 minutes. I will try to debug it myself, but
debugging memory leaks are tricky if the program is not using
different classes to store data :-).
Is anybody else seeing this?.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 25/11/13 15:24, Jesus Cea wrote:Neither in the "phone home" feature (for the user map in weewx website).
> I have quite a few reports, and REST services enabled. I played
> bisection with them and I am pretty sure that the issue is in
> report generation. It is not in the FTP service neither in the REST
> client.
I disabled everything. No leaks. I activate ONLY a simple report, with
no graphics and a single webpage. Nothing else, neither the FTP
upload. It leaks a bit. Then I enabled my main report, more heavy,
graphics, etc., and it leaks about 90 MB/day.
So, my bet is the report generation system.
I just add a "gc.collect()" in the main loop, to discard cyclic
references. Hope I did correctly. It still leaks.
- --
Jesús Cea Avión _/_/ _/_/_/ _/_/_/
jc...@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
Twitter: @jcea _/_/ _/_/ _/_/_/_/_/
jabber / xmpp:jc...@jabber.org _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
- --
Jesús Cea Avión _/_/ _/_/_/ _/_/_/
jc...@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
Twitter: @jcea _/_/ _/_/ _/_/_/_/_/
jabber / xmpp:jc...@jabber.org _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
I’ve been looking at this since I am suffering from this as well and its been confusing me a lot.
Part of the problem is separating fragmentation from leaks.
Anyway I think I just found the cause for me which is the PIL image library draw.text method leaks.
I’ve commented that out and most of the rest of the code is the same (not completely so I will need to put things back together to be completely sure) and memory usage seems to be stable.
So this would depend on the system you have and whether it has replace image with PIL and also possibly what version of PIL is being used I guess.
I’ll dig into that a bit more tonight.
Jesus to test this comment out all the calls to draw.text in bin/weeplot/genplot.py and see if that helps for you as well.
John
- --
Jesús Cea Avión _/_/ _/_/_/ _/_/_/
- --
Jesús Cea Avión _/_/ _/_/_/ _/_/_/
jc...@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
Twitter: @jcea _/_/ _/_/ _/_/_/_/_/
jabber / xmpp:jc...@jabber.org _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
From: Jesus Cea <jc...@jcea.es>
To: weewx-de...@googlegroups.com
Sent: Wednesday, 11 December 2013, 15:00
Subject: Re: [weewx-development] Re: heads up: Memory leak in weewx
Jesús Cea Avión _/_/ _/_/_/ _/_/_/
jabber / mailto:xmpp%3Aj...@jabber.org _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
I think I’ve just found the leak in pillow.
I’ve only been running for a short time but there is a leak in its drawtext function that leaks a FreeType Glyph on every call to drawtext for each character.
This is in the latest version I downloaded the source for.
I’ll report it to Pillow & PIL mailing lists. But if you want to try it then in pillow’s source in the file _imageingft.c in font_getsize(…)
After the calls
FT_Get_Glyph(…)
FT_Glyph_Get(…)
Add
FT_Done_Glyph(glyph);
And yes I agree with your comments about gc.collect(). The only possible thing I think it could change is memory fragmentation. But now I’ve got rid of my big leak I can put everything back together and see what happens.
John
And 1 more thing this bug is just in pillow which explains why I don’t see it in the system using PIL and why it started happening for me recently (ish) when Arch Linux replaced PIL with PILLOW.
So I’ll just report it to Pillow.
John
No its c code implementation around the freetype library so not related to pythons garbage collecting.
It has been reported against pillow and acknowledged and they appear to be trying to do a point release (2.2.2) in the next few days which might include this fix if we are lucky, if not it looks like there might be a 2.3.0 within a month.
Unfortunately this isn’t the whole thing. It is my big leak but the memory usage has been creeping up slowly overnight.
After 9 ½ hours it was using 10695Kb
After about 13 hours it is now using 10733Kb
So this isn’t the complete answer but much better.
I am running most of the code again.
I’ve just removed the reportgenerator and put the imagegeneration back to the complete code again so I’ll leave that running for the day and see how that looks.
John
I have continued investigating and I was convinced there was still a leak in the Pillow libraries. I ended up writing a test app that sat in a loop and created images & drew text and have just found it.
Pillow is leaking the font names every time we call getfont which we do for every graph each time it is drawn.
So have a fix for it and the problem also occurs in the original PIL code and my test app has created over 200,000 images with no leaks.
We could avoid it (and probably speed things up very slightly) if we cached the font handles since we will be recreating almost identical ones for each graph.
If you think this would be worth doing I can consider looking at it after Christmas.
Anyway I will submit an issue to PIL & Pillow later in the week as well.
I’ve just restarted weewx with the fix and will see what it looks like (just running imagegenerator at the moment).
Then I will start putting all the code back together again and see what it looks like.
- --
Jesús Cea Avión _/_/ _/_/_/ _/_/_/
jc...@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
Twitter: @jcea _/_/ _/_/ _/_/_/_/_/
jabber / xmpp:jc...@jabber.org _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
So just an update on where we are
My system is using a Davis Vantage VUE with the archive interval at 1 minute and of course since most of this is related to archive processing the archive rate will affect the rate at which we leak.
Issue 1) There is 1 issue in PILLOW which leaks huge amounts of memory (Mb’s per hour) leaking 1 glyph per character rendered to an image.
https://github.com/python-imaging/Pillow/issues/446 has been submitted and should therefore be in the 3.0 release which looks to be coming soon.
Issue 2) There is a leak of filenames passed to getfont in PILLOW and PIL. I have submitted a pull request to PILLOW for this
https://github.com/python-imaging/Pillow/pull/459 and also reported against PIL https://bitbucket.org/effbot/pil-2009-raclette/issue/39/getfont-can-leak-the-memory-used-for . I hope this will get into the next PILLOW release as well . I don’t expect anything to happen for PIL though. This leak is approximately 40Kb per hour. I will try to generate a workaround for this in weewx and I think I know what I will do now and I don’t think it will be hard.
Remaing issues. I still have a leak of about 4Kb per hour which I am seeing using the simulator and only running StdArchive & StdPrint services.
So that rules out things like USB libraries talking to the weather station. I am running the simulator as fast as I can with a 1 minute archive interval (any less than that and I hit a divide by Zero somewhere)
This morning I stopped the Archive service writing to the database and I had a small memory increase after 1.4 hours but it has stayed the same since then (9.5 hours). So I guess this leak is probably somewhere in the database writing code.
I’m going to leave it running overnight and see what it looks like tomorrow morning. Then I’ll try and enable statsdb writing only and see what happens and move on from there.
This is with sqlite and it could be different with mysql depending on where the error is.
John
From: weewx-de...@googlegroups.com [mailto:weewx-de...@googlegroups.com] On Behalf Of Thomas Keffer
Sent: 25 December 2013 13:14
To: weewx-de...@googlegroups.com
Subject: Re: [weewx-development] Re: heads up: Memory leak in weewx
Hi, Jesus
PILLOW has now merged the fix https://github.com/python-imaging/Pillow/pull/459 into its latest code so when PILLOW 3.0 releases things should be much better.
It looks like our last leak is in sqlite writing to the database so I wrote a test application to try it and it shows the same results.
Using a copy of the archive database it repeatedly adds rows increasing the timestamp each time which showed the same memory increase of about 30Kb per ~640 insertions. This carried on until we had inserterd 12231 times and at that point the memory stopped growing. In my test app I hit this point after 23 minutes and left it running for 9 hours and it still hadn’t changed.
I tried using apws instead of pysqlite and that seems to be faster but has similar growth. So I suspect this is some sort of caching going on in sqlite and isn’t worth chasing.
If my calculations are correct with 1 insertion happening at each archive interval (per database) and that happening at most every minute then it would take 8 to 9 days before our memory stabilises and I haven’t been leaving any of these tests running that long.
So I am not intending to chase this memory leak any more and I think once I get the code written to work around the font leak then things are probably ok. So I’m going to start concentrating on that now.
John and JesusHave you had a chance to try the patch? I've been running v2.6.0a5, which includes the patch, and I can't see it making any difference.
The problem is my low memory system that was really suffering has the fixed version of PILLOW on it as well so the patch doesn’t help there. But there is also an underlying growth in cached memory use within the sqlite libraries which will go on for several days before hitting a peak. And how long that lasts for will depend on your archive interval. I think it is about 5 days with a 1 minute archive interval so with a 5 minute interval that could be a few weeks. So this could well be that.
Do you have a comparison for running without the patch?
John
Matthew, I can't read the vertical scale on your plot. Do you have the numbers? Also, is this the virtual image size, or the resident image?
Any way to tell how much of that memory is used by weewx? I'm sure hoping it isn't 200+ Mb!
And, because you have no swap space, this is the equivalent of RES memory?
Tom can you post how you made your graph happen ? I can hook it into my 2.6.0a6 setup and let it run for a bit if you want.
That’s good to know.
It always seems strange that calling the garbage collection should be necessary but it does seem to change the behaviour enough which is strange.
John
From: weewx-de...@googlegroups.com [mailto:weewx-de...@googlegroups.com] On Behalf Of Thomas Keffer
Sent: 07 February 2014 20:17
To: John Harvey
Cc: Jesus Cea; weewx-de...@googlegroups.com
Subject: Re: [weewx-development] Re: heads up: Memory leak in weewx
I added a call to garbage collection about once an hour. It seems to stabilize memory use after a couple days. I think John had it right.
I would be interested to see what it is like in a couple of weeks. If you are using sqlite its memory usage and buffering is strange and takes a long time to stabilise so I still wouldn’t be surprised to see it settle down in another week or so.
If you can leave it running it would be interesting to see. I’ll add the monitoring like this to my 2 systems as well and see what they look like over the next couple of weeks.
a=[1,2,3]a=[4,5,6]
a=[1,2,3]a=len(a)
I “think” that chart is of system memory not the memory consumption of weewx. If so then that is something very different and not something I would look at based on just that graph.
From: weewx-de...@googlegroups.com [mailto:weewx-de...@googlegroups.com] On Behalf Of Thomas Keffer
Sent: 24 March 2014 19:54
To: nileg...@guitar-o-rama.com
Cc: weewx-de...@googlegroups.com; mwall
Subject: Re: [weewx-development] Re: heads up: Memory leak in weewx
200 MB sounds extremely high. I'm usually around 30 MB RSS, 80 MB VSS.
On Mon, Mar 24, 2014 at 11:39 AM, <nileg...@guitar-o-rama.com> wrote:
Just noticed...that legend should be in KBytes. I just corrected that in my skin.conf. -- Nile
On Monday, March 24, 2014 11:31:04 AM UTC-7, nileg...@guitar-o-rama.com wrote:
I'm definitely seeing memory leakage. Here is my current cmon memory graph:
Assuming that is for weewxd then that is quite high.
You could post the output of ps –F –C weewxd as well.
When Tom was asking about services I believe he was talking about weewx services not system ones.
So my system which just does the archiving, & stdreport including weewx-wd & a couple of small extra ones has Res = 27M and the above ps command shows SZ= 11869 (pages).
For your system you must have cmon as an addition weewx service do you also use any of the forecasting services or Uploads to other sites (Wunderground etc.) or anything non-standard.
Also what are you using for a database? Mysql or sqlite?
John
From: weewx-de...@googlegroups.com [mailto:weewx-de...@googlegroups.com] On Behalf Of nileg...@guitar-o-rama.com
Sent: 24 March 2014 20:24
To: weewx-de...@googlegroups.com
Cc: nileg...@guitar-o-rama.com; mwall
Subject: Re: [weewx-development] Re: heads up: Memory leak in weewx
Other services are very lightweight ones: bcron, nullmailer, runit, socklog. Looking at stats via top python process %MEM >10%, Res = 44m, VIRT = 77212, SHR = 3928. Everything else looks normal.
Another data point...when I rebooted the RPi about 36 hours ago it was only using about 140 MB and it has steadily grown since then.
Nile
On Monday, March 24, 2014 12:54:21 PM UTC-7, Thomas Keffer wrote:
200 MB sounds extremely high. I'm usually around 30 MB RSS, 80 MB VSS.
Are you running any other services besides cmon?
-tk
On Mon, Mar 24, 2014 at 11:39 AM, <nileg...@guitar-o-rama.com> wrote:
Just noticed...that legend should be in KBytes. I just corrected that in my skin.conf. -- Nile
On Monday, March 24, 2014 11:31:04 AM UTC-7, nileg...@guitar-o-rama.com wrote:
UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD
root 2427 1 8 19447 45980 0 Mar23 ? Sl 193:59 python /usr/bin/weewxd --daemon --pidfile=/var/run/weewx.pid /etc/weewx/weewx.conf
As stated in an earlier post the only weewx external service I'm running is cmon 0.2, the latest version. I'm also doing archiving, standard reporting, ftp publishing, Wunderground posting with RapidFire protocol, all specified in weewx.conf. No exfoliation or similar external services and no additional forecasting module. My daily graphs are a bit larger than the default (340x204 vs 300x180) and my weekly, monthly, and yearly graphs are 600x180 for better legibility. All databases are using the default sqlite install. That's it.USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2427 8.7 10.3 78152 46352 ? Sl Mar23 197:21 python /usr/bin/weewxd --daemon --pidfile=/var/run/weewx.pid /etc/weewx/weewx.conf
NileThat’s ok the previous 1 had SZ which I believe is almost the same as VSZ except in pages instead of Kbytes so VSZ I think is just 4 times bigger than SZ.
They don’t look particularly bad at this time. You are running extra services so that possibly accounts for the extra memory usage.
Could you post the same output tomorrow at some point so we can see what happens over the next 20 hours or so?
Thanks
Mon Mar 24 16:16:11 PDT 2014
%MEM RSS VSZ
10.4 46784 78408
Tue Mar 25 11:05:09 PDT 2014
%MEM RSS VSZ
11.6 52108 83832
I'd say it's growing. NileAgreed and that is nearly 7Mbytes per day which is quite high.
As far as I can see the main difference (apart from probably a different OS/etc. which will almost definitely have different versions of python/libraries from me) is that you are pushing to weatherundergound & using rapidfire.
Could you try disabling those and see if it makes any difference?
Running for a couple of hours should be fairly obvious with that sort of rate.
Also what sort of weather station are you using and what is the update interval for the loop data and the archive interval?
And sorry we would probably need more data points than just the 2 as well so if we could get those figures for each hour interval that would help.
Ie it might have grown to 53M soon after the 1st reading and then sat there for an hour and a half.
Those last numbers are about where mine has sat for the last 3 weeks. (in fact RSS has just gone down a bit)
http://jpharvey1.no-ip.biz/weather/weewx/monthMem.png
So it could be stable now.
Thanks
JOhn
I have it installed but I wont reference any of the items from my skin so it probably isn’t being used. If you suspect that I can easily modify the skin to add something in somewhere.
Very custom skins. Re-using quite a few bits from the standard skins but nothing that uses pyephem.
But let me modify a page to reference something and see if it makes a difference. I can do that on my test system.
John
That looks good to me so at least with the same services you are seeing the same as me.
It doesn’t really matter I don’t think which order you add things back in. My guess is its WU or Rapidfire but only the experiments will prove it so I guess if you could run 24hrs for each of only add ftp, then disable ftp & add WU then disable WU and add Rapidfire that should show which of those are leaking.
I also put back in pyephem last night on mine and after an initial growth it became stable so I don’t think that is the issue but I’ll leave it enabled for a few more days to be certain
I’ll try to add weatherunderground and rapidfire to mine and see if I see it as well with those enabled and then assuming I can I’ll try to work out what that leak is.
But it wont be quick too many other things to do and I’ve just been adding some code to allow beter control of ftp times since I wanted to be able to transfer different files at different rates so I’ve been adding code which allows report jobs to be run at times specified like cron.
Will create a patch for it and post it soon to see if you think it is worth merging in to the main code.
John
Strange given mine is now increasing very slightly having moved to 2.6.3.
It’s a very slow increase but definitely there now and wasn’t before.
Its gone from about 53Mb to 56Mb over the last 2 weeks.
If I can find some time I’ll try to take a look but It wont be this week.
John
What a pain. Being so dependent on the environment its running on is going to make finding this harder.
What are you running on? I’ve just updated to a new release of Ubuntu and what I had was stable so I’ll keep an eye on it next week.