Potential memory issue

156 views
Skip to first unread message

Gordie Stirling

unread,
Aug 10, 2019, 4:29:34 AM8/10/19
to weewx-user
Hi all,

I have been runnin Weewx on a Raspberry Pi (model a) for a couple of years with a Maplin NG96Y, without any issues.

I recently upgraded to a Pi3, running Stetch, and Lighttpd. the old Pi was starting to struggle and sat at 100% cpu for most of the time.

All worked well for a month or so, the the station locked up. After a reset(batteries out, usb disconnected) it works for an hour or so, then locks the connection again. the base station still continues to function.

Error in syslog is Aug 10 09:17:28 raspberrypi weewx[10377]: fousb: get_records failed: [Errno 110] Operation timed out.

so far, other than rebooting, i have used Easy Weather to clear tge memory, but alas had no effect, 85 mins later, timeouts start again.

I am thunking of downgrading the Pi to Jessie, and using reverting to Weewx 3.9.1, but wondered if anyone has had this problem and managed to fix rather than rebuild.

TIA

G

Leon Shaner

unread,
Aug 10, 2019, 9:01:11 AM8/10/19
to weewx...@googlegroups.com
It isn't the weewx version, it's the pi kernel / usb module version. The usb stack hangs. I wrote a watchdog to detect it and reboot the pi, because so far there has been no fix to the issue. I am not able to correlate this to a memory leak. What are you seeing re: memory?

Regards,
\Leon
--
Leon Shaner :: Dearborn, Michigan (iPhone)
> --
> You received this message because you are subscribed to the Google Groups "weewx-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/afbf7556-425f-4eca-8088-caaf96c06805%40googlegroups.com.

Gordie Stirling

unread,
Aug 11, 2019, 6:39:00 AM8/11/19
to weewx-user


On Saturday, 10 August 2019 14:01:11 UTC+1, Leon Shaner wrote:
It isn't the weewx version, it's the pi kernel / usb module version.  The usb stack hangs.  I wrote a watchdog to detect it and reboot the pi, because so far there has been no fix to the issue.  I am not able to correlate this to a memory leak.  What are you seeing re: memory?

Regards,
\Leon
--
Leon Shaner :: Dearborn, Michigan (iPhone)

Hi Leon,

The reason I suspected an issue with the memory was a line in /var/log/syslog

Aug 11 1:31:04 wstation weewx[15295]: fousb: unstable read: blocks differ for ptr 0x0001000

I will revert back to Jessie and see if that fixes the issue.  My station pushes updates to Wunderground, but sits in its own DMZ, so I dont mind running slightly older code.

Leon Shaner

unread,
Aug 11, 2019, 11:55:24 AM8/11/19
to weewx...@googlegroups.com
TK, you may want to know...
There definitely is a memory leak in weewxd.
It grows by 1 KiB about every 2 seconds.

I haven't yet correlated whether weewxd can grow to a point where other things start dying, and whether that has any impact on the USB stack.

What tipped me off was I just got a warning from sar/sa1 stating it couldn't do it's job due to "resource unavailable," so I went and had a peek.

This RPI is a Zero W(H), so it has only 1 GB of RAM.
It's full enough on memory that it's using some swap.

Here is just one snapshot. That RES (RZ) value for weewxd goes up by one digit every 2 seconds.
I gleaned that is 1 KiB, but I am not 100% certain. It might only be 1 byte every 2 seconds -- whatever the actual unit it goes up by one every 2 seconds. I've never seen it come down.

I may enhance my watchdog to check the weewxd memory size and restart it after a limit, then see if that has any positive impact on avoiding the USB stack going south. Yeah, I guess I'll work on that this week. I'll put some logging in so anyone who wants to keep an eye on it can know whether their setup is memory leaking also. =D

I have the WMR300 driver, in case the leak could be specific to that driver. :-/

image0.png

Thomas Keffer

unread,
Aug 11, 2019, 12:45:22 PM8/11/19
to weewx-user
If memory increases every couple of seconds, which suggests that the leak is in the part of device driver that deals with LOOP packets. But, you're on an WMR300, right? As I recall, their LOOP packets are more like 10-20 seconds apart --- too long to correlate with what you're seeing. Or, are you on something like a Vantage with a shorter LOOP packet interval?

I've noticed memory leaks in other instances, such as my own site http://www.threefools.org/weewx. In my case, I spent a couple of weeks trying to chase it down, and concluded the problem was somewhere in the sqlite database driver, but couldn't put my finger on it conclusively.

Then, magically, Debian did an update and the problem (nearly) went away. Memory leak dropped from 5 MB/day, to almost nothing (maybe a megabyte or two over a week). Again, it suggests the problem was (is) in a device driver somewhere.

-tk

--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/D7820F76-CC3A-4CC2-B359-4AF967A32638%40isylum.org.


Regards,
\Leon
--
Leon Shaner :: Dearborn, Michigan (iPad)

> On Aug 10, 2019, at 9:01 AM, Leon Shaner <le...@isylum.org> wrote:
>
> It isn't the weewx version, it's the pi kernel / usb module version.  The usb stack hangs.  I wrote a watchdog to detect it and reboot the pi, because so far there has been no fix to the issue.  I am not able to correlate this to a memory leak.  What are you seeing re: memory?


--
You received this message because you are subscribed to the Google Groups "weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+...@googlegroups.com.

Leon Shaner

unread,
Aug 11, 2019, 4:06:26 PM8/11/19
to weewx...@googlegroups.com
Hey, TK.  WMR300.  I'll see if I can track it down.


Regards,
\Leon
--
Leon Shaner :: Dearborn, Michigan (iPad)

On Aug 11, 2019, at 12:45 PM, Thomas Keffer <tke...@gmail.com> wrote:



Thomas Keffer

unread,
Aug 11, 2019, 7:22:37 PM8/11/19
to weewx-user
While the tools work well in smaller projects where you can identify and track every object, in a larger, real-time, project like weewx, there is just too much going on to make sense of the huge memory trees the tools produce.

They also suffer from a Heisenberg uncertainty principle: the very act of measuring consumes memory, which you then chase, looking for your leak.

So, I have taken to a patient, binary approach. I shut down half the program, say the reporting engine, and see if that makes a difference. If not, I shut down the other half. Keep dividing by half and eventually you'll locate the problem. With most leaks, you can only do one iteration every day or two, so it can take weeks to isolate the problem. With a large leak, like yours, it can be done in a matter of hours.

Patience is the only thing I've found that works well!

One thing worth noting: Python uses garbage collection, so it doesn't actually "leak" memory. It may struggle to reclaim memory that has cyclic references, taking a long time to reclaim it, but eventually it will (except if a __del__ method is defined and when using a Python version before 3.4. Weewx never defines __del__, so it is not subject to this limitation.). 

Most of the time the problem is in an underlying "C" routine, specifically drivers, occasionally graphic utilities.

-tk

On Sun, Aug 11, 2019 at 3:51 PM Leon Shaner <le...@isylum.org> wrote:
Hey, TK,

Can you please toss me a bone on how to instrument weewx for memory profiling?
I tried python3-memory-profiler but couldn't make it work.  It seems to block the normal importing of needed python modules (like weewx.engine).  Maybe it is because weewx uses precompiled modules?

Other than that I found a thread in the weewx google group where you suggested pympler, but I can't seem to find that on Raspbian, and you had a specially modified engine.py anyway.  :-/


Regards,
\Leon
--
Leon Shaner :: Dearborn, Michigan (iPad)

On Aug 11, 2019, at 4:06 PM, Leon Shaner <le...@isylum.org> wrote:

Hey, TK.  WMR300.  I'll see if I can track it down.

Leon Shaner

unread,
Aug 11, 2019, 8:29:07 PM8/11/19
to weewx...@googlegroups.com
TK,
If it helps, I don't use any reporting stuff.
I'm strictly using WMR300 with rapid fire and archive simultaneously.
No other features.

Unfortunately, I've restarted weewxd several times today making the attempt to debug it, so I can't share the latest memory usage snapshot.  But as a data point my watchdog has to reboot the pi roughly every 3 days.  So now that I've restarted weewx I'll see if it goes 6 days instead of the usual 3.

Regards,
\Leon
--
Leon Shaner :: Dearborn, Michigan (iPad)

On Aug 11, 2019, at 7:22 PM, Thomas Keffer <tke...@gmail.com> wrote:



Cameron D

unread,
Aug 12, 2019, 10:34:11 AM8/12/19
to weewx-user
If there is a memory leak in the wmr300 driver, it might be my fault, so I suppose I should look into it.

There is a thread from maybe a year ago where WRM300s and 200s were reportedly hanging Pi's but it seemed to be related to kernel version, so I had assumed it was a kernel driver issue that was triggered by something specific to the WMR code.

The loop interval is rather erratic - individual items report at different intervals, some per second-ish and others much longer.  Occasionally there are really long gaps that I've never tied down to a reason.

I just checked my running weewx, on Debian stretch on an Intel box. After 20 days uptime, top reported the RES was 220MiB and VIRT was about 450MiB.

I restarted weewx, and after a couple of hours the RES was 45MiB and VIRT 260MiB.

My RES value is not incrementing slowly, but completely stable and then jumping occasionally by 300kiB.  I'll put a timer on it when I'm a bit more awake.

I just noticed it drop then by 450kiB - presumably normal garbage collection.

Cameron.

Cameron D

unread,
Aug 13, 2019, 12:42:35 AM8/13/19
to weewx-user
Here is a plot overnight tracking the various memory values in /proc/pid/status at sampling interval 20 sec.
My system is a wmr300, weewx 3.9.2, archive interval 1 minute to mysql, posting to WU, not usng rapidfire.

I don't understand what I am talking about here, but just reporting what I see:
The memory usage increase seems to be all in RssAnon, tracked precisely by the value in VmRss (because Rss file and shared are unchanged)
That also seems to be the major contribution to VMData and VMSize, although they grow more slowly than RSSAnon.

I have attached a plot of the memory increments, starting from a baseline about 3 hours after restarting weewx.
There was no history clearing (of WMR300 console) carried out in that time.

I can see no pattern or correlation in the chart, with increases occurring anywhere from 5 min apart to one hour. However, I guess a lot of this timing is hidden by malloc code reusing freed blocks of memory and all that this is showing is when malloc requests more memory from the OS.

Cameron.
weewx-mem0.png

Gordie Stirling

unread,
Aug 13, 2019, 1:09:39 PM8/13/19
to weewx-user
thanks for the updates, good to see a community thrive. the issue i have isn't (doesn't) appear to be a memory leak, more of a call going wrong.

i am using the fousb / wh1080, so it maynot be down to a driver issue, for me anyhoo.

i tried rolling back to Stretch, but same issues reappeared. Jessie is next on my to do list, along with weewx 3.9.1.

will update when I get a chance to rebuild.

G

Cameron D

unread,
Nov 16, 2019, 11:27:20 PM11/16/19
to weewx-user
Another update

I was seeing the same inexorable memory usage creep whether I used the default wmr300 driver, or my version using libusb1. Also after removing weather underground updates.

Between 22 days after restart and 32 days I had gained nearly 100MB VMdata allocation.

But it did not happen with the default simulator station.

However, I then decided to upgrade my systems to Buster (Debian 10), and also implemented the mem.py example to log memory usage within weewx - my previous examples had simply been done with an external script, which stopped working whenever weewx was restarted and the pid changed.

5 days after restart I have no increase at all in any of the memory usage parameters.  They are just bouncing around +/- 1MB as resources are allocated and freed.

So, the most likely cause as far as I can see is that a memory leak in python 2.7 got fixed.

Reply all
Reply to author
Forward
0 new messages