MQTT lagging

532 views
Skip to first unread message

T Reid

unread,
Oct 24, 2020, 8:22:14 PM10/24/20
to weewx-user
I am using the MQTT add-in to pipe real time loop entries from my Davis Vantage Pro into the Belchertown skin, using the standard instructions for that skin.  It generally works great.  But every couple of days the MQTT feed will fall behind, sometimes by as much as an hour or more.  To fix it, I have to restart weewx.  Any idea what might be happening?  I am running weewx 4.1.1 with Belchertown 1.1.

Greg Troxel

unread,
Oct 25, 2020, 10:43:08 AM10/25/20
to T Reid, weewx-user
My suggestions:

explain if you are publishing aggregate or individual data

Use mosquitto_sub (from mosquitto) on the broker to watch the traffic.

Run netstat on the weewx host to look for full output queues

read the weewx logs

explain what you are seeing more precisely than saying "fall behind".

next time it fails, restart your broker instead of weewx and see what
happens. That will cause the weewx->broker connection to get shut
down and weewx/mqtt will open a new one.

if you are using some cloud broker, run your own

signature.asc

T Reid

unread,
Oct 28, 2020, 1:19:40 AM10/28/20
to weewx-user
Thank you for your very thorough response, Greg.  In response to your questions and suggestions:
  • I am publishing aggregate data. Here is the MQTT section of my weeewx.conf:
[[MQTT]]
    server_url = mqtts://user:pass...@mqtt.example.com:8883/
    topic = weather/belmont
    binding = archive, loop
    aggregation = aggregate
    [[[tls]]]
        ca_certs = /etc/ssl/certs/ca-certificates.crt
  • I looked at the traffic with mosquitto_sub, and the incoming weather data right now is from 45 minutes ago.
  • I'm afraid that I don't know what running "netstat ... to look for full output queues" means.
  • What am I looking for in the logs? I see new "INFO weewx.restx: MQTT" entries every two seconds or so. Each entry says that it has published a record with a time stamp from about 45 minutes ago.
  • What I am seeing on my weather page using the Belchertown skin are charts built using current archive records (updated every five minutes), but  "station observations" at the top of the page using MQTT delivered loop records that reflect weather data from 45 minutes ago.  Earlier today, the lag was only 30 minutes.  The lag grows over time.
  • I restarted the broker, and the MQTT weather data on the web page, on mosquitto_sub, and in the weewx logs is still running 45 minutes behind.  No change. 
  • I run my own broker using mosquitto running on a droplet on Digital Ocean.  My copy of weewx is running on a raspberry pi that is hooked up to the datalogger on my Davis Vantage Pro2 console with a USB cable.
To my eyes, it looks like weewx is running further and further behind in publishing loop records over MQTT, which makes no sense to me.

T Reid

unread,
Oct 29, 2020, 12:25:32 AM10/29/20
to weewx-user
Greg, I am not ready to declare victory yet, but after restarting my mosquitto broker, I let weewx keep running and overnight the MQTT loop records gradually caught up!  Not all at once, but over time you could see the delay shrinking.  All day today, the display has been pretty much spot on.  I would love to understand what caused this in the first place, and why restarting the MQTT broker would fix it.  Meanwhile, I have my fingers crossed that the delay does not come back.

T Reid

unread,
Oct 30, 2020, 10:07:53 PM10/30/20
to weewx-user
As I feared, the problem is not fixed. The lag between the posting of MQTT records by weewx and the current time has wandered all over the place over the past two days.  It has gone from dead on time to 1.5 hours behind, and everything in between.  Last night it was doing okay, this morning it was 80 minutes behind, and now it is 20 minutes behind.  There is no rhyme or reason to it.  The only constant is that weewx can't seem to post MQTT records on time on a consistent basis.

Greg Troxel

unread,
Oct 31, 2020, 9:47:04 AM10/31/20
to T Reid, weewx-user

T Reid <trgo...@hoghill.net> writes:

> As I feared, the problem is not fixed. The lag between the posting of MQTT
> records by weewx and the current time has wandered all over the place over
> the past two days. It has gone from dead on time to 1.5 hours behind, and
> everything in between. Last night it was doing okay, this morning it was
> 80 minutes behind, and now it is 20 minutes behind. There is no rhyme or
> reason to it. The only constant is that weewx can't seem to post MQTT
> records on time on a consistent basis.

My guess is that your weewx instance is CPU bound and really is running
behind.

Is your archive interval 5 minutes, or something longer? If it's
shorter than 5 minutes, then don't do that!


I suggested restarting the broker because if the problem was in the
network going to the broker, then the reconnect would start fresh. But
you see old records newly appearing after the broker restart, which more
or less proves that weewx is newly posting them after the restart.

So look with ps, top, and so on about CPU usage. Check if your html
skins are being generated on time, and transferred to your web server
etc., or if those also are slow.

Try to simplify your setup in terms of skins and generating things, and
see if that helps. If it does, that supports the "not enough CPU time"
theory.

Back off from loop to archive in mqtt, and see if that is reliably
timely. I'm not saying you should run that way, or have to, but it's an
interesting data point.

I am running weewx on a Raspberry Pi 3 (NetBSD 9) with a VP2, and every
5-minute archive interval:

generate 2 skins
rsync them
post aggregate data to mqtt

and I do not publish loop data. I just looked and the weewx python
process has used 17 hours of CPU time in 16 days.

> - I looked at the traffic with mosquitto_sub, and the incoming
> weather data right now is from 45 minutes ago.

It would be interesting to know if the inter-message arrival rate is the
same as the intended transmit rate, or if it catches up some between
archive intervals and then has a bigger lag over them.
signature.asc

T Reid

unread,
Nov 1, 2020, 12:52:51 PM11/1/20
to weewx-user
>  My guess is that your weewx instance is CPU bound and really is running behind.  

You may be on to something.  I watched top for a while.  CPU load on the python2 process runs under 10% when MQTT is posting loop records.  But every five minutes, when the archive records and web pages are being generated, the CPU load of the python2 process jumps up over 100%.  In /var/log/messages, the MQTT records are posting on time, but fall behind by 10 seconds or more when the archive records and web pages are generated.  The MQTT records then catch up again almost immediately after the archive activity dies down.  I may have to dig around in the log files a bit more to find lags that it did not recover from.

vince

unread,
Nov 1, 2020, 1:32:27 PM11/1/20
to weewx-user
On Sunday, November 1, 2020 at 9:52:51 AM UTC-8, T Reid wrote:
>  My guess is that your weewx instance is CPU bound and really is running behind.  

You may be on to something.  I watched top for a while.  CPU load on the python2 process runs under 10% when MQTT is posting loop records.  But every five minutes, when the archive records and web pages are being generated, the CPU load of the python2 process jumps up over 100%.  


I might have missed it, but what kind of box are you running on ?   An old model-B pi with a slow SD card, a 4GB pi4 with a SSD, etc. ???

My experience is that Belchertown is pretty heavyweight.  Look at the time it takes the archive interval skins to run.   Disable all except the standard or seasons skin and do a test to baseline your system.  Then flip back to Belchertown and see if things bog down again.

T Reid

unread,
Nov 1, 2020, 2:09:53 PM11/1/20
to weewx-user
I am running weewx and Belchertown on an older rpi 3 model B version 1.2.  Cheetah takes around 20 seconds to generate the Belchertown reports.  Time for new hardware?

vince

unread,
Nov 1, 2020, 4:39:58 PM11/1/20
to weewx-user
On Sunday, November 1, 2020 at 11:09:53 AM UTC-8, T Reid wrote:
I am running weewx and Belchertown on an older rpi 3 model B version 1.2.  Cheetah takes around 20 seconds to generate the Belchertown reports.  Time for new hardware?


"I'm good. My computer is too fast already", said nobody ever...

I'm running on far wimpier hardware than even a model-B pi, and it takes 42 seconds for me, but I do have my MQTT broker on a pi3 that has a wired.

Still think you should do a quick test with Belchertown disabled to see if that breaks looks your MQTT issue.  That would be a good data point to have.

Greg Troxel

unread,
Nov 2, 2020, 8:42:50 AM11/2/20
to vince, weewx-user

vince <vince...@gmail.com> writes:

> On Sunday, November 1, 2020 at 11:09:53 AM UTC-8, T Reid wrote:
>>
>> I am running weewx and Belchertown on an older rpi 3 model B version 1.2.
>> Cheetah takes around 20 seconds to generate the Belchertown reports. Time
>> for new hardware?
>
> "*I'm good. My computer is too fast already*", said nobody ever...
>
> I'm running on far wimpier hardware than even a model-B pi, and it takes 42
> seconds for me, but I do have my MQTT broker on a pi3 that has a wired.

If you are generating a skin once every 5m (T: you *are* set to 5m,
right?) and it takes 20s, that seems totally fine.

You should also check any upload timings.

> Still think you should do a quick test with Belchertown disabled to see if
> that breaks looks your MQTT issue. That would be a good data point to have.

Agreed that more data is needed. Something is wrong but so far we don't
know what. I expect it will be something odd, when found, but util then
it won't make sense.

signature.asc
Reply all
Reply to author
Forward
0 new messages