Vantage driver timeouts

64 views
Skip to first unread message

Les Niles

unread,
Feb 17, 2021, 3:28:13 PM2/17/21
to weewx-development
In the course of doing a fresh install (4.3.0 debian package), I’ve been doing a lot of restarting weewx.  The restart would fail pretty often due to not waking up the Vantage console (Davis Ethernet datalogger).  The driver would report an ip-read error, followed by a series of ip-write errors until max_tries was used up.  This all happens very quickly because the sleep for wait_before_retry is inside the try clause so there’s no delay when there’s a WeeWxIOError exception.   (Lines 110-115 in vantage.py.)  I moved the sleep outside of the try/except block and it fixed the problem — with the delay, the wakeup succeeds after a few retries.  (diff attached)

I’m not sure if there was a specific reason for skipping the delay in case of WeeWxIOError. It seems like there wouldn’t be any disadvantage to putting the delay outside of the exception, other than taking slightly longer for weewx to exit in case of an unrecoverable error, and there certainly is the possibility that having a delay between retries makes it more likely to succeed.  Neither do I see a reason this wouldn’t work with a USB datalogger, but I have no way to test that.  
Thoughts?

  -Les


*** /usr/share/weewx/weewx/drivers/vantage.py.dist 2021-01-04 11:43:12.000000000 -0800
--- /usr/share/weewx/weewx/drivers/vantage.py 2021-02-13 10:11:53.084750115 -0800
***************
*** 107,117 ****
                  if _resp == b'\n\r':
                      log.debug("Rude wake up of console successful")
                      return
-                 print("Unable to wake up console... sleeping")
-                 time.sleep(self.wait_before_retry)
-                 print("Unable to wake up console... retrying")
              except weewx.WeeWxIOError:
                  pass
              log.debug("Retry #%d failed", count)
  
          log.error("Unable to wake up console")
--- 107,118 ----
                  if _resp == b'\n\r':
                      log.debug("Rude wake up of console successful")
                      return
              except weewx.WeeWxIOError:
                  pass
+             
+             print("Unable to wake up console... sleeping")
+             time.sleep(self.wait_before_retry)
+             print("Unable to wake up console... retrying")
              log.debug("Retry #%d failed", count)
  
          log.error("Unable to wake up console")


Tom Keffer

unread,
Feb 18, 2021, 8:11:36 AM2/18/21
to Les Niles, weewx-development
Normally I am very reluctant to make changes in the Vantage driver because it has been so reliable for so long.

However, thanks to your careful sleuthing, you have uncovered a subtle, and hard-to-find, bug. Thanks so much, Les! 

Commit 9605ec9.

-tk

--
You received this message because you are subscribed to the Google Groups "weewx-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weewx-developm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-development/7D296A46-154D-4518-AF5D-AA2DA6B4F2AD%402pi.org.

Les Niles

unread,
Feb 19, 2021, 1:28:12 AM2/19/21
to Tom Keffer, weewx-development
Well now I really hope this doesn’t turn out to be an instance of “what could possibly go wrong?” :)

I did some poking around the intermittent failures and am pretty convinced they occur when weewx starts up while the datalogger is uploading to Weatherlink.  As long as weewx is running it doesn’t try to upload to Weatherlink, but as soon as weewx stops the uploads begin, consisting of a single packet HTTP PUT every minute. With opening and closing the connection, the whole transaction takes 1-1.5 sec.  If weewx connects and tries to do the wakeup during or immediately after an upload, the TCP connection happens but the datalogger doesn’t respond to the wakeup.  The failure is only occasional for random restarts because there’s a 2-3 second window each minute when it will happen.  But doing a “systemctl restart weewx” restarts weewx fast enough that it is very likely to hit the problem.  The driver’s retry on the established TCP connection after a short delay seems to work reliably.  For this particular issue, that's better than closing and re-opening the connection because that would probably trigger the race with Weatherlink all over again.  

BTW, my LAN for the datalogger is just a cable between it and the RPi running weewx; there’s no other network traffic to confuse things.

  -Les


Tom Keffer

unread,
Feb 19, 2021, 7:56:42 AM2/19/21
to Les Niles, weewx-development
Most people disable the WL upload. In fact, I didn't even think it was possible to do both simultaneously. 
Reply all
Reply to author
Forward
0 new messages