gw1000 lost connection to gateway

325 views
Skip to first unread message

vince

unread,
Mar 17, 2021, 9:30:36 PM3/17/21
to weewx-user
(apologies if this is badly formatted - google groups is going whacko for me currently and I'm now stuck in a tiny overlaid window in the Safari browser)

I notice that my weewx (docker) instance running the gw1000 driver lost connection to the gateway on Sunday.  I am going to guess it did not have anything to do with the daylight savings transition here in the US as the failure was at 20:42 localtime on Sunday 3/14.

Restarting the weewx container did not help.    While the ubuntu system could ping the ip address of the gw1000 just fine, weewx is complaining it can't find the mac address of the gateway and throws the usual 'retrying in 60 seconds' messages.

I might add that the gateway's comm to the Ecowitt servers was working fine during this multi-day outage before I noticed weewx wasn't updating.  The Ecowitt mobile app didn't miss any readings, it was just weewx that couldn't communicate with the gateway.

Transcript follows.

(note - a power reset on the gateway 'did' result in weewx automatically working again)

Mar 14 20:41:55 d75bb2f0dc58 weewx[8] CRITICAL __main__: Caught WeeWxIOError: Failed to obtain response to command 'CMD_READ_SENSOR_ID' after 3 attempts
Mar 14 20:41:55 d75bb2f0dc58 weewx[8] CRITICAL __main__:     ****  Waiting 60 seconds then retrying...
Mar 14 20:42:55 d75bb2f0dc58 weewx[8] INFO __main__: retrying...
Mar 14 20:42:55 d75bb2f0dc58 weewx[8] INFO __main__: Using configuration file /home/weewx/weewx.conf
Mar 14 20:42:55 d75bb2f0dc58 weewx[8] INFO __main__: Debug is 0
Mar 14 20:42:55 d75bb2f0dc58 weewx[8] INFO weewx.engine: Loading station type GW1000 (user.gw1000)
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] ERROR user.gw1000: Failed to obtain response to command 'CMD_READ_STATION_MAC' after 3 attempts
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] ERROR weewx.engine: Import of driver failed: Failed to obtain response to command 'CMD_READ_STATION_MAC' after 3 attempts (<class 'user.gw1000.GW1000IOError'>)
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****  Traceback (most recent call last):
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****    File "/home/weewx/bin/weewx/engine.py", line 119, in setupStation
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****      self.console = loader_function(config_dict, self)
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****    File "/home/weewx/bin/user/gw1000.py", line 1498, in loader
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****      return Gw1000Driver(**config_dict[DRIVER_NAME])
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****    File "/home/weewx/bin/user/gw1000.py", line 1844, in __init__
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****      super(Gw1000Driver, self).__init__(**stn_dict)
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****    File "/home/weewx/bin/user/gw1000.py", line 972, in __init__
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****      debug_wind=self.debug_wind)
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****    File "/home/weewx/bin/user/gw1000.py", line 2166, in __init__
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****      lost_contact_log_period=lost_contact_log_period)
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****    File "/home/weewx/bin/user/gw1000.py", line 2964, in __init__
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****      self.mac = self.get_mac_address()
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****    File "/home/weewx/bin/user/gw1000.py", line 3173, in get_mac_address
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****      return self.send_cmd_with_retries('CMD_READ_STATION_MAC')
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****    File "/home/weewx/bin/user/gw1000.py", line 3375, in send_cmd_with_retries
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****      raise GW1000IOError(_msg)
Mar 14 20:43:21 d75bb2f0dc58 weewx[8] CRITICAL weewx.engine:     ****  user.gw1000.GW1000IOError: Failed to obtain response to command 'CMD_READ_STATION_MAC' after 3 attempts

gjr80

unread,
Mar 17, 2021, 10:50:44 PM3/17/21
to weewx-user
Vince,

I'll have  a look at this in the coming days. Out of interest have you specified an ip address for the GW1000 in weewx.conf or is discovery being used.

Gary

vince

unread,
Mar 18, 2021, 11:27:49 AM3/18/21
to weewx-user
I specified ip address, but I might note that the gateway is on a different VLAN/subnet than the weewx container, so nothing broadcast would work if you're using that under the hood.  Given that I'm hard-specifying an ip_address, shouldn't all the relatively complicated discovery code be being skipped perhaps ?

This isn't a big deal for me so no rush required, but I thought I'd post the abort in case others have run into the same issue...

gjr80

unread,
Mar 18, 2021, 9:02:20 PM3/18/21
to weewx-user
From memory if the driver loses connectivity (ie no response is received to an API request) it retries the same command two more times and if still no response falls back to broadcast to locate the device (the MAC address is used to ensure the driver connects to the same device). If no response to the broadcast then the 60 sec delayed driver restart kicks in and if an IP address has been specified it should be used rather than using a broadcast (need to check). I could probably have the driver try the IP address (if one has been specified) as well as the broadcast before bailing out into the 60 second delayed driver restart.
There is the ability to reboot the GW1000 but you need connectivity to issue the command plus I am loathed to start doing things like that in the driver.

Gary

jovo...@googlemail.com

unread,
Sep 9, 2021, 4:22:14 AM9/9/21
to weewx-user
I have same  config, means different VLAN. If gw1000 lost connection weewx stop working - any solution for that problem?

Jovo

vince

unread,
Sep 9, 2021, 12:04:23 PM9/9/21
to weewx-user
On Thursday, September 9, 2021 at 1:22:14 AM UTC-7 jovo...@googlemail.com wrote:
I have same  config, means different VLAN. If gw1000 lost connection weewx stop working - any solution for that problem?

How would weewx get data from the gw1000 if it loses connection 'with' the gw1000 ?

You need (1) the gw1000 to have Internet connectivity to the Ecowitt servers or its watchdog timers will reboot the gw1000, and (2)  you need weewx to be able to talk to the gw1000.

If you VLAN your network, you still need to permit (1) and (2) above for everything to work together.

jovo...@googlemail.com

unread,
Sep 9, 2021, 12:37:48 PM9/9/21
to weewx-user

Normaly they have both connectivity! one in vlan 1 and one in vlan 2 for example. But on different L2 switches. Weewx is on switch 1 - WiFi Router with GW1000 is on switch 2.
I've had a small spanning tree issue on the uplink between switch1 and switch2, later I see that weewx is down.
As you describe all work correct. I think about to go back to interceptor - I loss data for the monent ,but weewx is still running.....if connection come back.

vince schrieb am Donnerstag, 9. September 2021 um 18:04:23 UTC+2:

If you VLAN your network, you still need to permit (1) and (2) above for everything to work together.

Normaly they have both connectivity! one in vlan 1 and one in vlan 2 for example. But on different L2 switches. Weewx is on switch 1 - WiFi Router with GW1000 is on switch 2.
I've had a small spanning tree issue on the uplink between switch1 and switch2, later I see that weewx is down.
As you describe all work correct. I think about to go back to interceptor - I loss data for the monent ,but weewx is still running.....if connection come back.
 

gjr80

unread,
Sep 9, 2021, 9:57:52 PM9/9/21
to weewx-user
It's probably worthwhile outlining how the GW1000 driver communicates with the GW1000 (when I refer here to GW1000 I mean GW1000 or GW1100) and what happens when the GW1000 does not respond.

The GW1000 driver can communicate with a GW1000 via (1) an IP address and port number or (2) network broadcast. When the driver is loaded during WeeWX startup if no IP address or no port number is provided in the weewx.conf driver config stanza the driver will to attempt to obtain the missing data via a network broadcast. If the GW1000 is on a different sub-net to the WeeWX machine then the broadcast will not be received by the GW1000 concerned (so if using a GW1000 on a different sub-net you must specify an IP address and port number for the GW1000 in weewx.conf).

The process for lost contact is a little more complex. If no response is received to a command sent to the GW1000 the same command is re-sent a further two times (the default is to make three attempts before declaring lost contact) with a default delay of 10 seconds between each attempt (effectively 12 seconds if you add a two second timeout). If the third command results in no response a network broadcast is sent and any responses are checked for the previously used GW1000 MAC address. If a match is found the IP address and port number used by the driver is updated if necessary (the GW1000 could now be on a different IP address) and communication resumes. If the GW1000 does not respond WeeWX waits 60 seconds and then does a restart that results in the driver being reloaded.

During the restart the driver is loaded and if the GW1000 IP address and port number are specified in weewx.conf these are used to communicate with the GW1000. If the GW1000 responds normal startup continues. If the GW1000 does not respond to a broadcast message is sent. Again if the GW1000 responds normal startup continues. If the GW1000 does not respond WeeWX looks at the loop_on_init option in weewx.conf, if it is set to True WeeWX waits 60 seconds and attempts another restart, if it is set to False WeeWX exits.

A GW1000 on a separate sub-net will not received any network broadcasts so it will never answer the broadcasts used during lost contact and startup. This really does not matter as when no response is received to the network broadcast during lost contact, WeeWX automatically restarts after 60 seconds. Likewise, if no response is received to the network broadcast that (may) occur during the re-start, provided loop_on_init is set to True WeeWX will again wait 60 seconds before attempting a WeeWX restart.

The bottom line is when using the GW1000 driver the recommended approach is to add an address reservation for the GW1000 to your DHCP server so the GW1000 is affectively given a fixed IP address (the GW1000 cannot be programmed with a fixed IP). If the GW1000 is on a different sub-net then this reservation is essential. Specifying the GW1000 IP address and port number in weewx.conf is recommended in most cases and essential if the GW1000 is on a different sub-net. Setting loop_on_init to True is also recommended (in fact when you use wee_config --reconfigure to select the GW1000 driver you are prompted to set loop_on_init)

I'm not sure what other action can be taken by WeeWX (and the driver) when communication is lost. By definition communication has been lost with the GW1000 so the driver cannot reset/reboot (or do anything else to) the GW1000. WeeWX does a restart after 60 seconds which forces a reload of the driver so we are resetting as much as we can on the WeeWX/driver side of things.

By all means set debug = 3 then restart WeeWX and we will see the low level communications with the GW1000 which may give some clues, but the log is going to be pretty chatty at that level and there will be lots of output, not really practical unless to you can manually initiate the error.

Gary

vince

unread,
Sep 9, 2021, 10:21:53 PM9/9/21
to weewx-user
On Thursday, September 9, 2021 at 6:57:52 PM UTC-7 gjr80 wrote:
The bottom line is when using the GW1000 driver the recommended approach is to add an address reservation for the GW1000 to your DHCP server so the GW1000 is affectively given a fixed IP address (the GW1000 cannot be programmed with a fixed IP). If the GW1000 is on a different sub-net then this reservation is essential. Specifying the GW1000 IP address and port number in weewx.conf is recommended in most cases and essential if the GW1000 is on a different sub-net. Setting loop_on_init to True is also recommended (in fact when you use wee_config --reconfigure to select the GW1000 driver you are prompted to set loop_on_init)


Just to confirm, I switched to a loop_on_init=True and used a fixed IP reservation in my DHCP server to get a stable address for the GW1000 and it 'does' recover nicely for me, so I'd recommend it to jovo...(sorry, can't see your name in google groups) as your best next step.

 

jovo...@googlemail.com

unread,
Sep 10, 2021, 4:10:44 AM9/10/21
to weewx-user
@gjr80
many thanks for your very extensive explanation. I change 'loop_on_init' to true. Think that will work probably. W'll try later. Test was simple: close port on WiFi router. 

It could be helpful if there was a hint in 'readme.txt' for use of different subnet. 

regards
Jochen

jovo...@googlemail.com

unread,
Sep 10, 2021, 4:15:47 AM9/10/21
to weewx-user
jepp, I do it also, set 'loop_on_init' to true. gjr80 describe the hole process- very helpful. 

many thank for help
Jochen

jovo...@googlemail.com

unread,
Sep 10, 2021, 11:48:41 AM9/10/21
to weewx-user
I'll do the test - works as expected - restart every 60sec. 

thx to all.

Jochen
Reply all
Reply to author
Forward
0 new messages