I agree that this is an area of the zwave binding that needs work. I added some functionality to get neighbor node information and started to add support for this into HABmin, but have currently stopped this until 1.4 is released. I suspect that many of the issues are associated with routing, so a) finding out what the routes are, and b) being able to manipulate them, is, in my opinion probably quite important. From what I could tell, there doesn't seem to be any way to read the return routes that are currently set in the node, so the only way to be sure of what they are is to explicitly set them - this is one of the next things on my list....
Some sort of network 'healing' (similar to what Vera does) might also be useful, but I think this is a secondary issue - first is to try and understand what's happening...
I don't understand what marks a node dead (and I don't see dead nodes, but I do have some that stop working). I think it's the binding that marks nodes dead, but I also know that the stick holds a list of nodes that are considered dead as well, so it could be happening there (??). I've just not looked into this area of the code (yet!), but if anyone knows more it would be useful information to share so we can look at combating it.
Sorry - that's not answered your question, but I do think it's important and if anyone knows more about how this works it would be useful to get ideas so we can implement some fixes...
Cheers
Chris
Didn't know that the Veras performed network healing activities. That may explain the smoother (in that and only that area) user experience.
It is definitely the binding that marks the nodes dead. If that in turn comes from the firmware/Zwave stack in the Aeon stick or not, I don't know. I have full Zwave binding debug logs where the behavior can be seen.
Today I spent fifteen minutes of openHAB restarts to get all my nodes up and running. Extremely annoying, took about 15 restarts to get the all up. Which node(s) end up dead varies but it is never the ones closest to the Aeon USB stick.
On what documentation or information is the Zwave binding implemented from?
Talking about the Aeon USB stick, in HABmin by stick isn't identified as an Aeon Labs USB stick....
No idea why they are lost or set to 0 later on
I also experience zwave connection problems, which do not exist in Homeseer. I configure the zwave routing with Homeseer. When Homeseer has found a good forward and return path, routing can be left static without further zwave errors in Homeseer. But with the openHAB binding still frequent errors are reported with some nodes.
Also battery powered nodes report errors during their sleep period. That is certainly not correct.
I also have sometimes dead nodes after openHAB has run untouched during several days. Declaring nodes dead is a bad strategy, it should be selfhealing indeed.
I just had a quick look a the code. The dead node check simply checks if a node has node completed to DONE stage after 2 minutes. It will only set to dead though if the node is a 'listening' or 'frequently listening' node. Once dead, it stays there...
The implementation in OZW seems to be the same, although as OZW is effectively just a driver it sends a notification that there are dead nodes and I guess assumes that the application will do something. In the OH case, I don't think that this same functionality is applicable - or to be more precise, the handling of any top level 'retry' function also needs to be handled somewhere in the binding since there is no higher level application.I suspect that we need to implement an "management layer" in the binding that handles this sort of thing, and any "daily heal" or whatever in order to ensure that the network remains healthy....
- We have to make sure we handle battery operated devices during the healing process and I am not sure how this can be done.
- On Fibaro forums, I have seen a thread on network healing and there is an example of some excellent visualization for the current routing. Adding such a view to HABmin will really help. http://forum.fibaro.com/viewtopic.php?t=1714
- I was reading on Razberry site and the controller implements network healing function (http://razberry.z-wave.me/index.php?id=10). I am not sure about other controllers supporting this. Can we not periodically trigger this in the controller or we are planning to implement such a function in openhab.
- Would be great if we could see the current routing table (graphical or textual)in the classic UI and trigger manual updates.
Not sure if you have seen this http://wiki.zwaveeurope.com/index.php?title=SDK_Versions_and_Explorer_Frames. If also talks of dealing with dead nodes/failed routes in various version of the SDK.
...
(I’m in Germany this week - I’ve even brought a small zwave network with me to play with in the hotel - that’s how dull I am :) )
I just thought I add my latest observations to this thread ... which are actually rather Z-Wave than OpenHAB related. After my Z-Wave network started behaving worse again, I decided to try a few things out again. Result: After deleting return routes on the nodes that keep dying in OpenHAB things run much more smoothly. E.g. while my sunrise dimmer kept locking after a few updates every single morning it has now worked correctly for 3 days. Even my script restarting OpenHAB when it sees dead nodes appear in the logs is less busy lately (but happens to still be triggered from time to time).
Without knowing if the explanation is correct I guess that with no static return routes the devices choose the (reverse) route the request went through which is not a perfect choice but might be a better choice than always trying the same static route. I always wondered how routes where chosen - probably, neighborhood discovery does not take connection quality into account, and of course radio is something dynamic with conditions changing all the time.
--
You received this message because you are subscribed to the Google Groups "openhab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openhab+u...@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at http://groups.google.com/group/openhab.
For more options, visit https://groups.google.com/d/optout.
Hi Chris,
No - and this is clearly an issue that they are trying to solve as new features were added to try and discover routes better. The problem (I think) is that most of these devices have crappy antennas,
The best think (maybe) is to add a few more nodes as repeaters to fill the gaps and make the network reliable.
I agree - I don’t really know why the binding has a “dead node” concept such that when a node is dead we don’t communicate with it. Maybe it’s to avoid locking up the network and slowing down all the ‘good’ nodes (?), but otherwise I don’t really see why we can’t send messages out rather than marking it dead...
- Implement a low rate polling so that we find out if nodes stop working before we actually want to use them ‘in anger’ - maybe poll all nodes every 5 or 10 minutes.
- When nodes go DEAD, I think we should still send ‘required’ messages.
On the first spot this seems to improve things ... No long-time observations yet, but at least I can now get the network up with all nodes ready by starting OpenHAB and starting a "heal" from Habmin. I did not actually understand the "DEAD node check" messages yet: isn't this supposed to immediately try to heal nodes being marked DEAD?
At least it does not for nodes being DEAD from the start (it's happening every now and then that after a restart one or two nodes do not come up). Maybe, a heal should preventively be scheduled a few minutes after staring up (or at the moment when the binding does declare the ZWave network ready?)
I fully agree. ZWave *is* a radio network, and there is no such thing as a completely reliable radio network. So, using a radio network includes having a plan for what to do when things temporarily go wrong. Declaring a node DEAD forever just because it did not respond to a message is a very severe reaction.- Implement a low rate polling so that we find out if nodes stop working before we actually want to use them ‘in anger’ - maybe poll all nodes every 5 or 10 minutes.
- When nodes go DEAD, I think we should still send ‘required’ messages.
Yes, or at least ping them in order to notice when they become reachable again.
Yes - it should. Every 2 minutes it should check for DEAD nodes and it should then try and heal them. I’m happy to look at some logs if you want to send them over...
Effectively this is what should happen - every 2 minutes… It might not work though on devices that don’t complete initialisation - I’ll need to check this.
When I get a chance I will take a look at these things. I’d like to get the current stuff merged soon since it seems that at the very least it’s working at least as well as the existing system.
You've got mail :-)
Effectively this is what should happen - every 2 minutes… It might not work though on devices that don’t complete initialisation - I’ll need to check this.
I am not sure if it is auto-healing at all. I see the regular dead node checks being logged but in the meantime I had not only the dead-on-start case but also the dead-during-operation case. Neither of them was answered by an automatic heal.
Timed and HABmin healing seems to run, however.
As far as I can tell no new issues appeared and a few of the old ones are lindered :-)
--
You received this message because you are subscribed to a topic in the Google Groups "openhab" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openhab/h3beXlo5iKU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openhab+u...@googlegroups.com.