Scott, I am not sure if the change to ignore m_expectedCallbackId is going to resolve the issue. Here is what I see is going on.
First SET transaction works like we (and the code) expects it to. ZW_SEND_DATA request with callback ID is received in .76 seconds.
2012-11-13 20:43:00:564 Node012, Sending command (Callback ID=0x68, Expected Reply=0x13) - Basic Set (Node=12): 0x01, 0x0a, 0x00, 0x13, 0x0c, 0x03, 0x20, 0x01, 0x00, 0x05, 0x68, 0xa5
2012-11-13 20:43:00:565 Node012, Queuing command: BasicCmd_Get (Node=12): 0x01, 0x09, 0x00, 0x13, 0x0c, 0x02, 0x20, 0x02, 0x05, 0x69, 0xa5
2012-11-13 20:43:00:566 Node012, ACK received CallbackId 0x68 Reply 0x13
2012-11-13 20:43:00:570 Node012, Received: 0x01, 0x04, 0x01, 0x13, 0x01, 0xe8
2012-11-13 20:43:00:570 Node012, ZW_SEND_DATA delivered to Z-Wave stack
2012-11-13 20:43:01:262 Node012, Received: 0x01, 0x09, 0x00, 0x04, 0x00, 0x0c, 0x03, 0x20, 0x03, 0x00, 0xde
2012-11-13 20:43:01:262 2012-11-13 20:43:01:262 Node012, Received Basic report from node 12: level=0
2012-11-13 20:43:01:263 Node012, Refreshed Value: old value=255, new value=0, type=uint8
2012-11-13 20:43:01:263 Node012, Changes to this value are not verified
2012-11-13 20:43:01:330 Node012, Received: 0x01, 0x05, 0x00, 0x13, 0x68, 0x00, 0x81
2012-11-13 20:43:01:330 Node012, ZW_SEND_DATA Request with callback ID 0x68 received (expected 0x68)
2012-11-13 20:43:01:330 Node012, Expected reply was received
2012-11-13 20:43:01:330 Node012, Message transaction complete
The second GET transaction is messy but does complete.
2012-11-13 20:43:01:331 Node012, Sending command (Callback ID=0x69, Expected Reply=0x04) - BasicCmd_Get (Node=12): 0x01, 0x09, 0x00, 0x13, 0x0c, 0x02, 0x20, 0x02, 0x05, 0x69, 0xa5
2012-11-13 20:43:01:333 Node012, ACK received CallbackId 0x69 Reply 0x04
2012-11-13 20:43:01:336 Node012, Received: 0x01, 0x04, 0x01, 0x13, 0x01, 0xe8
2012-11-13 20:43:01:336 Node012, ZW_SEND_DATA delivered to Z-Wave stack
2012-11-13 20:43:01:399 Node012, Received: 0x01, 0x09, 0x00, 0x04, 0x00, 0x0c, 0x03, 0x20, 0x03, 0x00, 0xde
2012-11-13 20:43:01:399 2012-11-13 20:43:01:399 Node012, Received Basic report from node 12: level=0
2012-11-13 20:43:01:400 Node012, Refreshed Value: old value=0, new value=0, type=uint8
2012-11-13 20:43:01:400 Node012, Changes to this value are not verified
2012-11-13 20:43:03:332 Node012, WriteMsg Wait Timeout m_currentMsg=b0534730
2012-11-13 20:43:03:332
We see the basic report response come back from the remote device pretty quickly (.068 seconds) but we never receive the ZW_SEND_DATA request. The big question is what happened to that final ZW_SEND_DATA? Note that all the retransmissions use the same callback ID of 0x69.
We time out and retransmit again. So now we have two outstanding ZW_SEND_DATAs out there. Still no final ZW_SEND_DATA request with the final transmit status. Same callback ID of 0x69.
So we transmit a third and final time based on the current code using 3 transmissions. This time the message isn't transmitted (0x00 response). However the final ZW_SEND_DATA request with callback ID of 0x69 is received. I am going to claim this is the first ZW_SEND_DATA request that just took this long to get back to the controller and our code wasn't patient enough to wait for it. We already received the basic report at 20:43:01:399. It just took 4.2 seconds for the ZW_SEND_DATA request to get there!
2012-11-13 20:43:05:948 Node012, Received Basic report from node 12: level=0
2012-11-13 20:43:05:948 Node012, Refreshed Value: old value=0, new value=0, type=uint8
2012-11-13 20:43:05:948 Node012, Changes to this value are not verified
We process the next read message which is the second (attempt 2) basic report.
2012-11-13 20:43:05:948 Node012, Expected reply and command class was received
2012-11-13 20:43:05:948 Node012, Message transaction complete
2012-11-13 20:43:05:948
Then we complete the transaction based on the ZW_SEND_DATA request callback id 0x69. It is done but more responses are flowing in from the retransmissions.
2012-11-13 20:43:09:679 Received: 0x01, 0x05, 0x00, 0x13, 0x69, 0x00, 0x80
2012-11-13 20:43:09:679 ZW_SEND_DATA Request with callback ID 0x69 received (expected 0x00)
2012-11-13 20:43:09:679 WARNING: Unexpected Callback ID received
The second retransmission's ZW_SEND_DATA request with the same callback ID of 0x69 arrives but the code is no longer expecting it.
2012-11-13 20:43:09:757 Node012, Received: 0x01, 0x09, 0x00, 0x04, 0x00, 0x0c, 0x03, 0x20, 0x03, 0x00, 0xde
2012-11-13 20:43:09:757 2012-11-13 20:43:09:757 Node012, Received Basic report from node 12: level=0
2012-11-13 20:43:09:757 Node012, Refreshed Value: old value=0, new value=0, type=uint8
2012-11-13 20:43:09:757 Node012, Changes to this value are not verified
2012-11-13 20:43:09:958 Node012, Received: 0x01, 0x09, 0x00, 0x04, 0x00, 0x0c, 0x03, 0x20, 0x03, 0x00, 0xde
2012-11-13 20:43:09:959 2012-11-13 20:43:09:959 Node012, Received Basic report from node 12: level=0
2012-11-13 20:43:09:959 Node012, Refreshed Value: old value=0, new value=0, type=uint8
2012-11-13 20:43:09:959 Node012, Changes to this value are not verified
Then two more basic reports flow in. Unless the device is set up to asynchronously deliver basic reports I am at a loss to explain this one other then they are retransmissions from the remove device because it never saw an ACK from the controller.
So the big question for me is: why does the first round trip sequence (SET) complete in under 1 second but the second transaction (GET) takes over 4 seconds? This is the smoking gun here. I would bet on RF interference or nodes that are barely within range. It is claimed the Vera did a better job but I would love to see more message sequences from the Vera as it is quite possible it had the same issue and their timeouts were more adaptive then ours.
I don't think relaxing the m_expectedCallbackId test will change anything since all the retransmitted frames use the same callback id.
I have been adding code to measure lost messages and round trip times in an effort to categorize device reliability. Right now there is no clear feedback on how reliable the communications are between devices. It also isn't clear why things are unreliable. Getting some more data will help us determine if it is environment, library limitations (or bugs), user error, etc. Also better RTT data will help us improve our timeout algorithms. TCP has solved all these problems so there is no doubt we can too.
> It looks like the retries are because the library only considers a "message transaction" complete if it follows the expected order. In your case, the information arrives, but the strict order isn't followed. I've just committed a revision that relaxes the criteria for when a transaction is complete...this should solve your issue and, I hope, not introduce other issues. For the moment, I've just commented out one part of a multi-prong test (in Driver::ProcessMsg) in case we find some other issues and have to try a different strategy.
I agree with your analysis...and I began a similar analysis of the log sequence as a possible example of the controller "hanging." It would be great if we can really fix the issue over time (and, like you, I'm focused on it), but the change I've made does address it in the near term. I've been thinking about changing the code to use a different callback id for retries (this would involve changing that byte as well as the checksum before resending), which would help with diagnostics.
It isn't clear whether the frame received at 43:01.399 is a duplicate response to the 43:00.564 request or a really fast response to the 43:01.331 request. If it's the latter, then I'm not sure we really care what happened to the second ZW_SEND_DATA message in the transaction...we asked for data and received it. (That was the logic behind relaxing the condition.) But if it is a duplicate response to the first message (and I have seen a fair number of "dupes" on my network over time), then we haven't really re-read the value and we have to wait for the second ZW_SEND_DATA to come through.
I'll try to prioritize changing the callback ID for each retry and see what that tells us about which messages belong to which transaction.
The trace above is just switching on node 12. I have a lot of these retries as well in other parts (e.g. at system start up). I will try to add a more complete (and large) logfile later (Where is a good place to dump these?)
I wonder if it would be possible to add a configuration flag which would change message transmission behavior. As I understand Basic Set is sent with TRANSMIT_OPTION_ACK which requires the message to be acknowledged at each hop towards the destination. However, openzwave uses immediately after the Basic Set a Basic Get to confirm if the transmission was successful. Thus, two mechanisms ensuring reliability are used concurrently. In my opinion either we should send a message with TRANSMIT_OPTION_ACK and in this case we would not need the Basic Get (as we should know from the Basic Set with TRANSMIT_OPTION_ACK if it was delivered or not) or we should use Basic Set without TRANSMIT_OPTION_ACK followed by Basic Get. (Which seems to be more what the vera controller does; vera knows as well if messages were properly processed but it does that without the Basic Get after the Basic Set).
I do not believe it is a network problem with lost messages as I can use the vera controller included in the same network to send messages to node 12 without problem (both controllers use the same network and same routing, one works one has some problems). I will try to add more traces to show similarities/differences here.
Don't do too much testing just yet. I'm going to try to write the code that changes the callback ID for retries today or tomorrow. That will (hopefully) give us better insight into what's going on. On your other point about changing the TRANSMIT_OPTION, I'm open to doing that for efficiency purposes once we know what's going on. You may not have observed the following behavior on your network, but others (including me) have seen situations in which the "Set" is executed and acknowledged flawlessly but the value doesn't actually change. Similarly, we have seen devices report spurious changes (like temperature going from 68F to 0F to 68F in three sequential polls within 60 seconds). This abnormal behavior was the impetus behind "verifying" changed values and only trusting a value change when we've actually read it (sometimes twice!) from the device. Otherwise, we found that we couldn't "trust" the values held in OZW Value objects.
But we may be able to be more sparing in the use of TRANSMIT_OPTION_ACK...
I'll try to commit code with the changing callback IDs soon.
Ok, I wait a bit with testing to see what your changes may do.
I think it might be useful to be able to disable TRANSMIT_OPTION_ACK. I do not see the need to have two concurrent end-to-end reliability mechanisms on the go.
If you google "zwave routing" you find a document describing the ack process. It seems to me that a message requiring an ACK transported over 1 intermediate hop requires 8 individual messages. Without ack would be 2. Given that we confirm anyway after we should maybe reduce network load.
On Wednesday, November 14, 2012 4:28:58 PM UTC, dscott wrote:
> Don't do too much testing just yet. I'm going to try to write the code > that changes the callback ID for retries today or tomorrow. That will > (hopefully) give us better insight into what's going on. > On your other point about changing the TRANSMIT_OPTION, I'm open to doing > that for efficiency purposes once we know what's going on. You may not > have observed the following behavior on your network, but others (including > me) have seen situations in which the "Set" is executed and acknowledged > flawlessly but the value doesn't actually change. Similarly, we have seen > devices report spurious changes (like temperature going from 68F to 0F to > 68F in three sequential polls within 60 seconds). This abnormal behavior > was the impetus behind "verifying" changed values and only trusting a value > change when we've actually read it (sometimes twice!) from the device. > Otherwise, we found that we couldn't "trust" the values held in OZW Value > objects.
> But we may be able to be more sparing in the use of TRANSMIT_OPTION_ACK...
> I'll try to commit code with the changing callback IDs soon.
Staring at the Vera logs Utz sent revealed some different behavior then what we see from the library. It is why I was asking what firmware his Vera's controller is using.
With the vera the get data applicationcontroller update shows up after the ZW_SEND_DATA request. With the library it shows up in between the ZW_SEND_DATA response and the request. All of the Vera log's show round trips less then .2 seconds though some of his health checks didn't retrieve an ack.
Also the Vera's basic sets didn't return a basic report so I can see why you think the basic report at 20:43:01:262 may be from another transaction.
What I do see is that our transactions are really messy and lots of data is flying about so keeping track of it all is very hard. The Vera transaction is pretty straight forward with simple command/response interactions. Is it the controller firmware that is different or what we are doing?
Is it possible to pop out the usb stick from the Vera and use it against the library to see if anything is different?
> I agree with your analysis...and I began a similar analysis of the log sequence as a possible example of the controller "hanging." It would be great if we can really fix the issue over time (and, like you, I'm focused on it), but the change I've made does address it in the near term. I've been thinking about changing the code to use a different callback id for retries (this would involve changing that byte as well as the checksum before resending), which would help with diagnostics.
> It isn't clear whether the frame received at 43:01.399 is a duplicate response to the 43:00.564 request or a really fast response to the 43:01.331 request. If it's the latter, then I'm not sure we really care what happened to the second ZW_SEND_DATA message in the transaction...we asked for data and received it. (That was the logic behind relaxing the condition.) But if it is a duplicate response to the first message (and I have seen a fair number of "dupes" on my network over time), then we haven't really re-read the value and we have to wait for the second ZW_SEND_DATA to come through.
> I'll try to prioritize changing the callback ID for each retry and see what that tells us about which messages belong to which transaction.
I agree that the Vera transactions for the same thing look much simpler. If this way works it might be beneficial to try this (simpler is better).
The Vera2 I have has the zwave chip integrated, I can't take it out. However, Vera 1 had it external and it was the same as a zstick. The chip is used via serial in Vera so I assume it is the same as a USB zstick just integrated. Vera is an openwrt Linux router with odd processor. Compiling openzwave for this hardware is generally possible but not easy (tool chain setup).
Generally I am quite convinced it is the same hardware, not with a particular special firmware. I have a look if I can check firmware numbers...
I've been testing the code that changes the callback ID for retries of the same message. Attached is an interesting log snippet that highlights some issues with our code. Some observations below, then a summary of the sequencing under the five attempts for this message (attempt 5 is a "clean" transaction, executing as expected). I'll be investigating some of the issues cited below and encourage others to do the same. I'll commit a change that changes the callback ID shortly...I haven't seen any adverse effects.
Observations: 1) interspersed second and third attempts demonstrate that the "report" may(?) arrive before the ZW_SEND_DATA Request message...or was the report associated with Attempt 1? 2) not sure it's a coincidence that the "ZW_SEND_DATA_Request" message appears to arrive from the controller at the same instant (05:39:14:078) the timeout occurs--need to look at the code more carefully 3) note the overlapping "Put"s (data received from the controller and "Write". Even though some data comes in from the controller, it isn't passed to the application until later(see 05:39:14:098) 4) further effect of the same issue, the CAN sent by the controller isn't provided to the application until 100ms later (received at 05:39:14:158, processed at :278) 5) we need to focus on the cause of the delay between receipt of bytes from the controller and the time they are supplied to the application (maybe make sure there's nothing in the "read from controller" buffer before writing?) 6) I have to look at my code to evaluate the message at 05:39:14:448 (not sure what makes it think the callback ID is 0xd8) 7) As the summary (below) shows, when all was complete, three messages were sent without a CAN and three ZW_SEND_DATA Receiveds and 3 Reports came back
First attempt 05:39:03:342 Message written to controller (attempt 1, callback 0xd5) 05:39:03:523 ZW_SEND_DATA delivered (presumably 0xd5) 05:39:03:783 ZW_SEND_DATA Request (callback 0xd5) 05:39:08:350 Timeout
Second and third attempts (interspersed) 05:39:08:390 Message written to controller (attempt 2, callback 0xd6) 05:39:08:620 ZW_SEND_DATA delivered (presumably 0xd6) 05:39:10:523 Report received (unclear whether this is 0xd5 or 0xd6) 05:39:14:078 Timeout 05:39:14:098 ZW_SEND_DATA Request message received from controller, but not processed until 05:39:14:208 05:39:14:138 Message written to controller (attempt 3, callback 0xd7) 05:39:14:158 Controller sends CAN , but not processed until 05:39:14:278 05:39:14:208 ZW_SEND_DATA Request (callback 0xd6) from 05:39:14:098 processed (but now expecting 0xd7) 05:39:14:238 Report received from controller, but not processed until 05:39:14:398 05:39:14:278 Application processes the 05:39:14:158 CAN, triggering resend
Fourth attempt 05:39:14:308 Message written to controller (attempt 4, callback 0xd8) 05:39:14:328 Controller sends CAN , but not processed until 05:39:14:488 05:39:14:398 Report received from controller 05:39:14:488 Application processes the 05:39:14:328 CAN, triggering resend
Fifth attempt 05:39:14:528 Message written to controller (attempt 5, callback 0xd9) 05:39:14:639 ZW_SEND_DATA delivered (presumably 0xd9) 05:39:14:759 ZW_SEND_DATA Request (callback 0xd9) 05:39:14:949 Report received from controller Message transaction complete
Summary 5 messages written 3 ZW_SEND_DATA delivered messages received, 2 CANs received 3 ZW_SEND_DATA Request messages received (0xd5, 0xd6, 0xd9...the three that weren't CANned) 3 reports received
when using FUNC_ID_ZW_SEND_DATA the following things happen: 1) an ACK is received from the controller 2) then a message starting with SOF is coming in, normally of length 4 (e.g. 0x01, 0x04, 0x01, 0x13, 0x00, 0xe9). I believe this message just indicates that the controller is processing the FUNC_ID_ZW_SEND_DATA request. This message does not contain the CallbackId. I guess we need to send an ACK to the controller for this message (which I think is done). 3) then another message starting with SOF is coming in, normally of length 5 (e.g. 0x01, 0x05, 0x00, 0x13, 0x9f, 0x00, 0x76). This message confirms that the FUNC_ID_ZW_SEND_DATA request has been processed by the controller. Here the CallbackId is included. I guess we need to send an ACK to the controller for this message (which I think is done).
I think that openzwave gets mixed up with the messages in step 2 and 3. Message in 2 is interpreted as it would be 3, the callbackid is not found (as not included) and the message is marked as failed delivered and a resend is triggered. While this is happening the message of step 3 might come in. Depending on timing this message may still be processed and all is good, if not we have 3 retries and a fail at the end.
Can someone have a look at the code that understands it better and see if this might be the culprit regarding the behaviour described here?
> when using FUNC_ID_ZW_SEND_DATA the following things happen:
> 1) an ACK is received from the controller
> 2) then a message starting with SOF is coming in, normally of length 4 (e.g. 0x01, 0x04, 0x01, 0x13, 0x00, 0xe9). I believe this message just indicates that the controller is processing the FUNC_ID_ZW_SEND_DATA request. This message does not contain the CallbackId. I guess we need to send an ACK to the controller for this message (which I think is done).
> 3) then another message starting with SOF is coming in, normally of length 5 (e.g. 0x01, 0x05, 0x00, 0x13, 0x9f, 0x00, 0x76). This message confirms that the FUNC_ID_ZW_SEND_DATA request has been processed by the controller. Here the CallbackId is included. I guess we need to send an ACK to the controller for this message (which I think is done).
> I think that openzwave gets mixed up with the messages in step 2 and 3. Message in 2 is interpreted as it would be 3, the callbackid is not found (as not included) and the message is marked as failed delivered and a resend is triggered. While this is happening the message of step 3 might come in. Depending on timing this message may still be processed and all is good, if not we have 3 retries and a fail at the end.
> Can someone have a look at the code that understands it better and see if this might be the culprit regarding the behaviour described here?
I can try this tomorrow. I assume it will only help partly as timing is one issue, interpreting a field in a message that is not there causes other issues as well (this is what I assume). I can't check it with the code in trunk as this currently spins on a printf when errors occur (I assume recent changes to deal with this).
This thread has been quiet for a while so I'm not certain of the context, but:
1) The theory offered today by Utz doesn't hold water. The library recognizes the difference between the two incoming messages (the 4 byte one and the 5 byte one). So the sequencing issue (which I think is the basis for this conjecture) remains an anomaly. I can provide logs to demonstrate this.
I can also confirm that extending the wait timeout doesn't solve the issue.
But maybe someone else (Linux?) will get a different result, so it's probably still worth a try.
-----Original Message-----
From: openzwave@googlegroups.com [mailto:openzwave@googlegroups.com] On Behalf Of utz
Sent: Wednesday, December 05, 2012 7:29 PM
To: openzwave@googlegroups.com
Subject: Re: [openzwave] Wait Timeout
I can try this tomorrow. I assume it will only help partly as timing is one issue, interpreting a field in a message that is not there causes other issues as well (this is what I assume). I can't check it with the code in trunk as this currently spins on a printf when errors occur (I assume recent changes to deal with this).
I agree with Scott. The second message comes from the controller as a response. The 0x01 in the third byte is checked by the code before the the function code in the fourth byte. The third message containing the callback ID has a 0x00 in the third byte and is a request. So you can see the code is able to distinguish these two messages correctly.
Scott, are the logs utz posted earlier in this thread show these anomalies? How does increasing the timeout change (or not change) the behavior? I would think it should minimize the issues at least.
Thanks,
Greg
On Dec 5, 2012, at 7:52 PM, D. Scott Miller wrote:
> This thread has been quiet for a while so I'm not certain of the context, but:
> 1) The theory offered today by Utz doesn't hold water. The library recognizes the difference between the two incoming messages (the 4 byte one and the 5 byte one). So the sequencing issue (which I think is the basis for this conjecture) remains an anomaly. I can provide logs to demonstrate this.
> I can also confirm that extending the wait timeout doesn't solve the issue.
> But maybe someone else (Linux?) will get a different result, so it's probably still worth a try.
> Scott
> -----Original Message-----
> From: openzwave@googlegroups.com [mailto:openzwave@googlegroups.com] On Behalf Of utz
> Sent: Wednesday, December 05, 2012 7:29 PM
> To: openzwave@googlegroups.com
> Subject: Re: [openzwave] Wait Timeout
> I can try this tomorrow. I assume it will only help partly as timing is one issue, interpreting a field in a message that is not there causes other issues as well (this is what I assume). I can't check it with the code in trunk as this currently spins on a printf when errors occur (I assume recent changes to deal with this).
Utz's original message:
Here another theory:
when using FUNC_ID_ZW_SEND_DATA the following things happen:
1) an ACK is received from the controller
2) then a message starting with SOF is coming in, normally of length 4 (e.g. 0x01, 0x04, 0x01, 0x13, 0x00, 0xe9). I believe this message just indicates that the controller is processing the FUNC_ID_ZW_SEND_DATA request. This message does not contain the CallbackId. I guess we need to send an ACK to the controller for this message (which I think is done).
3) then another message starting with SOF is coming in, normally of length 5 (e.g. 0x01, 0x05, 0x00, 0x13, 0x9f, 0x00, 0x76). This message confirms that the FUNC_ID_ZW_SEND_DATA request has been processed by the controller. Here the CallbackId is included. I guess we need to send an ACK to the controller for this message (which I think is done).
I think that openzwave gets mixed up with the messages in step 2 and 3. Message in 2 is interpreted as it would be 3, the callbackid is not found (as not included) and the message is marked as failed delivered and a resend is triggered. While this is happening the message of step 3 might come in. Depending on timing this message may still be processed and all is good, if not we have 3 retries and a fail at the end.
Can someone have a look at the code that understands it better and see if this might be the culprit regarding the behaviour described here?
yep, my assessment of the situation is crap; that is not the problem indeed. I think I was staring too much on program code last night and got lost.
The problem is shown in the first post of the thread. Two times the same case of transmitting to a node. In one case it works, the other one I get the 3 retries and then the stack gives up to deliver the message. I have this issue a lot in my setup, basically communicating with all nodes. As a result I have messages sometimes delivered and sometimes not. This is bad if you control your heating at home as setpoint changes might get missed (heating running all night). So I have at the moment 3 options : 1) build a layer on top of openzwave which monitors success of transmissions and then retransmit on this level if needed. 2) fix openzwave 3) write my own little stack that can do far less than openzwave but is more reliable (for the few things I want to do). I would prefer 2) as 1) is a hack and 3) is duplication of work already done.
I try the timer setting, I run on linux, maybe its different there.
On Thursday, December 6, 2012 5:34:59 AM UTC, satz wrote:
> I agree with Scott. The second message comes from the controller as a > response. The 0x01 in the third byte is checked by the code before the the > function code in the fourth byte. The third message containing the callback > ID has a 0x00 in the third byte and is a request. So you can see the code > is able to distinguish these two messages correctly.
> Scott, are the logs utz posted earlier in this thread show these > anomalies? How does increasing the timeout change (or not change) the > behavior? I would think it should minimize the issues at least.
> Thanks, > Greg
> On Dec 5, 2012, at 7:52 PM, D. Scott Miller wrote:
> > This thread has been quiet for a while so I'm not certain of the > context, but: > > 1) The theory offered today by Utz doesn't hold water. The library > recognizes the difference between the two incoming messages (the 4 byte one > and the 5 byte one). So the sequencing issue (which I think is the basis > for this conjecture) remains an anomaly. I can provide logs to demonstrate > this.
> > I can also confirm that extending the wait timeout doesn't solve the > issue.
> > But maybe someone else (Linux?) will get a different result, so it's > probably still worth a try.
> > Scott
> > -----Original Message----- > > From: open...@googlegroups.com <javascript:> [mailto: > open...@googlegroups.com <javascript:>] On Behalf Of utz > > Sent: Wednesday, December 05, 2012 7:29 PM > > To: open...@googlegroups.com <javascript:> > > Subject: Re: [openzwave] Wait Timeout
> > I can try this tomorrow. I assume it will only help partly as timing is > one issue, interpreting a field in a message that is not there causes other > issues as well (this is what I assume). I can't check it with the code in > trunk as this currently spins on a printf when errors occur (I assume > recent changes to deal with this).
> Utz's original message:
> Here another theory:
> when using FUNC_ID_ZW_SEND_DATA the following things happen: > 1) an ACK is received from the controller > 2) then a message starting with SOF is coming in, normally of length 4 > (e.g. 0x01, 0x04, 0x01, 0x13, 0x00, 0xe9). I believe this message just > indicates that the controller is processing the FUNC_ID_ZW_SEND_DATA > request. This message does not contain the CallbackId. I guess we need to > send an ACK to the controller for this message (which I think is done). > 3) then another message starting with SOF is coming in, normally of length > 5 (e.g. 0x01, 0x05, 0x00, 0x13, 0x9f, 0x00, 0x76). This message confirms > that the FUNC_ID_ZW_SEND_DATA request has been processed by the controller. > Here the CallbackId is included. I guess we need to send an ACK to the > controller for this message (which I think is done).
> I think that openzwave gets mixed up with the messages in step 2 and 3. > Message in 2 is interpreted as it would be 3, the callbackid is not found > (as not included) and the message is marked as failed delivered and a > resend is triggered. While this is happening the message of step 3 might > come in. Depending on timing this message may still be processed and all is > good, if not we have 3 retries and a fail at the end.
> Can someone have a look at the code that understands it better and see if > this might be the culprit regarding the behaviour described here?
I am beginning to wonder if using SUC has something to do with more varying delays. The fact that it works more consistently with Vera is what keeps me coming back to what we are doing differently in the library.
I agree the way the library handles delays needs to change. Changing your delay should change the behavior toward the better. For some reason Scott doesn't see that and this may be part of the clue too. In a soon to be committed patch a notification for timeouts will be generated and counters will track that interaction. You folks with larger networks are testing the library further and it will require more changes to keep up with it.
> yep, my assessment of the situation is crap; that is not the problem indeed. I think I was staring too much on program code last night and got lost.
> The problem is shown in the first post of the thread. Two times the same case of transmitting to a node. In one case it works, the other one I get the 3 retries and then the stack gives up to deliver the message. I have this issue a lot in my setup, basically communicating with all nodes. As a result I have messages sometimes delivered and sometimes not. This is bad if you control your heating at home as setpoint changes might get missed (heating running all night). So I have at the moment 3 options : 1) build a layer on top of openzwave which monitors success of transmissions and then retransmit on this level if needed. 2) fix openzwave 3) write my own little stack that can do far less than openzwave but is more reliable (for the few things I want to do). I would prefer 2) as 1) is a hack and 3) is duplication of work already done.
> I try the timer setting, I run on linux, maybe its different there.
> Utz
> On Thursday, December 6, 2012 5:34:59 AM UTC, satz wrote:
> I agree with Scott. The second message comes from the controller as a response. The 0x01 in the third byte is checked by the code before the the function code in the fourth byte. The third message containing the callback ID has a 0x00 in the third byte and is a request. So you can see the code is able to distinguish these two messages correctly.
> Scott, are the logs utz posted earlier in this thread show these anomalies? How does increasing the timeout change (or not change) the behavior? I would think it should minimize the issues at least.
> Thanks, > Greg
> On Dec 5, 2012, at 7:52 PM, D. Scott Miller wrote:
> > This thread has been quiet for a while so I'm not certain of the context, but: > > 1) The theory offered today by Utz doesn't hold water. The library recognizes the difference between the two incoming messages (the 4 byte one and the 5 byte one). So the sequencing issue (which I think is the basis for this conjecture) remains an anomaly. I can provide logs to demonstrate this.
> > I can also confirm that extending the wait timeout doesn't solve the issue.
> > But maybe someone else (Linux?) will get a different result, so it's probably still worth a try.
> > Scott
> > -----Original Message----- > > From: open...@googlegroups.com [mailto:open...@googlegroups.com] On Behalf Of utz > > Sent: Wednesday, December 05, 2012 7:29 PM > > To: open...@googlegroups.com > > Subject: Re: [openzwave] Wait Timeout
> > I can try this tomorrow. I assume it will only help partly as timing is one issue, interpreting a field in a message that is not there causes other issues as well (this is what I assume). I can't check it with the code in trunk as this currently spins on a printf when errors occur (I assume recent changes to deal with this).
> Utz's original message:
> Here another theory:
> when using FUNC_ID_ZW_SEND_DATA the following things happen: > 1) an ACK is received from the controller > 2) then a message starting with SOF is coming in, normally of length 4 (e.g. 0x01, 0x04, 0x01, 0x13, 0x00, 0xe9). I believe this message just indicates that the controller is processing the FUNC_ID_ZW_SEND_DATA request. This message does not contain the CallbackId. I guess we need to send an ACK to the controller for this message (which I think is done). > 3) then another message starting with SOF is coming in, normally of length 5 (e.g. 0x01, 0x05, 0x00, 0x13, 0x9f, 0x00, 0x76). This message confirms that the FUNC_ID_ZW_SEND_DATA request has been processed by the controller. Here the CallbackId is included. I guess we need to send an ACK to the controller for this message (which I think is done).
> I think that openzwave gets mixed up with the messages in step 2 and 3. Message in 2 is interpreted as it would be 3, the callbackid is not found (as not included) and the message is marked as failed delivered and a resend is triggered. While this is happening the message of step 3 might come in.. Depending on timing this message may still be processed and all is good, if not we have 3 retries and a fail at the end.
> Can someone have a look at the code that understands it better and see if this might be the culprit regarding the behaviour described here?
I don't use a SUC on the network and have the problem (as you know).
From: openzwave@googlegroups.com [mailto:openzwave@googlegroups.com] On Behalf Of Greg Satz
Sent: Thursday, December 06, 2012 11:55 AM
To: openzwave@googlegroups.com
Subject: Re: [openzwave] Wait Timeout
I am beginning to wonder if using SUC has something to do with more varying delays. The fact that it works more consistently with Vera is what keeps me coming back to what we are doing differently in the library.
I agree the way the library handles delays needs to change. Changing your delay should change the behavior toward the better. For some reason Scott doesn't see that and this may be part of the clue too. In a soon to be committed patch a notification for timeouts will be generated and counters will track that interaction. You folks with larger networks are testing the library further and it will require more changes to keep up with it.
Thanks,
Greg
On Dec 6, 2012, at 3:05 AM, utz wrote:
Scott, Greg,
yep, my assessment of the situation is crap; that is not the problem indeed. I think I was staring too much on program code last night and got lost.
The problem is shown in the first post of the thread. Two times the same case of transmitting to a node. In one case it works, the other one I get the 3 retries and then the stack gives up to deliver the message. I have this issue a lot in my setup, basically communicating with all nodes. As a result I have messages sometimes delivered and sometimes not. This is bad if you control your heating at home as setpoint changes might get missed (heating running all night). So I have at the moment 3 options : 1) build a layer on top of openzwave which monitors success of transmissions and then retransmit on this level if needed. 2) fix openzwave 3) write my own little stack that can do far less than openzwave but is more reliable (for the few things I want to do). I would prefer 2) as 1) is a hack and 3) is duplication of work already done.
I try the timer setting, I run on linux, maybe its different there.
Utz
On Thursday, December 6, 2012 5:34:59 AM UTC, satz wrote:
I agree with Scott. The second message comes from the controller as a response. The 0x01 in the third byte is checked by the code before the the function code in the fourth byte. The third message containing the callback ID has a 0x00 in the third byte and is a request. So you can see the code is able to distinguish these two messages correctly.
Scott, are the logs utz posted earlier in this thread show these anomalies? How does increasing the timeout change (or not change) the behavior? I would think it should minimize the issues at least.
Thanks,
Greg
On Dec 5, 2012, at 7:52 PM, D. Scott Miller wrote:
> This thread has been quiet for a while so I'm not certain of the context, but:
> 1) The theory offered today by Utz doesn't hold water. The library recognizes the difference between the two incoming messages (the 4 byte one and the 5 byte one). So the sequencing issue (which I think is the basis for this conjecture) remains an anomaly. I can provide logs to demonstrate this.
> I can also confirm that extending the wait timeout doesn't solve the issue.
> But maybe someone else (Linux?) will get a different result, so it's probably still worth a try.
> Scott
> -----Original Message-----
> From: open...@googlegroups.com<javascript:> [mailto:open...@googlegroups.com<javascript:>] On Behalf Of utz
> Sent: Wednesday, December 05, 2012 7:29 PM
> To: open...@googlegroups.com<javascript:>
> Subject: Re: [openzwave] Wait Timeout
> I can try this tomorrow. I assume it will only help partly as timing is one issue, interpreting a field in a message that is not there causes other issues as well (this is what I assume). I can't check it with the code in trunk as this currently spins on a printf when errors occur (I assume recent changes to deal with this).
Utz's original message:
Here another theory:
when using FUNC_ID_ZW_SEND_DATA the following things happen:
1) an ACK is received from the controller
2) then a message starting with SOF is coming in, normally of length 4 (e.g. 0x01, 0x04, 0x01, 0x13, 0x00, 0xe9). I believe this message just indicates that the controller is processing the FUNC_ID_ZW_SEND_DATA request. This message does not contain the CallbackId. I guess we need to send an ACK to the controller for this message (which I think is done).
3) then another message starting with SOF is coming in, normally of length 5 (e.g. 0x01, 0x05, 0x00, 0x13, 0x9f, 0x00, 0x76). This message confirms that the FUNC_ID_ZW_SEND_DATA request has been processed by the controller. Here the CallbackId is included. I guess we need to send an ACK to the controller for this message (which I think is done).
I think that openzwave gets mixed up with the messages in step 2 and 3. Message in 2 is interpreted as it would be 3, the callbackid is not found (as not included) and the message is marked as failed delivered and a resend is triggered. While this is happening the message of step 3 might come in.. Depending on timing this message may still be processed and all is good, if not we have 3 retries and a fail at the end.
Can someone have a look at the code that understands it better and see if this might be the culprit regarding the behaviour described here?
I did this and it improved things. Network seems to operate better (Still looking at it). (I am not using the most recent version from trunk btw)
However, still see some of these errors. Below an example sending to node 9. It says here that it expected 0x00 as callbackid and it received 0x6e, therefore it assumes it must be retransmitted. However, 0x6e seems perfectly ok to me as a callback. Why would it expect 0x00?
On Wednesday, December 5, 2012 11:48:49 PM UTC, satz wrote:
> Try increasing your timeout. In the file cpp/src/Defs.h at:
> #define RETRY_TIMEOUT 2000 // Retry send after two seconds
> change the 2000 to 10000 and report back what happens.
> Thanks, > Greg
> On Dec 5, 2012, at 4:16 PM, utz wrote:
> > Here another theory:
> > when using FUNC_ID_ZW_SEND_DATA the following things happen: > > 1) an ACK is received from the controller > > 2) then a message starting with SOF is coming in, normally of length 4 > (e.g. 0x01, 0x04, 0x01, 0x13, 0x00, 0xe9). I believe this message just > indicates that the controller is processing the FUNC_ID_ZW_SEND_DATA > request. This message does not contain the CallbackId. I guess we need to > send an ACK to the controller for this message (which I think is done). > > 3) then another message starting with SOF is coming in, normally of > length 5 (e.g. 0x01, 0x05, 0x00, 0x13, 0x9f, 0x00, 0x76). This message > confirms that the FUNC_ID_ZW_SEND_DATA request has been processed by the > controller. Here the CallbackId is included. I guess we need to send an > ACK to the controller for this message (which I think is done).
> > I think that openzwave gets mixed up with the messages in step 2 and 3. > Message in 2 is interpreted as it would be 3, the callbackid is not found > (as not included) and the message is marked as failed delivered and a > resend is triggered. While this is happening the message of step 3 might > come in. Depending on timing this message may still be processed and all is > good, if not we have 3 retries and a fail at the end.
> > Can someone have a look at the code that understands it better and see > if this might be the culprit regarding the behaviour described here?
What I am seeing in this log sequence is a set request not getting an ACK in 4 seconds. Prior to increasing the timeout the library would have tried to retransmit after two seconds. With the increased timer the controller gets a chance to do its radio layer retransmissions until it decides to give up and return a No Ack error. What is even more interesting is the controller returns a success status 11ms later. We get the unexpected callback ID because the No Ack response satisfied the library's state machine that the transaction has completed and the callback ID is now unexpected. I haven't seen this fail/success scenario before because our timeout timer got in the way. The succeeding get responds in 12ms and the resulting data arrives 7ms after that.
Wireless networks aren't going to be 100% reliable. The question is how do we recover and whether we lose data. In this case the set actually worked but it took a little more then 4 seconds to figure it out. We cannot control how the controller and its radios work. So I wouldn't call this a bug and the following get should show that the 'on' state is accurate.
To the question of why one transaction takes over 4 seconds and another takes less then quarter second is a good question. It could be environmental, such as radio interference or it could still be something to do with the library though it isn't clear how yet. It could also have to do with the difference in location of your usb stick vs. where the vera is/was situated. Even a few feet one direction or another can make a difference depending on your topology. It would help if we had more statistics in the library that can track trends and show where the network may be experiencing problems. That is what we are trying to work on now.
> I did this and it improved things. Network seems to operate better (Still looking at it). (I am not using the most recent version from trunk btw)
> However, still see some of these errors. Below an example sending to node 9. It says here that it expected 0x00 as callbackid and it received 0x6e, therefore it assumes it must be retransmitted. However, 0x6e seems perfectly ok to me as a callback. Why would it expect 0x00?
> 2012-12-06 20:33:36:894 Node009, Queuing command: SwitchBinary Set (Node=9): 0x01, 0x0a, 0x00, 0x13, 0x09, 0x03, 0x25, 0x01, 0xff, 0x05, 0x6e, 0x5c
> 2012-12-06 20:33:36:894 Node009, Queuing command: SwitchBinaryCmd_Get (Node=9): 0x01, 0x09, 0x00, 0x13, 0x09, 0x02, 0x25, 0x02, 0x05, 0x6f, 0xa3
> 2012-12-06 20:33:36:899 Node009, WriteMsg WriteNextMsg m_currentMsg=af68e398
> 2012-12-06 20:33:36:899 > 2012-12-06 20:33:36:899 Node009, Sending command (Callback ID=0x6e, Expected Reply=0x13) - SwitchBinary Set (Node=9): 0x01, 0x0a, 0x00, 0x13, 0x09, 0x03, 0x25, 0x01, 0xff, 0x05, 0x6e, 0x5c
> 2012-12-06 20:33:36:901 Node009, ACK received CallbackId 0x6e Reply 0x13
> 2012-12-06 20:33:36:904 Node009, Received: 0x01, 0x04, 0x01, 0x13, 0x01, 0xe8
> 2012-12-06 20:33:36:904 Node009, ZW_SEND_DATA delivered to Z-Wave stack
> 2012-12-06 20:33:40:998 Node009, Received: 0x01, 0x05, 0x00, 0x13, 0x6e, 0x01, 0x86
> 2012-12-06 20:33:40:999 Node009, ZW_SEND_DATA Request with callback ID 0x6e received (expected 0x6e)
> 2012-12-06 20:33:40:999 Node009, ERROR: ZW_SEND_DATA failed. No ACK received - device may be asleep.
> 2012-12-06 20:33:40:999 Node009, WARNING: Device is not a sleeping node - retrying the send.
> 2012-12-06 20:33:40:999 Node009, Expected callbackId was received
> 2012-12-06 20:33:40:999 IsExpectedReply: m_expectedNodeId = 9 m_expectedReply = 13
> 2012-12-06 20:33:41:010 Node009, Received: 0x01, 0x05, 0x00, 0x13, 0x6e, 0x00, 0x87
> 2012-12-06 20:33:41:011 Node009, ZW_SEND_DATA Request with callback ID 0x6e received (expected 0x00)
> 2012-12-06 20:33:41:011 Node009, WARNING: Unexpected Callback ID received
> 2012-12-06 20:33:41:011 Node009, Expected reply was received
> 2012-12-06 20:33:41:011 Node009, Message transaction complete
> 2012-12-06 20:33:41:011 > 2012-12-06 20:33:41:011 Node009, WriteMsg WriteNextMsg m_currentMsg=a9c41a88
> 2012-12-06 20:33:41:012 > 2012-12-06 20:33:41:012 Node009, Sending command (Callback ID=0x6f, Expected Reply=0x04) - SwitchBinaryCmd_Get (Node=9): 0x01, 0x09, 0x00, 0x13, 0x09, 0x02, 0x25, 0x02, 0x05, 0x6f, 0xa3
> 2012-12-06 20:33:41:014 Node009, ACK received CallbackId 0x6f Reply 0x04
> 2012-12-06 20:33:41:017 Node009, Received: 0x01, 0x04, 0x01, 0x13, 0x01, 0xe8
> 2012-12-06 20:33:41:018 Node009, ZW_SEND_DATA delivered to Z-Wave stack
> 2012-12-06 20:33:41:029 Node009, Received: 0x01, 0x05, 0x00, 0x13, 0x6f, 0x00, 0x86
> 2012-12-06 20:33:41:030 Node009, ZW_SEND_DATA Request with callback ID 0x6f received (expected 0x6f)
> 2012-12-06 20:33:41:037 Node009, Received: 0x01, 0x09, 0x00, 0x04, 0x00, 0x09, 0x03, 0x25, 0x03, 0xff, 0x21
> 2012-12-06 20:33:41:037 > 2012-12-06 20:33:41:037 Node009, Received SwitchBinary report from node 9: level=On
> 2012-12-06 20:33:41:037 Node009, Refreshed Value: old value=false, new value=true, type=bool
> 2012-12-06 20:33:41:038 Node009, Changes to this value are not verified
> 2012-12-06 20:33:41:038 Node009, Expected reply and command class was received
> 2012-12-06 20:33:41:038 Node009, Message transaction complete
> On Wednesday, December 5, 2012 11:48:49 PM UTC, satz wrote:
> Try increasing your timeout. In the file cpp/src/Defs.h at:
> #define RETRY_TIMEOUT 2000 // Retry send after two seconds
> change the 2000 to 10000 and report back what happens.
> Thanks, > Greg
> On Dec 5, 2012, at 4:16 PM, utz wrote:
> > Here another theory:
> > when using FUNC_ID_ZW_SEND_DATA the following things happen: > > 1) an ACK is received from the controller > > 2) then a message starting with SOF is coming in, normally of length 4 (e.g. 0x01, 0x04, 0x01, 0x13, 0x00, 0xe9). I believe this message just indicates that the controller is processing the FUNC_ID_ZW_SEND_DATA request. This message does not contain the CallbackId. I guess we need to send an ACK to the controller for this message (which I think is done). > > 3) then another message starting with SOF is coming in, normally of length 5 (e.g. 0x01, 0x05, 0x00, 0x13, 0x9f, 0x00, 0x76). This message confirms that the FUNC_ID_ZW_SEND_DATA request has been processed by the controller. Here the CallbackId is included. I guess we need to send an ACK to the controller for this message (which I think is done).
> > I think that openzwave gets mixed up with the messages in step 2 and 3. Message in 2 is interpreted as it would be 3, the callbackid is not found (as not included) and the message is marked as failed delivered and a resend is triggered. While this is happening the message of step 3 might come in. Depending on timing this message may still be processed and all is good, if not we have 3 retries and a fail at the end.
> > Can someone have a look at the code that understands it better and see if this might be the culprit regarding the behaviour described here?
I assume that you changed the timeout to 10000 as Greg suggested. If you can, please be on the lookout for messages received right at the end of the timeout threshold. That was a symptom I noted…the longer I set the timeout, the longer it took for some responses to come back. If you see the same behavior, that would suggest that there’s a problem with OZW’s event/wait mechanics. If you don’t see it, then I must have been dreaming!
He had to have changed the timeout to something longer then the default 2000ms or the timeout would have fired prior to the no ack failure. I recall you having issues with the wait code since Mal wrote it. We spent quite some time trying to reproduce what you saw and never did. Seems like that is still an issue for you. I don't recall how it looked in the logs but it might be useful to post a snippet so others know what to look for. Are you saying that elongating the timeout timer causes the problem to occur?
Thanks,
Greg
On Dec 7, 2012, at 4:44 AM, D. Scott Miller wrote:
> I assume that you changed the timeout to 10000 as Greg suggested. If you can, please be on the lookout for messages received right at the end of the timeout threshold. That was a symptom I noted…the longer I set the timeout, the longer it took for some responses to come back. If you see the same behavior, that would suggest that there’s a problem with OZW’s event/wait mechanics. If you don’t see it, then I must have been dreaming!
I was asking because the behavior I saw was that nothing would come back during the timeout waiting period, but then it would (somehow!) show up when the timeout elapsed.
I asked the question because his log would have been consistent with what I have seen if he had set the timeout to 4000 instead of 10000.
I'll locate a snippet and post it. If you don't see something soon, it'll be because I've had to reset things to regenerate the error (which can take some time to show up).
From: openzwave@googlegroups.com [mailto:openzwave@googlegroups.com] On Behalf Of Greg Satz
Sent: Friday, December 07, 2012 12:17 PM
To: openzwave@googlegroups.com
Subject: Re: [openzwave] Wait Timeout
He had to have changed the timeout to something longer then the default 2000ms or the timeout would have fired prior to the no ack failure. I recall you having issues with the wait code since Mal wrote it. We spent quite some time trying to reproduce what you saw and never did. Seems like that is still an issue for you. I don't recall how it looked in the logs but it might be useful to post a snippet so others know what to look for. Are you saying that elongating the timeout timer causes the problem to occur?
Thanks,
Greg
On Dec 7, 2012, at 4:44 AM, D. Scott Miller wrote:
I assume that you changed the timeout to 10000 as Greg suggested. If you can, please be on the lookout for messages received right at the end of the timeout threshold. That was a symptom I noted...the longer I set the timeout, the longer it took for some responses to come back. If you see the same behavior, that would suggest that there's a problem with OZW's event/wait mechanics. If you don't see it, then I must have been dreaming!
I was testing during start-up of the open-zwave (r585-588) and i noticed
that my sensors (sleeping devices) are always (tried to be) polled. This
results in time-outs (resend attempts), but that's not the major problem.
After those resend attempts, the NEXT poll requests towards none-sleeping
devices are going wrong - and results also in failed resend attempts.
Asfar i am aware off, (all or almost all?) sleeping devices have a WakeUp
commandclass, thus we should queue up the query of the device until it
wakes up.
I modified "cpp/src/command_classes/WakeUp.cpp" from:
m_awake( true ),
to:
m_awake( false ),
Then the initial value if a sleeping device is awake, will be "false". See
the attached OZW_Log.txt* files for the pre and post patch. All looks to be
working perfectly now (during start-up). Can this patch be applied in the
main code line?
The latest version of the library has changes to address this thread, including the sleeping nodes behavior. The retransmit timer has been increased to 40 seconds. The No Ack (or any other failure return from Send Data) will be treated as a final conclusion to that particular transaction. A No Operation will be sent to nodes during discovery that will help determine if they are alive or asleep. This patch shouldn't be needed. It would be good to get some feedback on these changes. There are some other things that can be done to improve the reliability of the message handling of the library. However before continuing it would be nice to know how the current software is working.
> I was testing during start-up of the open-zwave (r585-588) and i noticed that my sensors (sleeping devices) are always (tried to be) polled. This results in time-outs (resend attempts), but that's not the major problem. After those resend attempts, the NEXT poll requests towards none-sleeping devices are going wrong - and results also in failed resend attempts.
> Asfar i am aware off, (all or almost all?) sleeping devices have a WakeUp commandclass, thus we should queue up the query of the device until it wakes up.
> I modified "cpp/src/command_classes/WakeUp.cpp" from:
> m_awake( true ),
> to:
> m_awake( false ),
> Then the initial value if a sleeping device is awake, will be "false". See the attached OZW_Log.txt* files for the pre and post patch. All looks to be working perfectly now (during start-up). Can this patch be applied in the main code line?
> Regards,
> Alex
> 2012/12/8 utz <utz.roe...@gmail.com>
> I set it to 10000. I will look for the things you point out and post any findings here.