ThreadStart Command -Reg

230 views
Skip to first unread message

velag...@pathpartnertech.com

unread,
Apr 28, 2017, 5:03:28 AM4/28/17
to openthread-users
Hi all,

I have tried porting openthread on new radio. I am facing a challenge during thread start. When I gave "thread start" device is enabling receiver and trying to receive data but not transmitting any frame.

As per my understanding of openthread, when a "thread start" command is given , the device tries to listen to the nodes in the network and after timeout it will transmit a broadcast message , if there is no parent in the network the current node will become leader. (please correct me if I am wrong). But in my case the device is trying to receive and not transmitting any frame. What might be the issue?


Thanks for response,

Jayaram.

velag...@pathpartnertech.com

unread,
Apr 29, 2017, 5:16:57 AM4/29/17
to openthread-users
And the top level issue is, if I start two nodes and do thread start both the devices are being shown as leader.

xiaom

unread,
Apr 29, 2017, 6:04:08 AM4/29/17
to openthread-users
Hi Jayaram,

“Thread start” command will make node boradcast “MLE parent request” message to search an available parent for attaching the thread network (node will go to transmit state from receiving state, after transmission done, go back to receiving state).
Requiring that all configured network parameters (e.g. channel, panid, extpanid, network name, master key…) must keep the same between the two nodes for a successful attachment.

If no “MLE parent response” is received, this node will form a new thread network by itself as a Leader.

For above case: two nodes are being shown both as Leader, please have a check:
1. All configured network parameters in these two nodes are same, special for channel, panid, extpanid, master key.
2. Check whether or not the "otPlatRadioReceive()" function really enables the radio receiver. If not, the other node cannot receive MLE parent request message, then will not response MLE parent response message.
3. Check whether or not the "otPlatRadioTransmit()" function really sends out one message. (sometimes if the tx power is too low, will also influence other nodes' reception)
4. Whether or not there is some CCA failure or filter mechanism in your platform that would cause a failure MLE parent request message reception.

And please also see Porting Guide for more information. Hope that will be helpful for you.

Thanks,
Xiao

mad...@pathpartnertech.com

unread,
Apr 29, 2017, 9:45:19 AM4/29/17
to openthread-users
Thanks for your response xiaom,

1. I am giving the same network parameters for both the nodes.
2. Receive is really enabling the radio receiver.
3. CPU is not entering in to otPlatRadioTransmit()

I have traversed through cli thread start command in the src/cli/cli.cpp. I am not sure where the radio transmit is being initiated. Could you help me to find out where exactly radio transmit will be called after thread start.

Regards,
Jayaram

xiaom

unread,
Apr 29, 2017, 11:55:38 AM4/29/17
to openthread-users
Hi Jayaram,

Radio transmit function is called by upper MAC layer. Let's take the MLE parent request message transmission as an example, it will go through below process:

Mle::SendParentRequest()

Mle::SendMessage()

Udp::SendDatagram()

Ip6.SendDatagram()

Ip6.EnqueueDatagram()

SendQueueTask.Post()

Ip6::HandleSendQueue()

Ip6::HandleDatagram()

netif->SendMessage()

MeshForwarder.SendMessage()

mNetif.GetMac().SendFrameRequest()

StartCsmaBackoff()

Mac::HandleBeginTransmit()

otPlatRadioTransmit()


Generally speaking, when transmitting a message should be decided by upper layer, for PHY layer, we just need to implement the correct otPlatRadioTransmit() function per the definition in the radio header file(include/openthread/platform/radio.h).


Could you place a break point in Mac::HandleBeginTransmit() during debugging to track whether radio transmit will be called in your platform? 


Thanks

Xiao

velag...@pathpartnertech.com

unread,
May 5, 2017, 4:20:58 AM5/5/17
to openthread-users

Thank you xiaom. Sorry for late response. I got the issue. Now I am able to see leader & child and also able to do Active scan & ping. But the behavior is inconsistent. Sometimes it is pinging & scanning and sometimes it is not. I have tried to print the Log info and found that there is a frame drop in MAC layer. Here are few logs. Could you help me out.

[1;36m[0000272830] [1;33m -MAC-----: Error Duplicated: Dropping received frame
[1;36m[0000272847] [1;33m -MAC-----: Error Duplicated: Dropping received frame
[1;36m[0000272863] [1;33m -MAC-----: Error Duplicated: Dropping received frame
[1;36m[0000275015] [1;32m -ICMP----: Received Echo Request
[1;36m[0000275015] [1;32m -ICMP----: Sent Echo Reply (seq = 6)
[1;36m[0000275030] [1;33m -MAC-----: Error Duplicated: Dropping received frame
[1;36m[0000275049] [1;33m -MAC-----: Error Duplicated: Dropping received frame
[1;36m[0000275074] [1;33m -MAC-----: Error Duplicated: Dropping received frame
[1;36m[0000275111] [1;33m -MAC-----: Error Duplicated: Dropping received frame
[1;36m[0000282084] [1;32m -ICMP----: Sent echo request: (seq = 3)

Regards,
jayaram.

xiaom

unread,
May 5, 2017, 11:34:43 PM5/5/17
to openthread-users
Hi Jayaram,

From this log, it seems that the sender does not receive the ACK frame from the peer. For a unicast frame transmission, sender needs to wait the ACK from receiver, then will be able to do the next operation. Otherwise, sender will retransmit the frame up to 4 times with the same sequence number.

Please check whether or not the receiver responses an ACK normally after receiving a unicast frame (as far as I know, currently most SoC platform should support to reply an ACK by hardware automatically). And for another point we can check is that radio should go back to receiving state once after finishing a transmission or reception to prepare for the incoming frames.

Thanks,
Xiao

velag...@pathpartnertech.com

unread,
May 5, 2017, 11:44:56 PM5/5/17
to openthread-users
Thank you xiaom, I will check with the ack frame, and about receiving state. And I will get back to you .

Regards,
jayaram

velag...@pathpartnertech.com

unread,
May 6, 2017, 4:59:57 AM5/6/17
to openthread-users
 I got the issue. Receiver is not being re enabled after any frame reception sometimes. Now the issue is solved by adding enable after every reception .
  Thank you xiaom.

Regards,
jayaram

xiaom

unread,
May 6, 2017, 11:15:25 AM5/6/17
to openthread-users
Glad to hear that, thanks!

velag...@pathpartnertech.com

unread,
May 8, 2017, 12:06:24 PM5/8/17
to openthread-users
Hi xiaom,

I am getting few issues in ping. I am testing with two nodes.
 1. Able to ping between nodes when the thread started. After when child become router, ping is happening very rarely.
 2. And  when I do continuous ping between the two nodes there is a hang.

What might be the reason.

Regards,
jayaram.

xiaom

unread,
May 9, 2017, 9:03:54 AM5/9/17
to openthread-users
Hi Jayaram,

May I know which IPv6 address you are using for ping? If that's the Mesh-Rloc address(such as fd00:0db8::ff:fe00:5000), when Child becomes a Router, the last two bytes (rloc16) will change, that would cause the ping failure. You can use Mesh-EID address to ping(such as fd00:0db8::d359:a1e2:960d:900d), it will not change along with device's role changed.

For hang issue, it's a little hard to say the specific reason. Generally speaking, it's useful to use JLINK to track the function call in stack. For example, if the hang issue is easy to reproduce, you can connect the JLINK to your device and open the GDBServer, when the device hangs, you can use "bt" command to track the function calls to check which function causes the device enters the HardFault handler.

For ping operation, if that fails, might be caused by no available buffer as well, you can add a interval parameter for ping to check whether or not that is caused due to fast speed, For example:

ping <dst-ipv6-address> <payload length> <times> <interval>

Thanks,
Xiao

velag...@pathpartnertech.com

unread,
May 9, 2017, 9:48:05 AM5/9/17
to openthread-users
I am using mesh EID address for ping. And I have done ping as your suggestion. These are the logs (on the router side).

> ping fdde:ad00:beef:0:13b2:de93:dfc:6ad6 8 50 1
2017-05-09 19:13:23,897 - INFO #  ping fdde:ad00:beef:0:13b2:de93:dfc:6ad6 8 50 1
> 2017-05-09 19:13:25,915 - INFO #  16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=3 hlim=64 time=11ms
2017-05-09 19:13:34,016 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=11 hlim=64 time=108ms
2017-05-09 19:13:36,062 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=13 hlim=64 time=154ms
2017-05-09 19:13:36,933 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=14 hlim=64 time=29ms
sta2017-05-09 19:13:38,935 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=16 hlim=64 time=30ms
 2017-05-09 19:13:39,936 - INFO #  16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=17 hlim=64 time=29ms
2017-05-09 19:13:40,918 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=18 hlim=64 time=11ms
2017-05-09 19:13:41,919 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=19 hlim=64 time=11ms
2017-05-09 19:13:42,918 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=20 hlim=64 time=11ms
2017-05-09 19:13:44,918 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=22 hlim=64 time=11ms
2017-05-09 19:13:45,936 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=23 hlim=64 time=29ms
2017-05-09 19:13:48,005 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=25 hlim=64 time=98ms
2017-05-09 19:13:48,919 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=26 hlim=64 time=11ms
2017-05-09 19:13:49,937 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=27 hlim=64 time=30ms
2017-05-09 19:13:52,954 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=30 hlim=64 time=46ms
2017-05-09 19:13:53,918 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=31 hlim=64 time=11ms
2017-05-09 19:13:54,918 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=32 hlim=64 time=11ms
2017-05-09 19:13:55,918 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=33 hlim=64 time=11ms
2017-05-09 19:14:00,041 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=37 hlim=64 time=132ms
2017-05-09 19:14:00,918 - INFO # 16 bytes from fdde:ad00:beef:0:13b2:de93:dfc:6ad6: icmp_seq=38 hlim=64 time=11ms

I have observed some sequence numbers are missing.

xiaom

unread,
May 9, 2017, 10:15:20 PM5/9/17
to openthread-users
Recall one case for missing continuous ping reply that I encountered previously is that actually receiver has received the ping request, however the next ping request arrives while it doesn't get the chance to pass the ping request to upper layer for processing, here the previous ping request might be washed away.

What I do for this case is to add an interrupt context at xxxRadioProcess() which processes the receiving and pass the frame to upper layer. That would disable interrupt temporarily and enable the interrupt again until finishing the current frame processing. Maybe you can have a check for this point.

Thanks,
Xiao

velag...@pathpartnertech.com

unread,
May 10, 2017, 12:24:44 AM5/10/17
to openthre...@googlegroups.com
Hi xiaom,

Thanks for recalling the issue you faced. Even I am controlling the radioProcess in the same way. In the log messages I observed that frame count value is not being incremented and I am Frame Duplicated Error. Also ICMPV6 Error, Cache entry removed. And that is the case where I am loosing ping and finally hang at some point.

Here are the few logs:
 [DEBG]-MAC-----: Received from short address 8401
^[[1;36m[0000084062]^[[0m [DEBG]-MAC-----: Frame counter 0
^[[1;36m[0000084062]^[[1;33m [WARN]-MAC-----: Error Duplicated: Dropping received frame
^[[1;36m[0000084063]^[[0m ==============================[RX len=054]===============================
^[[1;36m[0000084064]^[[0m | 69 98 93 CA DE 00 84 01 | 84 0D 00 00 00 00 01 D5 | i..J^..........U
^[[1;36m[0000084065]^[[0m | 2F 9D B1 64 EF F5 ED 83 | 92 50 B5 20 A3 28 C9 C5 | /.1doum..P5 #(IE
^[[1;36m[0000084065]^[[0m | 5B 3E B5 DC 96 C3 B9 3C | E4 E7 AC 68 70 3E A8 97 | [>5\.C9<dg,hp>(.
^[[1;36m[0000084066]^[[0m | E7 08 22 9C 6C 2B .. .. | .. .. .. .. .. .. .. .. | g.".l+..........
^[[1;36m[0000084066]^[[0m ------------------------------------------------------------------------
^[[1;36m[0000084067]^[[0m [DEBG]-MAC-----: ack timer start
^[[1;36m[0000084083]^[[0m [DEBG]-MAC-----: ack timer fired
^[[1;36m[0000084083]^[[0m ============================[NO ACK len=016]=============================
^[[1;36m[0000084084]^[[0m | 69 98 14 CA DE 01 84 00 | 84 0D 02 00 00 00 01 0C | i..J^...........
^[[1;36m[0000084084]^[[0m ------------------------------------------------------------------------
^[[1;36m[0000084105]^[[0m [DEBG]-MAC-----: ack timer start
^[[1;36m[0000084121]^[[0m [DEBG]-MAC-----: ack timer fired
                                                                                          

^[[1;36m[0000096972]^[[1;32m [INFO]-ICMP----: Sent ICMPv6 Error
^[[1;36m[0000096972]^[[1;32m [INFO]-ARP-----: cache entry removed!

velag...@pathpartnertech.com

unread,
May 10, 2017, 11:29:47 AM5/10/17
to openthread-users
The hang issue i have resolved. The dynamic memory allocations at some leading to HardFault Handler. I have taken a static array and solved it.
But, the ping issue still persists. When the device become router ping is not working with EID ipv6 address. But with the changed Mesh Loc address I am able to ping.

What could be the issue.

Regards,
jayaram

xiaom

unread,
May 11, 2017, 2:53:04 AM5/11/17
to openthread-users
Hi Jayaram,

From above log, seems that the sender does not receive ACK, could you check whether or not each ACK corresponding to ping request has beed received? You can use wireshark to capture the ping traffic between two nodes.

I try to capture the ping traffic with two efr32 dev boards, one is Leader and another is Child first and then become to Router by changing the mode to "rsdn", ping the mesh EID respectively. Also attached a pcap file for your reference.

Thanks,
Xiao
test-ping.pcapng

velag...@pathpartnertech.com

unread,
May 11, 2017, 8:53:33 PM5/11/17
to openthread-users
Thanks for sharing your capture info. May I know how to capture the communication between two cli nodes in wireshark. In ncp we have pcap, but in cli how can I do it?

Regards,
jayaram.

xiaom

unread,
May 12, 2017, 4:31:29 PM5/12/17
to openthread-users
The sniffer which I'm using is one internal version developed for capturing the thread packets based on KW24D USB Dongle.

You can refer to below link about how to enable sniffer provided by NXP.

Thanks,
Xiao

velag...@pathpartnertech.com

unread,
May 21, 2017, 1:46:30 AM5/21/17
to openthread-users
Hi xiaom, Thank you. I am copying sequence number in wrong place in ack frame. Now I am able to ping to router with EID address. 

There is one more challenge. Either leader or router is going in to sleep state for more time ~30secs. What could be the issue??

Regards,
jayaram

xiaom

unread,
May 22, 2017, 11:58:56 PM5/22/17
to openthread-users
Wether this issue is due to "otPlatRadioReceive()" does not work in your case? According to the state transition diagram defined in the radio.h, "otPlatRadioReceive" (enabling the receiver) will trigger the state transition from Sleep to Receive immediately.

Thanks,
Xiao

velag...@pathpartnertech.com

unread,
Jun 5, 2017, 8:47:15 AM6/5/17
to openthread-users
Hi xiaom,

Sleep issue got solved. The radio is going to sleep state after successful/non successful reception of frame. I made it re-enable manually. Now, all are working fine.

Thanks a lot.

Regards,
jayaram.
Reply all
Reply to author
Forward
0 new messages