OT Multi-cassting ff03::1 vs ff002::1

264 views
Skip to first unread message

Michael Simpson

unread,
Oct 5, 2021, 11:49:58 PM10/5/21
to openthread-users
I have a OT network with OTBR and a number of REEDs
On startup, my REEDs multi-cast to FF03::1 to find and "logon" to my hostapp running on the Linux platform along with the OTBR.
When the host app responds, my REED learns it's MLEID address, and from then everything switches to uni-casting.
When I run tcpdump on my Linus platform I can see the raw UDP messsages
I have found I have a problem where my hostapp cannot receive multicast messages from some REEDs while others work fine.
But tcpdump sees them all.
My hostapp uses Californium for CoAP, so I stopped running it and ran netcat -u -6 -l 5683 so I could see if I can pick up these messages, but it can't see the same ones my hostapp can't see.
I tried switching to multi-casting on FF02::1 and I found netcat could receive my all multi-cast messages.
Can you please explain why multi-casting of ff02::1 works on all REEDs and ff03::1 only works on some?
I understand ff02::1 is link local scope whereas  ff03::1 is mesh local scope, so I thought I needed Mesh local to receive messages from a REED acting as a child which is routed through another REED acting as a router to get to my OTBR. Please confirm or correct this.
If I should be using FF03::1, is there a setting in the OTBR I need to adjust so my hostapp and netcat can see all the FF03::1 multi-casts from all REEDs.


Michael Simpson

unread,
Oct 5, 2021, 11:51:05 PM10/5/21
to openthread-users
Sorry typo in the title - should read ff02::1

Jonathan Hui

unread,
Oct 6, 2021, 3:24:33 PM10/6/21
to Michael Simpson, openthread-users
It's not clear to me why tcpdump would see all the messages when the application will only receive some.

Can you provide a pcap file for more visibility?

--
Jonathan Hui



--
You received this message because you are subscribed to the Google Groups "openthread-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openthread-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openthread-users/0298eb60-d704-491e-9601-6cd477fea6ecn%40googlegroups.com.

Michael Simpson

unread,
Oct 6, 2021, 6:45:16 PM10/6/21
to openthread-users
Hi Jonathan
Thanks for your help.
Please see attached:

goodreed.pcap which shows only one REED which communicates normally
badreed.pcap which shows only one REED which my host app and netcat do not see

.txt files also attached showing decode in HEX format

In goodreed, the first packet is my REED multicast CoAP request to ff03::1, followed by my hostapp response.
After this we see birectional unicast CoAP requests and responses.
10:40:48.692179 IP6 fdde:ad00::7695:70cf:a63a:895a.5683 > ff03::1.5683: UDP, length 44
    0x0000:  6000 0000 0034 1140 fdde ad00 0000 0000  `....4.@........
    0x0010:  7695 70cf a63a 895a ff03 0000 0000 0000  v.p..:.Z........
    0x0020:  0000 0000 0000 0001 1633 1633 0034 1196  .........3.3.4..
    0x0030:  4202 47cb 790e b56c 6f67 6f6e ff72 6562  B.G.y..logon.reb
    0x0040:  6f6f 743d 3126 6575 6936 343d 4430 4346  oot=1&eui64=D0CF
    0x0050:  3545 4646 4645 3837 3445 3738            5EFFFE874E78
10:40:48.708907 IP6 Khadas.5683 > fdde:ad00::7695:70cf:a63a:895a.5683: UDP, length 6
    0x0000:  600c 4f7f 000e 1140 fdde ad00 0000 0000  `.O....@........
    0x0010:  daf2 97e7 1979 cace fdde ad00 0000 0000  .....y..........
    0x0020:  7695 70cf a63a 895a 1633 1633 000e ec73  v.p..:.Z.3.3...s
    0x0030:  6243 47cb 790e                           bCG.y.


In badreed, we see my REED multicast CoAP request to ff03::1, but no response from my host app, so it retries
10:44:04.245192 IP6 fdde:ad00::7655:b9f0:8d91:1cd4.5683 > ff03::1.5683: UDP, length 44
    0x0000:  6000 0000 0034 1140 fdde ad00 0000 0000  `....4.@........
    0x0010:  7655 b9f0 8d91 1cd4 ff03 0000 0000 0000  vU..............
    0x0020:  0000 0000 0000 0001 1633 1633 0034 d0ad  .........3.3.4..
    0x0030:  4202 5403 3118 b56c 6f67 6f6e ff72 6562  B.T.1..logon.reb
    0x0040:  6f6f 743d 3126 6575 6936 343d 3930 4644  oot=1&eui64=90FD
    0x0050:  3946 4646 4645 3742 3546 3532            9FFFFE7B5F52

The two logon CoAP multicast requests look pretty much the same to me.

On Linux platform where BorderRouter and hostapp run:
state
leader

> ipaddr
fdde:ad00:0:0:0:ff:fe00:fc00
fdde:ad00:0:0:0:ff:fe00:1000
fdde:ad00:0:0:daf2:97e7:1979:cace
fe80:0:0:0:5083:d1ac:22d0:5040
Done
> ipaddr mleid
fdde:ad00:0:0:daf2:97e7:1979:cace
Done
>

GoodREED
ipaddr
fdde:ad00:0:0:0:ff:fe00:1005
fdde:ad00:0:0:7695:70cf:a63a:895a
fe80:0:0:0:825:f9c6:6de7:d47a

ipaddr mleid
fdde:ad00:0:0:7695:70cf:a63a:895a

Multicast request Source address:             fdde ad00 0000 0000 7695 70cf a63a 895a 
Muticast request Destination Address:     ff03 0000 0000 0000 0000 0000 0000 0001

Unicast response Source address:              fdde ad00 0000 0000 daf2 97e7 1979 cace
Unicast response Destination address:      fdde ad00 0000 0000 7695 70cf a63a 895a

CBadREED
State
child

ipaddr
fdde:ad00:beef:0:0:ff:fe00:1006
fdde:ad00:beef:0:7655:b9f0:8d91:1cd4
fe80:0:0:0:acce:bc65:be79:1630

ipaddr mleid
fdde:ad00:beef:0:7655:b9f0:8d91:1cd4

Multicast request Source address:            fdde ad00 0000 0000 7655 b9f0 8d91 1cd4 
Muticast request Destination Address:    ff03 0000 0000 0000 0000 0000 0000 0001

No response

The difference I can see is that:
  • The good REED is a router, whereas the bad REED is a child
  • TCPDump shows the bad REED mleid address does not completely match the source address see marked in green and red. I don't understand this.
  • The Border Router and the good REED share common MLEID up to fdde ad00 0000 0000, whereas the bad REED only shares MLEID up to fdde:ad00
Just a reminder, I can substitute my hostapp for netcat as follows and see the logon request from the good REED but not the bad one
khadas@Khadas:~$ sudo netcat -u -6 -l 5683
B▒▒▒▒▒logon▒reboot=1&eui64=D0CF5EFFFE874E78

Thanks
Michael
badreed.pcap
goodreed.pcap
goodreed.txt
badreed.txt

Jonathan Hui

unread,
Oct 6, 2021, 6:57:57 PM10/6/21
to Michael Simpson, openthread-users
The dropped packets are due to an invalid UDP checksum.

I presume the invalid UDP checksum is due to the mismatch in the mesh-local IPv6 address that you pointed out.

Can you double check that the Mesh Local Prefix in the Active Operational Dataset matches on both the "good" and "bad" REEDs? A mismatch in the Mesh Local Prefix can cause these kinds of failures.

--
Jonathan Hui



Michael Simpson

unread,
Nov 7, 2021, 9:16:53 PM11/7/21
to Jonathan Hui, openthread-users
Hi Jonathan

Following on from your email below, I thought I had this sorted but I really don't understand what I am seeing.

I installed Wireshark so I could check the packets I captured using tcpdump as you advised "
The dropped packets are due to an invalid UDP checksum"

I examined the first login CoAP request comparing both the good and bad pcap files and under UDP they show the same as follows. They both have the same checksum 0x1196 and state "checksum unverified"


Can you please advise how you diagnozed the checksum error.

I also compared the CoAP section and apart from the Message ID and Token, they look the same.


You asked me to double check that the Mesh Local Prefix in the Active Operational Dataset matches on both the "good" and "bad" REEDs?

My Border-Router RCP MLEID is:
> ipaddr mleid
fdde:ad00:0:0:758d:1e3a:ab54:d5d1
Done

> dataset active
Active Timestamp: 1
Channel: 15
Channel Mask: 0x07fff800
Ext PAN ID: 1111111122222222
Mesh Local Prefix: fdde:ad00:0:0::/64
Master Key: f34dd4690e631a68b0611565036cc853
Network Name: AGS-FCC-OT-1
PAN ID: 0x1234
PSKc: dc260358c2f24888fd9110ac0850b27d
Security Policy: 0, onrcb
Done

I only give my REEDs a partial dataset consisting of the networkname and the masterkey (now networkkey)

I found that the older Silab SDK my working REED the MLEID is:
> ipaddr mleid
fdde:ad00:0:0:1133:7529:9efa:f41e
Done

After connection the dataset is completed and reports:
> dataset active
Active Timestamp: 1
Channel: 15
Channel Mask: 0x07fff800
Ext PAN ID: 1111111122222222
Mesh Local Prefix: fdde:ad00:0:0::/64
Master Key: f34dd4690e631a68b0611565036cc853
Network Name: AGS-FCC-OT-1
PAN ID: 0x1234
PSKc: dc260358c2f24888fd9110ac0850b27d
Security Policy: 0, onrcb
Done

So the first 64 bits of the MLEID matches fdde:ad00:0:0

On the new Silab SDK my problem REED the MLEID is:
> ipaddr mleid
fdde:ad00:beef:0:1c37:1851:cef8:a7a5
Done

So only the first 32 bits of the MLEID matches fdde:ad00

Although the REED state changes to child, it doesn't get a complete dataset
> dataset active
Network Key: f34dd4690e631a68b0611565036cc853
Network Name: AGS-FCC-OT-1
Done

I tried adding the mesh link to my partial dataset before I activate it as follows:
//    Set Mesh Local Prefix
    uint8_t extMeshLocalPrefix[OT_MESH_LOCAL_PREFIX_SIZE] = {0xFD, 0xDE, 0xAD, 0x00, 0x00, 0x00, 0x00, 0x00};
    memcpy(aDataset.mMeshLocalPrefix.m8, extMeshLocalPrefix[0], sizeof(aDataset.mMeshLocalPrefix.m8));
    aDataset.mComponents.mIsMeshLocalPrefixPresent = true;

Now my REED MLEID changes to:
> ipaddr mleid
fdde:ad00:0:0:1c37:1851:cef8:a7a5
Done

So now the first 64 bits of the MLEID matches fdde:ad00

Now my REED can logon to my controller app running on the Linux platform of the OTBR and get response back.

But the active dataset still does not get completed.
> dataset active
Mesh Local Prefix: fdde:ad00:0:0::/64
Network Key: f34dd4690e631a68b0611565036cc853
Network Name: AGS-FCC-OT-1
Done


I set the meshlink address to fdde:ad00:0:0::/64 to match my OTBR / RCP.
I am forming my Thread network using the following script:
sudo ot-ctl reset
sleep 4
sudo ot-ctl dataset init new
sleep 2
sudo ot-ctl dataset channel 15
sudo ot-ctl dataset meshlocalprefix fdde:ad00::
sudo ot-ctl dataset panid 0x1234
sudo ot-ctl dataset extpanid 1111111122222222
sudo ot-ctl dataset networkname AGS-FCC-OT-1
sudo ot-ctl dataset masterkey f34dd4690e631a68b0611565036cc853
sudo ot-ctl dataset commit active
sleep 1
sudo ot-ctl ifconfig up
sleep 3
sudo ot-ctl thread start
sleep 10
sudo ot-ctl state

I don't understand the requirements for the meshlocalprefix. Should it be set in the dataset for the OTBR which forms the network and the REEDs with partial datasets which join the network?
Or should it be left unset and get passed to the REEDs when they connect to the network which I form on the OTBR?

otbr-web does not provide any configuration for the meshlocalprefix but if it forms the network it reports as below in red.
> dataset active
Active Timestamp: 0
Channel: 15
Channel Mask: 0x07fff800
Ext PAN ID: 1111111122222222
Mesh Local Prefix: fda5:2878:1b6e:3e0d::/64
Master Key: f34dd4690e631a68b0611565036cc853
Network Name: AGS-FCC-OT-1
PAN ID: 0x1234
PSKc: 16e480b755b5261b9419e46b154fc045
Security Policy: 672, onrcb
Done

Kind regards
Michael
badreed.pcap
goodreed.pcap

Gabe Kassel

unread,
Nov 7, 2021, 9:31:19 PM11/7/21
to Michael Simpson, Jonathan Hui, openthread-users
I haven’t followed the longer discussion, but is the activetimestamp higher on the node with a complete dataset than the new REED where you’re only setting network name and network key?

I suspect you have the default timestamp of “1” on both nodes causing a new sync not to happen. 

--
_______

Gabe Kassel
Technology Strategist | Office of the CTO

m 770.490.0431
e  ga...@eero.com

Michael Simpson

unread,
Nov 7, 2021, 11:16:08 PM11/7/21
to Gabe Kassel, Jonathan Hui, openthread-users
Hi Gabe

You are correct, I don't set this and presume it goes to a default value of 1.

I have no idea what this field is for, can you point me to a reference which explains what it does and how I should use it. The API only defines it as uint64_t.

It always seems to have worked ok with the default value of 1 and still does on Silab SDK 3.1.2, but this may be dumb luck on my part.

I would like confirmation on what whether I should set
Mesh Local Prefix, or whether I can leave it to default. Pros & Cons.

Thanks
Michael

Jonathan Hui

unread,
Nov 9, 2021, 5:27:37 PM11/9/21
to Michael Simpson, Gabe Kassel, openthread-users
To enable UDP checksum validation in wireshark: Preferences > Protocols > UDP > Validate the UDP checksum if possible

It seems you are running and old version of OTBR that is not properly forming a complete Active Operational Dataset (the Active Timestamp is being set to 0). A newer version of OTBR should use the new `datsaet init new` method (see src/web/web-service/wpan_service.cpp#L626).

The Active Timestamp is what's used to know which Active Operational Dataset is "newer". If they have the same value, then there is no attempt to synchronize the Active Operational Datasets. In your scenario, since OTBR and the devices all have the same Active Timestamp value of 0, then there is no attempt to synchronize.

--
Jonathan Hui



Reply all
Reply to author
Forward
0 new messages