OT Border Router Form - On Mesh Prefix

1,322 views
Skip to first unread message

Michael Simpson

unread,
Dec 15, 2020, 2:00:44 PM12/15/20
to openthread-users
Hi
I am a bit confused about the on mesh prefix and mesh local prefix. Could you please give me a brief summary on what these are for, or point me to a document on these.

On my REEDs I can set the meshlocalprefix in my dataset  

In the form tab of the OTBR-WEB, I have set my on mesh prefix to fd11:22::

How do I set the Mesh Local Prefix on the border router.
Thanks


Jonathan Hui

unread,
Dec 15, 2020, 6:26:18 PM12/15/20
to Michael Simpson, openthread-users
Here is a brief summary on mesh-local vs. on-mesh prefixes:
  1. Mesh-Local Prefix
    1. Only scoped for communication within a given Thread partition.
    2. Only configurable via the Operational Dataset.
  2. On-Mesh Prefix
    1. Intended for IPv6 prefixes that have scope beyond the Thread partition. This may be a globally routable IPv6 prefix or a unique local prefix that is only routable within a given administrative domain.
    2. Only configurable via the Network Data.
The mesh-local prefix is configured as part of forming the Thread network. In general, you should not need to configure this manually. It is better to have the mesh-local prefix randomly generated to minimize the chance for collisions.

--
Jonathan Hui



--
You received this message because you are subscribed to the Google Groups "openthread-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openthread-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openthread-users/4219c8cc-294f-4e37-8f5d-319e548d3fdbn%40googlegroups.com.

Michael Simpson

unread,
Dec 16, 2020, 3:50:51 PM12/16/20
to openthread-users
Hi Jonathan
Thanks for the explanation and your advice.

On startup, my REEDs send a Multicast "logon" message which only my host application will answer to exchange IP addresses. From then on I only unicast messages.

I am seeing strange things happen occasionally.
eg sometimes my Border Router/RCP and my REEDs start up with IP addresses which are completely different inclining the MS 16 bits.

Also last night my RCP and REED were running as Leader on the same ALOC. Is this ok? It doesn't sound right to me. My BR could receive multi-cast addresses but my Linux host app on BR could not reply for some reason. I tried restarting my REED a few times and in the end I had to power cycle my RCP to "fix" it. I am not doing "Anycasting".
REED
> state
state
leader
Done
> ipaddr
fdde:ad00:beef:0:0:ff:fe00:fc00
fdde:ad00:beef:0:0:ff:fe00:e000
fdde:ad00:beef:0:46ac:c08:1a1a:8da7
fe80:0:0:0:ec79:5abd:91be:afca
Done

OTBR
> state
leader
Done
> ipaddr
fdde:ad00:beef:0:0:ff:fe00:fc00
fdde:ad00:beef:0:0:ff:fe00:2000
fdde:ad00:beef:0:a1da:7199:3f61:b522
fe80:0:0:0:1040:1067:8652:978c
Done

My application is for installation in an industrial control application where it is extremely unlikely there will be other Thread devices. I thought setting each nodes Mesh local prefix to be the same might help with establishing reliable communications. Again your experienced advice valued.

In my application my REEDs only talks to my host app, they don't talk to each other application wise (except for routing of course). Is there any way (and would it be useful), to stop the REEDs becoming the Leader? My concern is if a REED starts up first as the Leader, and then my Border-Router and my other REEDs connect to the network from the Leader REED, if that REED goes down, what will happen? ie:
  1. Will my REEDs already on the network continue to talk with the BR Host  while there is no Leader  
  2. Will new REEDs not be able to connect to the network until a new Leader is established?
  3. How long should it take for a new Leader to be established?
Thanks for you ongoing help.

Jonathan Hui

unread,
Dec 16, 2020, 6:44:50 PM12/16/20
to Michael Simpson, openthread-users
On Wed, Dec 16, 2020 at 12:50 PM Michael Simpson <michae...@gmail.com> wrote:

On startup, my REEDs send a Multicast "logon" message which only my host application will answer to exchange IP addresses. From then on I only unicast messages.

I am seeing strange things happen occasionally.
eg sometimes my Border Router/RCP and my REEDs start up with IP addresses which are completely different inclining the MS 16 bits.

By "MS 16" bits, you mean values other than "fe80" and "fdde"? You should not see other IPv6 prefixes unless they have been registered by some device on the Thread network.

Also last night my RCP and REED were running as Leader on the same ALOC. Is this ok? It doesn't sound right to me. My BR could receive multi-cast addresses but my Linux host app on BR could not reply for some reason. I tried restarting my REED a few times and in the end I had to power cycle my RCP to "fix" it. I am not doing "Anycasting".
REED
> state
state
leader
Done
> ipaddr
fdde:ad00:beef:0:0:ff:fe00:fc00
fdde:ad00:beef:0:0:ff:fe00:e000
fdde:ad00:beef:0:46ac:c08:1a1a:8da7
fe80:0:0:0:ec79:5abd:91be:afca
Done

OTBR
> state
leader
Done
> ipaddr
fdde:ad00:beef:0:0:ff:fe00:fc00
fdde:ad00:beef:0:0:ff:fe00:2000
fdde:ad00:beef:0:a1da:7199:3f61:b522
fe80:0:0:0:1040:1067:8652:978c
Done

There should be exactly one leader per partition. However, it appears that the OTBR and REED have formed their own partitions - both devices are operating as leaders.

If the two devices start up at the same time, it could be that they initially form their own partitions, but they should converge to a single partition if they have 802.15.4 radio connectivity.

If they do not converge after some amount of time, it could mean that there is an issue with the 802.15.4 radio driver. If you believe you are experiencing this issue, could you provide a packet sniffer capture for further analysis? Also, what hardware platform and git commit version are you using for OpenThread?

My application is for installation in an industrial control application where it is extremely unlikely there will be other Thread devices. I thought setting each nodes Mesh local prefix to be the same might help with establishing reliable communications. Again your experienced advice valued.

Thread network configurations are encapsulated in an Operational Dataset, and includes things like master key and mesh-local prefix. All Thread devices configured for the same Thread network should have the same Operational Dataset (and thus, mesh-local prefix).

In my application my REEDs only talks to my host app, they don't talk to each other application wise (except for routing of course). Is there any way (and would it be useful), to stop the REEDs becoming the Leader?

Thread currently requires all router-capable devices to also be capable of serving as a leader.
 
My concern is if a REED starts up first as the Leader, and then my Border-Router and my other REEDs connect to the network from the Leader REED, if that REED goes down, what will happen? ie:
  1. Will my REEDs already on the network continue to talk with the BR Host  while there is no Leader  
If a Leader momentarily disconnects from the network, other devices in the Thread partition should still be able to communicate with each other. 
  1. Will new REEDs not be able to connect to the network until a new Leader is established?
New REEDs will be able to attach as end devices to other active routers. However, REEDs will not be able to upgrade to router until a the leader is active again. 
  1. How long should it take for a new Leader to be established?
Thread has a 120 second timeout before selecting a new leader.

--
Jonathan Hui
 

r...@farmjenny.com

unread,
Dec 17, 2020, 5:06:51 PM12/17/20
to openthread-users

I have a similar application – which I would describe as 'otbr-dependent', or ‘cloud-dependent’.  That is, I don’t really care if my REEDs form a network if they can’t reach the border router (and in most cases, the internet).

While I know that multiple leaders operating with the same operational dataset *should* work things out and merge their partitions, I prefer to avoid my REEDs powering up (resetting) and becoming leaders of their own networks, using whatever operational dataset they found stored in their openthread flash. Instead, I really want REEDs to 'connect' or 'reconnect' to the network using credentials they had acquired earlier (Joiner/Commissioning).

My biggest breakthrough came when I figured out the role activetimestamp plays in ‘Forming’ vs. ‘connecting’ (see below).  I have also started exclusively using the ‘dataset’ commands to configure operating parameters. Finally, you really only need the masterkey to connect to an existing Thread network (see https://github.com/openthread/openthread/issues/4036).

The otbr is the only device I program to “Form” a network.  On otbr, I explicitly specify all parameters in the operational dataset using a series of ‘sudo ot-ctl dataset  …’ commands, starting with ‘dataset new’.  Any parameters you don’t specify here get filled in with defaults and random values – I specify them all so I can bring up redundant border routers on the same network.

I DO NOT allow my REEDs to autostart at boot.  Autostart retrieves the entire last operational dataset from openthread flash, sets it as active and enables Thread.  To prevent this, in my REED initialization I set autostart_disable=true (Nordic SDK, may be different on your platform) during Thread initialization.

I also DO NOT allow my REEDs to Form their own networks.  After (re)boot, my REEDs perform the following process to ‘connect’ using only the stored masterkey:

-          pull ONLY the existing masterkey out of openthread flash using otThreadGetMasterKey()

-          create a new empty operational dataset buffer (type otOperationalDataset)

-          Copy the existing masterkey into the buffer (copy to dataset.mMasterKey.m8)

-          Set only the dataset.mComponents.mIsMasterKeyPresent flag

-          Commit the incomplete dataset using otDatasetSetActive()

-          bring up the interface with otIp6SetEnabled()

-          Start Thread using otThreadSetEnabled();

-          At this point, the REED will scan until it finds a network that accepts is masterkey

You can simulate this process using CLI dataset commands, but you have to remember to use ‘> dataset clear’ to start with an empty buffer (vs. ‘dataset new’).  I prefer to have REEDs learn everything except the masterkey from the network, but you can specify as many dataset parameters as you want (if you don’t want to scan or leave them to chance), except for ‘activetimestamp’.  Any dataset with activetimestamp set will cause the REED to become leader of its own new network partition.

Best,
Rob

Michael Simpson

unread,
Dec 17, 2020, 7:01:38 PM12/17/20
to openthre...@googlegroups.com

Thanks for this Rob.

I will do some tests along these lines.

--
You received this message because you are subscribed to a topic in the Google Groups "openthread-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openthread-users/5stsXhSoKXk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openthread-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openthread-users/ad8029ea-79ac-4478-93c3-812c3ecd8f9dn%40googlegroups.com.

Michael Simpson

unread,
Jan 17, 2021, 10:33:15 PM1/17/21
to openthread-users
Hi Rob
Hope you get this.
I tried this today and it works well except...
On startup my REED does not create its own network so if by OTBR is down and there is no Leader, it remains detached - so far so good.
Then if I bring up my Border Router, my REED attaches and the state changes to child or router - so far so good.
But then if my OTBR goes down, my REED role changes to leader.
I suspect it is somehow getting what it needs to become a leader from the connection to the network when my OTBR is up and running as the leader.
Have you found this?
Did you do anything to prevent this from occurring.
Thanks
Michael

r...@farmjenny.com

unread,
Jan 18, 2021, 1:33:24 PM1/18/21
to openthread-users
Hi Michael,
Glad to hear of your progress.

Yes, once connected, your REEDs possess all the information they need to assume the Leader role and have the *duty* to do so if they believe the initial leader (the OTBR) becomes inoperable.  Over time, I believe it is inevitable that due to momentary communications losses, interference, etc., the leader role will move from your OTBR to one of your REEDs (and from REED to REED and even back to the OTBR).  This doesn't particularly bother me (it is central to Thread's robustness, and I don't try to prevent it).  My only concern with having a network operating only with the REEDs left behind (because OTBR left) was that they might have memory of network settings (operational dataset) that were no longer correct/desired, potentially leading to a network partition that can't be merged when the OTBR comes back.  The rest of this message describes how I dealt with that concern.  Are there other reasons that you don't want your REEDs becoming leaders?

Timestamps:
I haven't always been so careful in setting my mesh prefixes, and learned that in cases where there is an existing Thread network operating with *similar* settings (say, hypothetically, a slightly different prefix), the settings that “win” depend on who has the higher activetimestamp in their operational dataset:
  • CASE 1: If the new node has a lower timestamp, it works and the new node inherits the settings from the network leader. 
  • CASE 2: If these timestamps are the same value, the connection can fail -- the new node gets stuck as a child
  • CASE 3: If the new node node has a higher activetimestamp, it will probably connect, but its operational dataset may not be correctly propagated through the network unless/until the new node becomes a leader*
*Note:  I'm not sure what the spec says about this case -- the behavior is probably related to the fact that the new node is using a mis-matched prefix and really doesn't have full network connectivity.

As discussed previously, my REEDs don't form networks (they can only connect to an existing network). Upon connection, my REEDs inherit the activetimestamp of the network they just joined. If the OTBR goes away, this timestamp would remain the same indefinitely. I wanted to ensure that if my OTBR rebooted, it would always use a higher activetimestamp (preventing the same/lower cases above) in Forming a network, ensuring it can connect. To handle this, I have the OTBR use its unix timestamp when Forming a network. This ensures the new node always falls into CASE 3 above, and the OTBR will connect to the network even if some of its operational dataset (like the prefix) is mis-matched.  At that point, we can't be 100 percent sure full connectivity has been restored, if for example it is using a different prefix. That's where the next step comes in. . .

Forcing Dataset Propagation when OTBR is not the Leader
Since the OTBR has no way of seizing the Leader role, I needed another way of resolving any discrepancies in the operational dataset.  This can be accomplished using Commissioner, by simply incrementing the timestamp.  Just after the OTBR forms the network, I add the border router prefix (ot-ctl prefix add ..., then ot-clt netdata register), then wait 5 seconds before starting the process below:
  • sudo ot-ctl commissioner start # petition then check state to confirm we became commissioner
  • sudo ot-ctl dataset init active  # Set copy the OTBR's exiting active dataset to the buffer
  • set  "$pendingTimestamp" and  "$activeTimestamp" to the current unix time
  • sudo ot-ctl dataset pendingtimestamp "$pendingTimestamp"
  • sudo ot-ctl dataset activetimestamp  "$activeTimestamp" # probably an unnecessary step
  • sudo ot-ctl dataset delay 5000
  • sudo ot-ctl dataset commit pending
  • sleep 5
  • sudo ot-ctl commissioner stop
As you mentioned in another post, messing with the operational dataset may impact flash wear.  While this isn't an issue for my RCP-based OTBR (where I understand these settings live in the unix filesystem), I believe all of the REEDs are going to write their flash anytime the operational dataset changes (including each time the activetimestamp increments).

Hope this helps,
Rob


Jonathan Hui

unread,
Jan 18, 2021, 7:55:20 PM1/18/21
to r...@farmjenny.com, openthread-users
On Mon, Jan 18, 2021 at 10:33 AM r...@farmjenny.com <r...@farmjenny.com> wrote:

Yes, once connected, your REEDs possess all the information they need to assume the Leader role and have the *duty* to do so if they believe the initial leader (the OTBR) becomes inoperable.  Over time, I believe it is inevitable that due to momentary communications losses, interference, etc., the leader role will move from your OTBR to one of your REEDs (and from REED to REED and even back to the OTBR).  This doesn't particularly bother me (it is central to Thread's robustness, and I don't try to prevent it).  My only concern with having a network operating only with the REEDs left behind (because OTBR left) was that they might have memory of network settings (operational dataset) that were no longer correct/desired, potentially leading to a network partition that can't be merged when the OTBR comes back.  The rest of this message describes how I dealt with that concern.  Are there other reasons that you don't want your REEDs becoming leaders?

Timestamps:
I haven't always been so careful in setting my mesh prefixes, and learned that in cases where there is an existing Thread network operating with *similar* settings (say, hypothetically, a slightly different prefix), the settings that “win” depend on who has the higher activetimestamp in their operational dataset:
  • CASE 1: If the new node has a lower timestamp, it works and the new node inherits the settings from the network leader. 
  • CASE 2: If these timestamps are the same value, the connection can fail -- the new node gets stuck as a child
  • CASE 3: If the new node node has a higher activetimestamp, it will probably connect, but its operational dataset may not be correctly propagated through the network unless/until the new node becomes a leader*
*Note:  I'm not sure what the spec says about this case -- the behavior is probably related to the fact that the new node is using a mis-matched prefix and really doesn't have full network connectivity.

Thread devices should always adopt the Operational Dataset that has the larger Active Timestamp - this should occur in both case 1 and case 3 above. In case 1, the attaching device should retrieve the latest Operational Dataset from its parent after attaching. In case 3, the attaching device should use the Operational Dataset from its parent to attach and then notify the leader of its "newer" Operational Dataset. The attaching device should not need to become a leader before providing its "newer" Operational Dataset.
 
As discussed previously, my REEDs don't form networks (they can only connect to an existing network). Upon connection, my REEDs inherit the activetimestamp of the network they just joined. If the OTBR goes away, this timestamp would remain the same indefinitely. I wanted to ensure that if my OTBR rebooted, it would always use a higher activetimestamp (preventing the same/lower cases above) in Forming a network, ensuring it can connect. To handle this, I have the OTBR use its unix timestamp when Forming a network. This ensures the new node always falls into CASE 3 above, and the OTBR will connect to the network even if some of its operational dataset (like the prefix) is mis-matched.  At that point, we can't be 100 percent sure full connectivity has been restored, if for example it is using a different prefix. That's where the next step comes in. . .

Forcing Dataset Propagation when OTBR is not the Leader
Since the OTBR has no way of seizing the Leader role, I needed another way of resolving any discrepancies in the operational dataset.  This can be accomplished using Commissioner, by simply incrementing the timestamp.  Just after the OTBR forms the network, I add the border router prefix (ot-ctl prefix add ..., then ot-clt netdata register), then wait 5 seconds before starting the process below:
  • sudo ot-ctl commissioner start # petition then check state to confirm we became commissioner
  • sudo ot-ctl dataset init active  # Set copy the OTBR's exiting active dataset to the buffer
  • set  "$pendingTimestamp" and  "$activeTimestamp" to the current unix time
  • sudo ot-ctl dataset pendingtimestamp "$pendingTimestamp"
  • sudo ot-ctl dataset activetimestamp  "$activeTimestamp" # probably an unnecessary step
  • sudo ot-ctl dataset delay 5000
  • sudo ot-ctl dataset commit pending
  • sleep 5
  • sudo ot-ctl commissioner stop
I don't think the commissioner trick is necessary to ensure that the attaching device provides its "newer" Operational Dataset to the leader. Thread should handle this automatically as long as the Active Timestamp differs.

--
Jonathan Hui

Michael Simpson

unread,
Jan 18, 2021, 11:41:53 PM1/18/21
to openthread-users
Hi Rob and Jonathan

Not giving my REED a complete dataset so it can't form a network seems to fix my other problem where the REED IPV6 addresses sometime don't match my OTBR IPV6 address.
My script now reliably starts the network and my REEDs reliably join with matching IP addresses and my SBC app using Californium CoAP communicates with my REEDs so I am happy.

For my understanding, how much of the IPV6 address (MLEID specifically) should match between nodes on my Thread network (REEDs and RCP). Even when these are completely different, tcpdump running on my OTBR sees multi-cast messages from my REEDs. But my application using Californium CoAP does not see them, but we could be using this incorrectly. With IP4V the network mask defines the class of network but I don't know how this works in IPV6.

Thanks for the information on setting the Mesh Local Prefix.

My main concern about preventing REEDs from becoming the Leader after a power cut, a REED could become a leader and all other nodes join. But later that REED might go down and bring down my Thread network for 90 second until another node becomes the Leader. My preference is that only the OTBR RCP can be the Leader as it has a more reliable power supply and the REEDS only need to communicate with the my app on the OTBR and not with each other.

So I tapped into otSetStateChangedCallback and if the otThreadGetDeviceRole changes to Leader I create a new incomplete dataset with Master key only and set this active again which causes my REED to detach and go back to looking for a network to join again.

Could you respond to my query about the OTBR-WEB Topology tab which is
 "Also is there a known problem with the OTBR-WEB Topology screen. When I use OTBR-WEB to successfully form my network and have several REEDs communicating, sometimes the topology display shows a Leader and one Child only. Sometimes it only shows one node (eg Router or Leader), and sometimes it shows nothing at all.  I am not sure what it should display and when. It would be nice if it reliably showed the entire Thread network (Leader, all Routers and all Children) " 

r...@farmjenny.com

unread,
Jan 19, 2021, 7:57:58 PM1/19/21
to openthread-users
Jonathan,

```Thread devices should always adopt the Operational Dataset that has the larger Active Timestamp - this should occur in both case 1 and case 3 above. In case 1, the attaching device should retrieve the latest Operational Dataset from its parent after attaching. In case 3, the attaching device should use the Operational Dataset from its parent to attach and then notify the leader of its "newer" Operational Dataset. The attaching device should not need to become a leader before providing its "newer" Operational Dataset.```

I conducted careful testing of this today.  When I test using CLI nodes, behavior is correct and as you describe above.  Specifically, in Case 3, the attaching device with higher timestamp does push its operational dataset into the network -- for example, causing all existing nodes to change their mesh local prefix, and reflecting the new timestamp.  However, when I repeated the same sequence using an ot-br-posix/RCP as the attaching device, the ot-br-posix connected, but used addresses in the existing network's prefix, and it did not push its operational dataset into the network.  Only after the other devices in the network were powered off, and the otbr became leader, did it assume an IP address using the prefix found in the higher-timestamp dataset.  This combination is what lead me to my earlier statement and commissioner trick currently employed in my border router design.

Details of my configuration, commits, procedure and results in the attached text file.  All credentials used are random.  I would really appreciate if someone could replicate this -- I spent a good part of the day thinking I was losing my mind when everything worked perfectly on the CLI devices.

Beyond the apparent issue with Case 3 on otbr, I saw some troubling behaviors with Case 2 (attaching with with same timestamp as an existing network).  Would welcome your input on these to see if they warrant further investigation.

Best,
Rob





Timestamp Cases.txt

Jonathan Hui

unread,
Jul 20, 2021, 1:08:02 AM7/20/21
to r...@farmjenny.com, openthread-users
FYI - we recently merged openthread/openthread#6813, which may address this issue.

--
Jonathan Hui



--
You received this message because you are subscribed to the Google Groups "openthread-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openthread-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openthread-users/5cb1e4d5-3790-4b77-a8c3-60ab7ed3f05en%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages