Zigbee model, node FAILED to join the network - ns-3-dev

162 views
Skip to first unread message

Piotr Biernacki

unread,
Dec 27, 2024, 5:21:01 PM12/27/24
to ns-3-users
Hello ns-3 users,

I have tried to use the Zigbee model and its NWK layer from the latest ns-3-dev version. Unfortunately I have run on a exception during execution of one of the examples (zigbee-nwk-routing2.cc) and I would like to ask you for help.
The example has 50 nodes which at the beginning of the simulation try to associate with the discovered networks and in the middle of the associations the exception occurs:

```
+382.122880000s 38 [ff:ff | 00:00:00:00:00:00:00:27] ZigbeeNwk:NlmeJoinRequest(0x557691aeaee0)
+382.122880000s 38 [ff:ff | 00:00:00:00:00:00:00:27] ZigbeeNwk:NlmeJoinRequest(): Send Assoc. Req. to [c7:4b] in PAN id and Ext PAN id: (0xf4d1 | 0x1)
+382.127040000s 21 [c7:4b | 00:00:00:00:00:00:00:16] ZigbeeNwk:MlmeAssociateIndication(0x557691ad0d60)
+382.127040000s 21 [c7:4b | 00:00:00:00:00:00:00:16] ZigbeeNwk:MlmeAssociateIndication(): Storing an Associate response command with the allocated address ff:ff
+382.624608000s 38 [ff:ff | 00:00:00:00:00:00:00:27] ZigbeeNwk:MlmeAssociateConfirm(0x557691aeaee0)
 +382.625s The device FAILED to join the network with status 195
msg="MLME-COMM-Status.Indication: params do not match", +382.624608000s 21 file=/home/piotrek/Projects/IITIS/Impress/workspace/ns-3-dev/src/zigbee/model/zigbee-nwk.cc, line=1694
NS_FATAL, terminating
terminate called without an active exception
Aborted (core dumped)
```


I have tried to debug and fix it. It looks like the problem appears when the neighbor table (of one of the Zigbee routers (ZRx) to which the other Zigbee node tries to associate/join) reaches its full capacity.

The exception is triggered in the function:
        ZigbeeNwk::MlmeCommStatusIndication(MlmeCommStatusIndicationParams params)
by the statement:
        params.m_dstExtAddr == m_joinIndParams.m_extendedAddress
where these two addresses are not equal.

I have tried to fix it by adding line:
        m_joinIndParams.m_extendedAddress = params.m_extDevAddr;
in the function:
        ZigbeeNwk::MlmeAssociateIndication(MlmeAssociateIndicationParams params)
at the 1900 line of zigbee-nwk.cc file.

It helped - the exception didn't occurred but still the rest of the Zigbee nodes couldn't associate to the network.

Then I have modified the NwkJoinConfirm callback (in zigbee-nwk-routing2.cc file) such when the NwkStatus equaled the FULL_CAPACITY state, new ZigbeeNwk::NlmeJoinRequest was scheduled immediately till there were no more neighbors as potential parent (which were discarded (entry->SetPotentialParent(false);) in ZigbeeNwk::MlmeAssociateConfirm(MlmeAssociateConfirmParams params) function, every time a given ZRx respond with FULL_CAPACITY state).

I have also noticed that the lines:

        case MacStatus::FULL_CAPACITY:
        // Discard neighbor as potential parent
        if (m_nwkNeighborTable.LookUpEntry(m_associateParams.extAddress, entry))
        {
            entry->SetPotentialParent(false);
        ...
        }


fetch and discards the first entry with the ff:ff:ff:ff:ff:ff:ff:ff as extAddress, but there are many of them with the same ff:ff:ff:ff:ff:ff:ff:ff address, and the one entry that should be discarded can be fetched by the nwkAddress so I have added

        Mac16Address nwkAddress;
to the struct AssociateParams, assigned the nwkAddress to the above variable in the function:

        ZigbeeNwk::NlmeJoinRequest(NlmeJoinRequestParams params)
        ...
        // Temporally store some associate values until the process concludes
        m_associateParams.panId = panId;
        m_associateParams.extAddress = bestParentEntry->GetExtAddr();
        m_associateParams.nwkAddress = bestParentEntry->GetNwkAddr();

        m_mac->MlmeAssociateRequest(assocParams);


and modified the discarding lines on:

        case MacStatus::FULL_CAPACITY:
        // Discard neighbor as potential parent
        if (m_nwkNeighborTable.LookUpEntry(m_associateParams.nwkAddress, entry))
        {
            entry->SetPotentialParent(false);
            // m_nwkNeighborTable.Update(m_associateParams.extAddress, entry);
            joinConfirmParams.m_status = NwkStatus::FULL_CAPACITY;
        }
        else
        {
            NS_LOG_ERROR("Neighbor not found when discarding as potential parent");
            joinConfirmParams.m_status = NwkStatus::NOT_PERMITED;
        }
        break;


After these modifications the example hasn't crashed, there were next join attempts to the network and some of the nodes (that failed before) joined the network but still there were some that didn't. I wanted to use 100 or even 1000 nodes in my simulation but even when I modified the max size of neighbor table (m_maxTableSize) to e.g. 64 sill some of the nodes couldn't join the network.

I have also tried to spread them more from each other (80/150 or even 1000m) in the simulation space using mobility object but it seams that it hasn't changed anything. The table were still getting full and didn't let the new nodes join the network.
I am not an export in Zigbee networks, and I am not sure whether the changes that I have made are correct or didn't mess up something else. I am also aware that it is not the final/release version of the Zigbee but I wonder whether I can do something to make it work in the scenario that I have planned (simulation up to 100/1000 nodes with packet transmission like in the Zigbee examples).

Q1. Is the zigbee-nwk-routing2.cc example working at your side without crashes? - I used the latest ns-3-dev version.

Q2. If you encounter the same behavior as me, is there any fix/workaround or any other way to configure it so all the nodes in the simulation can join the network?

Q3. If the simulation works as it should and the scenario where some of the nodes are not supposed to join the network is correct, what can I do to run the simulation with 1000 nodes connected to the network?

Q4. If the zigbee-nwk-routing2.cc example works fine at your side, what should I do/check to make it work at my side?

Q5. Why the increase of the distances between the nodes don't influence the range of network discovery and node association?

Q6. Are the modifications that I made valid or did I mess up something else applying them or misunderstood how something supposed to work?

Thank you for your time.
Best regards,
Piotr

Jack Higgins

unread,
Dec 28, 2024, 8:37:15 AM12/28/24
to ns-3-users
Thank you Piotr for the detailed report.

I am aware of the problem and I am working on the solution. I introduced a problem while adding some changes in my last updates and I haven't be able to implement the solution.
I can have it fixed but I need sometime, I am on a winter break and I am running some errands with the family.

A1. Yeah, it is not supposed to crash.
A2. Same problem on my side. There is for sure a fix. I just need to work on it :D.
A3. In the example, all nodes are supposed to join the network and not crash.
That being said,  naturally,  scenarios where nodes are not able to join the network do exist. Just be aware that in these situations nodes will node not retry to join the network unless you implement it.
The reason is that this is not supposed to be handled or decided by the NWK layer, so be aware of this in large simulations such as the one you describe with 1000 nodes.
Also, with 1000 nodes,  the chances of being assigned the same short address increase (addresses are assigned randomly) and there is no mechanism to detect address collision yet.
There should be one, but we haven't added support for one yet.
A4. Unfortunately, I have no concrete solution right now until I check the problem in depth.
A5. The distance between nodes influences network discovery, but it probably does not have as much impact as you expect. Network discovery is all based on LQI and distance impacts LQI, but  LQI values
do not change much until the last few meters before the end of the communication range (You can read more about this in our paper https://www.mdpi.com/2079-9292/11/24/4090) . 
This behavior is correct. Of course, there are other factors that should affect LQI, but these are not modeled in ns-3. This is a limitation that I wish to address sometime in the future, but for now, it is not considered.
A6. I haven't checked your changes, but I will definitely take a look as soon as I can.

I will keep you posted on any changes here.

Regards,
Al.

Piotr Biernacki

unread,
Dec 28, 2024, 11:16:52 AM12/28/24
to ns-3-users
Thank you Albert for all your answers.
I am sorry for the reassuring/redundant questions that I asked but they were kind of sanity check for me :) Thank you for the link to the paper - great performance analysis and in-depth look into the ns-3 internals.
Of course, I understand it's the holiday season and that the fix needs the time - I will stay tuned.

Some information from me that maybe useful for you later:
In the Q5. about the distance I have checked by placing the 100 nodes in 10x10 grid where each node was 200 or even 1200m away from other nodes and the discovery results given by the static void NwkNetworkDiscoveryConfirm() were the same or similar (7/9/13 networks found). It seemed weird to me that the nodes are still able to find the networks because I remember your answer about the effective range of the nodes with default sensitivity and Tx power.

This can also be connected with the neighbor table filling up as not only the true neighboring nodes (in the radio range) are added to the table but almost all which have sent the beacon and get heart by even distant node.

I have also noticed that most of the entries in the neighbor table has the ff:ff:ff:ff:ff:ff:ff:ff as the IEEE extended address. I don't know whether these are correct entries when the address mode is set to the SHORT_ADDR, but that's what I have observed and thought that's worth to mention, e.g.:

```
[00:00:00:00:00:00:00:48 | ff:ff] | Time: +712.623s | ZigBee Neighbor Table
IEEE Address             Nwk Address  Device Timeout  Relationship    Device type     Tx Failure    LQI  Outgoing Cost   Age     Ext PAN ID
ff:ff:ff:ff:ff:ff:ff:ff                58:a1        +1.536e+13ns         NONE           ROUTER          0                  242              1               0          0x1
ff:ff:ff:ff:ff:ff:ff:ff
               55:53        +1.536e+13ns        NONE           ROUTER          0                  242               1               0          0x1
ff:ff:ff:ff:ff:ff:ff:ff
               6b:85        +1.536e+13ns        NONE           ROUTER          0                  242               1               0          0x1
ff:ff:ff:ff:ff:ff:ff:ff
               00:00        +1.536e+13ns        NONE           COORDINATOR     0          242                1               0          0x1
ff:ff:ff:ff:ff:ff:ff:ff
               10:42        +1.536e+13ns        NONE           ROUTER          0                 255                1               0          0x1
ff:ff:ff:ff:ff:ff:ff:ff
               be:78        +1.536e+13ns        NONE           ROUTER          0                 242                1               0          0x1
ff:ff:ff:ff:ff:ff:ff:ff
               a3:d8        +1.536e+13ns        NONE           ROUTER          0                 242                1               0          0x1
ff:ff:ff:ff:ff:ff:ff:ff
               f2:5b        +1.536e+13ns        NONE           ROUTER          0                 255                 1               0          0x1
```

These entries are added at the end of the ZigbeeNwk::MlmeBeaconNotifyIndication(MlmeBeaconNotifyIndicationParams params) function. I tried to change them to the real extended address taken from the params.m_panDescriptor.m_coorExtAddr variable but it looks like even when I modified the beacons sent from the LrWpanMac::SendOneBeacon() so they had the srcExtAddr embedded in the MAC header, the extended address was disappearing when the beacon was received by the other node in LrWpanMac::PdDataIndication().

Once again, thank you for your time!
Have a lovely holidays!
Piotr

Jack Higgins

unread,
Jan 7, 2025, 1:28:09 AMJan 7
to ns-3-users
Dear Piotr,

I have checked the problems based on your feedback. Please check my comments below:

In the Q5. about the distance I have checked by placing the 100 nodes in 10x10 grid where each node was 200 or even 1200m away from other nodes and the discovery results given by the static void NwkNetworkDiscoveryConfirm() were the same or similar (7/9/13 networks found). It seemed weird to me that the nodes are still able to find the networks because I remember your answer about the effective range of the nodes with default sensitivity and Tx power.

You are right, I was able to confirm this incorrect behavior. The problem has to do with the mobility helper. For some reason, the propagation is not working correctly when using the mobility helper. This can be caused by a recent change, I am still diagnosing the problem, but I thought I would place this comment so I do not keep you waiting while I find the solution. If you notice, the zigbee-routing-example.cc is not using the mobility helper and it is working correctly (devices are not discoverable after 110 meters).

Jack Higgins

unread,
Jan 7, 2025, 5:08:13 AMJan 7
to ns-3-users
Hey Piotr, 

I found and fixed the issue with the MobilityHelper. There was a missing Initialization of the PHY in the NetDevice that was causing the mobility to not be updated when using the MobilityHelper. Thank you for your report.
I have submitted the MR solution here https://gitlab.com/nsnam/ns-3-dev/-/merge_requests/2286.

This will fix the issue in zigbee-example-routing2.cc, (which I will rename in future merges). However, please be aware that it is still possible to fill the neighbor tables in other node distributions. The assumption right now is that the APS should "re issue a join request" when this situation is found. But this is out of the scope of the NWK and the current example.


I have also noticed that most of the entries in the neighbor table has the ff:ff:ff:ff:ff:ff:ff:ff as the IEEE extended address. I don't know whether these are correct entries when the address mode is set to the SHORT_ADDR, but that's what I have observed and thought that's worth to mention, e.g.:

This is the normal behavior because the nodes are running in the MAC SHORT ADDRESS mode. When this mode is active, the MAC header only contains the short address. 
Now, Theoretically, if you are running in extended address mode, the MAC header would contain the Extended address and the short address would be FF:FE for all devices. I say theoretically because even though the possibility of using extended addresses in routing operations is somehow implied in Zigbee, this is not clearly explained in the specification and there is no clear way to "enable the Extended address mode" from the zigbee NWK that I am aware of.

These entries are added at the end of the ZigbeeNwk::MlmeBeaconNotifyIndication(MlmeBeaconNotifyIndicationParams params) function. I tried to change them to the real extended address taken from the params.m_panDescriptor.m_coorExtAddr variable.

I think this would not work because of the same reason given above, The MAC is working in short address mode and therefore the descriptor do not contain the extended address of the coordinator. I would need to double check this,  but as far as I know the descriptor only contains either the short address or the extended address.


I hope this solves most of your issues.

Regards,

Al.

Piotr Biernacki

unread,
Jan 8, 2025, 1:11:52 PMJan 8
to ns-3-users
Dear Alberto,

Thank you very much for the information, your time and effort.

I have checked your solution and the nodes are able to discover the networks advertised by their neighbor, as well as to join to the networks. I have added the simple reissuing of the join request after the NwkStatus::NEIGHBOR_TABLE_FULL status was received by the joining node and the nodes are able to connect to the network through other neighbor. Doing that I have noticed that the neighbor with full table (potential parent) is discarded using it's extended address but very often the neighbors extended addresses are the same (I understand the reason thanks to your last answer), so the first one with the ff:ff:ff:ff:ff:ff:ff:ff address in the neighbors table gets discarded every time and not the one that responded with the MacStatus::FULL_CAPACITY status.

The current discarding algorithm in zigbee-nwk.cc file:

    case MacStatus::FULL_CAPACITY:
        // Discard neighbor as potential parent
        if (m_nwkNeighborTable.LookUpEntry(m_associateParams.extAddress, entry))
        {
            entry->SetPotentialParent(false);
        }

I have modified the parent discarding to use the short address of the neighbor but I am not sure whether my modifications are correct.

I have added
        Mac16Address nwkAddress;
to the struct AssociateParams, assigned the nwkAddress to the above variable in the function:

        ZigbeeNwk::NlmeJoinRequest(NlmeJoinRequestParams params)
        ...
        // Temporally store some associate values until the process concludes
        m_associateParams.panId = panId;
        m_associateParams.extAddress = bestParentEntry->GetExtAddr();
        m_associateParams.nwkAddress = bestParentEntry->GetNwkAddr();

and modified the discarding algorithm to:


    case MacStatus::FULL_CAPACITY:
        // Discard neighbor as potential parent
        if (m_nwkNeighborTable.LookUpEntry(m_associateParams.nwkAddress, entry))
        {
            entry->SetPotentialParent(false);
        }

This modification discards the neighbor which responded with the FULL_CAPACITY status. I tried to check whether this modification is complaint with the standard but I couldn't find the exact answer.
I wanted to mention this issue and the fix I have made hoping it maybe useful for others or during the APS layer development.

Thank you once again.
Kind regards,
Piotr

Jack Higgins

unread,
Jan 10, 2025, 1:12:29 AMJan 10
to ns-3-users
Thank you for the report Piotr.
I will check the proposed patch and add it to Zigbee soon if correct.

Regards,

Al.

Jack Higgins

unread,
Feb 6, 2025, 3:31:09 AMFeb 6
to ns-3-users
Piotr,

I have fixed the issue that you reported above, sorry it took so long. 
Urgent patches and related problems were necessary to be added before the fix.
The 2 issues related to the problem that you reported were patched in:


I used your suggestions as a base, but the final result is slightly different.
The patches are ready and they should be merged into the dev repository soon. They should also appear in the soon to be released ns-3.44.
Thank you again for the report, and let me know if you find others.

Regards,

Al.

Piotr Biernacki

unread,
Feb 6, 2025, 8:44:01 AMFeb 6
to ns-3-users
Dear Alberto,

Thank you for your time and the information about the fixes. I have reviewed your merge request and can confirm that the issue has been resolved.
I will certainly let you know if I find anything.

Regards,
Piotr
Reply all
Reply to author
Forward
0 new messages