some nodes not work in ONOS cluster

9 views
Skip to first unread message

donghui lu

unread,
Sep 12, 2023, 5:34:49 AM9/12/23
to ONOS Developers
Hi All,

We have deployed 3 ONOS and 3 Atomix cluster on 3 ubuntu pc,there is one Atomix docker container and one ONOS container on each machine,but the cluster works not well.

After we begin each node with its own configuration (Atomix and ONOS
configuration flies are generated by ONOS/tools/test/bin/atomix-gen-config&onos-gen-config scripts),one ONOS node starts very slowly.

After a while,we cannot open its GUI as we can open other 2 ONOS node`s UI. Meanwhile, we can attach its cli,we type [devices] or other orders,it returns  some service not found like "Service org.onosproject.security.AuditService not found". Also this bad node canot be master for any device.

We see all ONOS logs, and the first ERROR in the bad node log is:
ERROR [onos-core-net] bundle org.onosproject.onos-core-net:2.6.0 (193)[org.onosproject.upgrade.impl.UpgradeManager(84)] : The activate method has thrown an exception
org.onosproject.store.service.StorageException$Timeout
after that, there are several errors like :
ERROR [FrameworkEvent] FrameworkEvent ERROR
org.osgi.framework.ServiceException: Service factory returned null. (Component: org.onosproject.upgrade.impl.UpgradeManager (84))
ERROR [FrameworkEvent] FrameworkEvent ERROR
org.osgi.framework.ServiceException: Service factory returned null. (Component: org.onosproject.store.cluster.impl.DistributedLeadershipStore (87))

ONOS version is 2.6.0,Atomix version is 3.1.9,ubuntu version is 22.04.2 or 20.04.1,docker version is 24.0.5,docker network mode is HOST and each container on the 3 machines can access each other.

The Atomix configuration is:
{
    "cluster": {
        "clusterId": "onos",
        "node": {
            "id": "atomix-1",
            "address": "192.168.0.221:5679"
        },
        "discovery": {
            "type": "bootstrap",
            "nodes": [
                {
                    "id": "atomix-1",
                    "address": "192.168.0.221:5679"
                },
                {
                    "id": "atomix-2",
                    "address": "192.168.0.222:5679"
                },
                {
                    "id": "atomix-3",
                    "address": "192.168.0.223:5679"
                }
            ]
        }
    },
    "managementGroup": {
        "type": "raft",
        "partitions": 1,
        "partitionSize": 3,
        "members": [
            "atomix-1",
            "atomix-2",
            "atomix-3"
        ],
        "storage": {
            "level": "mapped"
        }
    },
    "partitionGroups": {
        "raft": {
            "type": "raft",
            "partitions": 3,
            "partitionSize": 3,
            "members": [
                "atomix-1",
                "atomix-2",
                "atomix-3"
            ],
            "storage": {
                "level": "mapped"
            }
        }
    }
}
The ONOS configuration is
{
    "name": "onos",
    "node": {
        "id": "192.168.0.221",
        "ip": "192.168.0.221",
        "port": 9876
    },
    "storage": [
        {
            "id": "atomix-1",
            "ip": "192.168.0.221 192.168.0.222 192.168.0.223",
            "port": 5679
        }
    ]
}

Could you please tell us when can this happen ? And how can we debug this ?

Regards,
hui
Reply all
Reply to author
Forward
0 new messages