New to ONOS clustering

485 views
Skip to first unread message

Chris Ward

unread,
Apr 26, 2021, 11:00:03 AM4/26/21
to ONOS Developers
I'm new to ONOS clustering, and I'm trying to get it to work.
I have set up 3 virtual machines, each running atomix (3.0.7) and ONOS (2.5 Snapshot).
ONOS runs, but doesn't seem to be clustered; I get 3 individual instances.
What should I expect in the logs, to debug the problem ?

I am linking
1) A screenshot of the ONOS GUI 'cluster' screen, showing one of the ONOSes not clustered. http://tjcw.freeshell.org/onos-cluster/onos-screenshot.png
2) Atomix conf/atomix.conf from one of the nodes. Atomix was started with "./bin/atomix-agent -m atomix-1 -a 192.168.122.20:5679". http://tjcw.freeshell.org/onos-cluster/onos1.conf
3) ONOS config/cluster.json . All nodes have the same cluster.json . ONOS was started with "bazel run onos-local -- clean debug". http://tjcw.freeshell.org/onos-cluster/cluster.json

(I tried attaching the files, but got errors from the attach page)

Anurag Chadha

unread,
Apr 26, 2021, 11:08:42 AM4/26/21
to Chris Ward, ONOS Developers
Hi Chris,

You are using incorrect cluster-id in atomix.conf. The cluster-id in atomix.conf should match with the name field in cluster.json.

For more details, please refer:

Regards,
Anurag


--
You received this message because you are subscribed to the Google Groups "ONOS Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to onos-dev+u...@onosproject.org.
To view this discussion on the web visit https://groups.google.com/a/onosproject.org/d/msgid/onos-dev/1fab2c15-4eeb-4a4f-a1a4-39ee667899ban%40onosproject.org.

Chris Ward

unread,
Apr 26, 2021, 11:51:38 AM4/26/21
to ONOS Developers, Anurag Chadha, ONOS Developers, Chris Ward
I changed my ONOS cluster.json to
{
  "name": "atomix",
  "storage": [
    {
      "id": "atomix-1",
      "ip": "192.168.122.20",
      "port": 5679
    },
    {
      "id": "atomix-2",
      "ip": "192.168.122.21",
      "port": 5679
    },
    {
      "id": "atomix-3",
      "ip": "192.168.122.22",
      "port": 5679
    }
  ]
}
but this didn't fix the issue; I still have 3 separate ONOS instances, and the screenshot is the same as before.
Next I will try copying the ONOS and Atomix configurations from the wiki page mentioned above. All help appreciated !

Anurag Chadha

unread,
Apr 26, 2021, 12:37:51 PM4/26/21
to Chris Ward, ONOS Developers
Hi Chris,

Do try the configuration mentioned in the link shared earlier and then let me know.

Regards,
Anurag

Chris Ward

unread,
Apr 27, 2021, 5:49:28 AM4/27/21
to ONOS Developers, Anurag Chadha, ONOS Developers, Chris Ward
I have set up the configuration as per the wiki; my Atomix configuration is http://tjcw.freeshell.org/onos-cluster-2/onos1.conf
(with appropriate changes for nodes 2 and 3) and my cluster.json is unchanged as http://tjcw.freeshell.org/onos-cluster-2/cluster.json
I still get the same screenshot, i.e. the ONOS nodes are not clustering.

I don't see anything in the ONOS log about processing the cluster.json file, and as part of my attempts to set things up I made a run with an incorrect JSON file and there was nothing in the log to report this. My JSON file is at /home/openstack/onos-main/onos/config/cluster.json , and ONOS is built at /home/openstack/onos-main/onos/ ; can you check that this is the right location for the cluster.json file ?

One of the lines in the ONOS log is
                  | 128 - io.atomix.utils - 3.1.5 | RaftServer{raft-partition-1} - Found leader 127.0.0.1
Does this mean ONOS is built with atomix client library version 3.1.5, and is this compatible with the atomix server version 3.0.7 that the wiki page had me set up ?

Anurag Chadha

unread,
Apr 27, 2021, 8:08:20 AM4/27/21
to Chris Ward, ONOS Developers
Hi Chris,

You can build onos on any machine and then copy the onos.tar.gz to your vm's along with cluster.json in the config folder in the location where you untar the onos.tar.gz.
I guess this is the step that is missing as cluster.json is required at runtime and not at compile time.

Similarly the atomix.conf is also required at runtime so you need to copy them in the conf folder in the location where you have untar the atomix distribution.

Regards,
Anurag

Chris Ward

unread,
Apr 27, 2021, 9:29:35 AM4/27/21
to ONOS Developers, Anurag Chadha, ONOS Developers, Chris Ward
I build ONOS in the VMs with
cd /home/openstack/onos-main/onos
bazel build onos

and run with
cd /home/openstack/onos-main/onos
bazel run onos-local -- clean debug

It looks like the 'run' command makes a directory /tmp/onos-2.5.0-SNAPSHOT and sets things up to run from there. There is a file /tmp/onos-2.5.0-SNAPSHOT/config/cluster.json with contents
    {
      "name": "default-12418",
      "node": {
        "id": "127.0.0.1",
        "ip": "127.0.0.1",
        "port": 9876
      },
      "clusterSecret": "17440"
    }
i.e. not my cluster.json, but looks to be what the node is running with.

How should I run ONOS ? I don't know where the onos.tar.gz file is, and I don't know what command I should issue to run ONOS after unpacking it.If you could tell me the right command I would be most grateful (or point me at the right wiki page).

Anurag Chadha

unread,
Apr 27, 2021, 9:49:19 AM4/27/21
to Chris Ward, ONOS Developers
Hi Chris,

After building onos, onos.tar.gz is created in bazel-bin folder. You can untar the file anywhere just remember to place the cluster.json file in config folder inside the untar folder.
To run onos, run ./bin/karaf from apache-karaf directory inside untar folder.

You can also create onos as a service also by following below link: (init folder will be inside the untar folder)

Regards,
Anurag

Chris Ward

unread,
Apr 27, 2021, 9:49:25 AM4/27/21
to ONOS Developers, Chris Ward, Anurag Chadha, ONOS Developers
I found onos.tar.gz in the bazel-bin directory in the build tree. It looks as if to run a cluster I need to set up file /tmp/onos-2.5.0-SNAPSHOT/config/cluster.json and then run with
cd /home/openstack/onos-main/onos
bazel run onos-local -- debug

i.e. miss out the 'clean' parameter. Is that what is expected ?

Chris Ward

unread,
Apr 27, 2021, 10:36:42 AM4/27/21
to ONOS Developers, Chris Ward, Anurag Chadha, ONOS Developers
I'm getting a bit further now. But the atomix servers are giving exceptions :
15:33:29.027 [netty-messaging-event-epoll-client-2] ERROR i.a.c.messaging.impl.MessageDecoder - Exception inside channel handling pipeline.
io.netty.handler.codec.DecoderException: java.net.ProtocolException: Unsupported protocol version: 8195
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-codec-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392) ~[netty-codec-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:359) ~[netty-codec-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342) ~[netty-codec-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
and the ONOS servers are writing error messages to log :
2021-04-27T15:34:36,787 | WARN  | atomix-partition-group-membership-service-0 | DefaultPartitionGroupMembershipService | 125 - io.atomix.primitive - 3.1.5 | Failed to locate system partition group via bootstrap nodes. Please ensure partition groups are configured either locally or remotely and the node is able to reach partition group members.
2021-04-27T15:34:41,789 | WARN  | atomix-partition-group-membership-service-0 | DefaultPartitionGroupMembershipService | 125 - io.atomix.primitive - 3.1.5 | Failed to locate system partition group via bootstrap nodes. Please ensure partition groups are configured either locally or remotely and the node is able to reach partition group members.

This looks like it could be a mismatch in atomix levels between client and server. My next try will be to set up atomix servers at level 3.1.5 .

Chris Ward

unread,
Apr 27, 2021, 11:04:58 AM4/27/21
to ONOS Developers, Chris Ward, Anurag Chadha, ONOS Developers
Am I right in thinking that I need Atomix 3.1.5 ? I can't find a prebuilt version of this level; I can find source code which builds with 'mvn compile', but I don't know how to run the resulting package.

Anurag Chadha

unread,
Apr 27, 2021, 11:25:06 AM4/27/21
to Chris Ward, ONOS Developers
You can run the curl command to get 3.1.9 distribution for atomix which is the latest version now.
Anurag

Chris Ward

unread,
Apr 27, 2021, 12:04:09 PM4/27/21
to ONOS Developers, Anurag Chadha, ONOS Developers, Chris Ward
I downloaded the 3.1.5 distribution for atomix from https://repo1.maven.org/maven2/io/atomix/atomix-dist/ , and cluster ONOS is working for me now. I think the wiki needs updating :
1) The version of atomix server needs to agree with the version of atomix in ONOS. ONOS in non-cluster-mode will write the atomix version in its logging.
2) There should be an approved way of placing the cluster.json into /tmp/*/config  (or whichever the right directory is).

I need to think a little more about the right words for the wiki.
Reply all
Reply to author
Forward
0 new messages