New to ONOS clustering

34 views
Skip to first unread message

Chris Ward

unread,
Apr 26, 2021, 11:00:03 AM4/26/21
to ONOS Developers
I'm new to ONOS clustering, and I'm trying to get it to work.
I have set up 3 virtual machines, each running atomix (3.0.7) and ONOS (2.5 Snapshot).
ONOS runs, but doesn't seem to be clustered; I get 3 individual instances.
What should I expect in the logs, to debug the problem ?

I am linking
1) A screenshot of the ONOS GUI 'cluster' screen, showing one of the ONOSes not clustered. http://tjcw.freeshell.org/onos-cluster/onos-screenshot.png
2) Atomix conf/atomix.conf from one of the nodes. Atomix was started with "./bin/atomix-agent -m atomix-1 -a 192.168.122.20:5679". http://tjcw.freeshell.org/onos-cluster/onos1.conf
3) ONOS config/cluster.json . All nodes have the same cluster.json . ONOS was started with "bazel run onos-local -- clean debug". http://tjcw.freeshell.org/onos-cluster/cluster.json

(I tried attaching the files, but got errors from the attach page)

Anurag Chadha

unread,
Apr 26, 2021, 11:11:20 AM4/26/21
to Chris Ward, ONOS Developers
Hi Chris,

You are using incorrect cluster-id in atomix.conf. The cluster-id in atomix.conf should match with the name field in cluster.json.

For more details, please refer:

Regards,
Anurag


--
You received this message because you are subscribed to the Google Groups "ONOS Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to onos-dev+u...@onosproject.org.
To view this discussion on the web visit https://groups.google.com/a/onosproject.org/d/msgid/onos-dev/1fab2c15-4eeb-4a4f-a1a4-39ee667899ban%40onosproject.org.

Chris Ward

unread,
Apr 26, 2021, 11:51:38 AM4/26/21
to ONOS Developers, Anurag Chadha, ONOS Developers, Chris Ward
I changed my ONOS cluster.json to
{
  "name": "atomix",
  "storage": [
    {
      "id": "atomix-1",
      "ip": "192.168.122.20",
      "port": 5679
    },
    {
      "id": "atomix-2",
      "ip": "192.168.122.21",
      "port": 5679
    },
    {
      "id": "atomix-3",
      "ip": "192.168.122.22",
      "port": 5679
    }
  ]
}
but this didn't fix the issue; I still have 3 separate ONOS instances, and the screenshot is the same as before.
Next I will try copying the ONOS and Atomix configurations from the wiki page mentioned above. All help appreciated !

Anurag Chadha

unread,
Apr 26, 2021, 12:37:36 PM4/26/21
to Chris Ward, ONOS Developers
Hi Chris,

Do try the configuration mentioned in the link shared earlier and then let me know.

Regards,
Anurag

Chris Ward

unread,
Apr 27, 2021, 5:49:28 AM4/27/21
to ONOS Developers, Anurag Chadha, ONOS Developers, Chris Ward
I have set up the configuration as per the wiki; my Atomix configuration is http://tjcw.freeshell.org/onos-cluster-2/onos1.conf
(with appropriate changes for nodes 2 and 3) and my cluster.json is unchanged as http://tjcw.freeshell.org/onos-cluster-2/cluster.json
I still get the same screenshot, i.e. the ONOS nodes are not clustering.

I don't see anything in the ONOS log about processing the cluster.json file, and as part of my attempts to set things up I made a run with an incorrect JSON file and there was nothing in the log to report this. My JSON file is at /home/openstack/onos-main/onos/config/cluster.json , and ONOS is built at /home/openstack/onos-main/onos/ ; can you check that this is the right location for the cluster.json file ?

One of the lines in the ONOS log is
                  | 128 - io.atomix.utils - 3.1.5 | RaftServer{raft-partition-1} - Found leader 127.0.0.1
Does this mean ONOS is built with atomix client library version 3.1.5, and is this compatible with the atomix server version 3.0.7 that the wiki page had me set up ?

Anurag Chadha

unread,
Apr 27, 2021, 8:08:09 AM4/27/21
to Chris Ward, ONOS Developers
Hi Chris,

You can build onos on any machine and then copy the onos.tar.gz to your vm's along with cluster.json in the config folder in the location where you untar the onos.tar.gz.
I guess this is the step that is missing as cluster.json is required at runtime and not at compile time.

Similarly the atomix.conf is also required at runtime so you need to copy them in the conf folder in the location where you have untar the atomix distribution.

Regards,
Anurag

Chris Ward

unread,
Apr 27, 2021, 9:29:35 AM4/27/21
to ONOS Developers, Anurag Chadha, ONOS Developers, Chris Ward
I build ONOS in the VMs with
cd /home/openstack/onos-main/onos
bazel build onos

and run with
cd /home/openstack/onos-main/onos
bazel run onos-local -- clean debug

It looks like the 'run' command makes a directory /tmp/onos-2.5.0-SNAPSHOT and sets things up to run from there. There is a file /tmp/onos-2.5.0-SNAPSHOT/config/cluster.json with contents
    {
      "name": "default-12418",
      "node": {
        "id": "127.0.0.1",
        "ip": "127.0.0.1",
        "port": 9876
      },
      "clusterSecret": "17440"
    }
i.e. not my cluster.json, but looks to be what the node is running with.

How should I run ONOS ? I don't know where the onos.tar.gz file is, and I don't know what command I should issue to run ONOS after unpacking it.If you could tell me the right command I would be most grateful (or point me at the right wiki page).

Anurag Chadha

unread,
Apr 27, 2021, 9:49:08 AM4/27/21
to Chris Ward, ONOS Developers
Hi Chris,

After building onos, onos.tar.gz is created in bazel-bin folder. You can untar the file anywhere just remember to place the cluster.json file in config folder inside the untar folder.
To run onos, run ./bin/karaf from apache-karaf directory inside untar folder.

You can also create onos as a service also by following below link: (init folder will be inside the untar folder)

Regards,
Anurag

Chris Ward

unread,
Apr 27, 2021, 9:49:25 AM4/27/21
to ONOS Developers, Chris Ward, Anurag Chadha, ONOS Developers
I found onos.tar.gz in the bazel-bin directory in the build tree. It looks as if to run a cluster I need to set up file /tmp/onos-2.5.0-SNAPSHOT/config/cluster.json and then run with
cd /home/openstack/onos-main/onos
bazel run onos-local -- debug

i.e. miss out the 'clean' parameter. Is that what is expected ?

Chris Ward

unread,
Apr 27, 2021, 10:36:41 AM4/27/21
to ONOS Developers, Chris Ward, Anurag Chadha, ONOS Developers
I'm getting a bit further now. But the atomix servers are giving exceptions :
15:33:29.027 [netty-messaging-event-epoll-client-2] ERROR i.a.c.messaging.impl.MessageDecoder - Exception inside channel handling pipeline.
io.netty.handler.codec.DecoderException: java.net.ProtocolException: Unsupported protocol version: 8195
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-codec-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392) ~[netty-codec-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:359) ~[netty-codec-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342) ~[netty-codec-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
and the ONOS servers are writing error messages to log :
2021-04-27T15:34:36,787 | WARN  | atomix-partition-group-membership-service-0 | DefaultPartitionGroupMembershipService | 125 - io.atomix.primitive - 3.1.5 | Failed to locate system partition group via bootstrap nodes. Please ensure partition groups are configured either locally or remotely and the node is able to reach partition group members.
2021-04-27T15:34:41,789 | WARN  | atomix-partition-group-membership-service-0 | DefaultPartitionGroupMembershipService | 125 - io.atomix.primitive - 3.1.5 | Failed to locate system partition group via bootstrap nodes. Please ensure partition groups are configured either locally or remotely and the node is able to reach partition group members.

This looks like it could be a mismatch in atomix levels between client and server. My next try will be to set up atomix servers at level 3.1.5 .

Chris Ward

unread,
Apr 27, 2021, 11:04:57 AM4/27/21
to ONOS Developers, Chris Ward, Anurag Chadha, ONOS Developers
Am I right in thinking that I need Atomix 3.1.5 ? I can't find a prebuilt version of this level; I can find source code which builds with 'mvn compile', but I don't know how to run the resulting package.

Anurag Chadha

unread,
Apr 27, 2021, 11:24:50 AM4/27/21
to Chris Ward, ONOS Developers
You can run the curl command to get 3.1.9 distribution for atomix which is the latest version now.
Anurag

Chris Ward

unread,
Apr 27, 2021, 12:04:09 PM4/27/21
to ONOS Developers, Anurag Chadha, ONOS Developers, Chris Ward
I downloaded the 3.1.5 distribution for atomix from https://repo1.maven.org/maven2/io/atomix/atomix-dist/ , and cluster ONOS is working for me now. I think the wiki needs updating :
1) The version of atomix server needs to agree with the version of atomix in ONOS. ONOS in non-cluster-mode will write the atomix version in its logging.
2) There should be an approved way of placing the cluster.json into /tmp/*/config  (or whichever the right directory is).

I need to think a little more about the right words for the wiki.

Johnson0707

unread,
Jan 29, 2023, 4:07:08 AM1/29/23
to ONOS Developers, Chris Ward, anurag1...@gmail.com, ONOS Developers

Hi, I'm new to onos clusters.
I want to use onos-2.7.0 and atomix-3.1.12 in three virtual machines.
May I know the relevant steps or what is the tutorial?
I've followed the instructions on the wiki once, but that doesn't work.
Thanks lot.
Chris Ward 在 2021年4月28日 星期三凌晨12:04:09 [UTC+8] 的信中寫道:

Emre KARAKIŞ

unread,
Aug 18, 2024, 2:41:57 PM8/18/24
to ONOS Developers, Johnson0707, Chris Ward, anurag1...@gmail.com, ONOS Developers
Hi all, 

I have also trouble in setting up an ONOS Cluster on 3 physical machines having IP Addresses of 192.168.1.117, 192.168.1.108 and 192.168.1.107. The latest ONOS-2.7.0 distribution (X- Wing) cloned from the master branch in Github platform is installed on each one of the machines. I have Ubuntu 20.04 operating system on the separate machines. I recognized a config folder under ONOS directory. I put the cluster.json file into this directory on each machine. I tried to configure the cluster.json for each machine like below. 

Cluster.json on 192.168.1.117 machine
{
    "name": "onos",
    "node": {
        "id": "192.168.1.117",
        "ip": "192.168.1.117",
        "port": 9876

    },
    "storage": [
        {
            "id": "atomix-1",
            "ip": "192.168.1.117",

            "port": 5679
        },
        {
            "id": "atomix-2",
            "ip": "192.168.1.108",

            "port": 5679
        },
        {
            "id": "atomix-3",
            "ip": "192.168.1.107",
            "port": 5679
        }
    ]
}


Cluster.json on 192.168.1.108 machine
{
    "name": "onos",
    "node": {
        "id": "192.168.1.108",
        "ip": "192.168.1.108",
        "port": 9876

    },
    "storage": [
        {
            "id": "atomix-1",
            "ip": "192.168.1.117",

            "port": 5679
        },
        {
            "id": "atomix-2",
            "ip": "192.168.1.108",

            "port": 5679
        },
        {
            "id": "atomix-3",
            "ip": "192.168.1.107",
            "port": 5679
        }
    ]
}


Cluster.json on 192.168.1.107 machine
{
    "name": "onos",
    "node": {
        "id": "192.168.1.107",
        "ip": "192.168.1.107",
        "port": 9876

    },
    "storage": [
        {
            "id": "atomix-1",
            "ip": "192.168.1.117",

            "port": 5679
        },
        {
            "id": "atomix-2",
            "ip": "192.168.1.108",

            "port": 5679
        },
        {
            "id": "atomix-3",
            "ip": "192.168.1.107",
            "port": 5679
        }
    ]
}

In addition to cluster.json file, I cloned atomix-3.1.12 version to the home directory and renamed as atomix. I copied to each machine with the same name. I do not know whether the atomix should be installed on each machine separately. However, I cloned it anyway to the whole machine. There is an atomix.conf file within this directory (/home/emrekarakis/atomix/config). I tried to configure these file as such below: 

Atomix.conf on 192.168.1.117
{
    "cluster": {
        "clusterId": "onos",
        "node": {
            "id": "atomix-1",
            "address": "192.168.1.117:5679"
        },
        "discovery": {
            "type": "bootstrap",
            "nodes": [
                {
                    "id": "atomix-1",
                    "address": "192.168.1.117:5679"
                },
                {
                    "id": "atomix-2",
                    "address": "192.168.1.108:5679"
                },
                {
                    "id": "atomix-3",
                    "address": "192.168.1.107:5679"
                }
            ]
        }
    },
    "managementGroup": {
        "type": "raft",
        "partitions": 1,
        "partitionSize": 3,
        "members": [
            "atomix-1",
            "atomix-2",
            "atomix-3"
        ],
        "storage": {
            "level": "mapped"
        }
    },
    "partitionGroups": {
        "raft": {
            "type": "raft",
            "partitions": 3,
            "partitionSize": 3,
            "members": [
                "atomix-1",
                "atomix-2",
                "atomix-3"
            ],
            "storage": {
                "level": "mapped"
            }
        }
    }
}
Atomix.conf on 192.168.1.108
cluster {
  cluster-id: "onos"
  node {
    id: "atomix-2"
    address: "192.168.1.108:5679"
  }
  discovery {
    type: "bootstrap"
    nodes {
      node.1 {
        id: "atomix-1"
        address: "192.168.1.117:5679"
      }
      node.2 {
        id: "atomix-2"
        address: "192.168.1.108:5679"
      }
      node.3 {
        id: "atomix-3"
        address: "192.168.1.107:5679"
      }
    }
  }
}

management-group {
  type: "raft"
  partitions: 1
  partition-size: 3
  members: ["atomix-1", "atomix-2", "atomix-3"]
  storage.level: "disk"
}

partition-groups {
  raft {
    type: "raft"
    partitions: 3
    partition-size: 3
    members: ["atomix-1", "atomix-2", "atomix-3"]
    storage.level: "disk"
  }
}

Atomix.conf on 192.168.1.107
cluster {
  cluster-id: "onos"
  node {
    id: "atomix-3"
    address: "192.168.1.107:5679"
  }
  discovery {
    type: "bootstrap"
    nodes {
      node.1 {
        id: "atomix-1"
        address: "192.168.1.117:5679"
      }
      node.2 {
        id: "atomix-2"
        address: "192.168.1.108:5679"
      }
      node.3 {
        id: "atomix-3"
        address: "192.168.1.107:5679"
      }
    }
  }
}

management-group {
  type: "raft"
  partitions: 1
  partition-size: 3
  members: ["atomix-1", "atomix-2", "atomix-3"]
  storage.level: "disk"
}

partition-groups {
  raft {
    type: "raft"
    partitions: 3
    partition-size: 3
    members: ["atomix-1", "atomix-2", "atomix-3"]
    storage.level: "disk"
  }
}


Additionally, I recognized that after building ONOS, some files called ONOS-SNAPSHOT-3.0.0 folder is created on the tmp directory. Similar to the problem experienced by Chris Ward, even if I configured the cluster.json file and copied it manually to /tmp/ONOS-SNAPSHOT-3.0.0/config/  folder, it gets overwritten by the automatically generated version during runtime. After performing these configurations on the machines, the cluster is not created on one of the instances described above and each node can only recognize itself. 

 atomix-gen-config 192.168.1.117 ~/atomix.conf 192.168.1.117 192.168.1.108 192.168.1.107 
 onos-gen-config 192.168.1.117 ~/cluster.json -n 192.168.1.117 192.168.1.108 192.168.1.107

I would like to ask several questions related to the cluster configuration of ONOS instances. 

  • Does ONOS contain Atomix installation folder within itself? Where should be the position to put atomix.conf files? 
  • Similary, I do not know where I should put cluster.json exactly? Where am I wrong with this configuration and what additional configurations do I need? 
  • After a little bit search, I found a cluster.sh file and I reconfigured this file inspired by the Raúl Álvarez according to my environment. However, it does not give any error and it can be able to copy corrensponding files into the specified directory on other machines. After running nodes again, the cluster does not appear again. 
  • Does ONOS reads these atomix and cluster.json files from which directory of ONOS and how to integrate atomix application residing outside of ONOS application with ONOS? 

#!/bin/bash

# Set IP addresses of your machines
ATOMIX_NODES=("192.168.1.117" "192.168.1.108" "192.168.1.107")
ONOS_NODES=("192.168.1.117" "192.168.1.108" "192.168.1.107")

# SSH user
USER="emrekarakis"
PASSWORD="1"

# Path to ONOS and Atomix directories
ONOS_DIR="/home/$USER/onos"
ATOMIX_DIR="/home/$USER/atomix"

# Function to check and kill process using port 5679
check_and_kill_port() {
    local node_ip=$1
    ssh $USER@$node_ip "if sudo lsof -i :5679; then echo $PASSWORD | sudo -S fuser -k 5679/tcp; fi"
}

# Function to generate Atomix config
generate_atomix_config() {
    local node_ip=$1
    ssh $USER@$node_ip "cd $ONOS_DIR/tools/test/bin && ./atomix-gen-config $node_ip /tmp/atomix.conf ${ATOMIX_NODES[*]}"
    ssh $USER@$node_ip "echo $PASSWORD | sudo -S mv /tmp/atomix.conf $ATOMIX_DIR/conf/atomix.conf"
    echo "Atomix config generated on $node_ip"
}

# Function to generate ONOS cluster config
generate_onos_config() {
    local node_ip=$1
    ssh $USER@$node_ip "cd $ONOS_DIR/tools/test/bin && echo $PASSWORD | sudo -S ./onos-gen-config $node_ip /tmp/cluster.json -n ${ATOMIX_NODES[*]}"
    ssh $USER@$node_ip "if [ -f /tmp/cluster.json ]; then echo $PASSWORD | sudo -S mv /tmp/cluster.json $ONOS_DIR/config/cluster.json; else echo 'cluster.json not found on $node_ip'; fi"
    ssh $USER@$node_ip "if [ -f /tmp/cluster.json ]; then echo $PASSWORD | sudo -S cp /tmp/cluster.json /tmp/onos-3.0.0-SNAPSHOT/config/cluster.json; fi"
    echo "ONOS config generated on $node_ip and copied to necessary directories"
}

# Function to ensure the correct cluster.json is used during ONOS startup
ensure_correct_cluster_config() {
    local node_ip=$1
    # Copy the correct cluster.json to the ONOS config directory
    ssh $USER@$node_ip "echo $PASSWORD | sudo -S cp $ONOS_DIR/config/cluster.json $ONOS_DIR/config/cluster.json"
    # Ensure it's also copied to the runtime config directory
    ssh $USER@$node_ip "echo $PASSWORD | sudo -S cp $ONOS_DIR/config/cluster.json /tmp/onos-3.0.0-SNAPSHOT/config/cluster.json"
}

# Function to start Atomix and ONOS on each node
start_services() {
    local node_ip=$1

    # Check and kill any process using port 5679
    check_and_kill_port $node_ip

    ssh $USER@$node_ip "cd $ATOMIX_DIR && echo $PASSWORD | sudo -S ./bin/atomix-agent -m $(hostname) -a $node_ip:5679"
    ssh $USER@$node_ip "cd $ONOS_DIR && echo $PASSWORD | sudo -S ./tools/package/bin/onos-service start"
    echo "Services started on $node_ip"
}

# Main function to setup the cluster
setup_cluster() {
    for ip in "${ATOMIX_NODES[@]}"; do
        generate_atomix_config $ip
    done

    for ip in "${ONOS_NODES[@]}"; do
        generate_onos_config $ip
        ensure_correct_cluster_config $ip
    done

    for ip in "${ATOMIX_NODES[@]}"; do
        start_services $ip
    done
}

# Run the setup
setup_cluster


What should additionally be implemented for the cluster configuration to see all nodes working properly? 

Best Regards,
Emre Karakış  

29 Ocak 2023 Pazar tarihinde saat 12:07:08 UTC+3 itibarıyla Johnson0707 şunları yazdı:
Reply all
Reply to author
Forward
0 new messages