voltha deployment in a 3-node kubernetes cluster

Jerry Travlos

unread,

Sep 5, 2018, 7:34:02 AM9/5/18

to VOLTHA Discuss

Hello,

I 'm trying to deploy voltha in a 3-node kubernetes cluster.

The 3 target nodes are ubuntu server 16.04 VMs running in a bare metal server.

I 'm using a 2nd bare metal server (ubuntu 16.04) as a development machine.

All can ping each other.

I installed kubernetes in the 3-node cluster, by following https://guide.opencord.org/prereqs/k8s-multi-node.html.

Then I installed kubectl, by following https://kubernetes.io/docs/tasks/tools/install-kubectl/ and helm, by following https://guide.opencord.org/prereqs/helm.html.

Finally, I 'm trying to depploy VOLTHA, by following https://guide.opencord.org/charts/voltha.html.

Could someone clarify the following:

1. The process of installing VOLTHA helm chart (https://guide.opencord.org/charts/voltha.html) should be repeated in all target nodes?

2. What if I would like to test some code changes and need to redeploy with the changed VOLTHA?

3. Is this the proper way to deploy voltha in a multi node kubernetes cluster or the installer script way, described in voltha/install/BuildingTheInstaller.md, should be followed?

Thanks

Jerry

David Bainbridge

unread,

Sep 5, 2018, 9:44:45 AM9/5/18

to Jerry Travlos, VOLTHA Discuss

On Wed, Sep 5, 2018 at 4:34 AM Jerry Travlos <makis....@gmail.com> wrote:

Hello,

I 'm trying to deploy voltha in a 3-node kubernetes cluster.
The 3 target nodes are ubuntu server 16.04 VMs running in a bare metal server.
I 'm using a 2nd bare metal server (ubuntu 16.04) as a development machine.
All can ping each other.

I installed kubernetes in the 3-node cluster, by following https://guide.opencord.org/prereqs/k8s-multi-node.html.
Then I installed kubectl, by following https://kubernetes.io/docs/tasks/tools/install-kubectl/ and helm, by following https://guide.opencord.org/prereqs/helm.html.
Finally, I 'm trying to depploy VOLTHA, by following https://guide.opencord.org/charts/voltha.html.

Could someone clarify the following:

1. The process of installing VOLTHA helm chart (https://guide.opencord.org/charts/voltha.html) should be repeated in all target nodes?

The Helm command only needs to be executed on one of the nodes.

2. What if I would like to test some code changes and need to redeploy with the changed VOLTHA?

This gets a little more tricky and there are likely multiple ways to accomplish what you want. The essence of what you need to do is create a new version of the Docker image that contains your changes and update the Kubernetes cluster with that new image. It might be easisest to delete the VOLTHA instance and just reinstall it via Helm. You want need to re-install Kubernetes, just VOLTHA using Helm.

3. Is this the proper way to deploy voltha in a multi node kubernetes cluster or the installer script way, described in voltha/install/BuildingTheInstaller.md, should be followed?

Using Helm is the correct way.

Thanks
Jerry

--
You received this message because you are subscribed to the Google Groups "VOLTHA Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to voltha-discus...@opencord.org.
To post to this group, send email to voltha-...@opencord.org.
Visit this group at https://groups.google.com/a/opencord.org/group/voltha-discuss/.
To view this discussion on the web visit https://groups.google.com/a/opencord.org/d/msgid/voltha-discuss/3730a328-d23d-4ba9-a90a-c2ab4ce86efe%40opencord.org.
For more options, visit https://groups.google.com/a/opencord.org/d/optout.

makis....@gmail.com

unread,

Sep 17, 2018, 7:11:35 AM9/17/18

to VOLTHA Discuss, makis....@gmail.com

Hi,

I deployed voltha in a 3-node cluster ("helm install -n voltha voltha") using a local docker registry with images from voltha 1.4.

Voltha container fails to get running, because lookup for KV store's IP address fails:

cord@node1:~/helm-charts$ kubectl get pod -n voltha

NAME READY STATUS RESTARTS AGE

default-http-backend-5c6d95c48-mprgr 1/1 Running 0 15m

freeradius-6d49d9588b-xxtxx 1/1 Running 0 15m

netconf-75796c6558-pvpfr 1/1 Running 0 15m

nginx-ingress-controller-566c84c9fd-frwwz 1/1 Running 0 15m

ofagent-57b8c8d77d-rxsh7 1/1 Running 0 15m

vcli-5dd959d78f-mmms2 1/1 Running 0 15m

vcore-0 1/1 Running 0 15m

voltha-6dd5f6d69-5zj8x 0/1 CrashLoopBackOff 4 15m

cord@node1:~/helm-charts$ kubectl -n voltha logs voltha-6dd5f6d69-7q2qm

2018-09-13 11:17:23.600161 I | KV-store etcd at etcd-cluster.default.svc.cluster.local:2379

2018-09-13 11:17:35.811178 I | etcd-cluster.default.svc.cluster.local name resolution failed 1 time(s) retrying...

2018-09-13 11:17:46.946168 I | etcd-cluster.default.svc.cluster.local name resolution failed 2 time(s) retrying...

2018-09-13 11:17:56.048312 I | etcd-cluster.default.svc.cluster.local name resolution failed 3 time(s) retrying...

2018-09-13 11:17:58.053973 I | etcd-cluster.default.svc.cluster.local name resolution failed 4 time(s) retrying...

2018-09-13 11:18:00.058159 I | etcd-cluster.default.svc.cluster.local name resolution failed 5 time(s) retrying...

2018-09-13 11:18:02.158812 I | etcd-cluster.default.svc.cluster.local name resolution failed 6 time(s) retrying...

2018-09-13 11:18:04.163029 I | etcd-cluster.default.svc.cluster.local name resolution failed 7 time(s) retrying...

2018-09-13 11:18:06.231449 I | etcd-cluster.default.svc.cluster.local name resolution failed 8 time(s) retrying...

2018-09-13 11:18:08.235489 I | etcd-cluster.default.svc.cluster.local name resolution failed 9 time(s) retrying...

2018-09-13 11:18:10.239725 I | etcd-cluster.default.svc.cluster.local name resolution failed 10 time(s) retrying...

2018-09-13 11:18:12.243695 I | etcd-cluster.default.svc.cluster.local name resolution failed 10 times giving up

2018-09-13 11:18:12.243735 I | Can't proceed without KV store's vIP address: %slookup etcd-cluster.default.svc.cluster.local on 10.233.0.3:53: no such host

It seems that nslookup command fails:

cord@node1:~/helm-charts$ kubectl exec -ti busybox -- nslookup kubernetes.default

Server: 10.233.0.3

Address 1: 10.233.0.3 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'kubernetes.default'

command terminated with exit code 1

Here is resolv.conf:

cord@node1:~/helm-charts$ kubectl exec busybox cat /etc/resolv.conf

nameserver 10.233.0.3

search default.svc.cluster.local svc.cluster.local cluster.local

options ndots:5

But I don't see any failures in kube-dns pod:

cord@node1:~/helm-charts$ kubectl get pod -n kube-system

NAME READY STATUS RESTARTS AGE

calico-node-6spkf 1/1 Running 0 17d

calico-node-qqdb2 1/1 Running 0 17d

calico-node-zwkws 1/1 Running 1 17d

kube-apiserver-node1 1/1 Running 1 17d

kube-apiserver-node2 1/1 Running 1 17d

kube-controller-manager-node1 1/1 Running 0 17d

kube-controller-manager-node2 1/1 Running 0 17d

kube-dns-7bd4d5fbb6-85glj 3/3 Running 13 2d

kube-dns-7bd4d5fbb6-c6tlh 3/3 Running 18 2d

kube-proxy-node1 1/1 Running 1 17d

kube-proxy-node2 1/1 Running 1 17d

kube-proxy-node3 1/1 Running 1 17d

kube-scheduler-node1 1/1 Running 0 17d

kube-scheduler-node2 1/1 Running 0 17d

kubedns-autoscaler-679b8b455-c89zg 1/1 Running 0 17d

kubernetes-dashboard-55fdfd74b4-gx6gg 1/1 Running 0 17d

nginx-proxy-node3 1/1 Running 1 17d

tiller-deploy-5c688d5f9b-sshbh 1/1 Running 0 17d

cord@node1:~/helm-charts$ kubectl -n kube-system logs kube-dns-7bd4d5fbb6-c6tlh kubedns

I0914 13:30:50.684351       1 dns.go:48] version: 1.14.10

I0914 13:30:50.689707       1 server.go:69] Using configuration read from directory: /kube-dns-config with period 10s

I0914 13:30:50.689784       1 server.go:121] FLAG: --alsologtostderr="false"

I0914 13:30:50.689798       1 server.go:121] FLAG: --config-dir="/kube-dns-config"

I0914 13:30:50.689807       1 server.go:121] FLAG: --config-map=""

I0914 13:30:50.689813       1 server.go:121] FLAG: --config-map-namespace="kube-system"

I0914 13:30:50.689818       1 server.go:121] FLAG: --config-period="10s"

I0914 13:30:50.689826       1 server.go:121] FLAG: --dns-bind-address="0.0.0.0"

I0914 13:30:50.689832       1 server.go:121] FLAG: --dns-port="10053"

I0914 13:30:50.689841       1 server.go:121] FLAG: --domain="cluster.local."

I0914 13:30:50.689849       1 server.go:121] FLAG: --federations=""

I0914 13:30:50.689857       1 server.go:121] FLAG: --healthz-port="8081"

I0914 13:30:50.689863       1 server.go:121] FLAG: --initial-sync-timeout="1m0s"

I0914 13:30:50.689868       1 server.go:121] FLAG: --kube-master-url=""

I0914 13:30:50.689875       1 server.go:121] FLAG: --kubecfg-file=""

I0914 13:30:50.689881       1 server.go:121] FLAG: --log-backtrace-at=":0"

I0914 13:30:50.689889       1 server.go:121] FLAG: --log-dir=""

I0914 13:30:50.689896       1 server.go:121] FLAG: --log-flush-frequency="5s"

I0914 13:30:50.689901       1 server.go:121] FLAG: --logtostderr="true"

I0914 13:30:50.689907       1 server.go:121] FLAG: --nameservers=""

I0914 13:30:50.689912       1 server.go:121] FLAG: --stderrthreshold="2"

I0914 13:30:50.689918       1 server.go:121] FLAG: --v="2"

I0914 13:30:50.689923       1 server.go:121] FLAG: --version="false"

I0914 13:30:50.689932       1 server.go:121] FLAG: --vmodule=""

I0914 13:30:50.690050       1 server.go:169] Starting SkyDNS server (0.0.0.0:10053)

I0914 13:30:50.690382       1 server.go:179] Skydns metrics enabled (/metrics:10055)

I0914 13:30:50.690396       1 dns.go:188] Starting endpointsController

I0914 13:30:50.690402       1 dns.go:191] Starting serviceController

I0914 13:30:50.690502       1 dns.go:184] Configuration updated: {TypeMeta:{Kind: APIVersion:} Federations:map[] StubDomains:map[] UpstreamNameservers:[]}

I0914 13:30:50.693195       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]

I0914 13:30:50.693222       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]

I0914 13:30:51.197440       1 dns.go:222] Initialized services and endpoints from apiserver

I0914 13:30:51.197469       1 server.go:137] Setting up Healthz Handler (/readiness)

I0914 13:30:51.197485       1 server.go:142] Setting up cache handler (/cache)

I0914 13:30:51.197494       1 server.go:128] Status HTTP port 8081

I0917 09:25:15.717215       1 dns.go:601] Could not find endpoints for service "freeradius" in namespace "voltha". DNS records will be created once endpoints show up.

I0917 09:25:15.811153       1 dns.go:601] Could not find endpoints for service "netconf" in namespace "voltha". DNS records will be created once endpoints show up.

I0917 09:25:16.039319       1 dns.go:601] Could not find endpoints for service "vcore" in namespace "voltha". DNS records will be created once endpoints show up.

I0917 09:26:09.983292       1 dns.go:601] Could not find endpoints for service "etcd-cluster" in namespace "default". DNS records will be created once endpoints show up.

cord@node1:~/helm-charts$ kubectl -n kube-system logs kube-dns-7bd4d5fbb6-c6tlh dnsmasq

I0916 17:51:08.776540       1 main.go:74] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}

I0916 17:51:08.776929       1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]

I0916 17:51:08.869265       1 nanny.go:119]

W0916 17:51:08.869293       1 nanny.go:120] Got EOF from stdout

I0916 17:51:08.869352       1 nanny.go:116] dnsmasq[9]: started, version 2.78 cachesize 1000

I0916 17:51:08.869390       1 nanny.go:116] dnsmasq[9]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify

I0916 17:51:08.869421       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain ip6.arpa

I0916 17:51:08.869446       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa

I0916 17:51:08.869469       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain cluster.local

I0916 17:51:08.869535       1 nanny.go:116] dnsmasq[9]: reading /etc/resolv.conf

I0916 17:51:08.869563       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain ip6.arpa

I0916 17:51:08.869586       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa

I0916 17:51:08.869609       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain cluster.local

I0916 17:51:08.869633       1 nanny.go:116] dnsmasq[9]: using nameserver 10.233.0.3#53

I0916 17:51:08.869714       1 nanny.go:116] dnsmasq[9]: read /etc/hosts - 7 addresses

cord@node1:~/helm-charts$ kubectl -n kube-system logs kube-dns-7bd4d5fbb6-c6tlh sidecar

I0916 17:51:15.199545       1 main.go:51] Version v1.14.8.3

I0916 17:51:15.199834       1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})

I0916 17:51:15.199947       1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}

I0916 17:51:15.203488       1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}

DNS service is up:

cord@node1:~/helm-charts$ kubectl get svc --namespace=kube-system
NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
kube-dns               ClusterIP   10.233.0.3      <none>        53/UDP,53/TCP   17d
kubernetes-dashboard   ClusterIP   10.233.55.167   <none>        443/TCP         17d
tiller-deploy          ClusterIP   10.233.31.12    <none>        44134/TCP       17d

DNS endpoints are exposed:

cord@node1:~/helm-charts$ kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                                                     AGE
kube-dns   10.233.71.10:53,10.233.71.14:53,10.233.71.10:53 + 1 more...   17d

The VM is running Ubuntu server 16.04 and uses a static IP address:

cord@node1:~/helm-charts$ cat /etc/network/interfaces

# This file describes the network interfaces available on your system

# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface

auto lo

iface lo inet loopback

# The primary network interface

auto enp0s8

iface enp0s8 inet static

address 10.85.185.188

gateway 10.85.185.145

netmask 255.255.255.224

Version of kubectl:

cord@node1:~/helm-charts$ kubectl version

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:00:59Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:00:59Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

I have already tried various solutions suggested on the internet, with no luck.

Is it maybe some DNS configuration missing or a misconfiguration?

Any idea on how to solve this or what to further check?

Thanks

makis....@gmail.com

unread,

Sep 24, 2018, 10:11:29 AM9/24/18

to VOLTHA Discuss, makis....@gmail.com

Hi,

Could someone give a hint on the following please:

Voltha container does not crash any more,

but there are error/warning logs in etcd-operator like (full etcd-operator log attached):

time="2018-09-24T11:35:00Z" level=error msg="failed to reconcile: fail to add new member (etcd-cluster-0001): context deadline exceeded" cluster-name=etcd-cluster pkg=cluster

time="2018-09-24T11:35:10Z" level=error msg="failed to reconcile: lost quorum" cluster-name=etcd-cluster pkg=cluster

time="2018-09-24T11:36:13Z" level=error msg="failed to update members: list members failed: creating etcd client failed: grpc: timed out when dialing" cluster-name=etcd-cluster pkg=cluster

time="2018-09-24T11:38:37Z" level=warning msg="all etcd pods are dead." cluster-name=etcd-cluster pkg=cluster

Nevertheless, voltha-etcd-operator is in running state.

Could we continue with voltha testing, even with these errors/warnings?

We have also followed the workaround for the known etcd-operator bug described in https://guide.opencord.org/charts/voltha.html.

Are the above errors relative with this bug maybe?

Thanks,

Jerry

voltha-etcd-operator.log

makis....@gmail.com

unread,

Sep 28, 2018, 7:18:42 AM9/28/18

to VOLTHA Discuss, makis....@gmail.com

Issues resolved by following:

https://groups.google.com/a/opencord.org/forum/#!topic/voltha-discuss/tHw6kfALzTc

Saurav Das

unread,

Sep 28, 2018, 1:48:41 PM9/28/18

to makis....@gmail.com, VOLTHA Discuss

Questions regarding the deployment of voltha in seba/cord are best asked on the seba mailing list

https://wiki.opencord.org/display/CORD/SEBA

On Fri, Sep 28, 2018 at 4:18 AM, <makis....@gmail.com> wrote:

Issues resolved by following:

https://groups.google.com/a/opencord.org/forum/#!topic/voltha-discuss/tHw6kfALzTc

--

You received this message because you are subscribed to the Google Groups "VOLTHA Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to voltha-discuss+unsubscribe@opencord.org.

To post to this group, send email to voltha-...@opencord.org.
Visit this group at https://groups.google.com/a/opencord.org/group/voltha-discuss/.

To view this discussion on the web visit https://groups.google.com/a/opencord.org/d/msgid/voltha-discuss/5ffb6b0e-8c9c-485c-bb78-782dc0f33809%40opencord.org.

Reply all

Reply to author

Forward