RCORD 6.0 : cord-kafka and voltha-kafka in CrashLoopBackOff state

276 views
Skip to first unread message

udupa....@inventec.com

unread,
Sep 12, 2018, 2:03:35 PM9/12/18
to CORD Discuss
Hi,

I am installing RCORD version 6.0 single node cluster.
But, I am seeing that the pods cord-kafka and voltha-kafka are continuously crashing.
The error log is as below:

[2018-09-12 17:51:24,054] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2018-09-12 17:51:24,056] INFO Initiating client connection, connectString=cord-kafka-zookeeper:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@42a48628 (org.apache.zookeeper.ZooKeeper)
[2018-09-12 17:51:32,998] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2018-09-12 17:51:39,004] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
        at kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:225)
        at kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:221)
        at kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:221)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250)
        at kafka.zookeeper.ZooKeeperClient.kafka$zookeeper$ZooKeeperClient$$waitUntilConnected(ZooKeeperClient.scala:221)
        at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:95)
        at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1538)
        at kafka.server.KafkaServer.kafka$server$KafkaServer$$createZkClient$1(KafkaServer.scala:348)
        at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:372)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:202)
        at io.confluent.support.metrics.SupportedServerStartable.startup(SupportedServerStartable.java:117)
        at io.confluent.support.metrics.SupportedKafka.main(SupportedKafka.java:62)
[2018-09-12 17:51:39,007] INFO shutting down (kafka.server.KafkaServer)
[2018-09-12 17:51:39,008] WARN  (kafka.utils.CoreUtils$)
java.lang.NullPointerException
        at kafka.server.KafkaServer$$anonfun$shutdown$5.apply$mcV$sp(KafkaServer.scala:572)
        at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:85)
        at kafka.server.KafkaServer.shutdown(KafkaServer.scala:572)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:329)
        at io.confluent.support.metrics.SupportedServerStartable.startup(SupportedServerStartable.java:117)
        at io.confluent.support.metrics.SupportedKafka.main(SupportedKafka.java:62)
[2018-09-12 17:51:39,012] INFO shut down completed (kafka.server.KafkaServer)
[2018-09-12 17:51:39,013] INFO shutting down (kafka.server.KafkaServer)


Does anybody know how I can fix this? Please help.

Regards,
Ashwini

Matteo Scandolo

unread,
Sep 12, 2018, 2:15:02 PM9/12/18
to udupa....@inventec.com, CORD Discuss

--
You received this message because you are subscribed to the Google Groups "CORD Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cord-discuss...@opencord.org.
To post to this group, send email to cord-d...@opencord.org.
Visit this group at https://groups.google.com/a/opencord.org/group/cord-discuss/.
To view this discussion on the web visit https://groups.google.com/a/opencord.org/d/msgid/cord-discuss/d92fb953-f25b-4ddb-835f-f58e63ee286d%40opencord.org.
For more options, visit https://groups.google.com/a/opencord.org/d/optout.


--
Matteo Scandolo
Member of Technical Staff
--
ONF

Udupa.Ashwini ISV

unread,
Sep 12, 2018, 5:35:01 PM9/12/18
to Matteo Scandolo, CORD Discuss

Hi Matteo,

 

Thanks for the reply.

Cord-kafka-zookeeper is running.

cord@cordmc:~$ kubectl get pods

NAME                             READY     STATUS             RESTARTS   AGE

cord-kafka-0                     0/1       CrashLoopBackOff   24         1h

cord-kafka-zookeeper-0           1/1       Running            0          1h

cord-kafka-zookeeper-1           1/1       Running            0          1h

cord-kafka-zookeeper-2           1/1       Running            0          1h

xos-chameleon-5b7d8f6bd6-nj85x   1/1       Running            0          15h

xos-core-6fffcf6976-jsh4d        1/1       Running            0          15h

xos-db-f9ddc6589-5hxz7           1/1       Running            0          15h

xos-gui-65cc477945-jjjkn         1/1       Running            0          15h

xos-redis-74c5cdc969-dcrwb       1/1       Running            0          15h

xos-tosca-6898f97c86-n78j8       1/1       Running            0          15h

xos-ws-7dcfc46b87-8xcsm          1/1       Running            0          15h

 

In the logs, I see that the connection was successful one time when sessionTimeout is 40000. But, when sessionTimeout is 6000, it timed out while CONNECTING.

 

[main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=cord-kafka-zookeeper:2181 sessionTimeout=40000 watcher=io.confluent.admin.utils.ZookeeperConnectionWatcher@1ddc4ec2

[main-SendThread(cord-kafka-zookeeper.default.svc.cluster.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server cord-kafka-zookeeper.default.svc.cluster.local/10.101.126.60:2181. Will not attempt to authenticate using SASL (unknown error)

[main-SendThread(cord-kafka-zookeeper.default.svc.cluster.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to cord-kafka-zookeeper.default.svc.cluster.local/10.101.126.60:2181, initiating session

[main-SendThread(cord-kafka-zookeeper.default.svc.cluster.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server cord-kafka-zookeeper.default.svc.cluster.local/10.101.126.60:2181, sessionid = 0x165cf302b770001, negotiated timeout = 40000

[main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x165cf302b770001 closed

 

 

[2018-09-12 19:17:43,471] INFO Initiating client connection, connectString=cord-kafka-zookeeper:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@42a48628 (org.apache.zookeeper.ZooKeeper)

[2018-09-12 19:17:52,404] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)

[2018-09-12 19:17:58,410] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)

kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING

 

Should I increase zookeeper sessionTimeout  time? If so, how do I do it?

I am not sure if it is failing due to other reasons.

Can you please help?

 

Regards,

Ashwini

Matteo Scandolo

unread,
Sep 12, 2018, 5:37:52 PM9/12/18
to udupa....@inventec.com, CORD Discuss
I have never seen this issue myself,
if this is a timing issue I would think about resources.

On what system are you running this?

Teo

Udupa.Ashwini ISV

unread,
Sep 12, 2018, 5:41:33 PM9/12/18
to Matteo Scandolo, CORD Discuss

Hi Teo,

 

I am running this on freshly installed Ubuntu 16.04 (AMD64).

--
You received this message because you are subscribed to a topic in the Google Groups "CORD Discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/a/opencord.org/d/topic/cord-discuss/w-9GgkZBnqo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cord-discuss...@opencord.org.


To post to this group, send email to cord-d...@opencord.org.
Visit this group at https://groups.google.com/a/opencord.org/group/cord-discuss/.

Zack Williams

unread,
Sep 12, 2018, 6:05:32 PM9/12/18
to Udupa.Ashwini ISV, Matteo Scandolo, CORD Discuss
Does:

kubeclt describe cord-kafka-0

say anything interesting?

Which version of the upstream kafka chart is this?

Are you using any values files that specify additional configuration?

More recent versions of the incubator kafka chart changed the configuration keys - see
here for updated values we're using in master with v0.8.8 of the kafka chart:

https://gerrit.opencord.org/gitweb?p=helm-charts.git;a=blob;f=examples/kafka-single.yaml

- Zack

> On Sep 12, 2018, at 2:41 PM, Udupa.Ashwini ISV <udupa....@inventec.com> wrote:
>
> I am running this on freshly installed Ubuntu 16.04 (AMD64).

---
Zack Williams
z...@opennetworking.org

Udupa.Ashwini ISV

unread,
Sep 13, 2018, 2:38:04 PM9/13/18
to Zack Williams, Matteo Scandolo, CORD Discuss
Hi Zack,

On a freshly installed Ubuntu 16.04,I just installed minikube as per this link:
https://github.com/kubernetes/minikube#linux-continuous-integration-without-vm-support

and Helm as per this link: https://guide.opencord.org/prereqs/helm.html.

Then, I tried installing Kafka.
I also tried with -version 0.8.8 as specified in this link : https://guide.opencord.org/charts/kafka.html.

But, I am still seeing cord-kafka in continuous crash state. I am able to run all other containers successfully. I am only seeing issues with kafka.

Is there any specific Ubuntu OS version that you have successfully run cord-kafka and voltha-kafka? If so, I can try running on that version.
Also, please tell me if there are any changes I need to do. The cord-kafka-zookeeper is running. Do I have to change anything in cord-kafka-zookeeper?

Regards,
Ashwini

-----Original Message-----
From: Zack Williams <z...@opennetworking.org>
Sent: Wednesday, September 12, 2018 3:05 PM
To: Udupa.Ashwini ISV <udupa....@inventec.com>
Cc: Matteo Scandolo <t...@opennetworking.org>; CORD Discuss <cord-d...@opencord.org>
Subject: Re: [CORD Discuss] RCORD 6.0 : cord-kafka and voltha-kafka in CrashLoopBackOff state

Udupa.Ashwini ISV

unread,
Sep 15, 2018, 9:12:25 PM9/15/18
to Zack Williams, Matteo Scandolo, CORD Discuss
Hi Zack, Matteo,

I can run kafka successfully now.
In examples/kafka-single.yaml, I added "zookeeper.session.timeout.ms" as 60000 under configurationOverrides. The default was 6000 and it was timing out before it could establish connection. I guess the default 6000 is too less.

configurationOverrides:
"offsets.topic.replication.factor": 1
"log.retention.hours": 4
"zookeeper.connection.timeout.ms": 60000
"zookeeper.session.timeout.ms": 60000

After making this change and running kafka, the session.timeout value changed and its running successfully. I guess adding this change to CORD code helps.
Once again, thanks for your reply.

Regards,
Ashwini

-----Original Message-----
From: Zack Williams <z...@opennetworking.org>
Sent: Thursday, September 13, 2018 4:33 PM
To: Udupa.Ashwini ISV <udupa....@inventec.com>
Cc: Matteo Scandolo <t...@opennetworking.org>
Subject: Re: [CORD Discuss] RCORD 6.0 : cord-kafka and voltha-kafka in CrashLoopBackOff state


> On Sep 13, 2018, at 4:23 PM, Udupa.Ashwini ISV <udupa....@inventec.com> wrote:
>
> I have just installed Kubernetes and Helm on freshly installed Ubuntu 16.04 and then trying to install Kafka.

How did you install k8s? Kubeadm/kubespray/minikube?

> I also tried with -version 0.8.8 as specified in this link : https://guide.opencord.org/charts/kafka.html. But, still getting same error.

Did you use the linked values filewith helm that changes the number of replicas of kafka and zookeeper?

https://gerrit.opencord.org/gitweb?p=helm-charts.git;a=blob;f=examples/kafka-single.yaml

If you're not reducing these numbers and running on a single-node system (set up by minikube or kubeadm), zookeeper might be waiting for a quorum of replicas and failing like you're seeing.

Thanks,
Zack

Sivakolunthu A

unread,
Nov 1, 2018, 1:57:34 PM11/1/18
to CORD Discuss, z...@opennetworking.org, t...@opennetworking.org
HI

We are facing a similar issue. 
We updated the yaml file with new value of timeouts of overrides. But the -f option is not allowing us to override the yaml file and kakfa still reads as 40000

Used the below command. Any pointers would help. Thanks much

 helm install -n cord-kafka incubator/kafka --version 0.8.8 --set persistence.enabled=false --set zookeeper.persistence.enabled=false -f kafka-single-override.yaml 

The file contents are
configurationOverrides:
  "offsets.topic.replication.factor": 1
  "log.retention.hours": 4
  "log.message.timestamp.type": "LogAppendTime"

Udupa.Ashwini ISV

unread,
Nov 1, 2018, 2:11:01 PM11/1/18
to Sivakolunthu A, CORD Discuss, z...@opennetworking.org, t...@opennetworking.org

Hi,

 

I just added the 2 lines

  "zookeeper.connection.timeout.ms": 60000
  "zookeeper.session.timeout.ms": 60000
in examples/kafka-single.yaml file

And ran using “helm install -f examples/kafka-single.yaml --version 0.8.8 -n cord-kafka incubator/kafka” as mentioned in the guide.
 
From the logs, I saw that the connection was happening twice with sessionTimeout=40000 and sessionTimeout=6000 and the connection with sessionTimeout=40000 did not change with my changes. It got overridden for sessionTimeout=6000  case and I was able to install cord-kafka successfully. 

Logs:

 

[main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=cord-kafka-zookeeper:2181 sessionTimeout=40000 watcher=io.confluent.admin.utils.ZookeeperConnectionWatcher@1ddc4ec2

[main-SendThread(cord-kafka-zookeeper.default.svc.cluster.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server cord-kafka-zookeeper.default.svc.cluster.local/10.101.126.60:2181. Will not attempt to authenticate using SASL (unknown error)

[main-SendThread(cord-kafka-zookeeper.default.svc.cluster.local:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to cord-kafka-zookeeper.default.svc.cluster.local/10.101.126.60:2181, initiating session

[2018-09-12 19:17:43,471] INFO Initiating client connection, connectString=cord-kafka-zookeeper:2181 sessionTimeout=60000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@42a48628 (org.apache.zookeeper.ZooKeeper)

[2018-09-12 19:17:52,404] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)

[2018-09-12 19:17:58,410] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)

 

Regards,

Ashwini

--

You received this message because you are subscribed to a topic in the Google Groups "CORD Discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/a/opencord.org/d/topic/cord-discuss/w-9GgkZBnqo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cord-discuss...@opencord.org.
To post to this group, send email to cord-d...@opencord.org.
Visit this group at https://groups.google.com/a/opencord.org/group/cord-discuss/.

Udupa.Ashwini ISV

unread,
Nov 1, 2018, 2:17:45 PM11/1/18
to Sivakolunthu A, CORD Discuss, z...@opennetworking.org, t...@opennetworking.org

Hi,

 

The logs I sent before are for sessionTimeout=6000 case which failed. I do not have the success logs now but it did change to 60000 and cord-kafka installed successfully.

 

Regards,

Ashwini

Reply all
Reply to author
Forward
0 new messages