Not able to start cask services

102 views
Skip to first unread message

Atul Pundhir

unread,
Nov 25, 2015, 10:35:39 PM11/25/15
to cdap...@googlegroups.com
Hi All,

I am trying to connect cask to my existing Hadoop Cluster, which is using HDP. However, I am not able to start cask services.

I am getting:-

"Unknown / Unsupported version of HBase found"

Although, I have installed Hadoop and Hbase on CDAP node. However no service started yet.

Any help ?

/Atul

Sreevatsan Raman

unread,
Nov 25, 2015, 10:48:07 PM11/25/15
to Atul Pundhir, cdap...@googlegroups.com
Hey Atul,

What version of HDP and what version of CDAP are you using? Latest version of CDAP - 3.2.1 supports HDP 2.0, 2.1, 2.2 and 2.3. 

Also, have you installed all the hadoop client libraries and configurations on cdap-master nodes as listed in the software pre-requisties?

If you run /etc/init.d/cdap-master classpath 

You should get the hadoop, hbase libraries and configurations in the classpath from the previous command. Are you seeing those? If you don't see it then you have not configured the CDAP master correctly.

Thanks,
Sree





--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/CANL3VMaAhc%2BpS5t1ok2EJdWprkO8EK-pW3VhoadOpeq-q0wgOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Sreevatsan Raman

unread,
Nov 25, 2015, 11:07:12 PM11/25/15
to Atul Pundhir, cdap...@googlegroups.com
What is the hbase version that is printed in the following command?

classpath=`/etc/init.d/cdap-master classpath`

java -cp $classpath co.cask.tephra.util.HBaseVersion

Thanks,

Sree


On Wed, Nov 25, 2015 at 7:53 PM, Atul Pundhir <atul.p...@gmail.com> wrote:
Hey Sree,

Thanks for your prompt response. I am using CDAP 3.2 and HDP 2.2. Both are compatible to each other I believe. I already had existing Hadoop cluster. And Yes I can see the hbase and hadoop libraries: Below is the sample output. I have truncated it...


/opt/cdap/hbase-compat-0.98/lib/*:/opt/cdap/master/lib/*:/usr/local/hbase/conf:/usr/java/jdk1.7.0_79//lib/tools.jar:/usr/local/hbase:/usr/local/hbase/lib/activation-1.1.jar:/usr/local/hbase/lib/asm-3.1.jar:/usr/local/hbase/lib/commons-beanutils-1.7.0.jar:/usr/local/hbase/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hbase/lib/commons-cli-1.2.jar:/usr/local/hbase/lib/commons-codec-1.7.jar:/usr/local/hbase/lib/commons-collections-3.2.1.jar:/usr/local/hbase/lib/commons-configuration-1.6.jar:/usr/local/hbase/lib/commons-digester-1.8.jar:/usr/local/hbase/lib/commons-el-1.0.jar:/usr/local/hbase/lib/commons-httpclient-3.1.jar:/usr/local/hbase/lib/commons-io-2.4.jar:/usr/local/hbase/lib/commons-lang-2.6.jar:/usr/local/hbase/lib/commons-logging-1.1.1.jar:/usr/local/hbase/lib/commons-math-2.1.jar:/usr/local/hbase/lib/commons-net-1.4.1.jar:/usr/local/hbase/lib/findbugs-annotations-1.3.9-1.jar: 


/Atul



--
Atul S.

Atul Pundhir

unread,
Nov 25, 2015, 11:28:28 PM11/25/15
to Sreevatsan Raman, cdap...@googlegroups.com
It's 0.98

SLF4J: Found binding in [jar:file:/opt/cdap/master/lib/ch.qos.logback.logback-classic-1.0.9.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
0.98


Regards,
Atul
--
Atul S.

Atul Pundhir

unread,
Nov 25, 2015, 11:29:42 PM11/25/15
to Sreevatsan Raman, cdap...@googlegroups.com
Also, Please find attached screen shot of exact error when I try to run CDAP services

/Atul
--
Atul S.
Capture.PNG

Sreevatsan Raman

unread,
Nov 26, 2015, 1:18:31 AM11/26/15
to Atul Pundhir, cdap...@googlegroups.com
Hey Atul,

How are you starting cdap? Are you running init script as root? Can you also share the full log and the output of the classpath?

Thanks,
Sree

Atul Pundhir

unread,
Nov 26, 2015, 1:40:36 AM11/26/15
to Sreevatsan Raman, cdap...@googlegroups.com
Hi Sree, 

Yes, I am running as root. I am running services using (I have written a shell script)

service cdap-auth-server start
service cdap-kafka-server start
service cdap-master start
service cdap-router start
service cdap-ui start

Please find attached log files



--
Atul S.
logs.zip

Sreevatsan Raman

unread,
Nov 26, 2015, 2:02:55 AM11/26/15
to Atul Pundhir, cdap...@googlegroups.com
Hey Atul,

I don't see the master log in the attachment. Can you please attach.

Also would be good to get the complete classpath by executing /etc/initi.d/cdap-master classpath. I am not sure if there are multiple versions of Hbase jars in the classpath, would be good to confirm that as well. 

Thanks,
Sree

Atul Pundhir

unread,
Nov 26, 2015, 2:08:38 AM11/26/15
to Sreevatsan Raman, cdap...@googlegroups.com
Hi Sree,

Sure, will try to join IRC. I started services using /etc/init.d/cdap-master start. Still not able to get it work. Please find attached master log file.

/Atul
--
Atul S.
master-cdap-localhost.localdomain.log

Sreevatsan Raman

unread,
Nov 26, 2015, 10:33:48 AM11/26/15
to Atul Pundhir, cdap...@googlegroups.com
Hey Atul,

This seems to be a different issue than you saw earlier. Based on the logs, it looks like you don't have right versions of hbase and hadoop client libraries. You seem to have client libaries compatible with hadoop-1. From your logs

more master-cdap-localhost.localdomain.log | grep CLASSPATH | tr ':' '\n' | grep hbase-client

/usr/local/hbase/lib/hbase-client-0.98.16-hadoop1.jar

/usr/local/hbase/lib/hbase-client-0.98.16-hadoop1.jar


You will need to install hbase and hadoop client libraries on the CDAP master that is compatible with your Hadoop version on the cluster. 

Please let us know if you need more help.

Thanks,
Sree

Atul Pundhir

unread,
Nov 26, 2015, 11:44:34 PM11/26/15
to Sreevatsan Raman, cdap...@googlegroups.com
Thanks Sree, for the response. I have changed the hbase-client version to 0.98-hadoop2. However, I am still facing same issue. I deleted the log file for master anticipating that it will create new log file when starting cdap. However, now its not generating the log file also,

Regards,
Atul
--
Atul S.

Sreevatsan Raman

unread,
Nov 27, 2015, 12:35:30 AM11/27/15
to Atul Pundhir, CDAP User
Hi Atul,

Yes. You will need to have the right hbase and Hadoop configurations on the CDAP master nodes. So whatever is in /etc/Hadoop/conf and /etc/hbase/conf on the Hadoop and hbase master should be installed on CDAP master. 

Thanks,
Sree


Sent from Mailbox


On Thu, Nov 26, 2015 at 8:48 PM, Atul Pundhir <atul.p...@gmail.com> wrote:

BTW, do we need to do some configuration on hbase-site.xml or start any hadoop or hbase service?
--
Atul S.

Atul Pundhir

unread,
Nov 27, 2015, 4:49:05 AM11/27/15
to Sreevatsan Raman, CDAP User
Hey Sree,

Thanks for your support. Services are running fine now and I can connect to CDAP UI. However, sometimes UI is getting hanged. Please find attached screenshot.

Regards,
Atul
--
Atul S.
Capture.PNG

Sreevatsan Raman

unread,
Nov 27, 2015, 8:05:21 AM11/27/15
to Atul Pundhir, CDAP User
Hey Atul,
The CDAP master starts yarn application and it usually takes a couple of minutes to provision all the container it needs. If you are seeing this issue even after a few minutes, then recommend you to try the following:

1. Perform health check to see if the services are running fine http://docs.cask.co/cdap/3.2.1/en/admin-manual/installation/hadoop/starting-verification.html

2. Check to see if the router.server.address is configured to the right router address. 

Thanks,
Sree


Sent from Mailbox


<Capture.PNG>

Atul Pundhir

unread,
Dec 11, 2015, 4:26:33 AM12/11/15
to Sreevatsan Raman, CDAP User
Hi Sree,

Thanks for the response. Sorry for delay in response as I was on leave. Everything seems to be working fine. However, at UI side logs are not getting loaded. Please find attachment.


Regards,
Atul
--
Atul S.
Capture.PNG

Sreevatsan Raman

unread,
Dec 11, 2015, 6:46:16 AM12/11/15
to Atul Pundhir, CDAP User
Hey Atul,

Is this specific to Hydrator? Or is this is a problem for all the logs? Are you able to see metrics?

I would start by looking at http end point that would give log saver status:


Does http://<router_host>:<port>/v3/system/services say about logs? does it return service status OK for logs? Example: 
  • name"log.saver",
  • description"Service to collect and store logs.",
  • status"OK",

Checking if Kafka service is running fine, is it getting files with newer time stamp. Look at the kafka directory that you have configured.

Thanks,
Sree

Atul Pundhir

unread,
Dec 13, 2015, 11:21:04 PM12/13/15
to Sreevatsan Raman, CDAP User
Hi Sree,

Yes, Seems like Kafka server is not getting started hence the error. PFA. Please suggest !!!

Regards,
Atul
--
Atul S.
Capture.PNG

Atul Pundhir

unread,
Dec 13, 2015, 11:29:43 PM12/13/15
to Sreevatsan Raman, CDAP User
BTW, Kafka needs to be on cluster as well? Right now on CDAP node I have embedded kafka.  So, in cdap-site.xml zookeeper's configuration is going to be of Cluster's and kafka's configuration of local? right?

Regards,
Atul
--
Atul S.

Atul Pundhir

unread,
Dec 14, 2015, 10:11:27 PM12/14/15
to Sreevatsan Raman, CDAP User
Hi Sree,

Any update on this? 

Sent from my iPhone

Poorna Chandra

unread,
Dec 14, 2015, 10:30:19 PM12/14/15
to Atul Pundhir, Sreevatsan Raman, CDAP User
Hi Atul,

As long as cdap-kafka-server is running, and using the same cdap-site.xml as cdap-master everything should work fine. 

From the screenshot it looks like there is a bad config parameter, can you attach the complete logs for cdap-kafka-server?

Thanks,
Poorna.


Atul Pundhir

unread,
Dec 14, 2015, 10:45:42 PM12/14/15
to Poorna Chandra, Sreevatsan Raman, CDAP User
Hi Poorna,

Thanks for the response. Yes seems like configuration issue only. Please find attachment.

Regards,
Atul
--
Atul S.
kafka-server-cdap-localhost.localdomain.log-20151215.gz

Poorna Chandra

unread,
Dec 15, 2015, 12:37:02 AM12/15/15
to Atul Pundhir, Sreevatsan Raman, CDAP User
Hi Atul,

I see that Kafka server is throwing exception when it is trying to read logs - 

2015-12-14 05:14:50,452 INFO  [EmbeddedKafkaServer STARTING] log.LogManager: Loading logs.
2015-12-14 05:14:50,465 ERROR [EmbeddedKafkaServer STARTING] log.LogManager: There was an error in one of the threads during logs loading: java.lang.NumberFormatException: For
 input string: "c12d71cc4475_resources"
2015-12-14 05:14:50,468 FATAL [EmbeddedKafkaServer STARTING] server.KafkaServer: [Kafka Server 2130706433], Fatal error during KafkaServer startup. Prepare to shutdown
java.lang.NumberFormatException: For input string: "c12d71cc4475_resources"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:492)
        at java.lang.Integer.parseInt(Integer.java:527)
        at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
        at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
        at kafka.log.Log$.parseTopicPartitionName(Log.scala:833)
        at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:138)
        at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


I also see the following message during startup -
2015-12-14 05:14:50,318 INFO  [main] utils.VerifiableProperties: Property log.dir is overridden to /tmp/


Is the kafka.log.dir property in cdap-site.xml set to /tmp? If so, that is the reason for this error. Kafka is trying to read all files in /tmp directory as Kafka log files. Can you set the property to /tmp/kafka-logs, and try again. Although, if you want the logs to be persisted across restarts this property should be set to a non-tmp directory.

Thanks,
Poorna.

Atul Pundhir

unread,
Dec 15, 2015, 1:26:12 AM12/15/15
to Poorna Chandra, Sreevatsan Raman, CDAP User
Hi Poorna,

Thanks for response. I tried that also but still not working. Attaching the cdap-site.xml, and screen shots. 

Regards,
Atul
--
Atul S.
cdap-site.xml
logs.PNG
services.PNG

Poorna Chandra

unread,
Dec 15, 2015, 4:36:34 PM12/15/15
to Atul Pundhir, CDAP User
Hi Atul,

Is cdap-kafka-server now starting up? 

Assuming that cdap-kafka-server is starting fine, from the previous logs you posted -
2015-12-14 05:14:49,712 - WARN [main:c.c.c.k.r.KafkaServerMain@72] - Binding to loopback address!

Looks like the server is binding to loopback interface for some reason. Could you set "kafka.bind.address" to the right interface address in cdap-site.xml, and restart the server?

Let me know if this works.

Thanks,
Poorna.

Atul Pundhir

unread,
Dec 15, 2015, 10:59:25 PM12/15/15
to Poorna Chandra, CDAP User
Hi Poorna,

I did that. Added following in cdap-site.xml

 <property>
    <name>kafka.bind.address</name>
    <description>Kafka bind address</description>
  </property>

  <property>
    <name>kafka.bind.port</name>
    <value>9092</value>
    <description>Kafka bind port</description>
  </property>

But still, cant see logs on hyderator logs tab

Regards,
Atul
--
Atul S.

Atul Pundhir

unread,
Dec 15, 2015, 11:06:56 PM12/15/15
to Poorna Chandra, CDAP User
Seems like its throwing another error now:-

2015-12-16 03:50:22,518 ERROR [EmbeddedKafkaServer STARTING] change.logger: Controller 167772304 epoch 5 initiated state change for partition [metrics,6] from OfflinePartition to OnlinePartition failed
kafka.common.NoReplicaOnlineException: No replica for partition [metrics,6] is alive. Live brokers are: [Set()], Assigned replicas are: [List(2130706433)]
        at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
        at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
        at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
        at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
        at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
        at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70)
        at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:314)
        at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:161)
        at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:81)
        at kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:49)
        at kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47)
        at kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47)
        at kafka.utils.Utils$.inLock(Utils.scala:535)
        at kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:47)
        at kafka.controller.KafkaController$$anonfun$startup$1.apply$mcV$sp(KafkaController.scala:650)
        at kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:646)
        at kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:646)
        at kafka.utils.Utils$.inLock(Utils.scala:535)
        at kafka.controller.KafkaController.startup(KafkaController.scala:646)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:117)
        at org.apache.twill.internal.kafka.EmbeddedKafkaServer.startUp(EmbeddedKafkaServer.java:58)
        at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)
        at java.lang.Thread.run(Thread.java:745)

--
Atul S.

Poorna Chandra

unread,
Dec 15, 2015, 11:15:54 PM12/15/15
to Atul Pundhir, CDAP User

Hi Atul,

Since the Kafka bind interface changed, the broker id also changed.

We can try resetting Kafka server state to fix this. Can you stop Kafka server, remove the Kafka log directory (kafka.log.dir), and restart Kafka again?

Thanks,
Poorna.

Atul Pundhir

unread,
Dec 15, 2015, 11:59:58 PM12/15/15
to Poorna Chandra, CDAP User
Hi Poorna,

Did that but still not working. Attaching all the logs and cdap-site.xml configuration. Please suggest. 

Regards,
Atul
--
Atul S.
logs.zip

Nitin Motgi

unread,
Dec 16, 2015, 12:15:32 AM12/16/15
to Atul Pundhir, Poorna Chandra, CDAP User
Hi Atul, 

Based on the error messages in kafka log I found a JIRA that highlights the issue you are facing. Here is the JIRA : https://issues.apache.org/jira/browse/KAFKA-1460
  • Is Zookeeper running on same box as Kafka ?
  • How many replicas of a partition is being configured ?
Thanks,
Nitin



For more options, visit https://groups.google.com/d/optout.



--
"Humility isn't thinking less of yourself, it's thinking of yourself less"

Atul Pundhir

unread,
Dec 16, 2015, 1:31:50 AM12/16/15
to Nitin Motgi, Poorna Chandra, CDAP User
Hi Nitin,

Thanks for response. To answer your question:-

1. No it's running on different box.
2. Right now it's set to 1. 

--
Atul S.

Nitin Motgi

unread,
Dec 16, 2015, 1:33:05 AM12/16/15
to Atul Pundhir, Poorna Chandra, CDAP User
Hi Atul, 

As per the JIRA atleast increasing the replication would help resolve the issue. How many brokers do you have ?

Thanks,
Nitin

Atul Pundhir

unread,
Dec 16, 2015, 1:49:03 AM12/16/15
to Nitin Motgi, Poorna Chandra, CDAP User
Hi Nitin,

I think issue was because of bad configuration and zookeeper's namespace, I changed the namespace and I can see the logs on UI now. Just a quick question, Does cdap support's audit logging both technical and functional? 

Regards,
Atul
--
Atul S.

Nitin Motgi

unread,
Dec 16, 2015, 1:56:02 AM12/16/15
to Atul Pundhir, Poorna Chandra, CDAP User
Hi Atul, 

Can you please provide the actual issue and the change you did. It might help us add a check to the install process if possible or make sure that document it in docs. Help here would be really appreciated. 

Every activity is being tracked and currently presented in form of lineage (http://docs.cask.co/cdap/3.2.1/en/reference-manual/http-restful-api/metadata.html#http-restful-api-metadata-lineage). We have all the information available to create a comprehensive audit log. In fact, you can create one as this information is published on to Kafka (http://docs.cask.co/cdap/3.2.1/en/developers-manual/building-blocks/metadata-lineage.html#metadata-update-notifications). We will be adding the full audit capability with UI support in upcoming release(s).

Also Atul, could you please provide high level information on the use-case(s) you are attempting to use CDAP / Hydrator for. This would help us a lot in understanding and helping you with context.

Thanks,
Nitin
  

Poorna Chandra

unread,
Dec 16, 2015, 2:22:55 AM12/16/15
to Nitin Motgi, Atul Pundhir, CDAP User
Hi Atul,

You encountered issue https://issues.cask.co/browse/CDAP-4206 due to which Kafka state had to be reset in Zookeeper. I'm curious to know why you decided to change the cdap namespace - that effectively reset Kafka state in Zookeeper. If you could add your findings to https://issues.cask.co/browse/CDAP-4206 that would be helpful when we work on fixing the issue.

Thanks,
Poorna.

Atul Pundhir

unread,
Dec 16, 2015, 3:28:56 AM12/16/15
to Poorna Chandra, Nitin Motgi, CDAP User
Hi Nitin / Poorna,

@Nitin, Sure will document the same and also let you know about Use case/

@Poorna, I didn't change the cdap namespace. I changed the kafka zooper's namespace. I added kafka.zookeeper.namespace property in cdap-site.xml

Regards,
Atul
--
Atul S.
Reply all
Reply to author
Forward
0 new messages