CDAP could not start

289 views
Skip to first unread message

Teik Hooi Beh

unread,
Sep 20, 2016, 4:42:13 PM9/20/16
to CDAP User
Hi,

I have been trying to install CDAP with my MapR cluster and have been getting the following -

1. CDAP version 3.4.3
2. MapR 5.1 (without Stream Client)


SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cdap/master/lib/ch.qos.logback.logback-classic-1.0.9.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
2016-09-21 08:23:28,524 - INFO  [main:c.c.c.d.r.m.MasterServiceMain@156] - Starting MasterServiceMain
2016-09-21 08:23:30,837 - WARN  [main:o.a.h.u.NativeCodeLoader@62] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-09-21 08:23:33,152 - INFO  [main:c.c.c.c.g.LocationRuntimeModule$HDFSLocationModule@104] - HDFS namespace is /cdap
2016-09-21 08:23:33,414 - INFO  [main:c.c.c.s.TokenSecureStoreUpdater@209] - Setting token renewal time to: 86100000 ms
2016-09-21 08:23:33,654 - INFO  [main:c.c.c.e.s.ExploreServiceUtils@171] - Client Hive version: 1.2.0-mapr-1608
2016-09-21 08:23:33,656 - INFO  [main:c.c.c.c.i.URLConnections@36] - Turning off default caching in URLConnection
2016-09-21 08:23:34,082 INFO  [main-SendThread(node2.mesoslab.my:5181)] zookeeper.Login: successfully logged in.
2016-09-21 08:23:39,086 - INFO  [main:c.c.c.d.u.h.HBaseTableUtil@167] - Table created 'TableId{namespace=namespace:system, tableName=configuration}'
2016-09-21 08:23:39,087 - INFO  [main:c.c.c.d.u.h.ConfigurationTable@85] - Writing new config row with key DEFAULT
2016-09-21 08:23:39,203 - INFO  [main:c.c.c.d.u.h.ConfigurationTable@95] - Deleting any configuration from 1474403019086 or before
LOGBACK: No context given for co.cask.cdap.logging.appender.kafka.KafkaLogAppender[KafkaLogAppender]
2016-09-21 08:23:40,126 INFO  [main] utils.VerifiableProperties: Verifying properties
2016-09-21 08:23:40,205 INFO  [main] utils.VerifiableProperties: Property key.serializer.class is overridden to kafka.serializer.StringEncoder
2016-09-21 08:23:40,206 WARN  [main] utils.VerifiableProperties: Property log.publish.num.partitions is not valid
2016-09-21 08:23:40,206 INFO  [main] utils.VerifiableProperties: Property metadata.broker.list is overridden to node0.mesoslab.my:9092/cdap
2016-09-21 08:23:40,207 INFO  [main] utils.VerifiableProperties: Property partitioner.class is overridden to co.cask.cdap.logging.appender.kafka.StringPartitioner
2016-09-21 08:23:40,207 INFO  [main] utils.VerifiableProperties: Property producer.type is overridden to async
2016-09-21 08:23:40,207 INFO  [main] utils.VerifiableProperties: Property queue.buffering.max.ms is overridden to 1000
2016-09-21 08:23:40,207 INFO  [main] utils.VerifiableProperties: Property request.required.acks is overridden to 1
2016-09-21 08:23:40,208 INFO  [main] utils.VerifiableProperties: Property serializer.class is overridden to kafka.serializer.DefaultEncoder
Exception in thread "main" com.google.inject.ProvisionException: Guice provision errors:

1) Error injecting constructor, java.lang.NullPointerException
  at co.cask.cdap.logging.appender.kafka.KafkaLogAppender.<init>(KafkaLogAppender.java:43)
  while locating co.cask.cdap.logging.appender.kafka.KafkaLogAppender
  while locating co.cask.cdap.logging.appender.LogAppender
    for parameter 0 at co.cask.cdap.logging.appender.LogAppenderInitializer.<init>(LogAppenderInitializer.java:41)
  while locating co.cask.cdap.logging.appender.LogAppenderInitializer

1 error
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1013)
at co.cask.cdap.data.runtime.main.MasterServiceMain.start(MasterServiceMain.java:219)
at co.cask.cdap.common.runtime.DaemonMain.doMain(DaemonMain.java:58)
at co.cask.cdap.data.runtime.main.MasterServiceMain.main(MasterServiceMain.java:157)
Caused by: java.lang.NullPointerException
at scala.Predef$.Integer2int(Predef.scala:392)
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:103)
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:102)
at kafka.producer.BrokerPartitionInfo.<init>(BrokerPartitionInfo.scala:32)
at kafka.producer.async.DefaultEventHandler.<init>(DefaultEventHandler.scala:41)
at kafka.producer.Producer.<init>(Producer.scala:60)
at kafka.javaapi.producer.Producer.<init>(Producer.scala:26)
at co.cask.cdap.logging.appender.kafka.SimpleKafkaProducer.<init>(SimpleKafkaProducer.java:56)
at co.cask.cdap.logging.appender.kafka.KafkaLogAppender.<init>(KafkaLogAppender.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.google.inject.internal.DefaultConstructionProxyFactory$2.newInstance(DefaultConstructionProxyFactory.java:85)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:54)
at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:84)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
... 4 more
2016-09-21 08:23:40,256 - INFO  [Thread-2:c.c.c.d.r.m.MasterServiceMain@235] - Stopping master.services


My cdap-site.xml -

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Copyright © 2014-2016 Cask Data, Inc.

  Licensed under the Apache License, Version 2.0 (the "License"); you may not
  use this file except in compliance with the License. You may obtain a copy of
  the License at


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
  License for the specific language governing permissions and limitations under
  the License.
  -->
<configuration>

  <!-- General Configuration -->

  <property>
    <name>hdfs.namespace</name>
    <value>/${root.namespace}</value>
    <description>
      Root directory for HDFS files written by CDAP
    </description>
  </property>
  
  <property>
    <name>hdfs.user</name>
    <value>cdap</value>
    <description>
      User name for accessing HDFS
    </description>
  </property>

  <property>
    <name>root.namespace</name>
    <value>cdap</value>
    <description>
      Root for this CDAP instance; used as the parent (or root) node for
      ZooKeeper, as the directory under which all CDAP data and metadata is
      stored in HDFS, and as the prefix for all HBase tables created by
      CDAP; must be composed of alphanumeric characters
    </description>
  </property>

  <property>
    <name>zookeeper.quorum</name>
    <description>
      ZooKeeper quorum string; specifies the ZooKeeper host:port; substitute the quorum
      (FQDN1:2181,FQDN2:2181,...) for the components shown here
    </description>
  </property>


  <!-- Applications Configuration -->

  <property>
    <name>app.bind.address</name>
    <value>0.0.0.0</value>
    <description>
      App Fabric service bind address
    </description>
  </property>


  <!-- Datasets Configuration -->

  <property>
    <name>data.tx.bind.address</name>
    <value>0.0.0.0</value>
    <description>
      Transaction service bind address
    </description>
  </property>


  <!-- Kafka Server Configuration -->

  <property>
    <name>kafka.server.default.replication.factor</name>
    <value>1</value>
    <description>
      CDAP Kafka replication factor; used to replicate Kafka messages across
      multiple machines to prevent data loss in the event of a hardware
      failure. The recommended setting is to run at least two CDAP Kafka servers.
      If you are running two Kafka servers, set this value to 2; otherwise,
      set it to the number of Kafka servers.
    </description>
  </property>
  
  <property>
    <name>kafka.server.log.dirs</name>
    <value>/tmp/kafka-logs</value>
    <description>
      CDAP Kafka service log storage directory
    </description>
  </property>

  <property>
    <name>kafka.seed.brokers</name>
    <description>
      Comma-separated list of CDAP Kafka service brokers; for distributed CDAP, 
      replace with list of FQDN:port brokers
    </description>
  </property>


  <!-- Metrics Configuration -->
  
  <property>
    <name>metrics.query.bind.address</name>
    <value>0.0.0.0</value>
    <description>
      Metrics Query service bind address
    </description>
  </property>


  <!-- Router Configuration -->

  <property>
    <name>router.bind.address</name>
    <value>0.0.0.0</value>
    <description>
      CDAP Router service bind address
    </description>
  </property>
  
  <property>
    <name>router.bind.port</name>
    <value>10000</value>
    <description>
      CDAP Router service bind port
    </description>
  </property>

  <property>
    <name>router.server.address</name>
    <value>127.0.0.1</value>
    <description>
      CDAP Router service address to which CDAP UI connects
    </description>
  </property>

  <property>
    <name>router.server.port</name>
    <value>${router.bind.port}</value>
    <description>
      CDAP Router service port to which CDAP UI connects
    </description>
  </property>


  <!-- UI Configuration -->
  
  <property>
    <name>dashboard.bind.port</name>
    <value>9999</value>
    <description>
      CDAP UI bind port
    </description>
  </property>

  <property>
   <name>master.collect.containers.log</name>
   <value>false</value>
  </property>

  <property>
   <name>master.collect.app.containers.log.level</name>
   <value>OFF</value>
 </property>

</configuration>




Bhooshan Mogal

unread,
Sep 20, 2016, 4:52:54 PM9/20/16
to cdap...@googlegroups.com
Hi Teik, 

Could you please remove the trailing /${root.namespace} from the "kafka.seed.brokers" property and try again?

kafka.seed.brokers is a comma-separated list in the format <Hostname1>:<port1>,<Hostname2>:<port2>,<Hostname3>:<port3> 


Thanks,
Bhooshan

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/29f72694-5cae-491c-8d39-b1290ae09df8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Teik Hooi Beh

unread,
Sep 20, 2016, 7:26:54 PM9/20/16
to CDAP User
Thanks, that got through and a new error appear as below -

2016-09-21 10:32:25,777 - INFO  [leader-election-election-master.services:c.c.c.d.r.m.MasterServiceMain$2@424] - Starting service in master: AppFabricServer [NEW]
2016-09-21 10:32:25,780 - INFO  [DatasetService:c.c.c.d.d.d.s.DatasetService@123] - Starting DatasetService...
2016-09-21 10:32:25,783 - INFO  [DistributedSchedulerService STARTING:c.c.c.i.a.r.s.DistributedSchedulerService@73] - Starting scheduler.
2016-09-21 10:32:25,784 - INFO  [DistributedTransactionSystemClientService STARTING:c.c.c.d.t.DistributedTransactionSystemClientService@64] - Starting TransactionSystemClientService.
2016-09-21 10:32:25,784 - INFO  [DistributedTransactionSystemClientService STARTING:c.c.c.d.t.DistributedTransactionSystemClientService@64] - Starting TransactionSystemClientService.
2016-09-21 10:32:25,785 - INFO  [ApplicationLifecycleService STARTING:c.c.c.i.a.s.ApplicationLifecycleService@156] - Starting ApplicationLifecycleService
2016-09-21 10:32:25,789 - INFO  [ProgramLifecycleService STARTING:c.c.c.i.a.s.ProgramLifecycleService@129] - Starting ProgramLifecycleService
2016-09-21 10:32:25,849 - WARN  [reporter-scheduler:c.c.c.i.a.r.d.DistributedProgramRuntimeService$ClusterResourceReporter@513] - Exception getting cluster memory from 
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0_102]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[na:1.8.0_102]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) ~[na:1.8.0_102]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[na:1.8.0_102]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.8.0_102]
at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_102]
at java.net.Socket.connect(Socket.java:538) ~[na:1.8.0_102]
at sun.net.NetworkClient.doConnect(NetworkClient.java:180) ~[na:1.8.0_102]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) ~[na:1.8.0_102]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) ~[na:1.8.0_102]
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) ~[na:1.8.0_102]
at sun.net.www.http.HttpClient.New(HttpClient.java:308) ~[na:1.8.0_102]
at sun.net.www.http.HttpClient.New(HttpClient.java:326) ~[na:1.8.0_102]
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) ~[na:1.8.0_102]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) ~[na:1.8.0_102]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) ~[na:1.8.0_102]
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) ~[na:1.8.0_102]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1513) ~[na:1.8.0_102]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) ~[na:1.8.0_102]
at co.cask.cdap.internal.app.runtime.distributed.DistributedProgramRuntimeService$ClusterResourceReporter.reportClusterMemory(DistributedProgramRuntimeService.java:492) [co.cask.cdap.cdap-app-fabric-3.4.3.jar:na]
at co.cask.cdap.internal.app.runtime.distributed.DistributedProgramRuntimeService$ClusterResourceReporter.reportResources(DistributedProgramRuntimeService.java:472) [co.cask.cdap.cdap-app-fabric-3.4.3.jar:na]
at co.cask.cdap.internal.app.runtime.AbstractResourceReporter.runOneIteration(AbstractResourceReporter.java:72) [co.cask.cdap.cdap-app-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractScheduledService$1$1.run(AbstractScheduledService.java:170) [com.google.guava.guava-13.0.1.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_102]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_102]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_102]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_102]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_102]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_102]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
2016-09-21 10:32:25,878 - WARN  [reporter-scheduler:c.c.c.i.a.r.d.DistributedProgramRuntimeService$ClusterResourceReporter@478] - unable to get resource manager metrics, cluster memory metrics will be unavailable
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.

Bhooshan Mogal

unread,
Sep 20, 2016, 8:00:49 PM9/20/16
to cdap...@googlegroups.com
Hi Teik,

A couple of questions:

1. What is the yarn.resourcemanager.webapp.address set to in yarn-site.xml? Is the YARN webapp running correctly at that address?
2. This is warning, that CDAP is unable to read cluster metrics from YARN. It should be fixed, but is CDAP running even despite this warning?


Thanks,
Bhooshan

To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.

Teik Hooi Beh

unread,
Sep 20, 2016, 8:33:58 PM9/20/16
to CDAP User
1. YARN webapp is running correctly on the address, in my case node1.
2. So far master service could not start despite running over 1 minute, so I stop the service.

Teik Hooi Beh

unread,
Sep 20, 2016, 9:02:19 PM9/20/16
to CDAP User
Letting it to run longer got seems to get warning on DatasetService not available and spill out errors as below -

Exception in thread "AppFabricServer STARTING" 2016-09-21 12:52:02,808 - ERROR [DistributedTransactionSystemClientService STARTING:c.c.c.c.s.UncaughtExceptionIdleService$1@34] - Uncaught exception from Thread[DistributedTransactionSystemClientService STARTING,5,main]
java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
Caused by: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startUp(DistributedTransactionSystemClientService.java:78) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43) ~[com.google.guava.guava-13.0.1.jar:na]
... 1 common frames omitted
java.lang.RuntimeException: java.util.concurrent.ExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.base.Throwables.propagate(Throwables.java:160)
2016-09-21 12:52:02,810 - ERROR [KafkaNotificationService STARTING:c.c.c.c.s.UncaughtExceptionIdleService$1@34] - Uncaught exception from Thread[KafkaNotificationService STARTING,5,main]
com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.notifications.service.AbstractNotificationService.startUp(AbstractNotificationService.java:69) ~[co.cask.cdap.cdap-notifications-3.4.3.jar:na]
at co.cask.cdap.notifications.service.kafka.KafkaNotificationService.startUp(KafkaNotificationService.java:91) ~[co.cask.cdap.cdap-notifications-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43) ~[com.google.guava.guava-13.0.1.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
Caused by: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startUp(DistributedTransactionSystemClientService.java:78) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
... 2 common frames omitted
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:47)
at java.lang.Thread.run(Thread.java:745)
2016-09-21 12:52:02,810 - INFO  [leader-election-election-master.services:c.c.c.d.r.m.MasterServiceMain$2@455] - Stopping master twill application
Caused by: java.util.concurrent.ExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at co.cask.cdap.internal.app.services.AppFabricServer.startUp(AppFabricServer.java:128)
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)
... 1 more
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.2016-09-21 12:52:02,808 - ERROR [DistributedTransactionSystemClientService STARTING:c.c.c.c.s.UncaughtExceptionIdleService$1@34] - Uncaught exception from Thread[DistributedTransactionSystemClientService STARTING,5,main]
java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
Caused by: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startUp(DistributedTransactionSystemClientService.java:78) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43) ~[com.google.guava.guava-13.0.1.jar:na]
... 1 common frames omitted

at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015)
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001)
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220)
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106)
at co.cask.cdap.notifications.service.AbstractNotificationService.startUp(AbstractNotificationService.java:69)
at co.cask.cdap.notifications.service.kafka.KafkaNotificationService.startUp(KafkaNotificationService.java:91)
... 2 more
Caused by: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startUp(DistributedTransactionSystemClientService.java:78)
... 2 more
2016-09-21 12:52:02,808 - ERROR [MDSDatasetsRegistry STARTING:c.c.c.c.s.UncaughtExceptionIdleService$1@34] - Uncaught exception from Thread[MDSDatasetsRegistry STARTING,5,main]
com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.data2.dataset2.tx.TransactionalDatasetRegistry.startUp(TransactionalDatasetRegistry.java:56) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at co.cask.cdap.data2.datafabric.dataset.service.mds.MDSDatasetsRegistry.startUp(MDSDatasetsRegistry.java:49) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43) ~[com.google.guava.guava-13.0.1.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
Caused by: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startUp(DistributedTransactionSystemClientService.java:78) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
... 2 common frames omitted
2016-09-21 12:52:02,810 - ERROR [DatasetService:c.c.c.c.s.UncaughtExceptionIdleService$1@34] - Uncaught exception from Thread[DatasetService,5,main]
com.google.common.util.concurrent.UncheckedExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.data2.datafabric.dataset.service.DatasetService.startUp(DatasetService.java:125) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.data2.dataset2.tx.TransactionalDatasetRegistry.startUp(TransactionalDatasetRegistry.java:56) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at co.cask.cdap.data2.datafabric.dataset.service.mds.MDSDatasetsRegistry.startUp(MDSDatasetsRegistry.java:49) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43) ~[com.google.guava.guava-13.0.1.jar:na]
... 1 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startUp(DistributedTransactionSystemClientService.java:78) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
... 2 common frames omitted
2016-09-21 12:52:03,110 - INFO  [DatasetService:c.c.c.d.d.d.s.DatasetService@123] - Starting DatasetService...
2016-09-21 12:52:03,113 - ERROR [DatasetService:c.c.c.c.s.UncaughtExceptionIdleService$1@34] - Uncaught exception from Thread[DatasetService,5,main]
com.google.common.util.concurrent.UncheckedExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.data2.datafabric.dataset.service.DatasetService.startUp(DatasetService.java:125) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.data2.dataset2.tx.TransactionalDatasetRegistry.startUp(TransactionalDatasetRegistry.java:56) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at co.cask.cdap.data2.datafabric.dataset.service.mds.MDSDatasetsRegistry.startUp(MDSDatasetsRegistry.java:49) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43) ~[com.google.guava.guava-13.0.1.jar:na]
... 1 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startUp(DistributedTransactionSystemClientService.java:78) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
... 2 common frames omitted
2016-09-21 12:52:03,586 - INFO  [DatasetService:c.c.c.d.d.d.s.DatasetService@123] - Starting DatasetService...
2016-09-21 12:52:03,588 - ERROR [DatasetService:c.c.c.c.s.UncaughtExceptionIdleService$1@34] - Uncaught exception from Thread[DatasetService,5,main]
com.google.common.util.concurrent.UncheckedExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.data2.datafabric.dataset.service.DatasetService.startUp(DatasetService.java:125) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.data2.dataset2.tx.TransactionalDatasetRegistry.startUp(TransactionalDatasetRegistry.java:56) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at co.cask.cdap.data2.datafabric.dataset.service.mds.MDSDatasetsRegistry.startUp(MDSDatasetsRegistry.java:49) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43) ~[com.google.guava.guava-13.0.1.jar:na]
... 1 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startUp(DistributedTransactionSystemClientService.java:78) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
... 2 common frames omitted
2016-09-21 12:52:04,469 - INFO  [DatasetService:c.c.c.d.d.d.s.DatasetService@123] - Starting DatasetService...
2016-09-21 12:52:04,471 - ERROR [DatasetService:c.c.c.c.s.UncaughtExceptionIdleService$1@34] - Uncaught exception from Thread[DatasetService,5,main]
com.google.common.util.concurrent.UncheckedExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.data2.datafabric.dataset.service.DatasetService.startUp(DatasetService.java:125) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.data2.dataset2.tx.TransactionalDatasetRegistry.startUp(TransactionalDatasetRegistry.java:56) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at co.cask.cdap.data2.datafabric.dataset.service.mds.MDSDatasetsRegistry.startUp(MDSDatasetsRegistry.java:49) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43) ~[com.google.guava.guava-13.0.1.jar:na]
... 1 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startUp(DistributedTransactionSystemClientService.java:78) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
... 2 common frames omitted
2016-09-21 12:52:06,138 - INFO  [DatasetService:c.c.c.d.d.d.s.DatasetService@123] - Starting DatasetService...
2016-09-21 12:52:06,140 - ERROR [DatasetService:c.c.c.c.s.UncaughtExceptionIdleService$1@34] - Uncaught exception from Thread[DatasetService,5,main]
com.google.common.util.concurrent.UncheckedExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.data2.datafabric.dataset.service.DatasetService.startUp(DatasetService.java:125) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) ~[com.google.guava.guava-13.0.1.jar:na]
at co.cask.cdap.data2.dataset2.tx.TransactionalDatasetRegistry.startUp(TransactionalDatasetRegistry.java:56) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at co.cask.cdap.data2.datafabric.dataset.service.mds.MDSDatasetsRegistry.startUp(MDSDatasetsRegistry.java:49) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43) ~[com.google.guava.guava-13.0.1.jar:na]
... 1 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startUp(DistributedTransactionSystemClientService.java:78) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
... 2 common frames omitted
2016-09-21 12:52:07,807 - INFO  [leader-election-election-master.services:c.c.c.d.r.m.MasterServiceMain$2@466] - Stopping service in master: EndureService{DatasetService}
2016-09-21 12:52:07,807 - WARN  [Endure-Service-DatasetService:c.c.c.c.s.RetryOnStartFailureService$1@87] - Stop requested for service DatasetService during start failure retry.
2016-09-21 12:52:07,993 - WARN  [leader-election-election-master.services:o.a.t.i.z.LeaderElection@231] - Exception thrown when calling leader() method. Withdraw from the leader election process.
java.lang.RuntimeException: Unable to start service AppFabricServer [FAILED]: java.util.concurrent.ExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.TimeoutException: Timed out after 600 seconds while waiting to discover the transaction service. Check the logs for the service to see what went wrong.
at co.cask.cdap.data.runtime.main.MasterServiceMain$2.leader(MasterServiceMain.java:432) ~[co.cask.cdap.cdap-master-3.4.3.jar:na]
at org.apache.twill.internal.zookeeper.LeaderElection.becomeLeader(LeaderElection.java:229) [org.apache.twill.twill-zookeeper-0.7.0-incubating.jar:0.7.0-incubating]
at org.apache.twill.internal.zookeeper.LeaderElection.access$1800(LeaderElection.java:53) [org.apache.twill.twill-zookeeper-0.7.0-incubating.jar:0.7.0-incubating]
at org.apache.twill.internal.zookeeper.LeaderElection$5.onSuccess(LeaderElection.java:207) [org.apache.twill.twill-zookeeper-0.7.0-incubating.jar:0.7.0-incubating]
at org.apache.twill.internal.zookeeper.LeaderElection$5.onSuccess(LeaderElection.java:186) [org.apache.twill.twill-zookeeper-0.7.0-incubating.jar:0.7.0-incubating]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:799) [com.google.guava.guava-13.0.1.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_102]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_102]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
2016-09-21 12:52:07,996 - INFO  [leader-election-election-master.services:c.c.c.d.r.m.MasterServiceMain$2@440] - Became follower for master services
2016-09-21 12:52:08,024 - ERROR [leader-election-election-master.services:c.c.c.d.r.m.MasterServiceMain$1@185] - CDAP Master failed to start
2016-09-21 12:52:08,028 - INFO  [Thread-2:c.c.c.d.r.m.MasterServiceMain@235] - Stopping master.services
2016-09-21 12:52:08,035 INFO  [kafka-publisher] producer.Producer: Shutting down producer
2016-09-21 12:52:08,048 INFO  [kafka-publisher] producer.ProducerPool: Closing all sync producers
2016-09-21 12:52:08,053 INFO  [kafka-publisher] producer.Producer: Producer shutdown completed in 17 ms

Ali Anwar

unread,
Sep 20, 2016, 10:08:17 PM9/20/16
to cdap...@googlegroups.com
Hey Teik.

The Transaction server failed to start within the expected time (10 minutes). Can you attach the full master logs, so that we can explore the reason for this.

Regards,

Ali Anwar

To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.

Teik Hooi Beh

unread,
Sep 20, 2016, 10:18:00 PM9/20/16
to CDAP User

Hi Ali,


Attached the logs for your attention. Please look at the bottom of the files (between 12.30 to end of file), that's the latest.

Thanks

Regards
Beh
...
cdap-logs.tar.gz

Ali Anwar

unread,
Sep 21, 2016, 1:43:36 AM9/21/16
to cdap...@googlegroups.com
There's no indication in the master logs as to why the transaction service did not start up successfully.
Can you check the Yarn application launched by the CDAP master? It is named 'master.services'. The transaction service runs as a container of that app, and it would be good to check the logs of that container in YARN logs.

Regards,

Ali Anwar

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.

Teik Hooi Beh

unread,
Sep 21, 2016, 2:22:59 AM9/21/16
to cdap...@googlegroups.com
Hi Ali,

Anything in particular that I should look out for?

Regards
Beh

--
You received this message because you are subscribed to a topic in the Google Groups "CDAP User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cdap-user/cWfmtvxTMNM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cdap-user+unsubscribe@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.

Ali Anwar

unread,
Sep 21, 2016, 1:10:02 PM9/21/16
to cdap...@googlegroups.com
When the 'master.services' YARN application is starting up or running in YARN, look at the log files of the container named 'transaction.service'. I suspect it has some errors starting up, but I don't know what or why.

Teik Hooi Beh

unread,
Sep 21, 2016, 7:40:15 PM9/21/16
to CDAP User
Hi Ali,

I will take a look at the YARN logs as advice by you.

I did notice this - No partition metadata for topic logs.user-v2 due to kafka.common.LeaderNotAvailableException}] for topic [logs.user-v2]: class kafka.common.LeaderNotAvailableException this as well in the cdap-master log that seems to be trying to look up the kafka queue. My feel is the kafka is causing the whole issue. My question is, should I always use kafka from CDAP or MapR? I notice that there are success running CDAP on MapR in the user list but only in MapR sandbox, as no changes on the kafka-server was made on cdap-site.xml. Your thoughts?

Regards
Beh
...

Ali Anwar

unread,
Sep 21, 2016, 7:44:22 PM9/21/16
to cdap...@googlegroups.com
Hi Beh.

I don't think that kafka errors can cause transaction service to fail.
Also, we test with MapR frequently, on a 3-node cluster, so its not restricted just to a sandbox cluster.
Do attach the transaction server log files (program.log, stderr, stdout, etc), and we can help debug why it doesn't start up.

Regards,

Ali Anwar

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.

Teik Hooi Beh

unread,
Sep 21, 2016, 7:53:36 PM9/21/16
to cdap...@googlegroups.com
Hi Ali,

Quick question, what's the version for both CDAP/MapR that would give a smoother deployment. 

Thanks

Regards
Beh

--
You received this message because you are subscribed to a topic in the Google Groups "CDAP User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cdap-user/cWfmtvxTMNM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cdap-user+unsubscribe@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.

Teik Hooi Beh

unread,
Sep 21, 2016, 7:54:20 PM9/21/16
to cdap...@googlegroups.com
And I am working on Centos7.2, for my 3 nodes I hope that's not an issue.

Ali Anwar

unread,
Sep 21, 2016, 9:04:56 PM9/21/16
to cdap...@googlegroups.com
We test against MapR 4.1, 5.0, and 5.1.
I suspect there's a configuration issue on your end, but it's difficult to determine, without seeing the transaction server logs.

Regards,

Ali Anwar

Teik Hooi Beh

unread,
Sep 21, 2016, 9:15:04 PM9/21/16
to cdap...@googlegroups.com
Agreed it is the configuration issue. Tailing the logs as we speak...

BTW, should the hdfs.user in cdap-site.xml be set to cdap or kept as yarn?

Regards
Beh

Teik Hooi Beh

unread,
Sep 21, 2016, 10:09:44 PM9/21/16
to cdap...@googlegroups.com
Any idea on the below even my /tmp is set readable and writable to the world...

main : user is cdap
main : requested yarn user is cdap
Can't create directory /tmp/hadoop-mapr/nm-local-dir/usercache/cdap/appcache/application_1474447860665_0020 - Permission denied
Did not create any app directories

Regards
Beh

Ali Anwar

unread,
Sep 21, 2016, 11:05:25 PM9/21/16
to cdap...@googlegroups.com
Hi Beh.

Even though the /tmp directory is readable and writable to the world, the directories inside it can have more restrictive permissions.
Can you check the permissions of the /tmp/hadoop-mapr/nm-local-dir/usercache/cdap directory as well as its parent?

Regards,

Ali Anwar

Teik Hooi Beh

unread,
Sep 21, 2016, 11:32:07 PM9/21/16
to CDAP User
Looks like all the issues are cdap's permission related. That includes the /tmp/hadoop-mapr....

And also cdap uid & gid was not consistence across the cluster, that's why it work easier on sandbox. So I am able to see the cdap console now but the below still remain -

WARN  [reporter-scheduler:c.c.c.i.a.r.d.DistributedProgramRuntimeService$ClusterResourceReporter@513] - Exception getting cluster memory from 
java.net.ConnectException: Connection refused

Thanks heaps.

Regards
Beh

Ali Anwar

unread,
Sep 21, 2016, 11:33:58 PM9/21/16
to cdap...@googlegroups.com
Hey Teik.

Can you describe what exactly the permission issues were? How did they come to be?

Also, what does it mean that the uid and gid was not consistent? Does it need to be?

Lastly, the WARN you pasted is not fatal, but you will not be able to see some cluster metrics.

Regards,

Ali Anwar

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.

Teik Hooi Beh

unread,
Sep 21, 2016, 11:46:15 PM9/21/16
to CDAP User
What need to be noted as far as cdap user is -

1. I don't recall creating the cdap user manually, so I guess it was created automatically during the startup process (correct me if I am wrong), and the uid/gid was different in each node. On the node that I run the startup services was ok but not on the other nodes. And because of that, when I look at hadoop fs on other nodes, instead of seeing cdap as the owner, I saw the uid/gid. That did not work well when master.service runs processes on other nodes.
2. /tmp/hadoop-mapr has cdap as the owner but not the lower folders (seems all belongs to mapr). And even if I change to world readable & writable, somehow nm-local-dir did not change. So I have to change owner for anything below /tmp/hadoop-mapr to cdap.

Hopes the above make sense.

I will have to dig deeper on the WARNing message as not able to get the metrics don't give any visibility on CDAP console. Maybe you could point me to somewhere to start looking.

Regards
Beh
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.

Teik Hooi Beh

unread,
Sep 21, 2016, 11:50:11 PM9/21/16
to CDAP User
Still some service are not ok as shown below -



On Wednesday, 21 September 2016 08:42:13 UTC+12, Teik Hooi Beh wrote:

ali

unread,
Sep 26, 2016, 4:51:43 PM9/26/16
to CDAP User
Hi.

Based upon a different ticket, you were able to get past this issue. How were you able to do so?

Regards,

Ali Anwar

Teik Hooi Beh

unread,
Sep 26, 2016, 5:35:57 PM9/26/16
to cdap...@googlegroups.com
Probably my mistake. Getting too adventurous.

I broke the whole setup up after updating MapR to 5.1, logically it shouldn't but that was the only thing I changed. Now I am moving back to MapR 5.0 and see what happens.

After upgrading, it have given me a weekend of headache -


Thanks

Regards
Beh

--
You received this message because you are subscribed to a topic in the Google Groups "CDAP User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cdap-user/cWfmtvxTMNM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cdap-user+unsubscribe@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages