NoNode for /master.services/discoverable? Tx server could not startup with CDH5.7

32 views
Skip to first unread message

xiang yuan

unread,
Dec 19, 2016, 10:38:39 PM12/19/16
to cdap...@googlegroups.com

ENV:

cdap 3.6.0 distribute

java 1.8.0_91

CDH 5.7

Vcores  32

Memory 64G


Here is the yarn job "master.services" log

Exception in thread "CompositeService STOPPING" java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /master.services/discoverable
	at com.google.common.base.Throwables.propagate(Throwables.java:160)
	at com.google.common.util.concurrent.AbstractIdleService$1$2.run(AbstractIdleService.java:61)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /master.services/discoverable
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294)
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:267)
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
	at org.apache.twill.internal.appmaster.ApplicationMasterMain$AppMasterTwillZKPathService.shutDown(ApplicationMasterMain.java:280)
	at com.google.common.util.concurrent.AbstractIdleService$1$2.run(AbstractIdleService.java:57)
	... 1 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /master.services/discoverable
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.twill.internal.zookeeper.DefaultZKClientService$Callbacks$4.processResult(DefaultZKClientService.java:619)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:593)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.twill.launcher.TwillLauncher.main(TwillLauncher.java:89)
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /master.services/discoverable
	at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015)
	at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001)
	at com.google.common.util.concurrent.AbstractService.stopAndWait(AbstractService.java:225)
	at com.google.common.util.concurrent.AbstractIdleService.stopAndWait(AbstractIdleService.java:122)
	at org.apache.twill.internal.ServiceMain.doMain(ServiceMain.java:113)
	at org.apache.twill.internal.appmaster.ApplicationMasterMain.main(ApplicationMasterMain.java:100)
	... 5 more
Caused by: java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /master.services/discoverable
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294)
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:267)
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
	at org.apache.twill.internal.appmaster.ApplicationMasterMain$AppMasterTwillZKPathService.shutDown(ApplicationMasterMain.java:280)
	at com.google.common.util.concurrent.AbstractIdleService$1$2.run(AbstractIdleService.java:57)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /master.services/discoverable
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.twill.internal.zookeeper.DefaultZKClientService$Callbacks$4.processResult(DefaultZKClientService.java:619)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:593)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

I could not find such node name as ''/master.services/discoverable" in zookeeper. I installed another env with hdp2.5, the tx server worked well at least. But I also could not find that node in it's zookeeper.

master-cdap.log

Sreevatsan Raman

unread,
Dec 19, 2016, 11:25:36 PM12/19/16
to cdap...@googlegroups.com
Hi,

Cdap 3.6 doesn't support HDP 2.5, the support for HDP 2.5 will be available in our upcoming cdap 4 release which will be out this week.

For hadoop compatibility with cdap please refer to the docs here:
http://docs.cask.co/cdap/3.6.0/en/admin-manual/hadoop-compatibility.html

I am not sure how the other env with HDP 2.5 works, did you make any changes in that env?


Thanks,
Sree

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/CAM%3Dw8_yf-ty3GBeU3ZZB0r%2BSMMiuZEVLWGS%3DtVk1bACvs94HdA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ali Anwar

unread,
Dec 20, 2016, 12:03:40 AM12/20/16
to cdap...@googlegroups.com
Hi.

The master logs you attached only show logs for a timespan of 1-2 minutes.
The transaction service (and other system services) can take several minutes to come up. Can you allow them a couple of minutes (up to 5 minutes, usually), and then check upon their status? If the issue still persists, could you attach the complete log at that point?
Also, setting the log level is probably not necessary, as the issue should appear as WARN/ERROR, if the cdap services don't come up within a few minutes.

- Ali Anwar

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.
Message has been deleted

soari...@gmail.com

unread,
Dec 21, 2016, 3:51:36 AM12/21/16
to CDAP User
Thans for comment.
I create 2 envs .One hdp2.5 and one cdh5.7.
Yes, the tx server ran well with hdp2.5 about 2 minute after startup . But I caught a hive problem, and I saw cdap 3.6 doesn't support hdp2.5. So I create cdh5.7 env.
Unfortunately the tx server never start up succesfully after using cdh5.7. This master-log is very little because I strip out most of it. There is a lot of 'Unable to discover tx service' in it.

xiang yuan

unread,
Dec 21, 2016, 4:14:20 AM12/21/16
to cdap...@googlegroups.com
Here is the whole log for running 10 minutes.

--
You received this message because you are subscribed to a topic in the Google Groups "CDAP User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cdap-user/WClQOPWMs4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cdap-user+unsubscribe@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.
cdap-master-10min.log

Sreevatsan Raman

unread,
Dec 21, 2016, 6:42:59 AM12/21/16
to cdap...@googlegroups.com
Hey Xian Yuan,

From the logs, i see that the cluster doesn't have enough configured memory to run all of CDAP Master services in YARN

2016-12-21 16:57:06,614 - WARN  [main:c.c.c.m.s.YarnCheck@190] - Services require 4608 MB of memory but the cluster only has 2750 MB of memory available.

Please make sure you have enough cluster resources, based on your configuration, we need atleast ~5GB for master services and if you want to run any additional CDAP program, you will need more resources to run those. This can be achieved by adding more YARN node managers to the cluster. 

I have opened a JIRA to give proper error messages to the user and fail fast if there isn't enough cluster resources. https://issues.cask.co/browse/CDAP-7978

Thanks,
Sree



--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.

Stephen

unread,
Dec 21, 2016, 9:56:19 PM12/21/16
to CDAP User
My lack of knowledge waste my weekend. Thanks Sreevatsan Raman .
It's running well now.

在 2016年12月21日星期三 UTC+8下午7:42:59,Sreevatsan Raman写道:
Hey Xian Yuan,

From the logs, i see that the cluster doesn't have enough configured memory to run all of CDAP Master services in YARN

2016-12-21 16:57:06,614 - WARN  [main:c.c.c.m.s.YarnCheck@190] - Services require 4608 MB of memory but the cluster only has 2750 MB of memory available.

Please make sure you have enough cluster resources, based on your configuration, we need atleast ~5GB for master services and if you want to run any additional CDAP program, you will need more resources to run those. This can be achieved by adding more YARN node managers to the cluster. 

I have opened a JIRA to give proper error messages to the user and fail fast if there isn't enough cluster resources. https://issues.cask.co/browse/CDAP-7978

Thanks,
Sree


On Wed, Dec 21, 2016 at 1:14 AM, xiang yuan <soari...@gmail.com> wrote:
Here is the whole log for running 10 minutes.
2016-12-21 16:51 GMT+08:00 <soari...@gmail.com>:
Thans for comment.
I create 2 envs .One hdp2.5 and one cdh5.7.
Yes, the tx server ran well with hdp2.5 about 2 minute after startup . But I caught a hive problem, and I saw cdap 3.6 doesn't support hdp2.5. So I create cdh5.7 env.
Unfortunately the tx server never start up succesfully after using cdh5.7. This master-log is very little because I strip out most of it. There is a lot of 'Unable to discover tx service' in it.

--
You received this message because you are subscribed to a topic in the Google Groups "CDAP User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cdap-user/WClQOPWMs4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cdap-user+...@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/b5bf6c0d-069e-4d5e-b743-50a8dafc3ad6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages