I have been using Druid 0.8.1 on HDP 2.3 (with Namenode HA and a single Yarn resource manager)
But the same config is failing in a HDP cluster that is configured with Namenode HA and Yarn HA.
This is the message I see in the Indexer log:
2016-04-09T08:21:53,495 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
2016-04-09T08:22:28,917 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
2016-04-09T08:22:44,568 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
2016-04-09T08:23:19,884 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
2016-04-09T08:23:40,845 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
2016-04-09T08:23:40,848 WARN [task-runner-0] org.apache.hadoop.io.retry.RetryInvocationHandler - Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication. Not retrying because failovers (30) exceeded maximum allowed (30)
java.net.ConnectException: Call From druidnode1001.local/10.193.64.154 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor75.newInstance(Unknown Source) ~[?:?]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.7.0_79]
at java.lang.reflect.Constructor.newInstance(Constructor.java:526) ~[?:1.7.0_79]
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.ipc.Client.call(Client.java:1410) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.ipc.Client.call(Client.java:1359) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at com.sun.proxy.$Proxy194.getNewApplication(Unknown Source) ~[?:?]
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:167) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
And this is the bin/run-druid:
exec java `cat "$CONFDIR"/"$WHATAMI"/jvm.config | xargs` \
-Dhadoop.dfs.nameservices=nnha \
-Dhadoop.dfs.ha.namenodes.nnha=nn1,nn2 \
-Dhadoop.dfs.namenode.rpc-address.nnha.nn1=nn1001.local:8020 \
-Dhadoop.dfs.namenode.rpc-address.nnha.nn2=nn1002.local:8020 \
-Dhadoop.dfs.client.failover.proxy.provider.nnha=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider \
-Dyarn.resourcemanager.ha.enabled=true \ # tried with -Dhadoop.yarn.resourcemanager..
-Dyarn.resourcemanager.ha.rm-ids=rm1,rm2 \
-Dyarn.resourcemanager.hostname.rm1=nn1001.local \
-Dyarn.resourcemanager.hostname.rm2=nn1002.local \
-Dhdp.version=2.4.0.0-169 \
-cp "$CONFDIR"/_common:"$CONFDIR"/"$WHATAMI":`ls "$WHEREAMI"/../dist/mz/*.jar | xargs | tr ' ' ':` \
`cat "$CONFDIR"/$WHATAMI/main.config | xargs`
Based on the message "
0.0.0.0:8032 failed on connection exception" it seems the Indexer service is unable to resolve the logical name (rm1 & rm2).
All hadoop configs are consistent across all nodes, and I verified Yarn is working fine by running a test MR job from the command line.
Any suggestion, how this issue can be resolved ?
Thank you,
Bikrant