Druid batch indexing not working with Yarn ha

620 views
Skip to first unread message

BiksN

unread,
Apr 9, 2016, 5:15:29 AM4/9/16
to Druid User
I have been using Druid 0.8.1 on HDP 2.3 (with Namenode HA and a single Yarn resource manager)

But the same config is failing in a HDP cluster that is configured with Namenode HA and Yarn HA.

This is the message I see in the Indexer log:

2016-04-09T08:21:53,495 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
2016-04-09T08:22:28,917 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
2016-04-09T08:22:44,568 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
2016-04-09T08:23:19,884 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
2016-04-09T08:23:40,845 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
2016-04-09T08:23:40,848 WARN [task-runner-0] org.apache.hadoop.io.retry.RetryInvocationHandler - Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication. Not retrying because failovers (30) exceeded maximum allowed (30)
java.net.ConnectException: Call From druidnode1001.local/10.193.64.154 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor75.newInstance(Unknown Source) ~[?:?]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.7.0_79]
at java.lang.reflect.Constructor.newInstance(Constructor.java:526) ~[?:1.7.0_79]
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.ipc.Client.call(Client.java:1410) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.ipc.Client.call(Client.java:1359) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at com.sun.proxy.$Proxy194.getNewApplication(Unknown Source) ~[?:?]
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:167) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]

And this is the bin/run-druid:
exec java `cat "$CONFDIR"/"$WHATAMI"/jvm.config | xargs` \
  -Dhadoop.dfs.nameservices=nnha \
  -Dhadoop.dfs.ha.namenodes.nnha=nn1,nn2 \
  -Dhadoop.dfs.namenode.rpc-address.nnha.nn1=nn1001.local:8020 \
  -Dhadoop.dfs.namenode.rpc-address.nnha.nn2=nn1002.local:8020 \
  -Dhadoop.dfs.client.failover.proxy.provider.nnha=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider \
  -Dyarn.resourcemanager.ha.enabled=true \ # tried with -Dhadoop.yarn.resourcemanager..
  -Dyarn.resourcemanager.ha.rm-ids=rm1,rm2 \
  -Dyarn.resourcemanager.hostname.rm1=nn1001.local \
  -Dyarn.resourcemanager.hostname.rm2=nn1002.local \
  -Dhdp.version=2.4.0.0-169 \
  -cp "$CONFDIR"/_common:"$CONFDIR"/"$WHATAMI":`ls "$WHEREAMI"/../dist/mz/*.jar | xargs | tr ' ' ':` \
  `cat "$CONFDIR"/$WHATAMI/main.config | xargs`
  
Based on the message "0.0.0.0:8032 failed on connection exception" it seems the Indexer service is unable to resolve the logical name (rm1 & rm2). 
All hadoop configs are consistent across all nodes, and I verified Yarn is working fine by running a test MR job from the command line.

Any suggestion, how this issue can be resolved ?

Thank you,
Bikrant

Fangjin Yang

unread,
Apr 15, 2016, 6:38:27 PM4/15/16
to Druid User
We have no experience in running Druid with Yarn. If you manage to get things running though, please share the details with us so we can update the docs.

Fokko Driesprong

unread,
Sep 19, 2016, 4:36:34 PM9/19/16
to Druid User
Did you find a solution? Or did you get it working at least? Please let me know.

Op zaterdag 9 april 2016 11:15:29 UTC+2 schreef BiksN:

rohit kochar

unread,
Sep 20, 2016, 2:08:23 AM9/20/16
to Druid User
In my opinion this is definitely an issue with missing Hadoop config file in your setup.
I have previously successfully used yarn for running the  batch ingestions.
You can login to one of the yarn machines and list all the hadoop config files (generally found at /etc/hadoop/*) and verify that these files are present in the Hadoop classpath  on the machine from where you are running the batch ingestion.
Reply all
Reply to author
Forward
0 new messages