Running CLI hadoop-indexer on HDP 2.3.4 with HA enabled

Paul Algren

unread,

Aug 23, 2016, 9:06:42 AM8/23/16

to Druid User

I've been having various difficulties configuring the CLI hadoop-indexer to run on our HDP 2.3.4 with HA cluster.

First, I would ask, do you expect this will work? I've seen some indications via searches that it may not.

Second, if it's known to work, excellent, please share any details of importance.

FWIW, my issues? (quick summary)

The provided 2.3.0 jars are not sufficient to work with HA yarn. - constant fail over retries

Use of my HDP libraries causes various library mis-match issues depending upon how configured (classpath, hadoop-coordinates, build druid jars from source)

Nishant Bangarwa

unread,

Aug 23, 2016, 12:30:17 PM8/23/16

to Druid User

Hi Paul,

http://druid.io/docs/latest/operations/other-hadoop.html has the instructions for making druid work with custom versions of Hadoop.

Could you share more concrete details on the problem you are facing ? task logs and the stack trace for exception you are getting ?

FWIW, setting up proper classpath and hadoopCoordinates should work fine for getting druid work with HDP 2.3.4.

Cheers,

Nishant

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/8b98ea24-a195-4dc5-8c6e-2d77626d3a03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paul Algren

unread,

Aug 23, 2016, 12:55:46 PM8/23/16

to Druid User

Thank you for the prompt response... :)

I have used the "other-hadoop" page to formulate quite a few experiments with hopes of resolution.

I will start with the simplest and provide the tracebacks I'm seeing.

First off, I'm using druid-0.9.1.1.

In the druid-0.9.1.1 folder on an edge node in my cluster, I run:

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.3.4.0-3485 -classpath lib/*:/usr/hdp/2.3.4.0-3485/hadoop/conf io.druid.cli.Main index hadoop quickstart/ingest-indProd.json

...

2016-08-23T16:36:08,055 WARN [main] org.apache.hadoop.hdfs.BlockReaderLocal - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

2016-08-23T16:36:08,065 WARN [main] org.apache.hadoop.hdfs.BlockReaderLocal - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

2016-08-23T16:36:08,101 INFO [main] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2

2016-08-23T16:36:44,244 INFO [main] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1

2016-08-23T16:37:17,662 INFO [main] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2

2016-08-23T16:37:59,704 INFO [main] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1

...

Hangs here without launching any jobs. I'm assuming this is related to the HA configuration as specified in the config dir included in the classpath.

This, of course, includes HA yarn configuration, but I fear the included 2.3.0 hadoop dependencies do not support HA yarn.

BTW, I have also tried including my `hadoop classpath` on the classpath (also via hadoop dependencies)

In some cases, I can get the jobs to launch and have run into the jackson version issue described under the CDH section in the MR job application logs on the cluster.

To work around the jackson issue, as suggested, I have deployed the following to my config.json:

"tuningConfig" : {

...

"jobProperties" : {

"mapreduce.job.user.classpath.first": "true"

}

Then I see (in the MR application log):

2016-08-23T11:51:31,731 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster - Error starting MRAppMaster
java.lang.IllegalArgumentException: Invalid ContainerId: container_e3062_1471543406898_6795_01_000001
	at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182) ~[hadoop-yarn-common-2.3.0.jar:?]
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1355) [hadoop-mapreduce-client-app-2.3.0.jar:?]
Caused by: java.lang.NumberFormatException: For input string: "e3062"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_73]
	at java.lang.Long.parseLong(Long.java:589) ~[?:1.8.0_73]
	at java.lang.Long.parseLong(Long.java:631) ~[?:1.8.0_73]
	at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137) ~[hadoop-yarn-common-2.3.0.jar:?]
	at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177) ~[hadoop-yarn-common-2.3.0.jar:?]
	... 1 more
2016-08-23T11:51:31,826 INFO [main] org.apache.hadoop.util.ExitUtil - Exiting with status 1

This I interpret as use of the wrong, out of date, jar on the cluster.

Any thoughts?

Paul

Jonathan Wei

unread,

Aug 23, 2016, 7:52:44 PM8/23/16

to druid...@googlegroups.com

Hi Paul,

The container ID issue there seems to be from hadoop 2.3.0 being unable to parse a newer container ID format, can you try:

1. pulling Hadoop 2.7.1 dependencies:

http://druid.io/docs/latest/operations/pull-deps.html

2. specify hadoop 2.7.1 in the hadoopDependencies of the ingestion task:

http://druid.io/docs/latest/operations/other-hadoop.html

e.g.

"hadoopDependencyCoordinates": ["org.apache.hadoop:hadoop-client:2.7.1"]

- Jon

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/0a6f3f0b-0710-4a9e-9fba-51edd1125d4a%40googlegroups.com.

Paul Algren

unread,

Aug 24, 2016, 9:28:23 AM8/24/16

to Druid User

Perfect. Worked for me. Thank you for your quick and kind response.

Reply all

Reply to author

Forward