multiple hadoop client versions and EMR libs

66 views
Skip to first unread message

Jelez Raditchkov

unread,
Apr 19, 2016, 1:47:51 PM4/19/16
to genie
Hello, 

I am confused as to the Genie dependency on Hadoop since there are several properties in the genie.properties which refer to hadoop:
com.netflix.genie.server.hadoop.home
com.netflix.genie.server.hadoop.s3cp.timeout
com.netflix.genie.server.job.manager.yarn.command.cp

I want to create an AMI with Genie preinstalled but independent of the hadoop distribution. My questions are:
- Do i need hadoop client in the ami or i can use cp command to be for example aws cli and decouple Genie from hadoop client?
- If I install Hadoop as application are there any caveats? Specifically what is the best way to set the classpath and extra libraries? application jars?
- If I use multiple versions of Hadoop (different EMR versions) is there any caveat if Hadoop client is installed with applicatoin?
- I want to use instance IAM role for authenticating to AWS services (S3) what is the recommended way to implement this?
- If I copy from EMR /usr/share/aws/emr/ as part of application how can I add those to the classpath before the hadoop client in the application? For example to use EMRFS

Amit Sharma

unread,
Apr 21, 2016, 5:33:20 PM4/21/16
to Jelez Raditchkov, genie
Hi Jelez,

Genie 1 and 2 made a lot of assumptions about Hadoop being present on the box where the Genie server runs and hence you see those properties. Based on our understanding of your use-case it looks like Genie 3.0 will be much better equipped to handle it. I can try and provide answers to some of your questions, but some of these will require hacks/experimentation and we would rather not spend time on it with a new version so close to release. That being said, if its really critical for you to use Genie 2.0 we can help you out. What kind of timeline are you looking at ? Do you already have Genie running?

 - Do i need hadoop client in the ami or i can use cp command to be for example aws cli and decouple Genie from hadoop client?
* Yes its possible to use a different cp command. Internally as well we use a cp script which uses IAM roles to download our configurations from s3.

- If I install Hadoop as application are there any caveats? Specifically what is the best way to set the classpath and extra libraries? application jars?
* This is where the tight coupling between Genie2/ Hadoop come into play. The joblauncher script and the Job Manager classes set the environment variables to set the class paths. Namely HADOOP_HOME and HADOOP_CONF_DIR.  If you set up and application you might have to use the envProp file for the application to move around the files and set the class path.

- If I use multiple versions of Hadoop (different EMR versions) is there any caveat if Hadoop client is installed with applicatoin?
* If you get it working as an application, multiple versions should not be a problem. 

- I want to use instance IAM role for authenticating to AWS services (S3) what is the recommended way to implement this?
* I am not sure what is authenticating here. If its for file copying then see my answer for point 1 above. If its for jobs, it would depend on cluster/job configurations.

- If I copy from EMR /usr/share/aws/emr/ as part of application how can I add those to the classpath before the hadoop client in the application? For example to use EMRFS
* As i mentioned in my reply to point 2, use the envProp file and copy around files. Basically all the Hadoop files from local sources and s3 sources need to end up in a single directory and then the HADOOP_CONF_DIR and HADOOP_HOME env variables have to be set. You might also need to muck with the joblauncher.sh script to get some of this working.

Thanks,
Amit


--
You received this message because you are subscribed to the Google Groups "genie" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genieoss+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages