Hello,
I am confused as to the Genie dependency on Hadoop since there are several properties in the genie.properties which refer to hadoop:
com.netflix.genie.server.hadoop.home
com.netflix.genie.server.hadoop.s3cp.timeout
com.netflix.genie.server.job.manager.yarn.command.cp
I want to create an AMI with Genie preinstalled but independent of the hadoop distribution. My questions are:
- Do i need hadoop client in the ami or i can use cp command to be for example aws cli and decouple Genie from hadoop client?
- If I install Hadoop as application are there any caveats? Specifically what is the best way to set the classpath and extra libraries? application jars?
- If I use multiple versions of Hadoop (different EMR versions) is there any caveat if Hadoop client is installed with applicatoin?
- I want to use instance IAM role for authenticating to AWS services (S3) what is the recommended way to implement this?
- If I copy from EMR /usr/share/aws/emr/ as part of application how can I add those to the classpath before the hadoop client in the application? For example to use EMRFS