Hi all,
I have Camus running in EMR with the following properties file snippet:
etl.destination.path=/tmp/camus
etl.execution.base.path=/tmp/camus/exec
etl.execution.history.path=/tmp/camus/camus/exec/history
And then I'm doing a s3distcp to copy data from the EMR instances into s3.
This part is functioning well.
What I'd like to do is skip the part where Camus writes stuff to hdfs://tmp and go directly to s3.
I've tried messing with the properties file but can't seem to get it working. I've added:
fs.defaultFS=s3*://bucket
fs.s3*.awsAccessKeyId=key
fs.s3*.awsSecretAccessKey=secret
But I keep getting this error:
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for
mapreduce.framework.name and the correspond server addresses.
Perhaps caused by this:
2015-04-22 17:36:34,708 INFO org.apache.hadoop.mapreduce.Cluster (main): Failed to use org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: Error in instantiating YarnClient
Normally, I'd mess around with the hadoop conf files like core-site.xml, but since this is on EMR, that's a little cumbersome as I want to keep spinning up new instances.
My main questions are:
1. Is there something really obvious I am missing about configuring camus.properties to write to s3.
2. Is there some automagic way I can override core-site.xml and the like in EMR without sshing onto the box and changing it myself?
Cheers,
Kev