On Nov 2, 2012, at 3:10 PM, Vishal Goklani wrote:
> Hi Roy,
>
> How did you get MRJOB to use Ubunutu for EMR? Would you mind posting your .mrjob.conf file?
I'm pretty much a novice at mrjob, so a lot of this was trial-and-error. I'm not actually sure our EMR hosts run ubuntu, just that the apt-get commands work.
$ cat mrjob.conf
runners:
emr:
ami_version: 2.2.1
aws_region: us-east-1
aws_access_key_id: XXXX
aws_secret_access_key: XXXX
bootstrap_cmds:
- sudo apt-get -y install python-virtualenv
- sudo apt-get -y install mercurial
- sudo apt-get -y install libcurl4-openssl-dev
- sudo mkdir -p /home/songza/deploy
- sudo chown -R hadoop.hadoop /home/songza
- hg clone XXXX /home/songza/deploy/current
- virtualenv /home/songza/env/python
- /home/songza/env/python/bin/easy_install pip
- /home/songza/env/python/bin/pip install -r /home/songza/deploy/current/deploy/python/emr-requirements.txt
- echo 'source /home/songza/env/python/bin/activate' >> ~/.bashrc
- echo 'export SONGZA_BASEDIR=/home/songza/deploy/current' >> ~/.bashrc
- echo 'export PYTHONPATH=/home/songza/deploy/current' >> ~/.bashrc
- (cd /home/songza/deploy/current; make aws)
cmdenv:
TZ: Etc/UTC
ec2_key_pair: compute
ec2_key_pair_file: /tmp/compute.pem
ec2_instance_type: m2.4xlarge
enable_emr_debugging: True
num_ec2_instances: 1
s3_log_uri: s3://songza.compute/tmp/logs/
s3_scratch_uri: s3://songza.compute/tmp/
ssh_tunnel_to_job_tracker: True
> On Nov 2, 2012, at 2:32 PM, Roy Smith <
r...@panix.com> wrote:
>
>> On Nov 2, 2012, at 2:26 PM, Shivkumar Shivaji wrote:
>>
>>> On binaries such as numpy, cython, pandas, sklearn it is probably quicker to find .rpm and call them in a bootstrap script. I think EMR uses a red hat based linux.
>>
>> We've been using apt-get in our EMR bootstrap scripts. Our EMR instances are ubuntu. Not sure if that's the default, or just happens to be the AMI we configured.
>> --
>> Roy Smith
>>
r...@panix.com
>>
>
--
Roy Smith
r...@panix.com