socket.gaierror: [Errno -2] Name or service not known and socket.gaierror: [Errno 11004] getaddrinfo failed

399 views
Skip to first unread message

Vignesh Kalai

unread,
Sep 23, 2015, 12:01:48 PM9/23/15
to mrjob
Hi all ,

I currently trying to learn mrjob and how to implement it in AWS emr so please forgive me if I am asking already asked question [searched many places but did not find the answer] and sorry if it is a silly question

This is my python script :

from mrjob.job import MRJob

class MRWordFrequencyCount(MRJob):

    def mapper(self, _, line):
        yield "chars", len(line)
        yield "words", len(line.split())
        yield "lines", 1

    def reducer(self, key, values):
        yield key, sum(values)


if __name__ == '__main__':
    MRWordFrequencyCount.run()

When I run it in local mode I got the result

cmd:

python sample.py input.txt

So I tried to run this in EMR

by creating a mrjob.conf file

which looks like this :

runners:
emr:
aws_access_key_id:
aws_secret_access_key:
aws_region: us-west-2a
ec2_key_pair: emr
ec2_key_pair_file: ~/Desktop/emr.pem
ec2_instance_type: m1.small
num_ec2_instances: 5

local:
base_tmp_dir: /tmp

First attempt

Trying it locally  on my windows system

    python check.py -r emr --conf-path ./mrjob.conf  word.txt

Note :

Same error came when I kept the input in s3 location and gave it as an argument

I got this traceback:

Traceback (most recent call last):
  File "check.py", line 16, in <module>
    MRWordFrequencyCount.run()
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\job.py", line 461, in run
    mr_job.execute()
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\job.py", line 479, in execute
    super(MRJob, self).execute()
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\launch.py", line 153, in execute
    self.run_job()
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\launch.py", line 215, in run_job
    with self.make_runner() as runner:
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\job.py", line 502, in make_runner
    return super(MRJob, self).make_runner()
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\launch.py", line 168, in make_runner
    return EMRJobRunner(**self.emr_job_runner_kwargs())
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\emr.py", line 643, in __init__
    self._fix_s3_scratch_and_log_uri_opts()
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\emr.py", line 760, in _fix_s3_scratch_and_log_uri_opts
    self._set_s3_scratch_uri(s3_conn)
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\emr.py", line 776, in _set_s3_scratch_uri
    buckets = s3_conn.get_all_buckets()
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\retry.py", line 149, in call_and_maybe_retry
    return f(*args, **kwargs)
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\s3\connection.py", line 436, in get_all_buckets
    response = self.make_request('GET', headers=headers)
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\s3\connection.py", line 664, in make_request
    retry_handler=retry_handler
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\connection.py", line 1070, in make_request
    retry_handler=retry_handler)
  File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\connection.py", line 1029, in _mexe
    raise ex
socket.gaierror: [Errno 11004] getaddrinfo failed

When I tried to run it in aws EC2 instance

I got this error

 Traceback (most recent call last):
  File "check.py", line 16, in <module>
    MRWordFrequencyCount.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 461, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 479, in execute
    super(MRJob, self).execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 153, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 215, in run_job
    with self.make_runner() as runner:
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 502, in make_runner
    return super(MRJob, self).make_runner()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 168, in make_runner
    return EMRJobRunner(**self.emr_job_runner_kwargs())
  File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 643, in __init__
    self._fix_s3_scratch_and_log_uri_opts()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 760, in _fix_s3_scratch_and_log_uri_opts
    self._set_s3_scratch_uri(s3_conn)
  File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 776, in _set_s3_scratch_uri
    buckets = s3_conn.get_all_buckets()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/retry.py", line 149, in call_and_maybe_retry
    return f(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 436, in get_all_buckets
    response = self.make_request('GET', headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 664, in make_request
    retry_handler=retry_handler
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1071, in make_request
    retry_handler=retry_handler)
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1030, in _mexe
    raise ex
socket.gaierror: [Errno -2] Name or service not known


I don't know what I am doing wrong

python version 2.7 mrjob version '0.4.5'

David Marin

unread,
Sep 23, 2015, 2:31:22 PM9/23/15
to mr...@googlegroups.com
Try using "us-west-2" rather than "us-west-2a". The latter isn't actually a region; it's an availability zone within a region. Since the region is used to form the hostname for connecting to S3, you're making mrjob try to connect to a host that doesn't exist.

-Dave
> --
> You received this message because you are subscribed to the Google Groups "mrjob" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mrjob+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

signature.asc

Vignesh Kalai

unread,
Sep 25, 2015, 6:16:11 AM9/25/15
to mrjob
Thanks Dave It worked :)
Reply all
Reply to author
Forward
0 new messages