Issues with EC2 Roles and S3 connections

445 views
Skip to first unread message

Chris Moyer

unread,
Jun 19, 2013, 9:28:14 AM6/19/13
to boto-...@googlegroups.com
Hi all,

I'm using boto on EC2 with IAM Roles, which appears to be having problems every once in a while:

#012    s3 = boto.connect_s3()
#012  File "/usr/local/boto/boto/__init__.py", line 130, in connect_s3
#012    return S3Connection(aws_access_key_id, aws_secret_access_key, **kwargs)
#012  File "/usr/local/boto/boto/s3/connection.py", line 174, in __init__
#012    validate_certs=validate_certs)
#012  File "/usr/local/boto/boto/connection.py", line 540, in __init__
#012    aws_access_key_id,
#012  File "/usr/local/boto/boto/provider.py", line 178, in __init__
#012    self.get_credentials(access_key, secret_key)
#012  File "/usr/local/boto/boto/provider.py", line 274, in get_credentials
#012    self._populate_keys_from_metadata_server()
#012  File "/usr/local/boto/boto/provider.py", line 293, in _populate_keys_from_metadata_server
#012    self._access_key = security['AccessKeyId']
#012TypeError: string indices must be integers, not str

It looks like the metadata server is returning invalid data occasionally. While this works most of the time, every once in a while I get this error which causes my scripts to fail and not be able to re-connect.

Has anyone else encountered this problem?

--
Chris Moyer

Jason Chan

unread,
Jun 19, 2013, 9:52:22 AM6/19/13
to boto-...@googlegroups.com
We use roles pretty extensively - have not seen issues with type errors/format problems. We have seen a few instances where an instance was given expired credentials, though (which cause a different kind of failure). 


--
You received this message because you are subscribed to the Google Groups "boto-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to boto-users+...@googlegroups.com.
To post to this group, send email to boto-...@googlegroups.com.
Visit this group at http://groups.google.com/group/boto-users.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Chris Moyer

unread,
Jun 26, 2013, 10:17:08 AM6/26/13
to boto-...@googlegroups.com
Ok, after a lot of digging, I finally figured out the problem. The issue is that occasionally the metadata server times out, and we get this log as well:

boto: ERROR Caught exception reading instance data

The problem is that the default action when there is just one exception is to give up and return an empty string. Adding this to my boto.cfg seems to fix the problem, since it typically doesn't error out more then a few times in a row (for me it seems like 3 errors and then it succeeds):

[Boto]
metadata_service_num_attempts = 10

It seems like this should be the default, and in at least the function definition it suggests that should be the default, but for some reason on line 283 of boto/provider.py it sets the default to 1:

boto/provider.py
283:        attempts = config.getint('Boto', 'metadata_service_num_attempts', 1)


Is there any reason this is set to 1 instead of something higher to help prevent this type of critical error from causing applications without setting this to something different to frequently die?
--
Chris Moyer

Mitchell Garnaat

unread,
Jun 26, 2013, 10:27:42 AM6/26/13
to boto-users
The reason it is set to 1 is because we use the same method to try to detect whether we have a role or not.  So, if you are running boto on a non-EC2 instance, you don't really want boto hanging every time it starts up trying to contact a non-existent metadata server.

I really don't know of any other method we can use to try to determine whether we are on a EC2 instance or not.  We added the ability to configure the retry strategy to handle the case you are experiencing but if anyone has better ideas, we are open to suggestions!

Mitch

Mitchell Garnaat

unread,
Jun 26, 2013, 10:29:29 AM6/26/13
to boto-users
Actually, what I meant is we use the same method to detect whether we are on an EC2 instance or not.  When we start up boto, it needs to find credentials.  One of the things it checks for is an IAM role but if you are not on an EC2 instance, we don't want to hang while we keep retrying the metadata service.

Mitch

Chris Moyer

unread,
Jun 26, 2013, 11:11:02 AM6/26/13
to boto-...@googlegroups.com
Hmm, that makes a lot of sense. Perhaps there is a better way to avoid testing at all if we can detect we're not on an EC2 instance? As it stands now, if you're not on an EC2 instance it seems like that function would get called quite a bit. What if instead we check to see if credentials are set in another way (boto.cfg, path variable, etc), and if they are not set, we already know we must be grabbing from a Role, so then set the default to 10?

Basically that would mean setting the default here to a variable somewhere in boto, which is set to 10 if credentials aren't available via some standard method? That would mean though that if you don't set your credentials it would take a while to return that error....

Does anyone on the EC2 team have any bright ideas?

Joshua Ma

unread,
Jul 8, 2015, 12:02:49 AM7/8/15
to boto-...@googlegroups.com
It's 2 years later, but are people still running into this? We just recently got a bout of errors on one of our machines (stack trace below) and the 1.0 timeout is still there. Running boto 2.36.0. Does anybody have a workaround?

[2015-07-08 03:37:00,343: ERROR/MainProcess] Caught exception reading instance data
Traceback (most recent call last):
  File "/srv/env/local/lib/python2.7/site-packages/boto/utils.py", line 210, in retry_url
    r = opener.open(req, timeout=timeout)
  File "/srv/env/local/lib/python2.7/site-packages/newrelic-2.50.0.39/newrelic/hooks/external_urllib2.py", line 31, in _nr_wrapper_opener_director_open_
    return wrapped(*args, **kwargs)
  File "/usr/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
    raise URLError(err)
URLError: <urlopen error timed out>
[2015-07-08 03:37:00,357: ERROR/MainProcess] Unable to read instance data, giving up
[2015-07-08 03:37:00,363: ERROR/MainProcess] Task delete_snapshot[91c6e646-0fda-48ff-a57c-21765ade65dc] raised exception: TypeError('string indices must be integers, not str',)
Traceback (most recent call last):
  File "/srv/env/local/lib/python2.7/site-packages/celery/task/trace.py", line 233, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/srv/env/local/lib/python2.7/site-packages/newrelic-2.50.0.39/newrelic/hooks/application_celery.py", line 66, in wrapper
    return wrapped(*args, **kwargs)
  File "/srv/1436210980/benchling/taskq/__init__.py", line 78, in __call__
    return TaskBase.__call__(self, *args, **kwargs)
  File "/srv/env/local/lib/python2.7/site-packages/celery/task/trace.py", line 420, in __protected_call__
    return self.run(*args, **kwargs)
  File "/srv/1436210980/benchling/taskq/snapshots.py", line 141, in delete_snapshot
    conn = connection.S3Connection(host=current_app.config['S3_HOST'])
  File "/srv/env/local/lib/python2.7/site-packages/boto/s3/connection.py", line 190, in __init__
    validate_certs=validate_certs, profile_name=profile_name)
  File "/srv/env/local/lib/python2.7/site-packages/boto/connection.py", line 555, in __init__
    profile_name)
  File "/srv/env/local/lib/python2.7/site-packages/boto/provider.py", line 200, in __init__
    self.get_credentials(access_key, secret_key, security_token, profile_name)
  File "/srv/env/local/lib/python2.7/site-packages/boto/provider.py", line 376, in get_credentials
    self._populate_keys_from_metadata_server()
  File "/srv/env/local/lib/python2.7/site-packages/boto/provider.py", line 395, in _populate_keys_from_metadata_server
    self._access_key = security['AccessKeyId']
TypeError: string indices must be integers, not str

kalai vanan

unread,
Jul 13, 2015, 8:47:08 AM7/13/15
to boto-...@googlegroups.com
Hi Joushua,

I am hitting the same problem, Did you got any solution to this.. I ended up searching solution a day. 

Please let me know if you found a solution to this problem.

Thanks,
Kalaivanan.L

Joshua Ma

unread,
Jul 13, 2015, 3:42:02 PM7/13/15
to boto-...@googlegroups.com
Nope - we just sat through the latency spikes and it eventually stopped. We're sort of at the mercy of AWS/S3 still - if anyone has seen this resolved with boto3, that'd be interesting to hear about as well...

- Josh

--
You received this message because you are subscribed to a topic in the Google Groups "boto-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/boto-users/bq0tMxNbjCg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to boto-users+...@googlegroups.com.

To post to this group, send email to boto-...@googlegroups.com.
Visit this group at http://groups.google.com/group/boto-users.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages