example of how to use connection pool

2,455 views
Skip to first unread message

Mike Miller

unread,
Aug 4, 2010, 3:16:38 PM8/4/10
to boto-users
I'm neither a python nor boto expert, but I'm interested in getting
better throughput to S3 for small file uploads. Currently I'm getting
about 10-20 files/s uploaded (< 1kB / file) using this simple code:

conn = S3Connection(AMAZON_ID, AMAZON_KEY)
bucket = conn.create_bucket(BUCKET)

#then in some loop like this
for i in someLoop:
k = Key(bucket)
k.key = <something>
k.set_contents_from_string(<something else>)

If I comment out the set_contents_from_string I know that nothing else
int he script is limiting the rate, so it's the upload. Clearly I'm
being super naive here, so some concrete questions:

1) Can someone point me to a good example of how to do this more
efficiently?
2) How do I enable http keep-alive in boto?
3) Does boto have a thread-pool or connection pool that can help with
this?

Thanks, Mike

Matt Billenstein

unread,
Aug 4, 2010, 6:31:48 PM8/4/10
to boto-...@googlegroups.com
I don't really have a direct answer to your question, but you can use a
coroutine library that does non-blocking I/O to achieve better
results... Essentially allowing multiple concurrent connections and
uploads to S3.

Something like:

import eventlet
eventlet.monkey_patch()

import boto.s3

conn = boto.s3.S3Connection(AMAZON_ID, AMAZON_KEY)
bucket = conn.create_bucket(BUCKET)

def upload(s):
# can't reuse the conn/bucket from above here -- I've tried
conn = boto.s3.S3Connection(AMAZON_ID, AMAZON_KEY)
bucket = conn.get_bucket(BUCKET)

k = Key(bucket)
k.key = <something>

k.set_contents_from_string(s, 'public-read')

pile = eventlet.GreenPile(10) # tweak concurrency to your liking
for i in someLoop:
pile.spawn(upload, <something else>)

# block until all uploads complete
list(pile)


m

> --
> You received this message because you are subscribed to the Google Groups "boto-users" group.
> To post to this group, send email to boto-...@googlegroups.com.
> To unsubscribe from this group, send email to boto-users+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/boto-users?hl=en.
>

--
Matt Billenstein
ma...@vazor.com
http://www.vazor.com/

Mitchell Garnaat

unread,
Aug 4, 2010, 8:03:48 PM8/4/10
to boto-...@googlegroups.com
Hi -

Matt's definitely on the right track.  To get maximum throughput to any of the Amazon services, you really need to find a way to introduce some concurrency.  So, you could try threads but make sure that you create a separate S3Connection object for each thread (httplib.py is not threadsafe), or you can use the wonderful multiprocessing library in Python, or you can use an async, non-blocking approach like Matt describes.

boto does have a built-in HTTP connection pooling mechanism and it also should be doing HTTP keep-alive, if the server cooperates.

Mitch

Domingo Aguilera

unread,
Aug 16, 2010, 10:07:37 AM8/16/10
to boto-users
Libraries like eventlet and gevent are very good approaches because
they are non-blocking async with a synchronous look. There some
recent videos explaining both libraries in pycon2010 and
europython2010 ( blip.tv).

I posted this months ago, and although is about using google storage
from boto, changing a couple of lines can do the trick to use S3.
http://gist.github.com/434053
> > boto-users+...@googlegroups.com<boto-users%2Bunsubscribe@googlegrou ps.com>
> > .
> > > For more options, visit this group at
> >http://groups.google.com/group/boto-users?hl=en.
>
> > --
> > Matt Billenstein
> > m...@vazor.com
> >http://www.vazor.com/
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "boto-users" group.
> > To post to this group, send email to boto-...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > boto-users+...@googlegroups.com<boto-users%2Bunsubscribe@googlegrou ps.com>
> > .
Reply all
Reply to author
Forward
0 new messages