Uploading large (more then 5GB) files to S3

3,160 views
Skip to first unread message

haizaar

unread,
Dec 12, 2011, 3:33:58 AM12/12/11
to boto-users
Good day everyone!

I'm trying to upload 50Gb file to S3 using boto. I'm running latest
copy of boto git master (from yesterday) on ubuntu 10.04 with
python2.6.
I'm trying to upload from us-east-1 m1.small instance to us-based S3
bucket and always finish up with either "Connection reset by peer" or
"Broken pipe". Uploading just 1Gb file works just fine.

Here is my code:

import boto
s3 = boto.connect_s3(aws_access_key_id='key',
aws_secret_access_key='secret')
bucket = s3.get_bucket('test')
key = boto.s3.key.Key(bucket)
key.key='50Gb'
f=open('/tmp/50Gb')
key.set_contents_from_file(f)

After some time I get either this exception:
File "<stdin>", line 1, in <module>
File "boto/s3/key.py", line 790, in set_contents_from_file
self.send_file(fp, headers, cb, num_cb, query_args)
File "boto/s3/key.py", line 583, in send_file
query_args=query_args)
File "boto/s3/connection.py", line 429, in make_request
override_num_retries=override_num_retries)
File "boto/connection.py", line 798, in make_request
return self._mexe(http_request, sender, override_num_retries)
File "boto/connection.py", line 764, in _mexe
raise e
socket.error: [Errno 104] Connection reset by peer

OR this exception:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "boto/s3/key.py", line 790, in set_contents_from_file
self.send_file(fp, headers, cb, num_cb, query_args)
File "boto/s3/key.py", line 583, in send_file
query_args=query_args)
File "boto/s3/connection.py", line 429, in make_request
override_num_retries=override_num_retries)
File "boto/connection.py", line 798, in make_request
return self._mexe(http_request, sender, override_num_retries)
File "boto/connection.py", line 764, in _mexe
raise e
socket.error: [Errno 32] Broken pipe

Can boto upload such a large files to S3?

Thanks,
Zaar

Enis Afgan

unread,
Dec 12, 2011, 7:53:23 AM12/12/11
to boto-users
If your file can be split, here is a great method that allows parallel
uploading of large files:
http://bcbio.wordpress.com/2011/04/10/parallel-upload-to-amazon-s3-with-python-boto-and-multiprocessing/

Might be useful...

Mitchell Garnaat

unread,
Dec 12, 2011, 8:28:04 AM12/12/11
to boto-...@googlegroups.com
And just to be clear, the limit for a single upload to S3 is 5GB.  That's an S3 limitation, not a boto limitation.  If you have something bigger than 5GB, you need to break it into chunks and use the MultiPart upload feature of S3.  Here's a short blog post that describes the basic process:


Mitch

--
You received this message because you are subscribed to the Google Groups "boto-users" group.
To post to this group, send email to boto-...@googlegroups.com.
To unsubscribe from this group, send email to boto-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/boto-users?hl=en.


Matt Billenstein

unread,
Dec 13, 2011, 5:07:25 AM12/13/11
to boto-...@googlegroups.com
On Mon, Dec 12, 2011 at 04:53:23AM -0800, Enis Afgan wrote:
> If your file can be split, here is a great method that allows parallel
> uploading of large files:
> http://bcbio.wordpress.com/2011/04/10/parallel-upload-to-amazon-s3-with-python-boto-and-multiprocessing/
>
> Might be useful...

I found that example a bit clunky - splitting the file and creating chunks on
disk isn't really necessary, and multiprocessing is kinda silly since you're
not cpu-bound but I/O bound...

Here is a script I hacked up last week that I think is better -- it uses gevent
to get I/O concurrency, but it could be replaced with eventlet if you don't
want to build a C-extension. Even threads would be fine here I believe.

https://gist.github.com/1471499

m

--
Matt Billenstein
ma...@vazor.com
http://www.vazor.com/
Reply all
Reply to author
Forward
0 new messages