Uploading a file to Amazon S3 with Tornado

1,520 views
Skip to first unread message

Johann Diedrick

unread,
Dec 27, 2012, 6:02:04 PM12/27/12
to python-...@googlegroups.com
Hello!

I'm coming with a question that I've done some research about, but I'm
still unclear about the best way to implement. I'm looking to make a
form that allows a user to upload an image to an Amazon S3 account. One
solution suggests using boto to interface with Amazon S3. Is this
recommended? Can Tornado not do this out of the box?

Another solution recommended setting up an Nginx server (from Didip's
blog post here:
http://didipkerabat.com/post/2724838963/nginx-file-upload-and-tornado-framework
). At the end of the post, it is mentioned that this method can be used
to upload a file to S3. I've never done file upload ever on Tornado, so
I'm not sure how best to go about this.

Any suggestions? Should I first look into simply getting a file uploaded
to S3 through whatever means (boto?), and then see how to work this in
with Nginx (something I've never used), or is there another way to get
me off the ground easier?

Any advice is appreciated!

Thanks,
Johann

Johann Diedrick

unread,
Dec 29, 2012, 1:00:18 AM12/29/12
to python-...@googlegroups.com
Hi Stuart-

Thank you very much! I will give this a try tomorrow hopefully and let
you know how it works.

Thanks again,
-Johann

Chris Allick

unread,
Dec 29, 2012, 1:03:23 AM12/29/12
to python-...@googlegroups.com
I like using boto.  you can accept the image via an XHR request or over websockets and upload it to s3, no need ot write to disk, it's very fast.

Shalabh Aggarwal

unread,
Dec 30, 2012, 1:50:58 AM12/30/12
to python-...@googlegroups.com
+1 for boto.

Its reliable, fast and easy to use.

Johann Diedrick

unread,
Dec 31, 2012, 1:02:14 PM12/31/12
to python-...@googlegroups.com
Hello again-

Thanks for the boto recommendation. I'm giving it a try now and it seems
to interface well with Amazon s3. I'm trying to debug a (hopefully)
simple bug on my end though..

Most tutorials suggest using set_contents_from_filename() to pass your
file to your S3 bucket. Unfortunately when I try this I get this error:

Traceback (most recent call last):
File
"/Library/Python/2.7/site-packages/tornado-2.2-py2.7.egg/tornado/web.py", line
988, in _execute
getattr(self, self.request.method.lower())(*args, **kwargs)
File "doesare.py", line 313, in post
k.set_contents_from_filename(imagename)
File "/Library/Python/2.7/site-packages/boto/s3/key.py", line 1056,
in set_contents_from_filename
fp = open(filename, 'rb')
IOError: [Errno 2] No such file or directory: u'image.jpg'


Here is my ImageUploadHandler:

class ImageUploadHandler(tornado.web.RequestHandler):
def get(self):
self.render("imageupload.html")

def post(self):
image=self.request.files['image'][0]

imagename = image['filename']
conn = S3Connection(<public_key>,<secret_key>)
bucket = conn.create_bucket('doesare_images')
k = Key(bucket)
k.key = imagename
k.set_metadata("Content-Type", "image/jpeg")
k.set_contents_from_filename(imagename)
k.set_acl('public-read')

self.write("file uploaded!")

Here is my code for the page where the POST request is made:

<html>
<head></head>
<body>
<div class="form">
<form method="POST" action ="/imageupload"
enctype="multipart/form-data">
File: <input type="file" name="image">
<br/>
<input type="submit" value="Upload">
</form>
</div>
</body>
</html>


It seems the problem is self.request.files is getting the image, and I
can get the file name, but trying to pass the filename to
set_contents_from_filename isn't working. Am I missing something?

Somewhat related, I then tried another method by using
set_contents_from_file and passing in the image object itself (I
think...). The POST request page remains the same but the post handler
is a little different.:

def post(self):
image=self.request.files['image'][0]
imagename = image['filename']
conn = S3Connection(<public_key>,<secret_key>)
bucket = conn.create_bucket('doesare_images')
k = Key(bucket)
k.key = imagename
k.set_metadata("Content-Type", "image/jpeg")
k.set_contents_from_file(StringIO(image))
k.set_acl('public-read')
self.write("file uploaded!")

This actually succeeds in putting a file in my S3 bucket, but the file
is corrupt. It has the right name and filetype, but its more than twice
the original size of the image and I can't open it.

Does anyone know what I'm doing wrong in either method? Any help or
insight would be much appreciated!

Thanks so much in advance,
Johann

Chris Allick

unread,
Dec 31, 2012, 1:09:29 PM12/31/12
to python-...@googlegroups.com
I think you want something that uploads from file data not name because ideally you are taking base64 or image post data and sending that straight to s3 not writing to disk.

I'm away from computer but can send an example later

Didip Kerabat

unread,
Dec 31, 2012, 1:19:39 PM12/31/12
to python-...@googlegroups.com
> image=self.request.files['image'][0]

image object contains more than just the bytes. you should just get the bytes: self.request.files['image'][0]['body']

Also, you don't need StringIO to upload raw bytes. You can use: k.set_contents_from_string()

- Didip -

Johann Diedrick

unread,
Dec 31, 2012, 2:00:36 PM12/31/12
to python-...@googlegroups.com
Hello!

Didip- Thank you! That worked perfectly. I really appreciate it :)

Chris- Thank you in advance. If you still have an example lying around
I'd love to see it, just as another reference as to how to accomplish this.

Thank you all very much!

-Johann

Hermann Yung

unread,
Sep 29, 2014, 1:49:33 PM9/29/14
to python-...@googlegroups.com
Anyone got any ideas how to make it asynchronous?

jø於 2012年12月28日星期五UTC+8上午7時02分04秒寫道:

Zhou

unread,
Sep 29, 2014, 11:05:35 PM9/29/14
to python-...@googlegroups.com, hong.kon...@gmail.com

@Johann

If you want clients upload file to s3 directly, you can sign a URL for clients to upload a file. I use this way in a project to make app upload/delete/update files directly.

You can read:
Signing and Authenticating REST Requests
PUT Object(s3 restful api for uploading)
How boto library generate a signature

@Hermann Yung

You can use the restful APIs of s3 directly(and the SDK which provided by amazon for developer also use this), make an asynchronous http request is very easy. (I use this way in tornado and nginx)


--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornad...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zhou

unread,
Sep 29, 2014, 11:19:03 PM9/29/14
to python-...@googlegroups.com, hong.kon...@gmail.com

My code to sign s3 request.

The sign_s3_request function is from S3 using python-requests.

import hmac

from hashlib import sha1
from base64 import encodebytes as encodestring
from email.utils import formatdate
from libs.utils import remove_blank

def sign_s3upload_request(
    aws_access_key, aws_secret_key, bucket, name, mime_type
):
    return sign_s3_request(
        aws_access_key,
        aws_secret_key,
        bucket,
        name,
        {
            'Content-Type': mime_type,
            'x-amz-acl': 'bucket-owner-full-control'
        },
        'PUT'
    )

def sign_s3download_request(
    aws_access_key, aws_secret_key, bucket, name
):
    return sign_s3_request(
        aws_access_key,
        aws_secret_key,
        bucket,
        name,
        {
        },
        'GET'
    )

def sign_s3delete_request(
    aws_access_key, aws_secret_key, bucket, name
):
    return sign_s3_request(
        aws_access_key,
        aws_secret_key,
        bucket,
        name,
        {
            #"Content-Length": '0',
            #"Content-Type": 'text/plain'
        },
        'DELETE'
    )

def sign_s3_request(
    access_key, secret_key, bucket, objectkey,
    headers, method, **kwargs
):
    special_params = [
        'acl', 'location', 'logging', 'partNumber', 'policy', 'requestPayment',
        'torrent', 'versioning', 'versionId', 'versions', 'website', 'uploads',
        'uploadId', 'response-content-type', 'response-content-language',
        'response-expires', 'response-cache-control', 'delete', 'lifecycle',
        'response-content-disposition', 'response-content-encoding'
    ]

    interesting_headers = {
        'content-md5': '',
        'content-type': '',
        'x-amz-date': formatdate(
            timeval=None,
            localtime=False,
            usegmt=True
        )
    }

    # get canonical string
    for key in headers:
        key_lower = key.lower()
        if (
            headers[key] and
            (
                key_lower in interesting_headers.keys() or
                key_lower.startswith('x-amz-')
            )
        ):
            interesting_headers[key_lower] = headers[key].strip()

    canonical_string = '%s\n' % method
    for key in sorted(interesting_headers.keys()):
        val = interesting_headers[key]
        if not key.startswith('x-amz-'):
            canonical_string += '%s\n' % val

    canonical_string += '\n'
    for key in sorted(interesting_headers.keys()):
        val = interesting_headers[key]
        if key.startswith('x-amz-'):
            canonical_string += '%s:%s\n' % (key, val)

    canonical_string += '/%s' % bucket

    canonical_string += '/%s' % objectkey

    params_found = False
    for k, v in kwargs.items():
        if k in special_params:
            if params_found:
                canonical_string += '&%s' % k
            else:
                canonical_string += '?%s' % k
            params_found = True

    print(canonical_string)
    h = hmac.new(secret_key.encode(), canonical_string.encode(), digestmod=sha1)
    signature = encodestring(h.digest()).strip()

    url = 'https://%s.s3.amazonaws.com/%s' % (bucket, objectkey)

    for key in headers.keys():
        key_lower = key.lower()
        if (
            headers[key] and
            key_lower not in interesting_headers
        ):
            interesting_headers[key_lower] = headers[key].strip()

    interesting_headers['Authorization'] = 'AWS %s:%s' % (
        access_key, signature.decode())

    return remove_blank({
        'method': method,
        'headers': interesting_headers,
        'url': url,
    })

Sash Nagarkar

unread,
Oct 6, 2014, 2:28:53 PM10/6/14
to python-...@googlegroups.com, hong.kon...@gmail.com
Happened to stumble across this:

Zhou

unread,
Oct 10, 2014, 5:07:46 AM10/10/14
to python-...@googlegroups.com, Hermann Yung
I think using aws RESTful APIs is the best way to integrate aws services.

Akriti

unread,
May 7, 2015, 3:59:08 AM5/7/15
to python-...@googlegroups.com, zhouqi...@gmail.com, hong.kon...@gmail.com
Hi,

Is this method also good for very large sized files (around 200GB or more)?
Reply all
Reply to author
Forward
0 new messages