simplest possible background process for file upload

uday

unread,

Apr 12, 2013, 5:06:27 AM4/12/13

to pylons-...@googlegroups.com

I have a pyramid app which handles file upload. I am able to upload that file to amazon s3 as soon as user uploads the file without saving it anywhere on the file system. The problem here is, it is taking more time than just storing file locally. I just want a simple task to upload the file to amazon after file upload response is sent. This makes the user experience a lot better. Is it necessary to use message queue in this case or some other solution would do the trick? I have not tried message queues before.

Whit Morriss

unread,

Apr 12, 2013, 10:07:43 AM4/12/13

to <pylons-discuss@googlegroups.com>

On Apr 12, 2013, at 4:06 AM, uday <goto...@gmail.com>

wrote:

I have a pyramid app which handles file upload. I am able to upload that file to amazon s3 as soon as user uploads the file without saving it anywhere on the file system. The problem here is, it is taking more time than just storing file locally. I just want a simple task to upload the file to amazon after file upload response is sent. This makes the user experience a lot better. Is it necessary to use message queue in this case or some other solution would do the trick? I have not tried message queues before.

I've seen folks use http://pythonhosted.org/APScheduler/ for this kind of thing to avoid having to use a worker. Basically, you spawn a worker thread at app creation, and queue stuff for it to handle. Not sure this is "simpler", but your failures and crashes at will happen in the same process which is convenient for debugging. You could also do something similar with multiprocessing.

I feel like a recent post cover a bunch of these options, but I'm failing to google it up.

-w

d. "whit" morriss

Platform Codemonkey

wh...@surveymonkey.com

Jonathan Vanasco

unread,

Apr 12, 2013, 2:42:48 PM4/12/13

to pylons-...@googlegroups.com

off the top of my head:

1. celery

2. fork a process to handle the uploads

3. register a cleanup handler

4. homegrown batch / daemon -- log the upload locally, then process the upload separately

in my personal experience -- the main thing i'd watch out for is the book-keeping/accounting portion of it.

you want to ensure that:

1- you mark the upload as complete when it's complete

2- you mark the upload as failed when it's failed

3- you have some sort of check in place to handle crashes ( the process died during an upload, before it could handle a complete or fail )

you notify the user or your application as necessary

the record-keeping and transactional element of this is really important -- otherwise you can end up with an s3 bucket that has thousands of images (which you're paying hosting for ) but will never be used. i learned that the hard way due to a bug in one of my unit tests !

IIRC, the approach I used was to have the uploading facility use a transactionless db handle for status recordkeeping ( i've added this file , i've deleted this file ), while the main application / daemon used transactions as normal.

Reply all

Reply to author

Forward