Re: Storing and deleting temporary uploads

19 views
Skip to first unread message

Ben Lovell

unread,
Apr 22, 2013, 4:51:58 AM4/22/13
to sina...@googlegroups.com
Hey James!

On 22 April 2013 09:21, James Abbott <abbo...@gmail.com> wrote:
Hello list,

let's say I'm building a service where a user uploads some files to a server. The server processes the files and presents the results in the view. So the app doesn't persist the uploaded files between users or sessions, it just crunches them.

So, the questions:

1) How do I keep file uploads separate so that if two users upload at the same time, their uploads don't intermix?

Save them with a unique filename, perhaps SecureRandom.uuid which generates something like "2d931510-d99f-494a-8c67-87feb05e1594". You can then set this token against your user record or session for tracking.
 

2) How do I decide when to delete the files from the uploads folder?

I'd run a nightly batch job to clear out the files older than a few days, or whichever interval better suits your use case.
 

I’m guessing I’ll have to scope the uploads to the user session, but have no experience doing this.

I assume your users are authenticated? You have a user record of some description? Once your upload is received you could store the generated ID somewhere in your user record - whether that's something like user#uploaded_files or similar.

As a general point: it's generally considered best practice to do any heavy lifting out-of-band. If possible you should run a separate worker that watches for uploaded files and processes them outside of your web's worker process. You could modify your UI to poll for completion of the job and present the result to your user when it has finished.

Cheers,
Ben
 

Thanks!

/James 

--
You received this message because you are subscribed to the Google Groups "sinatrarb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sinatrarb+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

James Abbott

unread,
Apr 22, 2013, 7:37:43 AM4/22/13
to sina...@googlegroups.com

Hi Ben,

many thanks for addressing my question.
 
I assume your users are authenticated? You have a user record of some description? Once your upload is received you could store the generated ID somewhere in your user record - whether that's something like user#uploaded_files or similar.

No, that's not the case (at least at this stage). I have no models / records of any kind - I'm envisioning this as a web page with an upload form that anyone can use so long as the files are in the right format. So if you uploaded a file, you'd see its output displayed on the same page.

 
As a general point: it's generally considered best practice to do any heavy lifting out-of-band. If possible you should run a separate worker that watches for uploaded files and processes them outside of your web's worker process. You could modify your UI to poll for completion of the job and present the result to your user when it has finished.

Do you have suggestions to any reading material / tutorials that explain this, ideally through an example? Would EventMachine be suitable for this?

Cheers,
James



On Monday, April 22, 2013 10:21:50 AM UTC+2, James Abbott wrote:
Hello list,

let's say I'm building a service where a user uploads some files to a server. The server processes the files and presents the results in the view. So the app doesn't persist the uploaded files between users or sessions, it just crunches them.

So, the questions:

1) How do I keep file uploads separate so that if two users upload at the same time, their uploads don't intermix?

2) How do I decide when to delete the files from the uploads folder?

I’m guessing I’ll have to scope the uploads to the user session, but have no experience doing this.

Thanks!

/James 

Ben Lovell

unread,
Apr 22, 2013, 8:23:49 AM4/22/13
to sina...@googlegroups.com
Hey,

On 22 April 2013 12:37, James Abbott <abbo...@gmail.com> wrote:

Hi Ben,

many thanks for addressing my question.
 
I assume your users are authenticated? You have a user record of some description? Once your upload is received you could store the generated ID somewhere in your user record - whether that's something like user#uploaded_files or similar.

No, that's not the case (at least at this stage). I have no models / records of any kind - I'm envisioning this as a web page with an upload form that anyone can use so long as the files are in the right format. So if you uploaded a file, you'd see its output displayed on the same page.


The simplest approach is to write a cookie containing a token that you can identify your user with, and subsequently reconcile against the file upload.
 
 
As a general point: it's generally considered best practice to do any heavy lifting out-of-band. If possible you should run a separate worker that watches for uploaded files and processes them outside of your web's worker process. You could modify your UI to poll for completion of the job and present the result to your user when it has finished.

Do you have suggestions to any reading material / tutorials that explain this, ideally through an example? Would EventMachine be suitable for this?

Take a look at railscasts [0] for some inspiration. You have a bunch of options starting from a simple script that you schedule with cron (whenever gem is great for this), all the way up to something such as Sidekiq which is a heavyweight worker solution.


Cheers,
Ben
 

Cheers,
James



On Monday, April 22, 2013 10:21:50 AM UTC+2, James Abbott wrote:
Hello list,

let's say I'm building a service where a user uploads some files to a server. The server processes the files and presents the results in the view. So the app doesn't persist the uploaded files between users or sessions, it just crunches them.

So, the questions:

1) How do I keep file uploads separate so that if two users upload at the same time, their uploads don't intermix?

2) How do I decide when to delete the files from the uploads folder?

I’m guessing I’ll have to scope the uploads to the user session, but have no experience doing this.

Thanks!

/James 

--

James Abbott

unread,
Apr 22, 2013, 9:39:10 AM4/22/13
to sina...@googlegroups.com
Ben:

That's awesome, thanks! I'll take it from there.

/J.

Markus Prinz

unread,
Apr 23, 2013, 4:26:05 AM4/23/13
to sina...@googlegroups.com

On 22.04.2013, at 13:37, James Abbott <abbo...@gmail.com> wrote:

>
> Hi Ben,
>
> many thanks for addressing my question.
>
>> I assume your users are authenticated? You have a user record of some description? Once your upload is received you could store the generated ID somewhere in your user record - whether that's something like user#uploaded_files or similar.
>
> No, that's not the case (at least at this stage). I have no models / records of any kind - I'm envisioning this as a web page with an upload form that anyone can use so long as the files are in the right format. So if you uploaded a file, you'd see its output displayed on the same page.

You should be aware that it'd be very easy to run a DoS against your server if someone uploads a large file and/or one that takes a long time to process. So while you don't need to have a DB for that, you need some way to store some data temporarily, like Redis.

>> As a general point: it's generally considered best practice to do any heavy lifting out-of-band. If possible you should run a separate worker that watches for uploaded files and processes them outside of your web's worker process. You could modify your UI to poll for completion of the job and present the result to your user when it has finished.
>
> Do you have suggestions to any reading material / tutorials that explain this, ideally through an example? Would EventMachine be suitable for this?

No, you usually use a queue for that (eg. like Redis and Resque, though there are many, many other solutions). Your web process adds a job with a payload to the queue (the payload can basically be anything, you probably want to put in the file name/path, and maybe some other info), and then one or more workers on the other end listen for incoming jobs, and process them, putting the result somewhere where a web process can later retrieve it.

That way, your web process will never take too long to process a single request, and your web page will stay responsive. You can then use something like long polling to have the client wait until the result is ready - this is something were EventMachine may come in handy.


So, the whole thing could look like this:

* User uploads a file to your web server
* Web process stores the file somewhere, and puts its name and other info into the worker queue. It then tells the browser to start waiting for the result.
* A background worker receives the job, and starts processing it.
* After it has finished processing, it puts the result into Redis using the key the web process gave it
* As the web browser keeps polling the server for the result, it sees that there's now data available in Redis under the given key
* Your web server returns that data


g, Markus

James Abbott

unread,
Apr 23, 2013, 10:19:14 AM4/23/13
to sina...@googlegroups.com
Markus,

thanks for the detailed input, much appreciated. I don't expect the app to have very many concurrent users and the files it will operate on are very lightweight text files (CSV). Still, it's great to have a blueprint for what a robust architecture should look like. I'll probably follow it to scale the app once the prototype is out.

Best,
James
 


--
You received this message because you are subscribed to a topic in the Google Groups "sinatrarb" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sinatrarb/5K5d-nfovOg/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to sinatrarb+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages