Re: Uploading a large number of files to S3

244 views

Skip to first unread message

Hans Hasselberg

unread,

May 9, 2013, 1:34:05 AM5/9/13

to typh...@googlegroups.com

Hi Eric,

Yes you are missing something:
https://github.com/typhoeus/typhoeus#handling-file-uploads. You don't need
to read the file. That should speed your queuing and solve your problem.
Did that help?

On May 9, 2013 7:30 AM, "Eric Anderson" <er...@pixelwareinc.com> wrote:

I am trying to upload a large number of files to S3. I found that I can do this to upload a file to S3 via Typhoeus.

s3 = AWS::S3.new.buckets['my-bucket']
o = s3.objects['new_file.dat']

url = o.url_for(:write).to_s
r = Typhoeus::Request.new url, method: :put, body: open('new_file.dat').read
h = Typhoeus::Hydra.hydra
h.queue r
h.run

So obviously I can just put requests in a loop and have them upload in parallel. But I hit two problems:
The number of files I need to upload is large so it takes a while to queue. It would be nice if it could start uploading even before I finished queuing. Can I call run first then queue later and it will just start sending them out ASAP?

This read the whole file into memory and it stays in memory until it actually is sent. For large files this means I quickly run out of memory. Is there any way I can set the body to an IO object and have it stream out?

Just want to make sure I wasn't missing any functionality before I started considering patching it to do this myself.

Eric

--
You received this message because you are subscribed to the Google Groups "Typhoeus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to typhoeus+u...@googlegroups.com.
To post to this group, send email to typh...@googlegroups.com.
Visit this group at http://groups.google.com/group/typhoeus?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Hans Hasselberg

unread,

May 12, 2013, 6:40:17 PM5/12/13

to typh...@googlegroups.com

On Fri, May 10, 2013 at 3:38 PM, Eric Anderson <er...@pixelwareinc.com> wrote:

On Thursday, May 9, 2013 1:34:05 AM UTC-4, Hans Hasselberg wrote:

Yes you are missing something:
https://github.com/typhoeus/typhoeus#handling-file-uploads. You don't need
to read the file. That should speed your queuing and solve your problem.
Did that help?

That is for a multipart mime-encoded form. I am just doing a simple PUT with the file as the entire body of the request. I guess I could use the browser upload functionality of S3, but that complicates the process because I have to do signatures and the like. But it may be my only option unless I want to fix Typhoeus myself.

You are right - thanks for pointing that out! Could you create an issue for Typhoeus as a reminder? I've created a gist for you which demonstrates howto do that with Ethon. It is a bit hacky but it should work. Depending on how much of the functionality of Typhoeus you need it could be helpful: https://gist.github.com/i0rek/5565206. You will need the latest Ethon master from github. Did that work for you?

Also you didn't touch on the issue of starting the uploads before queuing has even completed. I hate the network just sitting there doing nothing while I am queueing. Can I call run then call queue?

You have to queue at least one request. You then can call run and queue again. Queueing again has to happen in a new thread b/c run blocks. I don't think this is a good idea - I need to make the hydra queue thread safe before.