Re: Uploading a large number of files to S3

244 views
Skip to first unread message

Hans Hasselberg

unread,
May 9, 2013, 1:34:05 AM5/9/13
to typh...@googlegroups.com

Hi Eric,

Yes you are missing something:
https://github.com/typhoeus/typhoeus#handling-file-uploads. You don't need
to read the file. That should speed your queuing and solve your problem.
Did that help?

On May 9, 2013 7:30 AM, "Eric Anderson" <er...@pixelwareinc.com> wrote:
I am trying to upload a large number of files to S3. I found that I can do this to upload a file to S3 via Typhoeus.

s3 = AWS::S3.new.buckets['my-bucket']
o = s3.objects['new_file.dat']
url = o.url_for(:write).to_s
r = Typhoeus::Request.new url, method: :put, body: open('new_file.dat').read
h = Typhoeus::Hydra.hydra
h.queue r
h.run

So obviously I can just put requests in a loop and have them upload in parallel. But I hit two problems:
  • The number of files I need to upload is large so it takes a while to queue. It would be nice if it could start uploading even before I finished queuing. Can I call run first then queue later and it will just start sending them out ASAP?
  • This read the whole file into memory and it stays in memory until it actually is sent. For large files this means I quickly run out of memory. Is there any way I can set the body to an IO object and have it stream out?
Just want to make sure I wasn't missing any functionality before I started considering patching it to do this myself.

Eric

--
You received this message because you are subscribed to the Google Groups "Typhoeus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to typhoeus+u...@googlegroups.com.
To post to this group, send email to typh...@googlegroups.com.
Visit this group at http://groups.google.com/group/typhoeus?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Hans Hasselberg

unread,
May 12, 2013, 6:40:17 PM5/12/13
to typh...@googlegroups.com

On Fri, May 10, 2013 at 3:38 PM, Eric Anderson <er...@pixelwareinc.com> wrote:
On Thursday, May 9, 2013 1:34:05 AM UTC-4, Hans Hasselberg wrote:

Yes you are missing something:

https://github.com/typhoeus/typhoeus#handling-file-uploads. You don't need
to read the file. That should speed your queuing and solve your problem.
Did that help?

That is for a multipart mime-encoded form. I am just doing a simple PUT with the file as the entire body of the request. I guess I could use the browser upload functionality of S3, but that complicates the process because I have to do signatures and the like. But it may be my only option unless I want to fix Typhoeus myself.

You are right - thanks for pointing that out! Could you create an issue for Typhoeus as a reminder? I've created a gist for you which demonstrates howto do that with Ethon. It is a bit hacky but it should work. Depending on how much of the functionality of Typhoeus you need it could be helpful: https://gist.github.com/i0rek/5565206. You will need the latest Ethon master from github. Did that work for you?

Also you didn't touch on the issue of starting the uploads before queuing has even completed. I hate the network just sitting there doing nothing while I am queueing. Can I call run then call queue?

You have to queue at least one request. You then can call run and queue again. Queueing again has to happen in a new thread b/c run blocks. I don't think this is a good idea - I need to make the hydra queue thread safe before.

--
Hans
Reply all
Reply to author
Forward
0 new messages