Thin buffers response body and uses a ton of memory for large files

597 views
Skip to first unread message

Kevin West

unread,
Jul 29, 2009, 9:40:22 PM7/29/09
to thin-ruby
I am developing a service using Amazon S3 which could potentially
involve sending very large files down to the user. Due to the nature
of our design the files will reside on S3 in smaller chunks. The
ideal solution for sending these files is to stream the chunks in
order from S3 and send data to the response stream as we pull it. The
way I do this is to render the response text as a proc, something like
this:

render :status => options[:status], :text => proc { |response, output|
stream_content { |chunk|
output.write(chunk)
}
}

When I do this with Thin however, it evaluates the proc and buffers
the entire file in memory before sending it along to the web server.

We are currently using an Nginx/Mongrel setup, and it has the exact
same issue, but we were able to add a hack that essentially writes the
output data directly to the response socket, which would let Nginx
buffer the file download in a file. I would really like to switch to
Thin for the various perf benefits, as well as to avoid the memory
leaks we get from Mongrel's queueing system, but I can't figure out a
way to handle this properly in Thin.

After browsing through the source, I noticed that if output.body was
an object that defined an each method, it should send the chunks out
appropriately. I tried implementing this however, something like
this:

render :status => options[:status], :text => proc { |response, output|
output.body = ResponseBody.new
}

class ResponseBody
def each(&block)
stream_content { |chunk|
yield chunk
}
end
end

However watching top while downloading a 350MB file it seems to still
be loading everything up into memory. Just as additional info I am
currently using Nginx to serve static files and reverse proxying
requests to my thin servers for dynamic content.

Are there any suggestions for how to go about fixing this? I would
really love to use Thin as my initial benchmarks shows it outperforms
Mongrels by quite a bit, but this is a total blocker for me.

As a quick aside also, I've noticed that Swiftiply is available as a
backend for Thin. I've looked for info, but I can't really find
anything that lists pros/cons for using or not using this. What
potential benefits are there?

Thanks.

David Zhao

unread,
Jul 30, 2009, 7:18:09 PM7/30/09
to thin...@googlegroups.com
I was able to verify this behavior as well. For some reasons Thin/or
EventMachine is buffering the output even when it's undesirable to do
so (i.e. sending a file or binary data)

From the source, it looks like when response.each is used, the
response is sent to the underlying EventMachine connection without
buffering:

in connection.rb:
# Send the response
@response.each do |chunk|
trace { chunk }
send_data chunk
end

However, memory usage still grows by the size of data I'm sending. If
any one has any pointers, I'd really appreciate it.

Alexey Borzenkov

unread,
Jul 31, 2009, 1:42:21 AM7/31/09
to thin...@googlegroups.com
On Fri, Jul 31, 2009 at 3:18 AM, David Zhao<david...@gmail.com> wrote:
>
> I was able to verify this behavior as well. For some reasons Thin/or
> EventMachine is buffering the output even when it's undesirable to do
> so (i.e. sending a file or binary data)
>
>  From the source, it looks like when response.each is used, the
> response is sent to the underlying EventMachine connection without
> buffering:
>
> in connection.rb:
> # Send the response
> @response.each do |chunk|
>   trace { chunk }
>   send_data chunk
> end
>
> However, memory usage still grows by the size of data I'm sending. If
> any one has any pointers, I'd really appreciate it.

Could it be that the data is simply not GCed yet? Try GC.start or
ObjectSpace.garbage_collect and see if your memory usage shrinks.

James Tucker

unread,
Jul 31, 2009, 5:01:15 AM7/31/09
to thin...@googlegroups.com

On 31 Jul 2009, at 00:18, David Zhao wrote:

>
> I was able to verify this behavior as well. For some reasons Thin/or
> EventMachine is buffering the output even when it's undesirable to do
> so (i.e. sending a file or binary data)
>
> From the source, it looks like when response.each is used, the
> response is sent to the underlying EventMachine connection without
> buffering:
>
> in connection.rb:
> # Send the response
> @response.each do |chunk|
> trace { chunk }
> send_data chunk
> end

This doesn't release the reactor. Until it does, it'll block and
buffer. Gotta release the reactor, schedule data asynchronously, and /
or use fast file send infrastructure for such things.

Kevin West

unread,
Jul 31, 2009, 6:21:40 PM7/31/09
to thin-ruby
James thanks for the response. Can you give me a little bit more info
on how I could go about implementing this from my rails code? Am I
going to have to create any extensions to patch up the Thin
framework? I looked around a little bit and it seemed like next_tick
was the key to scheduling data to be sent asynchronously. Any more
info you may have would be helpful.

Thanks

Muhammad Ali

unread,
Jul 31, 2009, 9:49:43 PM7/31/09
to thin-ruby
It is easy to modify this if you are on Ruby1.9

http://github.com/oldmoe/thin/tree/master

This is a slightly modified (and highly experimental) thin for Ruby1.9
that fixes this problem.
It basically wraps the process in a fiber and yields it to give way
for the reactor to kick in.

I should be releasing it as a new backend for Thin (rather than
patching it) soon.

regards

oldmoe

David Zhao

unread,
Jul 31, 2009, 10:14:54 PM7/31/09
to thin...@googlegroups.com
Is there a way to do what you said for ruby1.8? My application isn't
ported to 1.9 yet, and I'd imagine it'll take awhile before it's 1.9
ready.

I don't know the specifics about how Thin interacts with EventMachine,
any pointers is appreciated.

-David

Muhammad Ali

unread,
Jul 31, 2009, 10:34:14 PM7/31/09
to thin-ruby
The real problem is in the Rack semantics which tie Thin in a serial
loop that feeds the response body to the reactor but never gives way
to it to start sending any data until all the response is buffered.
The only way to do that seamlessly is via Ruby1.9 fibers as explained
in my RubyKaigi presentation http://www.espace.com.eg/blog/2009/07/20/neverblock-at-ruby-kaigi/

If you really write large response bodies then you should consider
writing those to disk and letting nginx serve them directly. It might
sound horrible but if this is not a frequent use case then doing so is
actually much better than letting your thins grow fat!

regards

oldmoe

igrigorik

unread,
Aug 2, 2009, 10:52:00 AM8/2/09
to thin-ruby
Sendfile is your friend in this case.. However, if the file is not
local / on disk, you can also try using the new proxy method they
added to EventMachine (recently, check the docs), which allows you to
stream from one eventmachine connection directly to another (from http
fetch to http push, for example).

ig

Kevin West

unread,
Aug 4, 2009, 3:26:03 PM8/4/09
to thin-ruby
Thanks for the info. I did some research on the enable_proxy method
and it looks like it could potentially do what I need. I'm having
trouble conceptualizing how it would work however. On one side I have
the UnixConnection (using unix sockets) that connects to Nginx, where
I want the response to be buffered to disk as it's sent out. On the
other end I have my rails code which connects to S3 and grabs the data
I need. Do I need to create an EventMachine connection that reads
data from S3 and immediately proxies it to the UnixConnection? Or is
there another connection in Thin that I'm missing that I want to proxy
to the UnixConnection such that the response data gets sent out
immediately instead of buffered in EventMachine?

Anything you have to clarify would help. Thanks!
Reply all
Reply to author
Forward
0 new messages