Not cleaning up tempfiles for multipart?

1,591 views
Skip to first unread message

Charles Oliver Nutter

unread,
Mar 5, 2010, 9:40:11 AM3/5/10
to rack-...@googlegroups.com
I'm investigating some reported tempfle bugs in JRuby, and it seems
like Rack may not be cleaning up tempfiles it uses.

The creation of the tempfile happens at utils.rb line 486:

if content_type || filename
>> body = Tempfile.new("RackMultipart")
body.binmode if body.respond_to?(:binmode)
end

The body is then written to (with the content of the request body),
rewound, put into the "data" structure, passed to normalize_params
(which inserts the "data" into the params hash), and that's it. The
params are then presumably used for the request, and nobody ever
explicitly cleans up the tempfile.

Am I looking in the wrong place?

If I'm correct, this is a bug. Tempfiles should not be relied upon to
clean themselves up in response to GC, since you don't know when GC
will fire...

- Charlie

Charles Oliver Nutter

unread,
Mar 5, 2010, 9:47:24 AM3/5/10
to rack-...@googlegroups.com
I'm also confused by this code in rewindable_input.rb:

class Tempfile < ::Tempfile
def _close
@tmpfile.close if @tmpfile
@data[1] = nil if @data
@tmpfile = nil
end
end

the original _close (in 1.8.7) does this, which should be functionally
equivalent:

def _close # :nodoc:
@tmpfile.close if @tmpfile
@tmpfile = nil
@data[1] = nil if @data
end

Is this patching a bug from 1.8.6?

FYI, in JRuby there is no _close, since it's a private impl method and
we have our own tempfile implementation.

- Charlie

Charles Oliver Nutter

unread,
Mar 5, 2010, 9:48:24 AM3/5/10
to rack-...@googlegroups.com
On Fri, Mar 5, 2010 at 8:47 AM, Charles Oliver Nutter
<hea...@headius.com> wrote:
> FYI, in JRuby there is no _close, since it's a private impl method and
> we have our own tempfile implementation.

Actually, I was wrong; we do have _close, but it behaves somewhat
differently than MRI.

Still seems like it's an internal method that shouldn't be monkeyed with...

- Charlie

Eric Wong

unread,
Mar 6, 2010, 2:55:48 AM3/6/10
to rack-...@googlegroups.com
Charles Oliver Nutter <hea...@headius.com> wrote:
> If I'm correct, this is a bug. Tempfiles should not be relied upon to
> clean themselves up in response to GC, since you don't know when GC
> will fire...

Why not? Tempfiles are objects, too. It's perfectly reasonable
to let GC clean them up like any other object.


I don't know why RewindableInput is monkeying with Tempfile internals,
though.

--
Eric Wong

Hongli Lai

unread,
Mar 6, 2010, 5:20:21 AM3/6/10
to Rack Development
On Mar 5, 3:47 pm, Charles Oliver Nutter <head...@headius.com> wrote:
> I'm also confused by this code in rewindable_input.rb:
>
>     class Tempfile < ::Tempfile
>       def _close
>         @tmpfile.close if @tmpfile
>         @data[1] = nil if @data
>         @tmpfile = nil
>       end
>     end
>
> the original _close (in 1.8.7) does this, which should be functionally
> equivalent:
>
>   def _close    # :nodoc:
>     @tmpfile.close if @tmpfile
>     @tmpfile = nil
>     @data[1] = nil if @data
>   end
>
> Is this patching a bug from 1.8.6?

Yes. The following code doesn't work on 1.8.6:

t = Tempfile.new('foo')
t.unlink
t.close # <--- crash, because of the early unlink

Hongli Lai

unread,
Mar 6, 2010, 5:25:11 AM3/6/10
to Rack Development
On Mar 6, 8:55 am, Eric Wong <normalper...@yhbt.net> wrote:

> Charles Oliver Nutter <head...@headius.com> wrote:
>
> > If I'm correct, this is a bug. Tempfiles should not be relied upon to
> > clean themselves up in response to GC, since you don't know when GC
> > will fire...
>
> Why not?  Tempfiles are objects, too.  It's perfectly reasonable
> to let GC clean them up like any other object.

Phusion Passenger terminates worker processes by calling exit!. This
does not call finalizers on MRI, causing temp files to be left behind
sometimes. There seems to be no way to call finalizers, even GC.start
doesn't work. I too would like to see Tempfiles being cleaned up
explicitly.

Other than that, I've seen system administrators who are confused by
the fact that such Tempfiles are not immediately cleaned up. Some
people who operate websites that handle a large number of concurrent
uploads are worried that they might run out of disk space because of
this.

Charles Oliver Nutter

unread,
Mar 7, 2010, 9:25:36 AM3/7/10
to rack-...@googlegroups.com
On Sat, Mar 6, 2010 at 4:20 AM, Hongli Lai <hon...@phusion.nl> wrote:
>> Is this patching a bug from 1.8.6?
>
> Yes. The following code doesn't work on 1.8.6:
>
> t = Tempfile.new('foo')
> t.unlink
> t.close   # <--- crash, because of the early unlink

Thanks, that at least explains one mystery.

- Charlie

Charles Oliver Nutter

unread,
Mar 7, 2010, 9:27:55 AM3/7/10
to rack-...@googlegroups.com

It's roundly considered by every programming language community in the
world to be *really* bad form to let GC clean up IO. There's tons of
reasons for this:

* GC may not run as soon as you like, causing you to hold IO resources
too long (and under heavy load, potentially use up all descriptors for
a process)
* On some systems, old objects get promoted to heaps that are only
rarely collected, resulting in them potentially never GCing (or only
GCing extremely rarely)
* There's no reason you *shouldn't* be able to close resources when
they're no longer used. In this case, close any files opened for a
give request once the request has been processed (async libraries
using those files should expect them to be closed and deal with that
accordingly).

Bug-finding tools for several languages flag this sort of behavior as
a high-priority bug. Languages like Go, C#, and Java have or are
adding features to help guarantee you never do this. It's pretty much
universally frowned upon.

When I tweeted about Rubyists leaving IO objects to be cleaned up or
closed by GC, *everyone* agreed that it's a bug in the code, and that
nobody should ever rely on GC to release File/IO resources.

What more can I say? :)

- Charlie

Charles Oliver Nutter

unread,
Mar 7, 2010, 9:34:25 AM3/7/10
to rack-...@googlegroups.com
On Sat, Mar 6, 2010 at 4:25 AM, Hongli Lai <hon...@phusion.nl> wrote:
>> Why not?  Tempfiles are objects, too.  It's perfectly reasonable
>> to let GC clean them up like any other object.
>
> Phusion Passenger terminates worker processes by calling exit!. This
> does not call finalizers on MRI, causing temp files to be left behind
> sometimes. There seems to be no way to call finalizers, even GC.start
> doesn't work. I too would like to see Tempfiles being cleaned up
> explicitly.

We did run into some difficulties proving that MRI cleans up old files
when it GCs, mostly because of quirks in the GC (constructing the
tempfile in a method body kept it alive even when it wasn't
referenced, but constructing it in a block invocation allowed it to
GC...MRI's GC baffles me sometimes). But we did confirm it...it's just
unreliable in certain cases (and obviously if you exit! it's totally
useless, since the process just disappears and leaves files behind).

All IO-related objects should be explicitly cleaned up, without
exception...including the unlink of a tempfile once it's no longer
needed.

> Other than that, I've seen system administrators who are confused by
> the fact that such Tempfiles are not immediately cleaned up. Some
> people who operate websites that handle a large number of concurrent
> uploads are worried that they might run out of disk space because of
> this.

This is what we ran into. Because of a bug in JRuby 1.4, our new
Tempfile impl was only unlinking them on exit (or when explicitly told
to do so, of course). A user eventually filled their temp space since
various Ruby libraries did not explicitly close! Tempfile resources,
including Rack (attachment_fu was the other one...have not looked into
that yet).

Yes, JRuby was incorrect to not unlink tempfiles on GC, but Rack (and
any other Ruby library dong the same) is broken if it does not clean
up tempfiles once they're no longer needed.

I'm certainly willing to help fix Rack, but I'm not sure where would
be the best place to close! the tempfile.

- Charlie

Eric Wong

unread,
Mar 7, 2010, 6:53:37 PM3/7/10
to rack-...@googlegroups.com
Hongli Lai <hon...@phusion.nl> wrote:
> On Mar 6, 8:55�am, Eric Wong <normalper...@yhbt.net> wrote:
> > Charles Oliver Nutter <head...@headius.com> wrote:
> >
> > > If I'm correct, this is a bug. Tempfiles should not be relied upon to
> > > clean themselves up in response to GC, since you don't know when GC
> > > will fire...
> >
> > Why not? �Tempfiles are objects, too. �It's perfectly reasonable
> > to let GC clean them up like any other object.
>
> Phusion Passenger terminates worker processes by calling exit!.

Always? Why?

> This does not call finalizers on MRI, causing temp files to be left behind
> sometimes. There seems to be no way to call finalizers, even GC.start
> doesn't work. I too would like to see Tempfiles being cleaned up
> explicitly.
>
> Other than that, I've seen system administrators who are confused by
> the fact that such Tempfiles are not immediately cleaned up. Some
> people who operate websites that handle a large number of concurrent
> uploads are worried that they might run out of disk space because of
> this.

Doesn't Passenger have the luxury of only supporting sane platforms
that let you unlink Tempfiles and continue to use them?

--
Eric Wong

Eric Wong

unread,
Mar 7, 2010, 7:18:34 PM3/7/10
to rack-...@googlegroups.com
Charles Oliver Nutter <hea...@headius.com> wrote:
> On Sat, Mar 6, 2010 at 1:55 AM, Eric Wong <normal...@yhbt.net> wrote:
> > Charles Oliver Nutter <hea...@headius.com> wrote:
> >> If I'm correct, this is a bug. Tempfiles should not be relied upon to
> >> clean themselves up in response to GC, since you don't know when GC
> >> will fire...
> >
> > Why not? �Tempfiles are objects, too. �It's perfectly reasonable
> > to let GC clean them up like any other object.
> >
> >
> > I don't know why RewindableInput is monkeying with Tempfile internals,
> > though.
>
> It's roundly considered by every programming language community in the
> world to be *really* bad form to let GC clean up IO. There's tons of
> reasons for this:

You're forgetting there are programming language communities that
consider GC to be bad form in the first place :)

Given that Ruby does have a GC, I'd prefer to be as lazy as possible and
let the runtime deal with it.

> * GC may not run as soon as you like, causing you to hold IO resources
> too long (and under heavy load, potentially use up all descriptors for
> a process)

MRI considers EMFILE/ENFILE to be analogous to ENOMEM, and would trigger
GC in that case.

> * On some systems, old objects get promoted to heaps that are only
> rarely collected, resulting in them potentially never GCing (or only
> GCing extremely rarely)

Those GCs could be made smarter about dealing with IO objects.

> * There's no reason you *shouldn't* be able to close resources when
> they're no longer used. In this case, close any files opened for a
> give request once the request has been processed (async libraries
> using those files should expect them to be closed and deal with that
> accordingly).

Of course being able to close IO objects explicitly is good. In some
cases of socket/pipe IPC, it's the _only_ way to signal end-of-input.
HTTP/1.0 and HTTP/0.9 don't work otherwise.

> Bug-finding tools for several languages flag this sort of behavior as
> a high-priority bug. Languages like Go, C#, and Java have or are
> adding features to help guarantee you never do this. It's pretty much
> universally frowned upon.

As I understand it, C# and Java have to deal with operating/file
systems that don't work with unlinked files. I guess JRuby is limited
to only being able to use things that work on those platforms, and can't
work with unlinked temporary files reliably, my condolences.

I haven't had a chance to look Go...

> When I tweeted about Rubyists leaving IO objects to be cleaned up or
> closed by GC, *everyone* agreed that it's a bug in the code, and that
> nobody should ever rely on GC to release File/IO resources.
>
> What more can I say? :)

*everyone* meaning your followers on Twitter? I suspect your followers
on Twitter are more inclined to agree with you :)

--
Eric Wong

Eric Wong

unread,
Mar 7, 2010, 7:22:17 PM3/7/10
to rack-...@googlegroups.com
Charles Oliver Nutter <hea...@headius.com> wrote:
> I'm certainly willing to help fix Rack, but I'm not sure where would
> be the best place to close! the tempfile.

I suspect it needs to be done a per-handler basis or inside the server.
rack/handler/fastcgi.rb and rack/handler/lsws.rb both create
RewindableInput objects and close them in the ensure block.

--
Eric Wong

Eric Wong

unread,
Mar 7, 2010, 8:12:04 PM3/7/10
to rack-...@googlegroups.com

I guess this deals with wonky multipart uploads that browsers still
generate these days[1]. Ugh, yeah, it's nasty...
The comment below in _call is very important.


1. Ensure all tempfiles created by Rack go into an array in env,
probably env["rack.tempfiles"]:

tempfile = Tempfile.new("foo")
(env["rack.tempfiles"] ||= []) << tempfile


2. Have a middleware wrap everything, including the response body:

class TempfileReaper < Struct.new(:app, :env, :body)

# --------------- in config.ru -----------
# use TempfileReaper
# run MyApp.new
def initialize(app)
super(app)
end

# we wrap the entire response body later on, so dup it for
# reentrancy. You can actually avoid the enforced dup as
# an optimization, probably...
def call(env)
dup._call(env)
end

# used when wrapping the response body
def each(&block)
body.each { |chunk| yield chunk }
end

# the Rack server should call this (when we're the body)
def close
tempfiles = env["rack.tempfiles"]
if tempfiles
tempfiles.each { |tmp| tmp.close! rescue nil }
end
end

# wrap the normal application call, saving env
def _call(env)
self.env = env

# XXX VERY IMPORTANT:
# you need to ensure env stays the same throughout the request,
# some middlewares overwrite/replace it instead of merge!-ing into it
status, headers, body = app.call(env)
self.body = body
[ status, headers, self ]
end
end


[1] - in my world of singing and dancing penguins, my clients use
"curl -T" to send proper PUT requests :)

--
Eric Wong

James Tucker

unread,
Mar 8, 2010, 5:05:33 AM3/8/10
to rack-...@googlegroups.com

On 6 Mar 2010, at 10:25, Hongli Lai wrote:

> On Mar 6, 8:55 am, Eric Wong <normalper...@yhbt.net> wrote:
>> Charles Oliver Nutter <head...@headius.com> wrote:
>>
>>> If I'm correct, this is a bug. Tempfiles should not be relied upon to
>>> clean themselves up in response to GC, since you don't know when GC
>>> will fire...
>>
>> Why not? Tempfiles are objects, too. It's perfectly reasonable
>> to let GC clean them up like any other object.
>
> Phusion Passenger terminates worker processes by calling exit!. This
> does not call finalizers on MRI, causing temp files to be left behind
> sometimes.

So use #exit. Speed isn't everything.

> There seems to be no way to call finalizers, even GC.start
> doesn't work. I too would like to see Tempfiles being cleaned up
> explicitly.

GC.start makes no promises of determinism, so expecting it to have any effect is wrong thinking, expecting calls over a longer time period to eventually do something, sure, but immediate, no.

The cleanup will have to be done by the handler stack or a wrapping middleware as Eric has pointed out.

> Other than that, I've seen system administrators who are confused by
> the fact that such Tempfiles are not immediately cleaned up. Some
> people who operate websites that handle a large number of concurrent
> uploads are worried that they might run out of disk space because of
> this.

There's no accounting for sysadmins who can't use lsof.

James Tucker

unread,
Mar 8, 2010, 5:07:27 AM3/8/10
to rack-...@googlegroups.com

I agree for debugging reasons. Finding some IO from 3 requests back really raises questions I'd rather not have to raise.

Hongli Lai

unread,
Mar 8, 2010, 6:26:48 AM3/8/10
to Rack Development
On Mar 8, 12:53 am, Eric Wong <normalper...@yhbt.net> wrote:
> Always?  Why?

To simply our code. Phusion Passenger forks-without-exec worker
processes from a parent. It does this on multiple levels. If I don't
call exit! then a SystemExit exception will propagate through the call
stack and I'd have to modify all the 'ensure' blocks in the parent to
take into account they're now being executed in a child process. It's
definitely possible to get rid of the exit! but it makes our code more
complicated.


> Doesn't Passenger have the luxury of only supporting sane platforms
> that let you unlink Tempfiles and continue to use them?

Yes but the Tempfile is created by Rack and on the Phusion Passenger
level we have no control over those tempfiles. Even if Rack
immediately unlinks those tempfiles, the actual disk space isn't
released until the file handles are closed. I believe Rack currently
does not even call #close on the Tempfiles.


James Tucker:


> So use #exit. Speed isn't everything.

It's not speed.

> There's no accounting for sysadmins who can't use lsof.

I disagree. Software is made to be used by users, whose skills might
vary significantly. Skilled sysadmins know how to use lsof and know
how to take care of problems, but not everybody is or can be a skilled
sysadmin. Server resources also vary greatly, and some people don't
and/or cannot have have a lot of disk space. I think software should
do the right thing even in the face of people who don't know how to
use lsof.

Charles's middleware looks simple enough. I approve.

Hongli Lai

unread,
Mar 8, 2010, 6:30:41 AM3/8/10
to Rack Development
On Mar 8, 12:26 pm, Hongli Lai <hon...@phusion.nl> wrote:
> Charles's middleware looks simple enough. I approve.

Sorry. I mean Eric's. :)

Another thing that you might want to keep in mind is that users might
File.rename() a certain Tempfile, in which case you will not want to
unlink it. For example suppose one uploads a large file to the web
server, I think there are web apps out there that will simply rename()
the tempfile to the storage directory and therefore avoid an expensive
copy operation.

Charles Oliver Nutter

unread,
Mar 8, 2010, 8:22:57 AM3/8/10
to rack-...@googlegroups.com
On Mon, Mar 8, 2010 at 5:26 AM, Hongli Lai <hon...@phusion.nl> wrote:
> I disagree. Software is made to be used by users, whose skills might
> vary significantly. Skilled sysadmins know how to use lsof and know
> how to take care of problems, but not everybody is or can be a skilled
> sysadmin. Server resources also vary greatly, and some people don't
> and/or cannot have have a lot of disk space. I think software should
> do the right thing even in the face of people who don't know how to
> use lsof.

And this may be the best reason for cleaning up non-memory resources
explicitly (like files and file descriptors) rather than expecting GC
to do it for you: that's how the rest of the world does it. It's
unexpected (and usually frowned upon) for a program to leave dangling
non-memory resources all over the place, even if they'll eventually
(hopefully? maybe?) get cleaned up when GC fires (if it does before
something else happens, like a hard exit).

I sympathize with the desire to be "lazy" and let GC do it all, but
that's simply not how it's done. Non-memory resources should always be
explicitly cleaned up.

- Charlie

Randy Fischer

unread,
Mar 8, 2010, 9:33:02 AM3/8/10
to rack-...@googlegroups.com
On Mon, Mar 8, 2010 at 6:30 AM, Hongli Lai <hon...@phusion.nl> wrote:

> Another thing that you might want to keep in mind is that users might
> File.rename() a certain Tempfile, in which case you will not want to
> unlink it. For example suppose one uploads a large file to the web
> server, I think there are web apps out there that will simply rename()
> the tempfile to the storage directory and therefore avoid an expensive
> copy operation.

Small nit here: if tempfile's close is unlinking the file, it doesn't matter if
something else has linked to it. It's the filesystem's responsibility, after
all, to garbage collect unlinked files.

-Randy Fischer

James Tucker

unread,
Mar 8, 2010, 9:42:41 AM3/8/10
to rack-...@googlegroups.com

On 8 Mar 2010, at 11:26, Hongli Lai wrote:

> On Mar 8, 12:53 am, Eric Wong <normalper...@yhbt.net> wrote:
>> Always? Why?
>
> To simply our code. Phusion Passenger forks-without-exec worker
> processes from a parent. It does this on multiple levels. If I don't
> call exit! then a SystemExit exception will propagate through the call
> stack and I'd have to modify all the 'ensure' blocks in the parent to
> take into account they're now being executed in a child process. It's
> definitely possible to get rid of the exit! but it makes our code more
> complicated.

I can see how that would be painful, totally.

>> Doesn't Passenger have the luxury of only supporting sane platforms
>> that let you unlink Tempfiles and continue to use them?
>
> Yes but the Tempfile is created by Rack and on the Phusion Passenger
> level we have no control over those tempfiles. Even if Rack
> immediately unlinks those tempfiles, the actual disk space isn't
> released until the file handles are closed. I believe Rack currently
> does not even call #close on the Tempfiles.

Correct.

Some servers do, for example, in thin: http://github.com/macournoyer/thin/blob/master/lib/thin/request.rb#L143

> I think software should
> do the right thing even in the face of people who don't know how to
> use lsof.

Just to be clear, I don't disagree at all that this needs to be fixed. I think handlers/servers should close any tempfiles they create, and the requirement for middleware should be the same.

Charles Oliver Nutter

unread,
Mar 8, 2010, 9:43:33 AM3/8/10
to rack-...@googlegroups.com

I don't think it should be specified behavior that a tempfile *ever*
exists on disk. The mechanism of temporary storage for a large
incoming post should be a black box. Consider systems that won't
actually have a writable filesystem, like GAE; there, the "tempfile"
data would be stored in memory or in BigTable.

Rack should not require that large multipart posts or anything else
require a filesystem, nor guarantee that they'll be on any filesystem
that exists.

- Charlie

James Tucker

unread,
Mar 8, 2010, 9:49:41 AM3/8/10
to rack-...@googlegroups.com

It's for this reason that I'd prefer to see this handled correctly by servers +/ handlers in general.

Middleware may use what it likes, but it takes responsibility for cleanup of whatever it creates.


>
> - Charlie

Hongli Lai

unread,
Mar 8, 2010, 12:24:54 PM3/8/10
to Rack Development
On Mar 8, 3:42 pm, James Tucker <jftuc...@gmail.com> wrote:
> Just to be clear, I don't disagree at all that this needs to be fixed. I think handlers/servers should close any tempfiles they create, and the requirement for middleware should be the same.

Glad we're on the same page. :) Phusion Passenger does clean up its
own tempfiles. As for the link you posted, I think that's for cleaning
up the rack.input buffer file, not multipart files.


Charles Nutter:


> I don't think it should be specified behavior that a tempfile *ever*
> exists on disk. The mechanism of temporary storage for a large
> incoming post should be a black box. Consider systems that won't
> actually have a writable filesystem, like GAE; there, the "tempfile"
> data would be stored in memory or in BigTable.

Agreed. However I think it should be *possible* to access the on-disk
file directly as an optimization for apps that need it, they just
musn't rely on it to be available. For example if your app deals with
2 GB file uploads then you really want to avoid that extra copy when
storing the file upload in a storage location on the same filesystem,
whenever possible.

Charles Oliver Nutter

unread,
Mar 9, 2010, 2:43:34 AM3/9/10
to rack-...@googlegroups.com
On Mon, Mar 8, 2010 at 11:24 AM, Hongli Lai <hon...@phusion.nl> wrote:
> Charles Nutter:
>> I don't think it should be specified behavior that a tempfile *ever*
>> exists on disk. The mechanism of temporary storage for a large
>> incoming post should be a black box. Consider systems that won't
>> actually have a writable filesystem, like GAE; there, the "tempfile"
>> data would be stored in memory or in BigTable.
>
> Agreed. However I think it should be *possible* to access the on-disk
> file directly as an optimization for apps that need it, they just
> musn't rely on it to be available. For example if your app deals with
> 2 GB file uploads then you really want to avoid that extra copy when
> storing the file upload in a storage location on the same filesystem,
> whenever possible.

Yeah, that's reasonable, so long as it's not specified that it will
always be available on disk. On systems where filesystem access *is*
allowed, we of course do basically the same thing as MRI+Rack.

- Charlie

Eric Wong

unread,
Mar 16, 2010, 10:37:24 PM3/16/10
to rack-...@googlegroups.com
Hongli Lai <hon...@phusion.nl> wrote:
> On Mar 8, 12:26�pm, Hongli Lai <hon...@phusion.nl> wrote:
> > Charles's middleware looks simple enough. I approve.
>
> Sorry. I mean Eric's. :)

Btw, if you guys like the middleware please don't hesitate to
make a proper patch + spec and get it into Rack.

I don't really care for it (not enough to even be credited :).

--
Eric Wong

Charles Oliver Nutter

unread,
Mar 17, 2010, 11:41:11 AM3/17/10
to rack-...@googlegroups.com
On Sun, Mar 7, 2010 at 7:12 PM, Eric Wong <normal...@yhbt.net> wrote:
> I guess this deals with wonky multipart uploads that browsers still
> generate these days[1].  Ugh, yeah, it's nasty...
> The comment below in _call is very important.
>
>
> 1.  Ensure all tempfiles created by Rack go into an array in env,
>    probably env["rack.tempfiles"]:
>
>    tempfile = Tempfile.new("foo")
>    (env["rack.tempfiles"] ||= []) << tempfile

I'm with you so far...

> 2.  Have a middleware wrap everything, including the response body:

I'm a bit new to the middleware thing. Does this mean everyone would
have to configure middleware on their setups to get this
tempfile-closing behavior? And your comment below...does that mean
there are situations where this wouldn't work?

Closing and removing tempfiles when they're no longer needed should be
the default behavior, not something you have to configure. It's a bug
to not close and remove them (or at least, a bug to keep creating new
ones and expecting GC to clean everything up eventually).

>    class TempfileReaper < Struct.new(:app, :env, :body)
>
>      # --------------- in config.ru -----------
>      # use TempfileReaper
>      # run MyApp.new
>      def initialize(app)
>        super(app)
>      end
>
>      # we wrap the entire response body later on, so dup it for
>      # reentrancy.  You can actually avoid the enforced dup as
>      # an optimization, probably...
>      def call(env)
>        dup._call(env)
>      end
>
>      # used when wrapping the response body
>      def each(&block)
>        body.each { |chunk| yield chunk }
>      end
>
>      # the Rack server should call this (when we're the body)
>      def close
>        tempfiles = env["rack.tempfiles"]
>        if tempfiles
>          tempfiles.each { |tmp| tmp.close! rescue nil }
>        end
>      end

By "the Rack server should call this" do you mean that if the server
doesn't call this, tempfiles Rack creates will not be cleaned up until
GC runs?

Shouldn't Rack itself be guaranteeing it doesn't leave garbage files around?

>      # wrap the normal application call, saving env
>      def _call(env)
>        self.env = env
>
>        # XXX VERY IMPORTANT:
>        # you need to ensure env stays the same throughout the request,
>        # some middlewares overwrite/replace it instead of merge!-ing into it
>        status, headers, body = app.call(env)
>        self.body = body
>        [ status, headers, self ]
>      end
>    end

Same question as above...can badly-written middleware now cause
tempfiles to linger?

- Charlie

Eric Wong

unread,
Mar 18, 2010, 5:54:09 AM3/18/10
to rack-...@googlegroups.com
Charles Oliver Nutter <hea...@headius.com> wrote:
> On Sun, Mar 7, 2010 at 7:12 PM, Eric Wong <normal...@yhbt.net> wrote:
> > I guess this deals with wonky multipart uploads that browsers still
> > generate these days[1]. �Ugh, yeah, it's nasty...
> > The comment below in _call is very important.
> >
> >
> > 1. �Ensure all tempfiles created by Rack go into an array in env,
> > � �probably env["rack.tempfiles"]:
> >
> > � �tempfile = Tempfile.new("foo")
> > � �(env["rack.tempfiles"] ||= []) << tempfile
>
> I'm with you so far...
>
> > 2. �Have a middleware wrap everything, including the response body:
>
> I'm a bit new to the middleware thing. Does this mean everyone would
> have to configure middleware on their setups to get this
> tempfile-closing behavior? And your comment below...does that mean
> there are situations where this wouldn't work?
>
> Closing and removing tempfiles when they're no longer needed should be
> the default behavior, not something you have to configure. It's a bug
> to not close and remove them (or at least, a bug to keep creating new
> ones and expecting GC to clean everything up eventually).

I would expect major frameworks to stick this into the default
middleware stack as a convenience, but some users want a more bare-bones
Rack stack can still opt out. I see it as needless overhead as the vast
majority of HTTP requests do not create tempfiles.

I don't make decisions for Rack, though.

<snip>

> > � � �# the Rack server should call this (when we're the body)


> > � � �def close
> > � � � �tempfiles = env["rack.tempfiles"]
> > � � � �if tempfiles
> > � � � � �tempfiles.each { |tmp| tmp.close! rescue nil }
> > � � � �end
> > � � �end
>
> By "the Rack server should call this" do you mean that if the server
> doesn't call this, tempfiles Rack creates will not be cleaned up until
> GC runs?

Yes, a Rack-compliant server should call body.close if it responds to
body.close at the end of the response cycle for every individual
request.

I put this in the wrapped body because body.each {} could be relying on
the tempfiles to generate the response. body.close is the absolute last
action for any Rack application.

> Shouldn't Rack itself be guaranteeing it doesn't leave garbage files around?
>
> > � � �# wrap the normal application call, saving env
> > � � �def _call(env)
> > � � � �self.env = env
> >
> > � � � �# XXX VERY IMPORTANT:
> > � � � �# you need to ensure env stays the same throughout the request,
> > � � � �# some middlewares overwrite/replace it instead of merge!-ing into it
> > � � � �status, headers, body = app.call(env)
> > � � � �self.body = body
> > � � � �[ status, headers, self ]
> > � � �end
> > � �end
>
> Same question as above...can badly-written middleware now cause
> tempfiles to linger?

Yes, it might be safer to do this before app.call above:

� env["rack.tempfiles"] ||= []

That way if env gets replaced down the stack, e.g. via:

app.call(env.merge("foo.hello" => "world"))
# env.merge! would be correct above

any use of env["rack.tempfiles"] will still point to the same array.
Well, almost...

Then again some bad middleware could do:
� env["rack.tempfiles"] += [ tmp_a, tmp_b ]

Instead of what they _should_ do:
� env["rack.tempfiles"].concat([ tmp_a, tmp_b ])

So yes, discourage future middleware authors from replacing
the rack.tempfiles array.

--
Eric Wong

Wojtek Kruszewski

unread,
Jan 6, 2014, 5:45:41 PM1/6/14
to rack-...@googlegroups.com
I see this is still an issue. Is there a recommended way to have multipart temp files cleaned after request?


On Thursday, March 18, 2010 10:54:09 AM UTC+1, Eric Wong wrote:
Charles Oliver Nutter <hea...@headius.com> wrote:
> On Sun, Mar 7, 2010 at 7:12 PM, Eric Wong <normal...@yhbt.net> wrote:
> > I guess this deals with wonky multipart uploads that browsers still
> > generate these days[1]. �Ugh, yeah, it's nasty...

> > The comment below in _call is very important.
> >
> >
> > 1. �Ensure all tempfiles created by Rack go into an array in env,
> > � �probably env["rack.tempfiles"]:
> >
> > � �tempfile = Tempfile.new("foo")
> > � �(env["rack.tempfiles"] ||= []) << tempfile

>
> I'm with you so far...
>
> > 2. �Have a middleware wrap everything, including the response body:

>
> I'm a bit new to the middleware thing. Does this mean everyone would
> have to configure middleware on their setups to get this
> tempfile-closing behavior? And your comment below...does that mean
> there are situations where this wouldn't work?
>
> Closing and removing tempfiles when they're no longer needed should be
> the default behavior, not something you have to configure. It's a bug
> to not close and remove them (or at least, a bug to keep creating new
> ones and expecting GC to clean everything up eventually).

I would expect major frameworks to stick this into the default
middleware stack as a convenience, but some users want a more bare-bones
Rack stack can still opt out.  I see it as needless overhead as the vast
majority of HTTP requests do not create tempfiles.

I don't make decisions for Rack, though.

<snip>

> > � � �# the Rack server should call this (when we're the body)
> > � � �def close
> > � � � �tempfiles = env["rack.tempfiles"]
> > � � � �if tempfiles
> > � � � � �tempfiles.each { |tmp| tmp.close! rescue nil }
> > � � � �end
> > � � �end


>
> By "the Rack server should call this" do you mean that if the server
> doesn't call this, tempfiles Rack creates will not be cleaned up until
> GC runs?

Yes, a Rack-compliant server should call body.close if it responds to
body.close at the end of the response cycle for every individual
request.

I put this in the wrapped body because body.each {} could be relying on
the tempfiles to generate the response.  body.close is the absolute last
action for any Rack application.

> Shouldn't Rack itself be guaranteeing it doesn't leave garbage files around?
>

> > � � �# wrap the normal application call, saving env
> > � � �def _call(env)
> > � � � �self.env = env
> >
> > � � � �# XXX VERY IMPORTANT:
> > � � � �# you need to ensure env stays the same throughout the request,
> > � � � �# some middlewares overwrite/replace it instead of merge!-ing into it
> > � � � �status, headers, body = app.call(env)
> > � � � �self.body = body
> > � � � �[ status, headers, self ]
> > � � �end
> > � �end


>
> Same question as above...can badly-written middleware now cause
> tempfiles to linger?

Yes, it might be safer to do this before app.call above:

�  env["rack.tempfiles"] ||= []

That way if env gets replaced down the stack, e.g. via:

   app.call(env.merge("foo.hello" => "world"))
   # env.merge! would be correct above

any use of env["rack.tempfiles"] will still point to the same array.
Well, almost...

Then again some bad middleware could do:

�  env["rack.tempfiles"] += [ tmp_a, tmp_b ]

Instead of what they _should_ do:

�  env["rack.tempfiles"].concat([ tmp_a, tmp_b ])

Eric Wong

unread,
Feb 11, 2014, 4:05:43 PM2/11/14
to rack-...@googlegroups.com
Wojtek Kruszewski <woj...@oxos.pl> wrote:
> I see this is still an issue. Is there a recommended way to have multipart
> temp files cleaned after request?

Not that I know of, unfortunately (maybe there is, and I was hoping
somebody from rack-core would answer, but that hasn't happened).

Lenny Marks

unread,
Mar 27, 2014, 5:40:35 PM3/27/14
to rack-...@googlegroups.com
I've submitted a pull request to address this: https://github.com/rack/rack/pull/671
Reply all
Reply to author
Forward
0 new messages