FileFields and file ownership

135 views
Skip to first unread message

Ole Laursen

unread,
Aug 11, 2009, 7:39:20 AM8/11/09
to Django developers
Hi!

There are a couple of bugs open/closed about what happens when you
upload a new file to a file field that already has a file:

http://code.djangoproject.com/ticket/11663
http://code.djangoproject.com/ticket/2983
http://code.djangoproject.com/ticket/4339

Progress is currently halted because a design decision is needed,
maybe the problem is conflicting visions of what FileField is.

First: what happens right now is that the old file is left behind. If
the new file has the same name as the old, it is mangled so both can
stay.

As has been pointed out in 2983, this is all else set aside a security
problem because the old file is essentially garbage that when left
behind makes you vulnerable to someone filling up the disk (say on
shared hosting with few resources) by uploading the same file over and
over. So even if you check file sizes, you're not safe.

Here are two ideas of what FileField is:

1) a convenient file pointer for facilitating the upload machinery
2) a field for storing a file, just like storing it directly in the
database except we put the data in the file system

I think Django is currently the first. It won't let you overwrite
files (insists on mangling), it doesn't clean up the garbage, it does
a sort of reference counting so when the object is deleted, it first
checks if other objects with the same field is pointing to the file
before whacking it.

The implication of the second idea is a one-to-one mapping between
fields and files that Django will do everything it can to maintain. I
think the difference amounts to: always delete old files, don't go
through the whole table upon delete since if you messed with the
pointers, you pay the price (consider a table unique constraint
instead), and perhaps be a bit more careful so it's possible to
reupload a file with the same name to the same object without hitting
the name mangling code.

The documentation just says it's "A file-upload field." I think a file-
upload field would be better served by the second idea because you
don't have to think about ownership - Django's got it. What do you
think?


Ole

Malcolm Tredinnick

unread,
Aug 11, 2009, 11:08:43 PM8/11/09
to django-d...@googlegroups.com
On Tue, 2009-08-11 at 04:39 -0700, Ole Laursen wrote:
> Hi!
>
> There are a couple of bugs open/closed about what happens when you
> upload a new file to a file field that already has a file:
>
> http://code.djangoproject.com/ticket/11663
> http://code.djangoproject.com/ticket/2983
> http://code.djangoproject.com/ticket/4339
>
> Progress is currently halted because a design decision is needed,
> maybe the problem is conflicting visions of what FileField is.
>
> First: what happens right now is that the old file is left behind. If
> the new file has the same name as the old, it is mangled so both can
> stay.
>
> As has been pointed out in 2983, this is all else set aside a security
> problem because the old file is essentially garbage

This is an assumption that isn't universally correct and it's
implications appear to be vastly under-appreciated. One of the reasons
we *don't* delete files automatically is that you cannot you know for
certain, given no other information, that no other process on the system
or elsewhere is still needing that file. You are implicitly assuming
that the file upload through Django means that Django instance is the
only thing using the file. Other Django installs could be using. Other
processes could be. It's impossible to tell.

So knowing when it is safe to delete files requires domain-specific
knowledge.

We need good ideas on addressing the denial of service via disk
exhaustion attack. But, for example, there are patches floating around
(SmileyChris created a new version recently) to allow a FileField to
optionally delete files after saving a replacement. I'm not sure if
Chris's patch saves and then deletes to handle errors correctly right
now, because I haven't reviewed it in that level of depth yet. That
would be something you might enable if you were allowing public file
uploads.

Or you might have a cronjob that periodically looks for and cleans
things up (an easy enough approach today) and the frequency with which
it runs -- or how it is triggered externally -- depends upon your
available resources.

Realise that uploading with the same filename isn't the only way to use
up disk space. The filename is a pretty random string to use as a save
thing and it takes about 35 seconds to write a script to generate
hundreds of thousands of filenames for the same file if different
filenames are the only impediment to DOS'ing a system.

Regards,
Malcolm

Jonas Pfeil

unread,
Aug 12, 2009, 10:32:10 AM8/12/09
to Django developers
Hi,

I just wanted to mention there still is ticket #7048 "Support clearing
FileFields with ModelForms" [1], which also deletes files, if
configured to do so. I don't think this is particularly evil, if off
by default (which it is not in the current patch, but that's easy to
change).

This was originally scheduled and written for 1.0 maybe we can at
least bring the FileField-clearing functionality into 1.2 ;)

Regards,

Jonas

[1] http://code.djangoproject.com/ticket/7048

On Aug 12, 5:08 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

Ole Laursen

unread,
Aug 12, 2009, 5:23:51 PM8/12/09
to Django developers
On Aug 12, 5:08 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> This is an assumption that isn't universally correct and it's
> implications appear to be vastly under-appreciated. One of the reasons
> we *don't* delete files automatically is that you cannot you know for
> certain, given no other information, that no other process on the system
> or elsewhere is still needing that file. You are implicitly assuming
> that the file upload through Django means that Django instance is the
> only thing using the file. Other Django installs could be using. Other
> processes could be. It's impossible to tell.
>
> So knowing when it is safe to delete files requires domain-specific
> knowledge.

Yes, well, exactly. I agree. :)

What I suggest is that we define this, instead of making implicit
assumptions. Either to be some kind of elaboration of the current
approach in Django, or simply that FileField is owning the file.

If FileField owns the file, then the semantics are easy to document
(and implement). If something else is using the file, you'll know you
have to do something about it. In my experience, that does not happen
often, but maybe Django could help a little with some configurability
just in case.

In the end, doesn't it boil down to what's the reasonable semantics
for a field for storing uploaded files? Am I missing some important
use case?

> Realise that uploading with the same filename isn't the only way to use
> up disk space. The filename is a pretty random string to use as a save
> thing and it takes about 35 seconds to write a script to generate
> hundreds of thousands of filenames for the same file if different
> filenames are the only impediment to DOS'ing a system.

Sure, it's not specific to the naming. The use case I am thinking of
is a simple user profile where you can attach a photo of yourself. If
you code that with Django and ImageField today, you've got a disk
leak, which seems counter-intuitive when you read that it's a file-
upload field.

I must admit that I was also very surprised to see yesterday that
deleting an object with a FileField causes a full table scan.


Ole

PS: Oh, by the way I really enjoy reading your blog from time to
time. :) It makes me wish I was a better writer.

Chris Beaven

unread,
Aug 12, 2009, 7:35:39 PM8/12/09
to Django developers
On Aug 12, 3:08 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> One of the reasons we *don't* delete files automatically [...snip]

Just a correction: Django currently *does* delete files automatically
when a model instance is deleted.

Ole Laursen

unread,
Aug 26, 2009, 7:27:10 AM8/26/09
to Django developers
On Aug 11, 1:39 pm, Ole Laursen <o...@iola.dk> wrote:
>   1) a convenient file pointer for facilitating the upload machinery
>   2) a field for storing a file, just like storing it directly in the
> database except we put the data in the file system

No conclusion?

Here are some options

1. Apply the patch in 11663 more or less as is. Pros: no API breakage.
Cons: all FileField objects are leaking unless you do something.
Deleting an object with a FileField causes full table scan (unless you
add an index yourself).

2. Apply the patch, but turn on orphan deletion by default. Pros:
FileField is leak-free out of the box. Cons: API break if you were
relying on the garbage, deleting still causes full table scan.

3. FileField owns the file it points to. Pros: no leak, simple code,
no table scans. Cons: API break if you were relying on two objects
pointing to the same file.

4. As 3, but add option to be more in line with current behaviour. So
API break, but it's easy to fix.


Ole

Léon Dignòn

unread,
Aug 29, 2009, 3:45:27 AM8/29/09
to Django developers
I would love to see this feature in Django as soon as possible :)
something like: avatar = ImageField(upload_to='avatars', blank=True,
replace_on_upload=True)

Also a Widget which adds a Checkbox to a ModelForm to clear the
ImageField, like RemovableFileField in ticket #7048, would be great!
> > Malcolm- Hide quoted text -
>
> - Show quoted text -

anentropic

unread,
Oct 21, 2009, 7:49:37 AM10/21/09
to Django developers
Just dropping by to add my support for adding this functionality. At
the moment it doesn't seem consistent that you automatically delete
the file when you delete a model (...what if something else was using
it?!) but don't do the same when uploading a new file to the same
field. It should be configurable.

The name collision-avoidance should only happen if there is a file of
the same name that doesn't belong to the currently edited model.
Reply all
Reply to author
Forward
0 new messages