[Django] #26562: Introduce Storage and FileSystemStorage alternate save behaviour

10 views
Skip to first unread message

Django

unread,
Apr 30, 2016, 4:26:47 AM4/30/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage | Keywords: file FileField storage
Severity: Normal | overwrite filepath
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
I have read through many discussions about justifying why uploaded files
should not be deleted or overwritten, but I don't agree with them at all
(I can get into that if needed) and I think the current implementation of
`django.core.files.storage.Storage.get_available_name()` and
`django.core.files.storage.FileSystemStorage._save()` are quite
undesirable.

The behaviour I am referring to is the fact that the
`Storage.get_available_name()` method is so crucial to the whole storage
and `FileField` game and yet its implementation is susceptible to race
conditions and that it is the root cause of a dev not being granted total
control of the '''final''' file/storage path when saving a file using the
storage API.

Flow on from that, the fact that the `FileSystemStorage._save()` method
infinitely loops until a file path collision does not occur (to open a
file) and then uses low-level locking mechanisms to try and retain
ownership of the file for writing.

If the core devs are keen on retaining this behaviour, so be it.
However, can we please build upon Django's reputation of configurability
and allow for an alternative (and non-default, to appease backwards
compatibility) behaviour where `Storage.get_available_name()` effectively
does nothing (except maybe length truncation) and
`FileSystemStorage._save()` does not have the infinite loop and does not
have the low-level file lock.

My proposals for configurability are:
* Global setting in the settings file
* Static attribute of the `Storage` class, which can be overridden

--
Ticket URL: <https://code.djangoproject.com/ticket/26562>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 30, 2016, 8:22:04 AM4/30/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by aaugustin):

* needs_docs: => 0
* needs_better_patch: => 0
* needs_tests: => 0
* stage: Unreviewed => Accepted


Comment:

I don't see why a new configurability mechanism would be needed. There's
already [https://docs.djangoproject.com/en/1.9/ref/settings/#default-file-
storage DEFAULT_FILE_STORAGE] for this purpose.

Even if it's entirely doable to subclass `FileSystemStorage` to implement
this behavior and configure it in DEFAULT_FILE_STORAGE, I think Django
could provide such a storage backend natively e.g. as
`django.core.files.storage.OverwritingFileSystemStorage`. (Better names
welcome.)

`OverwritingFileSystemStorage` would likely be a superclass of
`FileSystemStorage` because it has fewer features.

I'm accepting the ticket but I'd like another core dev to confirm before
we move forwards with a patch.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:1>

Django

unread,
Apr 30, 2016, 8:36:36 AM4/30/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by claudep):

+1 from me.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:2>

Django

unread,
Apr 30, 2016, 10:30:25 AM4/30/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by freshquiz):

A "native" `OverwritingFileSystemStorage` class would be all well and good
for file-system storage backends, but then what about other kinds of
backends (e.g. key-value cloud stores like Amazon S3)?

Half of the problem (as far as I see it), would still exist in the
`Storage.get_available_name()` method.

Please refer to #26058 for another issue which relates to my concerns
about coupling `Storage` with `FileSystemStorage`.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:3>

Django

unread,
Apr 30, 2016, 1:31:20 PM4/30/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by aaugustin):

Ah, indeed, my proposal wouldn't solve the "change the behavior of
Storage.get_available_name()" part of the problem.

Perhaps we can make a backwards-incompatible change there and move the
"try to find an available name" logic in a mixin? This would make it
reasonably easy for projets that extend `Storage` and depend on that logic
to keep it.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:4>

Django

unread,
Apr 30, 2016, 5:06:24 PM4/30/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

Maybe I'm misunderstanding but I don't think changing the default to
overwriting files of the same name is acceptable. Allowing untrusted users
(or even trusted users, accidentally) to clobber existing files is
dangerous.

This isn't a concern if you're using a callable `FileField.upload_to` and
generating a file name instead of using the one provided by the user, but
we can't assume that.

The use case needs a bit more explanation for me to buy into the idea of
including another `Storage` class in Django.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:5>

Django

unread,
Apr 30, 2016, 9:49:59 PM4/30/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by freshquiz):

Tim, are you referring to changing the default behaviour of `Storage`?
If so, I never said to make this alternative behaviour default. I
suggested the current behaviour be kept default, but just to have this
alternative behaviour as a configurable option.

If you are saying this behaviour shouldn't exist in Django at all, I would
like justification as to why, please.

Your argument about allowing users to clobber existing files, it doesn't
make sense to me.

If you believe you need to be that protective, why does Django offer a
`delete()` method on models?
Why aren't you overly protective of existing database records?
The answer is, because it makes no sense to natively not allow the ability
for database records to be deleted.
Why should countless Django client devs have to duplicate the behaviour
themselves, just because you think they are too careless to be able to
prevent misuse?

That might be a bit of an extreme analogy, but the ability to make raw
queries in Django, is less extreme, but still a valid analogy.
Yes it has the potential to be more risky than using model queries in the
traditional way, but you still enable the clients to be able to utilise it
if they need it (albeit with a lot of warnings).
Why not the same here?

I understand your (and other core devs') reservations, but I don't agree
that they are justification for not allowing the behaviour to exist
natively, at all.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:6>

Django

unread,
May 1, 2016, 6:42:54 AM5/1/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by aaugustin):

> I suggested the current behaviour be kept default, but just to have this
alternative behaviour as a configurable option.

I don't like the idea of piling up configuration mechanisms. There's
already a setting for selecting the storage backend.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:7>

Django

unread,
May 1, 2016, 7:55:51 AM5/1/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by freshquiz):

I don't like the idea of piling up configuration mechanisms.

Fair enough.

There's already a setting for selecting the storage backend.

Yes, I am currently using this to implement the desired behaviour using
subclasses, but there are two problems with putting the responsibility on
the client:
* Code has to be duplicated from the `FileSystemStorage._save()` method,
in order to override it with the desired behaviour.
* These overriding subclasses will be duplicated by every other client who
wants the desired behaviour.

How do you propose introducing the behaviour natively as a non-default
option, without breaking backwards compability and without a configuration
option?

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:8>

Django

unread,
May 1, 2016, 8:03:01 AM5/1/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by freshquiz):

Personally, I see the storage API as too coupled to traditional file
systems and (along with the `FileField` implementation) as too protective
in an undesirable and unpreditable way.

#26058 is step in the right direction (along with this ticket) in my
opinion, but I can see how the storage API and `FileField` have backed you
into a backwards compatibility corner.

I only opened this ticket for the sake of all current and future Django
clients, not because I can't achieve the desired behaviour myself.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:9>

Django

unread,
May 1, 2016, 11:01:06 AM5/1/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

The concern about clobbering files is this: if user A uploads a file named
"foo.txt" for model instance A, then user B could upload a file of the
same name for model instance B and it would overwrite the first file.''

Again, I'd ask if you could please elaborate on your use case so we can
better understand exactly what problem needs to be solved.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:10>

Django

unread,
May 1, 2016, 3:35:22 PM5/1/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by aaugustin):

Replying to [comment:8 freshquiz]:


> Yes, I am currently using this to implement the desired behaviour using
subclasses, but there are two problems with putting the responsibility on
the client:

We're debating different ideas here, let's not mix them.

I didn't suggest to let each project implement its own storage backend. I
originally accepted the ticket and I said it would be useful for Django to
provide the behavior you're describing natively.

I also said that this behavior should be selected with the current
configuration mechanism, the DEFAULT_FILE_STORAGE setting, not with an
additional setting.

In my opinion, it would be a net negative for Django to add an ad-hoc
configuration mechanism just for this feature. Irregular configuration
options are hard to discover and easy to misconfigure. I'd rather not have
the feature requested in this ticket than have it controlled with a new
setting. That's my opinion; of course others may disagree :-)

> How do you propose introducing the behaviour natively as a non-default
option, without breaking backwards compability and without a configuration
option?

There's a variety of ways to do this, the easiest being to add a
`BaseStorage` or `SimpleStorage` or `OverwritingStorage` or
`UnsafeStorage` class as a superclass of the current `Storage` class,
which would keep its behavior. I'm happy to leave that part of the design
to the person who writes the patch.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:11>

Django

unread,
May 1, 2016, 7:08:47 PM5/1/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by freshquiz):

Replying to [comment:10 timgraham]:


> The concern about clobbering files is this: if user A uploads a file
named "foo.txt" for model instance A, then user B could upload a file of
the same name for model instance B and it would overwrite the first
file.''

I don't understand why you see that as too dangerous/hard for the client
to be able to handle.
Again, I ask why Django provides `Model.delete()` and `raw()` if you think
the clients are so careless?

> Again, I'd ask if you could please elaborate on your use case so we can
better understand exactly what problem needs to be solved.

I'm sorry, I missed the part about you asking for my use case.

Yes, I am using a callable `upload_to`, which does allow me to partially
control the storage path/key, but the current behaviour of mangling the
final path/key (with random characters) prevents me from having total
control.

I use a combination of model name, field name and UUID to generate unique
storage paths/keys, via `upload_to`.
My use case goal is to ensure the following tuple does not result in
clobbering/overwriting:
`(model_name, field_name, UUID)`, where the UUID part is just a substitute
for a primary key.

I don't take into account file name for uniqueness, but that's just
because of my use case.
E.g. If there is a model instance with model '''ModelABC''' that has
`FileField` named '''xyz''' and gets a UUID of '''123''', regardless of
how many files are uploaded and what their end-user specified names are,
there will only ever be one file present in the storage system.

Please let me know if you need more info about my use case.

Maybe this use case is too esoteric to warrant a "native" new feature.
I just thought it would benefit others, given how many discussions and
clunky workarounds I came across while trying to cook up a neat
workaround/solution for myself.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:12>

Django

unread,
May 1, 2016, 7:36:12 PM5/1/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

My concern about file clobbering was related to Aymeric suggesting making
a backwards-incompatible change in comment:4 (again, maybe I mis
understood the ramifications of his proposal).

Anyway, I don't mind looking a patch that might make the use case easier.
In the interim, it looks like [https://github.com/un1t/django-cleanup
django-cleanup] might fit your needs.

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:13>

Django

unread,
May 1, 2016, 8:17:14 PM5/1/16
to django-...@googlegroups.com
#26562: Introduce Storage and FileSystemStorage alternate save behaviour
-------------------------------------+-------------------------------------
Reporter: freshquiz | Owner: nobody
Type: New feature | Status: new
Component: File | Version: master
uploads/storage |
Severity: Normal | Resolution:
Keywords: file FileField | Triage Stage: Accepted
storage overwrite filepath |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by freshquiz):

Replying to [comment:13 timgraham]:


> My concern about file clobbering was related to Aymeric suggesting
making a backwards-incompatible change in comment:4 (again, maybe I mis
understood the ramifications of his proposal).

I see...

> Anyway, I don't mind looking a patch that might make the use case
easier. In the interim, it looks like [https://github.com/un1t/django-

cleanup django-cleanup] might fit your needs.

That doesn't fit my needs unfortunately, because my use case warrants the
preservation of the original end-user specified file name, i.e. the
current file name mangling (to avoid overwrite) is undesirable.
Sorry, I forgot to mention this directly, even though I repeatedly said
that the current implementation does not ensure the client-dev has total
control over the final path/key.

So I guess for me it's two problems:
* Not having total control over the storage path/key
* Not being able to overwrite files
...where the latter is caused by the former.

I have a workaround (albeit I'm still finishing it off), which will also
take into account the fact that there is no `post_save()` method on model
`Field` sub-classes (meaning that files get saved into storage potentially
before a database save is about to fail; another undesirable trait of the
current implementation, to me), but it does not take into consideration
backwards compatibility (otherwise I would have submitted a patch as well
as the ticket).

--
Ticket URL: <https://code.djangoproject.com/ticket/26562#comment:14>

Reply all
Reply to author
Forward
0 new messages