Bug or feature: pre_save used on FileField/ImageField doesn't work as in Django 1.0.2

629 views
Skip to first unread message

Lior Gradstein

unread,
Apr 2, 2009, 8:59:14 AM4/2/09
to Django developers
I noticed an important change in behaviour between Django 1.0.2 and
Django 1.1+ (svn and beta versions).

If I set a signal that intercepts as pre_save and try to access a
FileField/ImageField's path, I don't get the same result. It seems the
'upload_to' attribute is not taken into account!
It seems to be set much later, compared to version 1.0.2. Is this a
feature or a bug?

Here's a very simple example to show the problem:

* settings.py (relevant parts only):

MEDIA_ROOT = '/tmp/static'
MEDIA_URL = '/static/'

* models.py:

from django.db import models
from django.db.models import signals

class MyModel(models.Model):
myfile = models.FileField(upload_to="uploads")


def intercept(sender, instance, signal, *args, **kwargs):
print instance.myfile
print instance.myfile.path
print instance.myfile.url

signals.pre_save.connect(intercept, sender=MyModel)

- In Django 1.0.2-final:

uploads/a__.flv
/tmp/static/uploads/a__.flv
/static/uploads/a__.flv

- In Django 1.1+ (1.1beta and latest SVN version 1.1 beta 1
SVN-10367):

a.flv
/tmp/static/a.flv
/static/a.flv

Which is really wrong.

Thanks,
Lior

Karen Tracey

unread,
Apr 2, 2009, 11:45:32 AM4/2/09
to django-d...@googlegroups.com
On Thu, Apr 2, 2009 at 8:59 AM, Lior Gradstein <lior.gr...@gmail.com> wrote:

I noticed an important change in behaviour between Django 1.0.2 and
Django 1.1+ (svn and beta versions).

If I set a signal that intercepts as pre_save and try to access a
FileField/ImageField's path, I don't get the same result. It seems the
'upload_to' attribute is not taken into account!
It seems to be set much later, compared to version 1.0.2. Is this a
feature or a bug?

I think it's a bug, and I'm pretty sure it was introduced by r9766.  The setting of the actual file name that will be used (which pulls in the upload_to path and possibly tacks on some underscores if the uploaded name conflicts with already-present file(s) in the upload directory) is done by django.db.models.fields.files.FieldFile save().  This used to be called fairly early on from the django.db.models.fields.files.FileField save_form_data() method but r9766 moved that to much later, namely a newly-introduced pre_save() routine in FileField.  Since (from a brief look at django.db.models.base.Model save_base()) the field pre_save routine is called after the pre_save signal is sent, the real file name has now not yet been set when the pre_save signal is sent.

There are a couple of still-open tickets for other side-effects of r9766 (#10249, #10300) but I don't believe either of the potential fixes for those would fix this issue, so this probably needs a ticket with the details you provided (an integrated testcase would be even better, though I don't know how much signal testing is done by the test suite).  Not sure what the right fix would be -- can just the setting of the name be pulled out of save() and done earlier by save_form_data()?  Possibly save() would have to also still ensure it's done, in cases where save_form_data() is not involved?  Can anyone with a better handle on r9766 comment on this (and #10249, #10300)?

Karen

Marty Alchin

unread,
Apr 2, 2009, 12:33:36 PM4/2/09
to django-d...@googlegroups.com
On Thu, Apr 2, 2009 at 11:45 AM, Karen Tracey <kmtr...@gmail.com> wrote:

> I think it's a bug, and I'm pretty sure it was introduced by r9766.  The
> setting of the actual file name that will be used (which pulls in the
> upload_to path and possibly tacks on some underscores if the uploaded name
> conflicts with already-present file(s) in the upload directory) is done by
> django.db.models.fields.files.FieldFile save().  This used to be called
> fairly early on from the django.db.models.fields.files.FileField
> save_form_data() method but r9766 moved that to much later, namely a
> newly-introduced pre_save() routine in FileField.  Since (from a brief look
> at django.db.models.base.Model save_base()) the field pre_save routine is
> called after the pre_save signal is sent, the real file name has now not yet
> been set when the pre_save signal is sent.

Ugh. Thanks, Karen, for looking into this and so many others that came
up after r9766. I've been fairly negligent on these, and it's starting
to cause some pretty big problems. r9766 may well mark the last time I
try to be clever. :(

> There are a couple of still-open tickets for other side-effects of r9766
> (#10249, #10300) but I don't believe either of the potential fixes for those
> would fix this issue, so this probably needs a ticket with the details you
> provided (an integrated testcase would be even better, though I don't know
> how much signal testing is done by the test suite).  Not sure what the right
> fix would be -- can just the setting of the name be pulled out of save() and
> done earlier by save_form_data()?  Possibly save() would have to also still
> ensure it's done, in cases where save_form_data() is not involved?  Can
> anyone with a better handle on r9766 comment on this (and #10249, #10300)?

For the record, #10044 (which was "fixed" by r9766) was entered as a
precursor to model validation. After some discussions with Honza, we
discovered that validating a model with a file would cause the file to
get saved to storage even if the model itself would never get stored
in the database. Our solution was to allow files to be assigned to
models outside the save() logic, and only store the file on disk when
the model actually hits the database.

Unfortunately, that proved problematic, because if you assigned an
UploadedFile to a model attribute, it didn't have the FieldFile
methods necessary to actually store it in the backend. I briefly
considered leaving that alone, and not relying on FieldFile when
finally saving the delayed file, but that created an inconsistent API.
After all, if you have a function that acts on models with files
attached, but you pass it a model with an *unsaved* file, it would
have a different set of methods than if the file had been saved.

Thus, r9766 includes the task of creating a new hybrid class whenever
a new file is assigned to a FileField attribute. This implementation
detail is the cause of #10249 and #10300 that you mentioned, but this
one stems from the simple fact that file saving is delayed as long as
possible. A related, thus-far-unreported (I think) issue comes up when
attempting to access width_field and height_field attributes on a
model prior to saving the new file.

I have a couple ideas to try out with regard to #10249 and #10300, so
feel free to hit me up on IRC if you'd like to discuss them, but I'm
not as certain how to approach this new one yet. There are certainly
band-aids that would allow the filename to be generated prior to
storing the file, but that wouldn't address the
width_field/height_field concern, nor any other potential future
issues that arise from delaying the file. I'd like to get to a point
where we're confident this sort of thing won't happen again, but if
the only option is to just keep putting out fires and they flare up,
it'll have to do.

As for addressing this one in particular, I think we'll have to add
something similar to the _committed flag, so we can identify whether
the filename has been processed or not. If it has, return it;
otherwise, generate it using upload_to when it's requested. The
trouble with just moving it up into save_form_data() is that if
someone assigns a file manually, save_form_data() won't get the chance
to generate a proper filename and we'll have a separate issue on our
hands. I hate having to add all these flags to determine what part of
the process we're in, but at a glance, that seems the most reasonable
approach.

-Gul

Karen Tracey

unread,
Apr 2, 2009, 1:22:20 PM4/2/09
to django-d...@googlegroups.com
I'm short on time right now so I just want to respond to one part of this that I just noticed and I'll come back later to digest the rest. Thanks for the background on the why to r9766. I knew it was model-validation related but didn't know what the specific issue was. It helps to have a clearer idea of the actual problem it's trying to solve.

Now the additional wrinkle I just noticed:

On Thu, Apr 2, 2009 at 12:33 PM, Marty Alchin <gulo...@gmail.com> wrote:
As for addressing this one in particular, I think we'll have to add
something similar to the _committed flag, so we can identify whether
the filename has been processed or not. If it has, return it;
otherwise, generate it using upload_to when it's requested. The
trouble with just moving it up into save_form_data() is that if
someone assigns a file manually, save_form_data() won't get the chance
to generate a proper filename and we'll have a separate issue on our
hands. I hate having to add all these flags to determine what part of
the process we're in, but at a glance, that seems the most reasonable
approach.

Upon further investigation, I think it's going to be pretty hard to disentangle setting the right file name from the actual save.  While the adding in of the upload_path part can be done whenever, the possible underscore additions are done, I now see, as part of save.  There's actually a loop in _save() that deals with multiple threads possibly picking the same name and colliding on save -- if it turns out the tentatively assigned name is already in use, get_available_name() is called again (adding at least one more underscore) to generate the next available name:

http://code.djangoproject.com/browser/django/trunk/django/core/files/storage.py#L150

Thus, as it is right now, the actual file name on disk isn't going to be really known for sure until the file is saved.  I need to head out the door in a few minutes and don't at the moment have any good idea for how to deal with this.  "Reserving" the name by actually creating the file on disk before save I think just puts things back to where they were pre-r9766: you'd wind up with possibly something (maybe just an empty file) saved on disk when in the end ultimately the model containing the file field was not saved, unless we do something clever to un-do the "reservation" at some (what?) point. Ugh. Maybe there's a better way I'm overlooking?

Finally one thought I'll throw out there -- since r9766 was needed for model validation, and that's not now going to be in 1.1, does r9766 need to stay in 1.1?  Personally I'd prefer to solve the issues now if it's going to be needed for model validation anyway in the future...but if it comes down to the wire and we don't have good answers for some of these side-effects, could we just un-do r9766?

Karen

Karen Tracey

unread,
Apr 6, 2009, 1:16:42 PM4/6/09
to django-d...@googlegroups.com
On Thu, Apr 2, 2009 at 12:33 PM, Marty Alchin <gulo...@gmail.com> wrote:
  
A related, thus-far-unreported (I think) issue comes up when
attempting to access width_field and height_field attributes on a
model prior to saving the new file.

Actually it looks like this was kinda reported, it just wasn't recognized as related to r9766:

http://code.djangoproject.com/ticket/10404

And the problem is slightly worse than just being unable to access the fields prior to saving: turns out if the fields are declared before the associated image field in the model, they aren't set properly at all.
 
We've now got unresolved:

#10249 - MRO exception on attempt to assign File to FileField
#10300 - Storage backend cannot get length of content
#10404 - Image width/height fields not set properly

plus the file name issue (pre-save signal handler does not get actual file name) that started this thread, which I don't believe has a ticket.

There are tentative patches for the first two (plus Marty you say you have some ideas how to fix them, but I'm not sure how those ideas differ from what's noted in the tickets).  #10404 seems fixable as well given a bit of thought.  But I really don't know what to do about the file name issue. 

We cannot know for sure what the file name is until it is saved to disk, as the save operation may tack on underscores when handling race conditions.  Thus we cannot delay file save to a field pre_save routine and report the guaranteed correct file name in a pre_save signal handler, since the latter runs before the former.  In fact, there may be code out there expecting to know the correct name much earlier than a pre-save signal.  For example:

form = ModelFormForModelContainingFileField(request.POST, request.FILES)
if form.is_valid():
    m = form.save(commit=False) # this actually saved the file to disk pre-9766
    # ... code that uses the FileField's assigned file name ...

works on 1.0.X but will run into trouble post-r9766 as the file won't be saved anymore as part of the form.save(commit=False).  I can easily cause some of our existing model_forms tests to fail due to reported file name differences by passing commit=False on form save.  Making the same commit=False additions pre-9766 does not cause the failures, as pre-r9766 the file was saved and the name was set during form save regardless of the commit argument.

I don't see how we can support knowing the actual  file name as early as we used to and simultaneously delay saving the file long enough so that model validation doesn't cause file data to be written to disk...though I don't know enough about the model validation implementation to understand exactly why it was triggering the write-to-disk, as in the code above it's happening during a form-specific save routine and a form wouldn't necessarily be involved in model validation.

I feel like I'm going around in circles thinking about this one -- is there a way out that someone else sees that I'm blind to?

Karen

Marty Alchin

unread,
Apr 6, 2009, 1:56:01 PM4/6/09
to django-d...@googlegroups.com
On Mon, Apr 6, 2009 at 1:16 PM, Karen Tracey <kmtr...@gmail.com> wrote:
> We cannot know for sure what the file name is until it is saved to disk, as
> the save operation may tack on underscores when handling race conditions.
> Thus we cannot delay file save to a field pre_save routine and report the
> guaranteed correct file name in a pre_save signal handler, since the latter
> runs before the former.

Yes, very true. I wrote my reply in a hurry to get you some more
information, and didn't really think all the way through that one.

> I feel like I'm going around in circles thinking about this one -- is there
> a way out that someone else sees that I'm blind to?

Welcome to my world! :( I spent a long time on issues like this prior
to getting the new file storage system in place at first, and
apparently skipped much of that step when doing r9766. At this point,
I think the most useful approach is to take a step back and just look
at the high-level options we have available. Trying to determine the
name in advance is a no-go, because that would just make the existing
known race conditions much more prominent. I see two options left:

1. Write something to disk (maybe the whole file, as it was before,
maybe just a placeholder, as the _save() does now) when the file is
first assigned, so that we have a full, proper filename early on.
Then, if the model doesn't get saved, somehow roll that back so we
don't leave stuff lingering on the filesystem. This is currently
something of a "then some magic happens" approach, since I'm not yet
sure if there's a reasonable, reliable way to roll back a file save.
But I'd like to keep an open mind, so there it is. This would also be
preferable if we can detect transaction rollbacks, because there is
also #6456 to consider.

2. Save the file all at once, as soon as possible, and blatantly
document this behavior when model validation goes in. Then, if people
need to validate a model that has a file on it, they have two options:
validate the model *before* saving the file to it, so the model is
known to be valid and can all be saved at once, or (if they need to
validate something in the file, such as its name or contents), add
their own code to delete the file if validation fails. Or, I suppose a
third option is to ignore the lingering files until they suck up too
much space, requiring a manual purge.

I obviously hate to go with number 2, but if we can't come up with
something solid, I think it's the better approach, at least for now.
Documenting unfortunate behavior is certainly preferable to coding
even more unfortunate behavior.

-Gul

Karen Tracey

unread,
Apr 6, 2009, 4:41:05 PM4/6/09
to django-d...@googlegroups.com
On Mon, Apr 6, 2009 at 1:56 PM, Marty Alchin <gulo...@gmail.com> wrote:

On Mon, Apr 6, 2009 at 1:16 PM, Karen Tracey <kmtr...@gmail.com> wrote:
> I feel like I'm going around in circles thinking about this one -- is there
> a way out that someone else sees that I'm blind to?

Welcome to my world! :( I spent a long time on issues like this prior
to getting the new file storage system in place at first, and
apparently skipped much of that step when doing r9766. At this point,
I think the most useful approach is to take a step back and just look
at the high-level options we have available. Trying to determine the
name in advance is a no-go, because that would just make the existing
known race conditions much more prominent. I see two options left:

1. Write something to disk (maybe the whole file, as it was before,
maybe just a placeholder, as the _save() does now) when the file is
first assigned, so that we have a full, proper filename early on.
Then, if the model doesn't get saved, somehow roll that back so we
don't leave stuff lingering on the filesystem. This is currently
something of a "then some magic happens" approach, since I'm not yet
sure if there's a reasonable, reliable way to roll back a file save.
But I'd like to keep an open mind, so there it is. This would also be
preferable if we can detect transaction rollbacks, because there is
also #6456 to consider.

I'm not seeing where exactly we have a _save() that writes just a placeholder?  The _save() in django.core.files.storage.FileSystemStorage writes the whole content, and I don't see any other _save as opposed to save() methods in the FileField area?
 

2. Save the file all at once, as soon as possible, and blatantly
document this behavior when model validation goes in. Then, if people
need to validate a model that has a file on it, they have two options:
validate the model *before* saving the file to it, so the model is
known to be valid and can all be saved at once, or (if they need to
validate something in the file, such as its name or contents), add
their own code to delete the file if validation fails. Or, I suppose a
third option is to ignore the lingering files until they suck up too
much space, requiring a manual purge.

I obviously hate to go with number 2, but if we can't come up with
something solid, I think it's the better approach, at least for now.
Documenting unfortunate behavior is certainly preferable to coding
even more unfortunate behavior.

What about:

3. Revert the removal of the FileField save_form_data override that was part of r9766.  It is that removal that is causing save() on a ModelForm to no longer save the file to disk.  If we restore it, names will get assigned for code using ModelForms just as they used to be. 

I am missing why the save_form_data override had to be removed in order to support the direct assignment of files to FileFields that is needed for model validation to work without the side-effect of model validation saving the file to disk.  I can see that it might be preferable, in general, to delay saving as long as possible so you don't wind up with files written to disk for models that are ultimately not saved to the DB.  But we didn't have that behavior in 1.0 -- model form save (even with commit=False), in 1.0.X, saves files to disk.  Trying to move the save later at this point, for the model form usage scenario, leads to backwards-compatibility problems.

It's kind of ugly to have one form of model creation/assignment ( via model forms) save data early and a 2nd (assignment of files to FileFields) save data late, so I can see we might not want to do this.  But I thought I'd throw it out there.

Reverting just that bit of r9766 also doesn't fix #10249 or #10300 which are resulting form the mixing in of FieldFile in the class bases.  It also doesn't fix #10404 in the case where someone uses the new assign-file-to-FileField without saving the field explicitly before the model.  So they'd still need fixing, but they seem more easily solved than this file name issue.

Karen

Lior Gradstein

unread,
Apr 28, 2009, 9:31:31 AM4/28/09
to Django developers
About the initial problem that started this thread, the associated bug
number is #10948.


On Apr 6, 10:41 pm, Karen Tracey <kmtra...@gmail.com> wrote:
> On Mon, Apr 6, 2009 at 1:56 PM, Marty Alchin <gulop...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages