I noticed an important change in behaviour between Django 1.0.2 and
Django 1.1+ (svn and beta versions).
If I set a signal that intercepts as pre_save and try to access a
FileField/ImageField's path, I don't get the same result. It seems the
'upload_to' attribute is not taken into account!
It seems to be set much later, compared to version 1.0.2. Is this a
feature or a bug?
> I think it's a bug, and I'm pretty sure it was introduced by r9766. The
> setting of the actual file name that will be used (which pulls in the
> upload_to path and possibly tacks on some underscores if the uploaded name
> conflicts with already-present file(s) in the upload directory) is done by
> django.db.models.fields.files.FieldFile save(). This used to be called
> fairly early on from the django.db.models.fields.files.FileField
> save_form_data() method but r9766 moved that to much later, namely a
> newly-introduced pre_save() routine in FileField. Since (from a brief look
> at django.db.models.base.Model save_base()) the field pre_save routine is
> called after the pre_save signal is sent, the real file name has now not yet
> been set when the pre_save signal is sent.
Ugh. Thanks, Karen, for looking into this and so many others that came
up after r9766. I've been fairly negligent on these, and it's starting
to cause some pretty big problems. r9766 may well mark the last time I
try to be clever. :(
> There are a couple of still-open tickets for other side-effects of r9766
> (#10249, #10300) but I don't believe either of the potential fixes for those
> would fix this issue, so this probably needs a ticket with the details you
> provided (an integrated testcase would be even better, though I don't know
> how much signal testing is done by the test suite). Not sure what the right
> fix would be -- can just the setting of the name be pulled out of save() and
> done earlier by save_form_data()? Possibly save() would have to also still
> ensure it's done, in cases where save_form_data() is not involved? Can
> anyone with a better handle on r9766 comment on this (and #10249, #10300)?
For the record, #10044 (which was "fixed" by r9766) was entered as a
precursor to model validation. After some discussions with Honza, we
discovered that validating a model with a file would cause the file to
get saved to storage even if the model itself would never get stored
in the database. Our solution was to allow files to be assigned to
models outside the save() logic, and only store the file on disk when
the model actually hits the database.
Unfortunately, that proved problematic, because if you assigned an
UploadedFile to a model attribute, it didn't have the FieldFile
methods necessary to actually store it in the backend. I briefly
considered leaving that alone, and not relying on FieldFile when
finally saving the delayed file, but that created an inconsistent API.
After all, if you have a function that acts on models with files
attached, but you pass it a model with an *unsaved* file, it would
have a different set of methods than if the file had been saved.
Thus, r9766 includes the task of creating a new hybrid class whenever
a new file is assigned to a FileField attribute. This implementation
detail is the cause of #10249 and #10300 that you mentioned, but this
one stems from the simple fact that file saving is delayed as long as
possible. A related, thus-far-unreported (I think) issue comes up when
attempting to access width_field and height_field attributes on a
model prior to saving the new file.
I have a couple ideas to try out with regard to #10249 and #10300, so
feel free to hit me up on IRC if you'd like to discuss them, but I'm
not as certain how to approach this new one yet. There are certainly
band-aids that would allow the filename to be generated prior to
storing the file, but that wouldn't address the
width_field/height_field concern, nor any other potential future
issues that arise from delaying the file. I'd like to get to a point
where we're confident this sort of thing won't happen again, but if
the only option is to just keep putting out fires and they flare up,
it'll have to do.
As for addressing this one in particular, I think we'll have to add
something similar to the _committed flag, so we can identify whether
the filename has been processed or not. If it has, return it;
otherwise, generate it using upload_to when it's requested. The
trouble with just moving it up into save_form_data() is that if
someone assigns a file manually, save_form_data() won't get the chance
to generate a proper filename and we'll have a separate issue on our
hands. I hate having to add all these flags to determine what part of
the process we're in, but at a glance, that seems the most reasonable
approach.
-Gul
As for addressing this one in particular, I think we'll have to add
something similar to the _committed flag, so we can identify whether
the filename has been processed or not. If it has, return it;
otherwise, generate it using upload_to when it's requested. The
trouble with just moving it up into save_form_data() is that if
someone assigns a file manually, save_form_data() won't get the chance
to generate a proper filename and we'll have a separate issue on our
hands. I hate having to add all these flags to determine what part of
the process we're in, but at a glance, that seems the most reasonable
approach.
A related, thus-far-unreported (I think) issue comes up when
attempting to access width_field and height_field attributes on a
model prior to saving the new file.
Yes, very true. I wrote my reply in a hurry to get you some more
information, and didn't really think all the way through that one.
> I feel like I'm going around in circles thinking about this one -- is there
> a way out that someone else sees that I'm blind to?
Welcome to my world! :( I spent a long time on issues like this prior
to getting the new file storage system in place at first, and
apparently skipped much of that step when doing r9766. At this point,
I think the most useful approach is to take a step back and just look
at the high-level options we have available. Trying to determine the
name in advance is a no-go, because that would just make the existing
known race conditions much more prominent. I see two options left:
1. Write something to disk (maybe the whole file, as it was before,
maybe just a placeholder, as the _save() does now) when the file is
first assigned, so that we have a full, proper filename early on.
Then, if the model doesn't get saved, somehow roll that back so we
don't leave stuff lingering on the filesystem. This is currently
something of a "then some magic happens" approach, since I'm not yet
sure if there's a reasonable, reliable way to roll back a file save.
But I'd like to keep an open mind, so there it is. This would also be
preferable if we can detect transaction rollbacks, because there is
also #6456 to consider.
2. Save the file all at once, as soon as possible, and blatantly
document this behavior when model validation goes in. Then, if people
need to validate a model that has a file on it, they have two options:
validate the model *before* saving the file to it, so the model is
known to be valid and can all be saved at once, or (if they need to
validate something in the file, such as its name or contents), add
their own code to delete the file if validation fails. Or, I suppose a
third option is to ignore the lingering files until they suck up too
much space, requiring a manual purge.
I obviously hate to go with number 2, but if we can't come up with
something solid, I think it's the better approach, at least for now.
Documenting unfortunate behavior is certainly preferable to coding
even more unfortunate behavior.
-Gul
On Mon, Apr 6, 2009 at 1:16 PM, Karen Tracey <kmtr...@gmail.com> wrote:Welcome to my world! :( I spent a long time on issues like this prior
> I feel like I'm going around in circles thinking about this one -- is there
> a way out that someone else sees that I'm blind to?
to getting the new file storage system in place at first, and
apparently skipped much of that step when doing r9766. At this point,
I think the most useful approach is to take a step back and just look
at the high-level options we have available. Trying to determine the
name in advance is a no-go, because that would just make the existing
known race conditions much more prominent. I see two options left:
1. Write something to disk (maybe the whole file, as it was before,
maybe just a placeholder, as the _save() does now) when the file is
first assigned, so that we have a full, proper filename early on.
Then, if the model doesn't get saved, somehow roll that back so we
don't leave stuff lingering on the filesystem. This is currently
something of a "then some magic happens" approach, since I'm not yet
sure if there's a reasonable, reliable way to roll back a file save.
But I'd like to keep an open mind, so there it is. This would also be
preferable if we can detect transaction rollbacks, because there is
also #6456 to consider.
2. Save the file all at once, as soon as possible, and blatantly
document this behavior when model validation goes in. Then, if people
need to validate a model that has a file on it, they have two options:
validate the model *before* saving the file to it, so the model is
known to be valid and can all be saved at once, or (if they need to
validate something in the file, such as its name or contents), add
their own code to delete the file if validation fails. Or, I suppose a
third option is to ignore the lingering files until they suck up too
much space, requiring a manual purge.
I obviously hate to go with number 2, but if we can't come up with
something solid, I think it's the better approach, at least for now.
Documenting unfortunate behavior is certainly preferable to coding
even more unfortunate behavior.