Well, it does also prevent the data loss from happening ;) This data
loss is not a hypothetical problem; we had bug reports from users
affected by it.
> Why not simply letting the developer decide when to enable or disable
> it with a constructor boolean parameter?
>
> My company sells multimedia web applications normally handling over
> 10000 files over various models.
> I am sorry to say that but to me the idea of running a cron job to
> remove orphaned files does not seam to be practical. Shall I make a
> query for each file?
I don't see why that would be necessary. One query for each model
containing one or more FileFields is enough to build a list of the files
that ought to exist, and any file not in that list can presumably be
removed.
> The roll back data loss problem could have been solved by copying the
> file into a temporary file and by restoring it if necessary.
Emulating the transactional behavior of a relational database is not
that trivial. We considered this approach carefully and decided that if
we tried to go down that road, we'd be continually finding and fixing
edge-case bugs in it, and any bug in it would be likely to be a
data-loss bug. Deleting files when we can't be sure it's the right thing
to do is a very dangerous business to be in.
> Am I the only one who would like to see the previous behaviour
> restored? Can we at least re-enable this feature from the file-field
> constructor?
If you want the previous behavior, it's not at all difficult to restore
it with a post-save signal handler. You can make your own trivial
subclass of FileField that attaches this post-save handler in the
contribute_to_class method: that's precisely what FileField used to do.
Carl
I'm sorry this caused an problem for you. Hopefully it's not *too* big
a deal: a FileField subclass along the lines Carl suggested should be
able to provide the same behavior has you've seen before at minimal
effort. I'd do something like: https://gist.github.com/889692.
But just for the record, we're pretty much always going to make calls
like this. Data loss is one of the few places I'm happy to break
backwards compatibility. When it comes down to unexpected data
retention versus unexpected data loss we're always going to try to err
on the site of retention. Too much data's easy to deal with: delete
some stuff. But lost data means a trip to the backups if you're lucky,
and a really bad day (or week, or month, ...) if you're not.
Jacob
This is correct -- but this is only a problem if you were affected by
the previous issue. That's not necessarily the case. This is a
situation where Django has to work for *every* case, but it's entirely
possible that one specific site may not be affected. We have to err on
the side of caution because we support everyone's usage, not just one
particular use pattern.
> Having a maintenance job deleting file not listed will require a
> serious maintenance.
> Suppose a developer adding a file field and forgetting to update the
> maintenance script will cause all the file of that field to be
> deleted.
> Files which for a bad design are in the same folder as the file
> pointed by the file field will be removed.
> Thumbnails and other files eventually not pointed by a file field will
> be removed.
>
> And there will be more serious failures depending from the
> implementation.
> What I am trying to say is that removing orphaned files, even if with
> a cron job, should be done by django automatically and not assuming
> that the developers will take care of that.
>
> Said so I will start implementing such a maintenance job, and I am
> willing to share it so maybe we could include it in a future release
> of django.
If you can propose such a cleanup task, it's certainly worth
considering for inclusion into Django. There's precedent for Django to
include such cleanup tools -- for example, we include a cron task for
cleaning up stale session entries.
However, the session cron task is a 100% reliable solution that can be
implemented efficiently, and there's only one use pattern (you write
new session to the table, you delete old sessions from the table, and
that's it).
The problem with a FileFIeld cleanup task as an idea is that the
original bug with FileField exists because there are many different
ways that FileFields can be used, and we have to support *all* of
them. At the very least, a cleanup task included as part of Django
core would need to work for a significant and easy to identify subset
of uses, and be able to self-identify the cases where it will or won't
work (e.g., as a validation step). The one outcome we *won't* allow is
the case where someone adds a cleanup cron task "because the docs told
me to", and file data gets lost as a result.
Yours,
Russ Magee %-)
Sounds good -- I look forward to seeing your code!
Jacob
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
On 03/29/2011 01:36 AM, Alex Kamedov wrote:
> I think, cron jobs is an overhead in many simple cases where old
> behaviour was useful and more simpler.
> Why you don't want include DeletingFileField[1] in django?
>
> [1] https://gist.github.com/889692
Because, as mentioned above, it is known to cause data loss in certain
situations (rolled-back transactions, overlapping upload-to
directories), and we are not very fond of including things in Django
that cause some Django users to lose their data. If you understand those
risks and want to use DeletingFileField in your projects, it's not hard
to do so.
Carl