You've been doing some great working tracking these issues down, so
your understanding of their current status is immensely helpful.
Especially since I've been slacking on these for quite some time and
could use the refresher.
> So far as I know there 4 open ticks remaining related to r9766. Three are
> regressions so I believe something really needs to be done about them before
> 1.1; one I think is just a bug in the new function. Personally I'd rather
> not revert the new function at this point (it's been available in the
> alpha/beta releases, it would be nasty to pull it at this point). However,
> reverting the part of r9766 that delayed saving the file to the backend
> until model save (in the case where a ModelForm is used to create a model)
> may be necessary to fix the last of the four problems, unless we want to
> document that one as a backwards-incompatible change.
To be clear, I would definitely rather not revert the change if these
can be fixed in a reasonable time frame. I just didn't want 1.1 to
block for ages while we sort this stuff out, given that all the
problems were caused by a new feature (that is, r9766 didn't actually
fix any bugs, the ticket reference was really just a formality).
> #10249: can't create consistent MRO (method resolution order) when assigning
> a File to a FileField. This is not a regression as I don't think you could
> do this at all before r9766. I added a patch to #10249 that fixes it, but it
> needs review from someone more knowledgeable about the implications of
> messing with class bases than me, particularly since I noticed it introduced
> another problem when combined with the patch for the next problem. There's
> a note about that in the last comment on #10249 with possible fixes for the
> subsequent problem. It's fixable, I just don't know what the right fix is,
> and I don't know what other side-effects there might be to this patch.
Yeah, this is the big one, in my opinion, because of the potential for
side effects. r9766 went in with too little thought on the subject
(entirely my fault, not passing blame), and I don't want to make that
same mistake twice. It's not a regression in and of itself, but the
feature is broken, all the same.
> (Marty also mentioned at one point he had some other ideas on how to fix
> this one but I have no idea what they are.)
I apologize for not writing my ideas down at the time, as you probably
would've been able to help refine them by now, but here's the gist of
what I remember. Right now, FieldFile is a direct subclass of File,
overriding some methods and adding others. What I'm planning to try is
to break that stuff out into some kind of FileFieldMixin that actually
provides and overrides methods, then have FieldFile inherit from File
and that new mixin. Then, in the code introduced in r9766, it would
just add the mixin to the class, hopefully avoiding any MRO
consistency problems.
It's still a bit more complicated than that, because #10300 below
highlights the problems with overriding methods on a base class that
may have behavior you weren't expecting. I think it's a good start,
though, and it's where I'll be focusing my efforts this weekend.
> #10300: custom storage backend can't get length of content to save. This
> one is a regression. It's also possibly easily fixed -- it's got a patch
> that works for the reporter of the problem, but it needs review. I also
> don't remember if I added a test, I don't think I did.
Yeah, this definitely needs review, but mostly to see if there are any
similar problems caused by any of the other overridden methods
provided by r9766. Tests are definitely essential, but this one should
be relatively easy to test.
> #10404: height_field and width_field for an ImageField don't always work.
> Specifically, they don't work if they are declared in the model before the
> associated ImageField. This is a regression, resulting from the 2nd part
> of r9766, the delaying of the save to storage. Possibly fixable without
> moving the save back to where it used to be, but I haven't investigated that
> at all due to the next problem.
I remember having a small bit of discussion on this in IRC a while
back, where someone suggested getting the image's width and height
immediately upon assignment. I'm not a big fan of that kind of side
effect when simply assigning to an attribute, but it's certainly less
strain than saving the entire file on assignment, which is essentially
what we were requiring before. I was just spelled .save(), which at
least made it clear that more was going on than simple assignment.
> #10788: actual file name not known until after model save. This was
> reported by someone who has been relying on knowing the actual file name
> saved during a pre-save signal handler, but it's actually a bit broader.
> It's also a regression. The actual file name can't be known until a file is
> saved to the backend because underscores might be added during that
> process. I don't know how to fix it other than by moving the save back to
> where it used to be. We could just 'reserve' the name by writing an empty
> file to start with, but something has to be written out if we're going to
> report the correct file name at the same point we used to. (I'm not sure if
> delaying the write of the full content would be of any use, I just recall it
> came up as a possibility during a conversation about this previously.)
Yeah, this is definitely tricky. I shudder every time I see the phrase
"race condition" because of all the file naming issues that came up
during the initial work on file storage. I admit I neglected to
consider pre_save while working that out. Shooting from the hip, I
wonder if we can rely on the order of signal handlers enough to
register our own pre_save handler than saves the file before any
custom handlers can get the chance to read it. This would move the
save slightly ahead in the process, but not nearly back to where it
was before, so the benefits of delaying should still be present. I
don't know if that's really an option yet, or if it's a good idea, but
I'd rather think out loud than let it be forgotten.
Now that I think about it, that could also solve #10404 on its own,
because the save (and thus, width/height determination) would occur
prior to any of the fields being processed.
> While moving the save back to where it used to be for the ModelForm case
> might help the last two, and fix the regressions, I don't believe it will
> fix the same problems for code using the new function. That is, we need to
> cover the case of code that assigns a File to a FileField and does not use a
> ModelForm. We need to make sure #10404 is fixed for that case as well, and
> figure out when, in that code path, the file name is guaranteed to be
> known. For the ModelForm case the name was correct once you called save()
> on the form (even if you passed commit=False to the save). For the
> non-ModelForm case, when are we going to know the file name? Not until
> after model save()? If so we'll have one code path (the ModelForm case)
> where a pre-save signal handler can know the right name, and one (the direct
> File assignment without using a ModelForm) where it cannot, which seems
> ugly.
When I mentioned reverting the behavior, I really meant all of it,
including the ability to assign to FileField attributes at all. I
agree with you here that it's very, very wrong to go half-way and have
an inconsistent flow. It's all or nothing, and a BDFL is on the side
of "all". :)
> That implies to me we might want to move the save to the backend
> (or, at least, reserving of the name) all the way up to the File assignment
> step? Which leaves a bit of a mess in the case where you assign a File to a
> FileField but then don't end up saving the model. Not sure what to do about
> that.
To be fair, we already have this problem, because if a transaction
gets rolled back for any reason, we have no handling to remove the
file. My bigger objection here is violating expectations by making an
attribute assignment do so much work. Somebody just playing around in
a shell, for instance, should be able to assign all they want without
such drastic side-effects.
> Unfortunately how much time I'll have over the next few days to help out
> here is really up in the air. I may have none, I may have a few hours here
> and there, or if a couple of crises get resolved I may have significant
> amounts of time. I just don't know yet.
You've already done a lot of legwork here, so even if you can't spare
any more time, you've provided a great base to work with. Thank you
very much for that.
-Gul
Good to not neglect those, indeed.
> The fixing of the problems caused by the changeset will be covered by that
> branch as well.
I'm a little worried about how easy you're making it sound. Either one
of us is thinking way too much on the subject or the other isn't
thinking enough. :)
> We'll report back some status when we get the
> existing testcases passing again :)
I can't be on IRC for about another 7 to 8 hours or so, when I expect
you guys will already be done for the day, so if you have any
questions about background or want to bounce ideas around, I'm
available via email, and should be able to respond fairly quickly. We
could use this thread if you want more input, but I'd rather not spam
all the devs with minute details.
-Gul
I was reading over the patch Alex mentioned in IRC (yay for
DjangoBot's logger!), and though it looks different than what I
intended, if it works, I'm happy (mostly).
>> #10300: custom storage backend can't get length of content to save.
> This *should* be fixed. I can't test it, no access to S3.
It should be reasonably simple to set up a dummy backend that exhibits
the same behavior with size, just to test it out at least.
>> #10404: height_field and width_field for an ImageField don't always work.
> This is not really related to that and I have a hard time
> understanding that part of the ORM, but it should be fixable.
It's not explicitly related to the MRO and method stuff you and Alex
have been working on, but it's definitely related to the r9766
discussion, since it's caused by the delayed saving. I have a clear
understanding of the problem, but I don't have time to write it up at
the moment, so I'll try to catch you guys on IRC if you're still up
when I get home.
>> #10788: actual file name not known until after model save.
> That's the problem with the new implementation. The final filename is
> decided when the file is actually saved. The cleaner way is to use
> the assigned file and directly read from it when it's still in the
> temporary location or memory.
That only solves the problem if you're trying to access the file's
contents, which I'll admit is the majority case. Other apps may need
the filename for other reasons, though, and it was available in 1.0,
so the current behavior is backwards-incompatible. Of course, all the
uses I can think of would actually be better as post_save handlers
anyway, but the fact is that compatibility was broken, so we should
see if we can restore it.
Thanks for all the work you guys have done on this. We're not out of
the woods yet, but we're much better off than we were.
-Gul
I've been thinking this one over a bit, while everybody else was
cleaning up the rest of my mess (thanks, guys!), and I have to agree
with you, at least to a point. I've only been able to come up with two
general uses for having the filename available at that point, and both
of them have a better approach available.
If you're using the filename to store it somewhere else, typically for
denormalization, it'd be better to do that post-save, since then you
know the record actually got saved in the database. Otherwise, you
might be trying to access the content of the file, which would be
better using the direct file access, which can often save yourself a
round-trip to the storage backend (which can be a big win for paid
storage solutions, like S3).
I won't completely discount the possibility that there's a legitimate
use case for having the final filename available in the pre-save hook,
but I just don't see it. I'll admit that it's a change to current
behavior, but I'm not sure there's a good way to get it working
reliably. I think it's possible to do it if we register our own
pre-save hook directly when the FileField is attached to the model
class, but even then, we can't guarantee that it's the first handler
to get registered, much less should we be relying on the ordering of
signal handlers anyway.
Unless somebody sees something I don't, I'm not sure there's a good
way to get it working, nor do I see much to be gained from doing so.
-Gul
Hi Armin,
Also on OS X you can set if filename is case-sensitive on per-volume
basis, when formatting, and usually it's case-insensitive. windows is
always case insensitive, linux is usually case-sensitive.
Please see http://en.wikipedia.org/wiki/Filename for more information on this.
This does matter when one filename has the same letters as another but
some letters have different case. This also can cause troubles when
moving existing DB and files from one filesystem to another.
--
Best regards, Yuri V. Baburov, ICQ# 99934676, Skype: yuri.baburov,
MSN: bu...@live.com
I find it really disturbing for someone that comes for the first time
and *thinks* that's it's possible (especially when it was possible in
1.0, and you'll get a lot of references on the web on 'tips' and
similar usage) to access the filenames in a pre_save signal.
The problem is that *if* you don't know that it's impossible now,
you'll try to access instance.myattribute.path (which contained the
correct path to the file in Django 1.0) and get a result, but a wrong
one! Isn't it possible to prevent access to the attribute or at least
set it to something that will not be interpreted as "strange"?