ImageFields using actual saved file for width/height

23 views
Skip to first unread message

vicvicvic

unread,
Aug 10, 2008, 10:04:48 AM8/10/08
to Django developers
Hello!

First off: I'm loving the FileStorageRefactor. It will clean up my
views immensely!

So far, I've used it to write a subclass of FileSystemStorage which
normalizes uploaded file names (e.g. UploadedFile.JPEG =>
UploadedFile.jpg and with the help of PIL: UploadedPngFile ->
UploadedPngFile.png). Very useful!

All fine and dandy, but here's a more problematic scenario: Resizing
uploaded images on upload. Simply put, I don't want users to save
images larger than 500 pixels in width, so I downsize any images that
are too big. I'm currently doing this in my CustomFileStorage._save
method, after having called super, and I manipulate the file after it
has been moved in place.

The problem is that ImageField does not look at the actual to update
(width|height)_fields, but checks the incoming file, which has not
been resized yet. Width and height don't get saved in the proper
manner! Is there any chance of changing this to make it look
(primarily) at the final file instead, or am I doing it wrong?

I've patched my ImageFieldFile.save (django/db/models/fields/files.py:
264) method to move the check below the call to super and instead use
getattr(self.instance, self.field.name).path in the call to
get_image_dimensions. It's a minor patch really.

It works, but relies on Image.open being able to use the path
obviously. A proper solution should perhaps TRY using that path, and
fall back to something else. (Non-filesystem storages might not have
usable paths?)

To summarize: I think it makes more sense for ImageFieldFile to update
width/height_fields using the actual file, not the one submitted by
the user/incoming source.

Jacob Kaplan-Moss

unread,
Aug 10, 2008, 11:04:49 AM8/10/08
to django-d...@googlegroups.com
On Sun, Aug 10, 2008 at 9:04 AM, vicvicvic <victor...@gmail.com> wrote:
> To summarize: I think it makes more sense for ImageFieldFile to update
> width/height_fields using the actual file, not the one submitted by
> the user/incoming source.

Yeah, this seems to make sense to me; can you file a ticket so we
don't lose track of this?

Jacob

vicvicvic

unread,
Aug 10, 2008, 11:42:49 AM8/10/08
to Django developers
Done: http://code.djangoproject.com/ticket/8208

I realized you can just use the File object directly, so the patch
should work just as well as the current implementation (ergo, don't
mind the path stuff I mentioned above).

On Aug 10, 5:04 pm, "Jacob Kaplan-Moss" <jacob.kaplanm...@gmail.com>
wrote:

Marty Alchin

unread,
Aug 10, 2008, 12:27:29 PM8/10/08
to django-d...@googlegroups.com
On Sun, Aug 10, 2008 at 11:42 AM, vicvicvic <victor...@gmail.com> wrote:
> Done: http://code.djangoproject.com/ticket/8208
>
> I realized you can just use the File object directly, so the patch
> should work just as well as the current implementation (ergo, don't
> mind the path stuff I mentioned above).

Thanks for catching this, but I'm not sure this is the best approach
yet. It certainly touches the least amount of code, but I think this
is a symptom of something a bit bigger. I think it'd be better if
Storage._save() returns a File object that can be used by
ImageFieldFile.save() to update those related fields. That way, if
you're using something S3, you don't have to make a separate call to
the service just to get a file you already had in memory a second ago.

That would introduce another requirement on Storage subclasses, but I
think it's worthwhile in the long run, since there are bound to be
more issues like this. I'll work on getting a patch up later tonight.
Good catch.

-Gul

Marty Alchin

unread,
Aug 10, 2008, 12:41:17 PM8/10/08
to django-d...@googlegroups.com
On Sun, Aug 10, 2008 at 12:27 PM, Marty Alchin <gulo...@gamemusic.org> wrote:
> That would introduce another requirement on Storage subclasses, but I
> think it's worthwhile in the long run, since there are bound to be
> more issues like this.

I should also note up-front that this would also introduce a
backwards-incompatibility. Right now, Storage.save() returns just the
name of the file, but in order to support this ticket without making
unnecessary trips to the storage backend, it would need to return the
full File object. The name is still accessible as an attribute of the
File, of course, but it's definitely an incompatibility. I still think
it's worth it, but I wanted to be clear in case somebody has reason to
object to it.

-Gul

Julien Phalip

unread,
Aug 10, 2008, 5:52:42 PM8/10/08
to Django developers
Hi,

I've opened a similar ticket yesterday.
http://code.djangoproject.com/ticket/8204

I think that the attached patch -- which needs improvements -- could
also help in this case.

Julien

On Aug 11, 2:41 am, "Marty Alchin" <gulop...@gamemusic.org> wrote:

vicvicvic

unread,
Aug 11, 2008, 7:21:06 AM8/11/08
to Django developers
Speaking as someone who hasn't worked much with Django, I think that
would make a lot of sense actually. Not only for the sake of width/
height but for supporting more restricted filesystems.

Take Flickr for example. It's possible to make a read-only
FlickrStorage, storing filenames as <flickr_username>/<photo_id> and
working stuff out from that, but if you want to let users upload
pictures and have them stored on Flickr, you cannot know the photo_id
until the file is actually up on Flickr, meaning that whatever name
`save` tells `_save` to use is essentially meaningless (or will be
used to set a Flickr photo title). `_save` would be the method
actually talking to the server and would get a response about where
the file got stored. It should be able to tell `save` where that is.

Not that Django should strive towards implementing FileStorages for
every kind of photo site, but this would certainly make it easier for
anyone interested to make it so.

On Aug 10, 6:41 pm, "Marty Alchin" <gulop...@gamemusic.org> wrote:

Marty Alchin

unread,
Aug 11, 2008, 2:28:59 PM8/11/08
to django-d...@googlegroups.com
Okay, I've been giving this a lot of thought, and [8036] gave me
renewed interest in the subject, since it touches on the API I was
planning to change to make this happen. I'm going to take a bit
different approach, though, than I originally planned, which I think
is better in the long run.

To me, Storage objects are all about storing and retrieving files.
That's it. Some things, like determining the file's name, are
certainly related to that task, so they're included, but others, such
as altering the file's content, aren't. The filename normalization
mentioned previously is a great task for get_valid_filename() to
perform. In addition, the Flickr example above is now possible after
[8036], by having FlickrStorage._save() return the actual name
(including photo ID) that Flickr uses to store the image. So that's
one thing out of the way.

On the other hand, this thread is more about a Storage object altering
the file's content (resizing an image) and expecting the width and
height on the model to be updated. Getting this to happen is a bit
troublesome, because it requires Storage to return a full File object
from both save() and _save(), which is neither pretty nor easy to use.
Instead, I'd like to go back to the idea of purpose.

The purpose of Storage is to store and retrieve files. The purpose of
File, on the other hand, is to manage the details of specific file
types. FieldFile is a customization of this, designed to work with
model fields. In this context, I think the appropriate place to do the
image resizing is in the FieldFile's save() method, *prior* to passing
it to the storage system in the first place. Something like this:

class NormalizedImageFile(ImageFieldFile):
def save(self, name, content, save=True):
# Use PIL or whatever to resize the file
super(NormalizedImage, self).save(name, resized_content, save)

class NormalizedImageField(ImageField):
attr_class = NormalizedImageFile

Then just use NormalizedImageField on your models instead of
ImageField, and it'll automatically handle the resizing for you and
update the related fields properly. If you want, you can even override
NormalizedImageField.__init__() and accept target dimensions to make
it even more reusable.

That seems like a better approach to me overall, and is possible
without any changes to trunk. Thoughts? If nobody has any objections
to that, I'll go ahead and close #8208, but I'll wait and see if I'm
smoking something at the moment.

-Gul

Julien Phalip

unread,
Aug 11, 2008, 7:14:40 PM8/11/08
to Django developers
Your approach seems like the cleanest and most consistent one.
Also, it might be good to add a section in the documentation with that
kind of examples. A lot of people use PIL to manipulate uploaded
images, and some examples in the doc would prevent many questions to
pop up on the forums. That would also give a good start for those
using other fancy file manipulation tools.

vicvicvic

unread,
Aug 12, 2008, 8:11:12 AM8/12/08
to Django developers
Did you mean [8306]?

But yes, you're probably right about a subclasses ImageFieldFile doing
that. Maybe it could be useful in the docs to specify the separation
of responsibilities between Storage and File? (And I guess that _save
is now expected to return the final name? Docs still say: "No return
value is expected.")

vicvicvic

unread,
Aug 12, 2008, 6:19:00 PM8/12/08
to Django developers
Or actually, I'd like to voice my support for returning a full, final
File object from save. Not only because I think it makes more sense,
but because it'd be more useful.

Fine, Storage._save shouldn't modify the file being saved, but when
using remote storage, is it not possible that the remote site would? I
believe Flickr saves a copy of your uploaded picture as-is, but
Facebook doesn't. Now, of course you could argue that anyone using
Facebook for storing images is just stupid... Point is: _save is
responsible for actually saving the file, so it's likely the most
efficient place for finding out about any changes that happened to the
file when it "got beyond your control" (=uploaded on Facebook).

Now, I completely understand if you feel that supporting this would be
stupid because it'd support a stupid idea (using storages which are
beyond you control in terms of final file is hardly great in terms of
reliability).

> Getting this to happen is a bit
> troublesome, because it requires Storage to return a full File object
> from both save() and _save(), which is neither pretty nor easy to use.

I'm sorry, but what's not easy nor pretty about File objects? I think
the interface is very slick!

Marty Alchin

unread,
Aug 12, 2008, 7:12:03 PM8/12/08
to django-d...@googlegroups.com
On Tue, Aug 12, 2008 at 6:19 PM, vicvicvic <victor...@gmail.com> wrote:
> Fine, Storage._save shouldn't modify the file being saved, but when
> using remote storage, is it not possible that the remote site would? I
> believe Flickr saves a copy of your uploaded picture as-is, but
> Facebook doesn't. Now, of course you could argue that anyone using
> Facebook for storing images is just stupid... Point is: _save is
> responsible for actually saving the file, so it's likely the most
> efficient place for finding out about any changes that happened to the
> file when it "got beyond your control" (=uploaded on Facebook).

In those cases, Storage._save() won't know what the final file looks
like anyway. The whole point of passing the File back up through the
chain is so that you could get the changed file without making an
extra call to the remote storage. In your Facebook example, that's
impossible, since you would have to call out to Facebook to get it
anyway. In that case, passing it back up wouldn't save you anything at
all.

> I'm sorry, but what's not easy nor pretty about File objects? I think
> the interface is very slick!

As one of the main guys behind it, I obviously think so too! :) But
after spending a couple hours trying to get the whole
pass-a-File-in-get-a-different-file-out thing working, trust me when I
say it's not as easy as it sounds.

-Gul

vicvicvic

unread,
Aug 12, 2008, 9:59:16 PM8/12/08
to Django developers


On Aug 13, 1:12 am, "Marty Alchin" <gulop...@gamemusic.org> wrote:
> In those cases, Storage._save() won't know what the final file looks
> like anyway. The whole point of passing the File back up through the
> chain is so that you could get the changed file without making an
> extra call to the remote storage. In your Facebook example, that's
> impossible, since you would have to call out to Facebook to get it
> anyway. In that case, passing it back up wouldn't save you anything at
> all.

No, it wouldn't save you anything performance wise, but it'd be _save
talking to Facebook (to re-fetch the file), which I think fits with
the whole "_save doing the dirty work of talking to the filesystem"
stuff. I'm not being familiar with Facebook's APIs but I think it
sends an XML-reply with /some/ information about a newly uploaded
photo to whatever called it (in this case, _save) which might be
useful. You can't get that reply outside _save!

> As one of the main guys behind it, I obviously think so too! :) But
> after spending a couple hours trying to get the whole
> pass-a-File-in-get-a-different-file-out thing working, trust me when I
> say it's not as easy as it sounds.

I'll take your word for it! Returning the name is probably quite
sufficient in any normal case-scenario.
Reply all
Reply to author
Forward
0 new messages