Re: [Django] #8307: ImageFile use of width_field and height_field is slow with remote storage backends (was: ImageFile doesn't use the width_field and height_field for cache)

81 views
Skip to first unread message

Django

unread,
Jan 13, 2012, 6:07:09 PM1/13/12
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: jacob
Type: | Status: reopened
Cleanup/optimization | Version: SVN
Component: File | Resolution:
uploads/storage | Triage Stage: Accepted
Severity: Normal | Needs documentation: 1
Keywords: imagefile height | Patch needs improvement: 0
width | UI/UX: 0
Has patch: 1 |
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by alanjds):

* status: closed => reopened
* severity: => Normal
* cc: alanjds (added)
* type: => Cleanup/optimization
* easy: => 0
* ui_ux: => 0
* resolution: wontfix =>


Comment:

What about letting the storage to choose if is worth to touch the file
just to get its height/width?

This solves the problem for me. And it is a huge problem: My EC2 instance
with S3 backend wastes about 4sec more each pageview just to know the
width/height of 8 photos. And looks like the absent of this info harms not
my app.

Is this an acceptable solution?

{{{
def _get_image_dimensions(self):
if not hasattr(self, '_dimensions_cache'):
if getattr(self.storage, 'IGNORE_IMAGE_DIMENSIONS', False):
self._dimensions_cache = (0, 0) # Too costly to hit the file?
else:
close = self.closed
self.open()
self._dimensions_cache = get_image_dimensions(self,
close=close)
return self._dimensions_cache
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:11>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Dec 27, 2013, 11:05:05 AM12/27/13
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: jacob
Type: | Status: new
Cleanup/optimization | Version: master

Component: File | Resolution:
uploads/storage | Triage Stage: Accepted
Severity: Normal | Needs documentation: 1
Keywords: imagefile height | Patch needs improvement: 0
width | UI/UX: 0
Has patch: 1 |
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by viciu):

* cc: wiktor@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:13>

Django

unread,
Feb 23, 2014, 9:12:53 AM2/23/14
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: jacob
Type: | Status: new
Cleanup/optimization | Version: master
Component: File | Resolution:
uploads/storage | Triage Stage: Accepted
Severity: Normal | Needs documentation: 1
Keywords: imagefile height | Patch needs improvement: 0
width | UI/UX: 0
Has patch: 1 |
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by wdoekes):

* cc: walter+django@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:14>

Django

unread,
Jan 24, 2024, 4:12:56 PM1/24/24
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: Jacob
Type: | Status: new
Cleanup/optimization |
Component: File | Version: dev
uploads/storage |
Severity: Normal | Resolution:
Keywords: imagefile height | Triage Stage: Accepted
width |
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Mariusz Felisiak):

#35139 was a duplicate.
--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:15>

Django

unread,
Jan 24, 2024, 4:19:09 PM1/24/24
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: Jacob
Type: | Status: new
Cleanup/optimization |
Component: File | Version: dev
uploads/storage |
Severity: Normal | Resolution:
Keywords: imagefile height | Triage Stage: Accepted
width |
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by john-parton):

Here's the contents from my duplicate issue

I have prepped a github repo here with the basic tests: https://github.com
/john-parton/django-image-field-extra-read

**Conditions for behavior**

You must have an ImageField with the `width_field` or `height_field`
arguments set.

**Description of current behavior**

When a model is saved, the image file is written out using the Storage
API, and then
in order for the width and height fields to be updated, the file is read
back out
and then the width and height are extracted from the image.

In the case the storage is local, the performance impact is probably
negligible,
unless the application is seriously IO constrained, however if the storage
is
remote, the performance impact is significant, and there can be other
impacts on
operations.

For instance, if using S3 as backing, more GET operations are performed
than
strictly necessary. This could be a few dollars a day of operational costs
if your
scale is small, but can be significant if egress charges are high.

As another example, CloudFlare Images rate limits requests. This
effectively cuts
the rate limit in half because every save operations requires an
additional GET.

**Proposed behavior**

The proposed behavior is to simple read the image file which is resident
in memory
without reading it back out from the storage backend.

**Possible breaking issues**

The vast majority of storage backends and use cases likely guarantee that
if you
write a file into storage, and then retrieve it, you will get the same
file back.

However, for some image-specific services, they will compress or crush
larger images.

For users who specifically have this use case, they may end up with the
`width_field`
and `height_field` not representing the actual size of the image in the
store, but
rather the size of the image at time of upload.

**Explanation of current behavior**

It looks like when a model's save() method is called, the field's
pre_save() method
is called which results in the descriptor for the field having its __get__
method
called and then immediately having its __set__ method called with a
similar value.

The effect is to coerce the value of the field to `ImageFieldFile` which
is a subclass
of `ImageFile`. The `ImageFieldFile` instance is assigned a property of
`file` which
is the wrapped original value.

The image field then saves and persists the data using the storage API,
and then the
wrapped file isn't referred to later to get the width and height. When the
width and
height are requested, the file is read back out of storage.

**Proposed fix**

No specific fix at this time.

**Mitigating breaking issues**

Considering how unusual this use case is, it may be sufficient to document
the change
in behavior and provide a code snippet to wire up a signal handler to do
the
additional read for those users who have the unusual storage backends and
actually
care about the width/height being what is on disk. This would also allow
users to
customize the behavior. For instance, maybe if the image is under a
certain resolution,
the storage provider guarantees they don't mangle the image. A user could
enshrine
that logic in the signal handler, so they could still get the performance
uplift where
appropriate.

**Summary**

It seems pretty unlikely that this is the intended behavior, and seems to
largely be a byproduct of heavy descriptor use and "magic" in the code.
I've tried mitigating this in my application code, but it requires me to
monkey patch some private methods, and I'm not even sure I got it right.
Any attempt to "fix" this at a level higher than Django's internals will
just result in an unmaintainable mess. Back when Django was new, I'm not
sure I would make this argument, but now storage is usually backed by some
API like S3, I think it makes sense to make avoiding an extra network
request the "sane default."
--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:16>

Django

unread,
Jan 24, 2024, 4:55:43 PM1/24/24
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: Jacob
Type: | Status: new
Cleanup/optimization |
Component: File | Version: dev
uploads/storage |
Severity: Normal | Resolution:
Keywords: imagefile height | Triage Stage: Accepted
width |
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by john-parton):

I've got a fix that I think should pretty much always work.

Like I alluded to earlier, it requires some deep knowledge of Django's
internals. You have to know exactly when to side-step a descriptor and
what the purpose of the descriptor is.

Basically, the FileField.save() method will call the descriptors __set__
method with the value being a string. That's required, because the storage
backend might return a different string than you originally fed in. I
believe the motivating reason for this was to avoid accidentally
overwriting files. The other function of the descriptor is to calculate
and store the width and height values if it's configured. However, the
descriptor's __set__ method had to of been already called to even get to
this point in the save flow. Because the __set__ method of the descriptor
basically only serves this one function and the function has been
satisfied earlier, I just side step the descriptor by manipulating the
instances __dict__ directly.

Here's the meat of the code: https://github.com/john-parton/django-image-
field-extra-read/blob/bugfix/proposed-fix/test_app/fields.py

Of course, I wouldn't want to make a change like this without some tests.
As I said, I believe the primary purpose of the descriptor use there is
update the string value of the file field, which I have made sure that
it's updated to a non-conflicting value on subsequent saves.

Here's a test case that I wrote which captures some of the spirit of what
I'm trying to do: https://github.com/john-parton/django-image-field-extra-
read/blob/bugfix/proposed-fix/test_app/tests.py

I'm happy to take feedback, or I can work on building a proper patch.
--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:17>

Django

unread,
Jan 25, 2024, 11:34:19 AM1/25/24
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: Jacob
Type: | Status: new
Cleanup/optimization |
Component: File | Version: dev
uploads/storage |
Severity: Normal | Resolution:
Keywords: imagefile height | Triage Stage: Accepted
width |
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Alan Justino da Silva):

Replying to [comment:17 john-parton]:


> I've got a fix that I think should pretty much always work.

> ...


> Basically, the FileField.save() method will call the descriptors __set__
method with the value being a string.

First, thanks for refreshing this 15yo issue, [comment:17 john-parton].

That said, I do not see how this solution fixes a template page rendering
~10 images from a remote storage as there is no `.save()` involved for
''using existing images'':

When I reopened the issue, my application needed only the URL of the
files, already stored in the database, thus a O(1) call to DB. That is
because I needed not to put height and width in the `<img />` HTML tag, as
the browser deals with whatever it fetches from the URL. If grabbing the
URL keeps forcing to `.get_image_dimentions()`, then this call becomes
O(10) calls to the storage, meaning O(10) HTTP HEAD calls to S3. That is
why I proposed ''the storage'' to decide if this should be done, or
cached, or faked.

Do you think that letting the storage proxy the image dimensions would be
another way to fix your issues as well?
--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:18>

Django

unread,
Jan 27, 2024, 11:38:16 AM1/27/24
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: Jacob
Type: | Status: new
Cleanup/optimization |
Component: File | Version: dev
uploads/storage |
Severity: Normal | Resolution:
Keywords: imagefile height | Triage Stage: Accepted
width |
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by john-parton):

Replying to [comment:18 Alan Justino da Silva]:

For existing images, if you have a `width_field` and `height_field` set,
then `.get_image_dimensions()` is not ordinarily called. Usually the
`width_field` and `height_field` are referred to. That's the point of
those fields.

My patch does nothing if your ImageField doesn't have a width_field or
height_field, and that's intentional. Adding the width_field and
height_field should already solve the behavior you're describing.

If you could provide an actual test case where you think the behavior
isn't correct, I'd really appreciate it.

Thanks.
--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:19>

Django

unread,
Jan 29, 2024, 3:39:52 PM1/29/24
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: Jacob
Type: | Status: new
Cleanup/optimization |
Component: File | Version: dev
uploads/storage |
Severity: Normal | Resolution:
Keywords: imagefile height | Triage Stage: Accepted
width |
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by john-parton):

I added some tests. Big test is that there's a model with a Storage
backend that raises an exception if you try and read from it. Test suite
appears to be passing. Added a small note about the old behavior in the
docs.
--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:20>

Django

unread,
Jan 30, 2024, 8:40:18 AM1/30/24
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: Jacob
Type: | Status: new
Cleanup/optimization |
Component: File | Version: dev
uploads/storage |
Severity: Normal | Resolution:
Keywords: imagefile height | Triage Stage: Accepted
width |
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by john-parton):

* needs_docs: 1 => 0

Comment:

I added a small amount of documentation. A 'version changed' describing
the old behavior and under which circumstances the new behavior might
affect a user.
--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:21>

Django

unread,
Apr 15, 2024, 10:48:27 AM4/15/24
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: Jacob
Type: | Status: new
Cleanup/optimization |
Component: File | Version: dev
uploads/storage |
Severity: Normal | Resolution:
Keywords: imagefile height | Triage Stage: Accepted
width |
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by john-parton):

I've created a different pull request with a different approach, which is
better than my other patch in at least two ways:

* It removes an extra container class I was using as a signal for when to
update
* It removes an extra keyword argument I had to add (returning the
function signature to as it was before the patch)

https://github.com/django/django/pull/18070

I think we need to come to consensus on whether this is a bug that we can
just patch, or if this change is significant enough that we need to
introduce a setting to opt-in to the behavior (most likely changing the
setting in a future release).
--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:22>

Django

unread,
Apr 26, 2024, 4:35:37 AM4/26/24
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: Jacob
Type: | Status: new
Cleanup/optimization |
Component: File | Version: dev
uploads/storage |
Severity: Normal | Resolution:
Keywords: imagefile height | Triage Stage: Accepted
width |
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* needs_better_patch: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:23>

Django

unread,
May 21, 2024, 4:51:17 AM5/21/24
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: Jacob
Type: | Status: closed
Cleanup/optimization |
Component: File | Version: dev
uploads/storage |
Severity: Normal | Resolution: fixed
Keywords: imagefile height | Triage Stage: Accepted
width |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* has_patch: 1 => 0
* needs_better_patch: 1 => 0
* resolution: => fixed
* status: new => closed

Comment:

Replying to [comment:18 Alan Justino da Silva]:
> That said, I do not see how this solution fixes a template page
rendering ~10 images from a remote storage as there is no `.save()`
involved for ''using existing images'':
>
> When I reopened the issue, my application needed only the URL of the
files, already stored in the database, thus a O(1) call to DB. That is
because I needed not to put height and width in the `<img />` HTML tag, as
the browser deals with whatever it fetches from the URL. If grabbing the
URL keeps forcing to `.get_image_dimentions()`, then this call becomes
O(10) calls to the storage, meaning O(10) HTTP HEAD calls to S3. That is
why I proposed ''the storage'' to decide if this should be done, or
cached, or faked.

Hi Alan, the issue you are describing was fixed in #34517.
I will close this issue and reopen #35139 as this is a separate concern.
--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:24>

Django

unread,
May 21, 2024, 1:58:56 PM5/21/24
to django-...@googlegroups.com
#8307: ImageFile use of width_field and height_field is slow with remote storage
backends
-------------------------------------+-------------------------------------
Reporter: sebastian.serrano@… | Owner: Jacob
Type: | Status: closed
Cleanup/optimization |
Component: File | Version: dev
uploads/storage |
Severity: Normal | Resolution: fixed
Keywords: imagefile height | Triage Stage: Accepted
width |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by john-parton):

I think it's very important that they provide a minimal test.

Based on my research, I do not think the scenario they are describing is
actually a problem anymore, and had been fixed sometime in the last twelve
years by another change.

I could be wrong, of course, but without actual code to pore over, my gut
says it's fixed.

Replying to [comment:24 Sarah Boyce]:
> Replying to [comment:18 Alan Justino da Silva]:
> > That said, I do not see how this solution fixes a template page
rendering ~10 images from a remote storage as there is no `.save()`
involved for ''using existing images'':
> >
> > When I reopened the issue, my application needed only the URL of the
files, already stored in the database, thus a O(1) call to DB. That is
because I needed not to put height and width in the `<img />` HTML tag, as
the browser deals with whatever it fetches from the URL. If grabbing the
URL keeps forcing to `.get_image_dimentions()`, then this call becomes
O(10) calls to the storage, meaning O(10) HTTP HEAD calls to S3. That is
why I proposed ''the storage'' to decide if this should be done, or
cached, or faked.
>
> Hi Alan, the issue you are describing was fixed in #34517.
> I will close this issue and reopen #35139 as this is a separate concern.
--
Ticket URL: <https://code.djangoproject.com/ticket/8307#comment:25>
Reply all
Reply to author
Forward
0 new messages