ImageKit probing S3 excessively for existence of cache files

44 views
Skip to first unread message

mi...@directangular.com

unread,
Feb 5, 2017, 12:46:10 AM2/5/17
to Django ImageKit, David Keitel
Recently I've been noticing that a HUGE portion of my S3 bill is from API requests (as opposed to actual data transfer/storage).  For example, last month my S3 bill breakdown was:

PUT, COPY, POST, or LIST requests 851,391,339 Requests $4,682.65
GET and all other requests 1,388,299,727 Requests $610.85
first 50 TB / month of storage used 3,484.621 GB-Mo $90.60

The most interesting one is the PUT, COPY, POST, LIST line.  ImageKit incurs a LIST request (by virtue of the underlying django-s3-storage library, in my case) each time it checks for the existence of a cache file.  I don't think there are any COPY's going on, and uploading images will be a POST (or a PUT, not quite sure).  I incurred a total of 850M+ requests in this category last month, but the total number of objects I have stored in S3 only increased by 25 million, and I'm not deleting anything, so the remaining ~825M requests must have been LIST requests.  I assume most (or all?) of those LIST requests were from ImageKit checking for the existence of cached files.

I enabled logging on my S3 bucket and noticed that, indeed, there are a ton of duplicate LIST requests coming in for the same files in a relatively short time span:


... str-media [04/Feb/2017:14:05:31 +0000] 52.52.139.202 ... ... REST.GET.BUCKET - "GET /str-media/?delimiter=/&prefix=media/CACHE/images/invitem_images/IMG_7309_WMBam4N/28db3a4ea2bb49134756c53f31e9560e.JPG HTTP/1.1" 200 - 751 - 17 16 "-" "Boto/2.40.0 Python/2.7.6 Linux/3.13.0-106-generic" -
... str-media [04/Feb/2017:14:41:42 +0000] 52.52.158.136 ... ... REST.GET.BUCKET - "GET /str-media/?delimiter=/&prefix=media/CACHE/images/invitem_images/IMG_7309_WMBam4N/28db3a4ea2bb49134756c53f31e9560e.JPG HTTP/1.1" 200 - 751 - 17 16 "-" "Boto/2.40.0 Python/2.7.6 Linux/3.13.0-106-generic" -
... str-media [04/Feb/2017:15:35:56 +0000] 52.52.154.85 ... ... REST.GET.BUCKET - "GET /str-media/?delimiter=/&prefix=media/CACHE/images/invitem_images/IMG_7309_WMBam4N/28db3a4ea2bb49134756c53f31e9560e.JPG HTTP/1.1" 200 - 751 - 40 39 "-" "Boto/2.40.0 Python/2.7.6 Linux/3.13.0-106-generic" -
... str-media [04/Feb/2017:15:43:57 +0000] 52.52.158.136 ... ... REST.GET.BUCKET - "GET /str-media/?delimiter=/&prefix=media/CACHE/images/invitem_images/IMG_7309_WMBam4N/28db3a4ea2bb49134756c53f31e9560e.JPG HTTP/1.1" 200 - 751 - 15 14 "-" "Boto/2.40.0 Python/2.7.6 Linux/3.13.0-106-generic" -
... str-media [04/Feb/2017:16:00:42 +0000] 52.52.216.112 ... ... REST.GET.BUCKET - "GET /str-media/?delimiter=/&prefix=media/CACHE/images/invitem_images/IMG_7309_WMBam4N/28db3a4ea2bb49134756c53f31e9560e.JPG HTTP/1.1" 200 - 751 - 18 17 "-" "Boto/2.40.0 Python/2.7.6 Linux/3.13.0-107-generic" -
... str-media [04/Feb/2017:16:20:32 +0000] 52.9.95.192 ... ... REST.GET.BUCKET - "GET /str-media/?delimiter=/&prefix=media/CACHE/images/invitem_images/IMG_7309_WMBam4N/28db3a4ea2bb49134756c53f31e9560e.JPG HTTP/1.1" 200 - 751 - 13 12 "-" "Boto/2.40.0 Python/2.7.6 Linux/3.13.0-106-generic" -
... str-media [04/Feb/2017:16:42:56 +0000] 52.52.158.136 ... ... REST.GET.BUCKET - "GET /str-media/?delimiter=/&prefix=media/CACHE/images/invitem_images/IMG_7309_WMBam4N/28db3a4ea2bb49134756c53f31e9560e.JPG HTTP/1.1" 200 - 751 - 17 16 "-" "Boto/2.40.0 Python/2.7.6 Linux/3.13.0-106-generic" -


(REST.GET.BUCKET appears to correspond to LIST requests in the logs)

I haven't changed IMAGEKIT_CACHE_BACKEND, which means it's using my default cache backend, which is memcached.  As I understand it, the only reason we should be seeing these duplicate existence checks is if the info about the cached files is being evicted from memcached before the next time that it's needed.  But looking at the logs above it just seems like those duplicate requests are too close together to be explained by eviction...  I could be totally wrong on that and haven't yet validated that theory.  I do see dozens of evictions per second in memcached, so maybe this really is what's happening...

Anyways, I was just hoping to get a second opinion on what might be going on here and ways to diagnose it before I take the somewhat drastic action of introducing another less ephemeral cache backend (e.g. redis) to my project.

Thanks in advance.
Reply all
Reply to author
Forward
0 new messages