FetchFromCacheMiddleware.process_request() refactoring proposal

44 views
Skip to first unread message

Alexey Tsivunin

unread,
Mar 18, 2019, 5:30:32 PM3/18/19
to Django developers (Contributions to Django itself)
https://github.com/django/django/blob/418263c457636d3301f2068c47f09a0f42e15c52/django/middleware/cache.py#L127

I have an idea how to improve readability of this method. I've already implemented
it but before creating an issue and PR I decided to discuss it with community. Maybe
all of this doesn't really have any sense.

I propose to change this code:

136     # try and get the cached GET response
137     cache_key = get_cache_key(request, self.key_prefix, 'GET', cache=self.cache)
138     if cache_key is None:
139         request._cache_update_cache = True
140         return None  # No cache information available, need to rebuild.
141     response = self.cache.get(cache_key)
142     # if it wasn't found and we are looking for a HEAD, try looking just for that
143     if response is None and request.method == 'HEAD':
144         cache_key = get_cache_key(request, self.key_prefix, 'HEAD', cache=self.cache)
145         response = self.cache.get(cache_key)
146
147     if response is None:
148         request._cache_update_cache = True
149         return None # No cache information available, need to rebuild.



On this code:

136     vary_headers = get_cached_vary_headers(request)
137     if vary_headers is None:
138         # Vary headers for this request was not cached,
139         # so there is no cache for both GET and HEAD requests
140         # and cache should be updated
141         request._cache_update_cache = True
142         return None
143
144     cached_response = None
145
146     if request.method == 'GET':
147         cached_response = get_cached_response(request, vary_headers, 'GET', self.key_prefix, self.cache)
148
149     if request.method == 'HEAD':
150         # HEAD request can use GET and HEAD responses both
151         # because web servers should delete body when sending HEAD response
152         cached_response = get_cached_response(request, vary_headers, 'GET', self.key_prefix, self.cache) or \
153                           get_cached_response(request, vary_headers, 'HEAD', self.key_prefix, self.cache)
154
155     if cached_response is None:
156         request._cache_update_cache = True
157         return None  # No cache information available, need to rebuild.



Why? I will try to explain it by examples of questions that I had while reading
this code.

    Let's look at the line 140. What does get_cache_key() do? Let's suppose it generates
    key for given request and also takes into account the http method.
    If so, what does it mean when cache_key is None? In according to comment on 140 line,
    None means that cache doesn't have the response for this request and method.
    Then why we don't check if cache_key is None after 144 line?

    If None value of cache_key means that cache doesn't have the response for given request,
    therefore not-None value means that cache has response. If so, why do we check
    if response is None on 143 line?

I suspect the problem of this code is that it tries to hide one implementation
detail inside get_cache_key(). This detail is caching vary headers for the request.
get_cache_key(), at first, tries to get these headers and if there are none of them
it returns None that means there is no cache satisfying given request. But if
there are these headers get_cache_key() just generates key taking into
account request, method and headers. But generated key doesn't mean that
cache actually has something for this key.

I suspect that refactored code can answer all of these questions above, and looks
more readable, clear and explicit.

So what do you think about it?

Adam Johnson

unread,
Mar 18, 2019, 6:34:10 PM3/18/19
to django-d...@googlegroups.com
Hi

I appreciate the enthusiasm! Making Django's code more readable is certainly a good goal, but refactoring must be done very carefully as we don't want to avoid subtle bugs, especially in a function that's directly used by users.

This mailing list is generally not for low level code discussions, but for higher level design and directional decisions. You're better off making the change, testing it, and then opening a PR for refactoring like this. It's easier to comment on the code there, especially since it can be seen as code diffs, and more convincing to see it done with the test suite passing :) Normally one of the fellows will get back to you first on a PR - if you or they are unsure about the change, you can always post to the mailing list asking people to look at the PR.

As for your query: get_cache_key is documented as:

get_cache_key(request, key_prefix=None)[source]
Returns a cache key based on the request path. It can be used in the request phase because it pulls the list of headers to take into account from the global path registry and uses those to build a cache key to check against.

If there is no headerlist stored, the page needs to be rebuilt, so this function returns None.

As far as I can tell you're right that cache_key should really be checked for None on current line 144. The cache could lose the cached headerlist value between the two calls to get_cache_key - that said it wouldn't affect things too badly as all that would happen is the next request would see the response as not cached, even though it would have been.

As for the more major refactoring work to extract the cached vary headers, you'd need to modify django.utils.cache to make the internals of some of the functions public for this, since it is a public API. Any changes must be done carefully as users depend on that code. I'm not sure if the changes you're proposing would be compatible without deeper inspection, but also seeing a PR would help, as per reasons above :)

Hope that helps,

Adam



--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/056cecfe-979f-4662-988c-0378669c498c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Adam

Alexey Tsivunin

unread,
Mar 18, 2019, 7:08:52 PM3/18/19
to Django developers (Contributions to Django itself)
Thanks for your feedback! I'll make a PR in the coming days. I just needed to know does anyone else think this code needs refactoring or I'm doing useless work.

Adam Johnson

unread,
Mar 18, 2019, 7:26:34 PM3/18/19
to django-d...@googlegroups.com
Well, it might not be the most valuable thing to work on, but if it helps you get comfortable with the code base and the changes are low to zero risk, it’s not so bad. I think there might be a smaller set of changes that reduces the risk. 

On Mon, 18 Mar 2019 at 23:08, Alexey Tsivunin <zmos...@gmail.com> wrote:
Thanks for your feedback! I'll make a PR in the coming days. I just needed to know does anyone else think this code needs refactoring or I'm doing useless work.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.
--
Adam
Reply all
Reply to author
Forward
0 new messages