ConditionalGetMiddleware MD5

Francisco Couzo

unread,

Sep 10, 2020, 9:11:09 AM9/10/20

to Django developers (Contributions to Django itself)

I think it would be a good idea to make ConditionalGetMiddleware use a hash function that's not as easy to find a collision as MD5, most probably SHA-256 or BLAKE2.

I don't see a problem with just changing it, it will just invalidate the old cache.

If there's an agreement on changing the hash function, I can make the PR.

Adam Johnson

unread,

Sep 10, 2020, 10:08:00 AM9/10/20

to django-d...@googlegroups.com

What would this protect against?

--

You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.

To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/ff591d46-97fc-43d6-9d1c-d0ba24d7b1a8n%40googlegroups.com.

--

Adam

Francisco Couzo

unread,

Sep 10, 2020, 1:17:08 PM9/10/20

to django-d...@googlegroups.com

User 1 uploads a file

User 2 downloads it, and caches it

User 1 uploads a new file to the same URL, with the same MD5 hash

User 2 will keep using the old file indefinitely

Sure, user 1 has to upload two files with the same hash on purpose

To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM1Db5tRDNXQEMLuX6UdOmjhUq0_-Dr_9kK%3DB5AUqsXG%2Bg%40mail.gmail.com.

Taymon A. Beal

unread,

Sep 10, 2020, 9:14:26 PM9/10/20

to django-d...@googlegroups.com

That attack doesn't work with the recommended production setup because
Django doesn't serve uploaded files in that setup.

That being said, some users might be doing that anyway since setting
up production-worthy upload hosting is such a pain, and even if they
don't, they might have other views that somehow allow users to control
the response body. So I think this should be treated as a
not-super-severe-but-still-worth-fixing security issue.

What backwards-compatibility considerations exist? Do we consider it
normal for upgrading to a different Django version to bust users'
caches? I can't immediately think of any bad consequences of changing
the hash function, apart from that one. Busting users' caches doesn't
sound that terrible, given that, even if the hash function were
changed on every release (which of course it wouldn't be; SHA-2 has
been the most generally-recommended hash function for 15 years and
there are no signs that this will change), it would still only happen
once every eight months, and it's fairly rare for anything to be
cached that long in the first place, I think.

Taymon

> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAHx8S1vpoP9txnZrV4fTfwRJBCF2q-QtjnYpw%2BwAgGKbg4V5yQ%40mail.gmail.com.

Francisco Couzo

unread,

Sep 11, 2020, 2:25:31 AM9/11/20

to django-d...@googlegroups.com

If changing ConditionalGetMiddleware to use SHA-256

It also might be a good to change it on FileBasedCache, _generate_cache_key, and generate_cache_header_key too

Also, _generate_cache_key is just blindly concatenating hashes, so ['foo', 'bar'] is equal to ['fo', 'obar'], I don't know it might be a problem, but it just doesn't looks right

To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAHRQ%3D85KUrrsv%3DeY7YW%3DXb%3DeD%2BdY9-eeeQQW0YuB5ckDmUgfEg%40mail.gmail.com.

Gert Burger

unread,

Sep 11, 2020, 3:41:51 AM9/11/20

to django-d...@googlegroups.com

I'm not so sure this is a problem (wrt to using md5 hash of response content for ETags and likely also for cache keys). The probability of a naturally occurring collision with MD5 is 1.47*10^-29 [1] so the risk of this scenario occurring by accident is extremely remote.

If we assume that User 2 is a malicious actor and can reliably generate collisions (which is not practically possible within typical cache durations), then User 1 is likely still no worse off. Maybe some security researchers can comment on the threat model in this case.

Lastly, the practical use of content hash based ETags are limited due to the processing overhead required for them (as implemented in ConditionalGetMiddleware). It might be useful for a low traffic/value deployments and then the risks are likely not a problem. Switching to a more cryptographically secure hashing algorithm will increase the overhead and provide little additional benefit to such installations. As mentioned earlier, production deployments will often use Apache or Nginx, and neither generate ETags using file content hashes.

[1] https://www.avira.com/en/blog/md5-the-broken-algorithm

To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAHx8S1tk%2BjjKgpkOdaKyMti8DeLiLLcO6GxjHJTguc47Q-P80g%40mail.gmail.com.

Florian Apolloner

unread,

Sep 14, 2020, 3:41:07 AM9/14/20

to Django developers (Contributions to Django itself)

Hi,

one thing to consider might be that md5 is usually disabled for FIPS enabled system (ie https://code.djangoproject.com/ticket/28401 ). So if we are changing something here we might also consider this.

Cheers,

Florian

Adam Johnson

unread,

Sep 14, 2020, 8:00:06 AM9/14/20

to django-d...@googlegroups.com

Yes there's a risk of causing collisions if a user can control the responses. But that doesn't mean there's really a security concern - since the user creating the collision *can already upload arbitrary content to the django site.* The collision wouldn't really do anything but cause a stale asset to appear - which is a known risk with conditional GET's to begin with.

Additionally, e-tag calculation needs to be *fast* since it is run on every response. So moving to a slower hashing algorithm like SHA would be a performance degradation. There are some other deliberately fast hashing algorithms out there, though I don't know if they carry any advantages in this situation over MD5.

For those concerned, we could make etag collisions slightly less easy to forge by including a salt. Perhaps this could be derived from SECRET_KEY or some other setting. I'm not sure SECRET_KEY is appropriate though... if we used that SECRET_KEY because then we'd be sending a payload of content, with etag=MD5(content + SECRET_KEY), which could be cracked to retrieve SECRET_KEY.

ConditionalGetMiddleware only sets the ETag header if it's not already set. So if you're interested, you can experiment with different ETag algorithms by adding a custom middleware below it in your MIDDLEWARE setting.

Before we change anything it would also be useful to see a summary of how other major servers and frameworks have settled on creating ETags for dynamic content - such as Ruby, Nginx, Apache, Nginx, CloudFlare. If there's any problem with collisions they'll likely have already solved it.

one thing to consider might be that md5 is usually disabled for FIPS enabled system

This is true, but if no one has complained, why "fix" it? As covered above one can always implement custom ETag header generation.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/bdc64ced-1f75-414a-9e44-1c7fa606f5f9n%40googlegroups.com.

--

Adam

Florian Apolloner

unread,

Sep 14, 2020, 9:26:35 AM9/14/20

to Django developers (Contributions to Django itself)

Hi Adam,

On Monday, September 14, 2020 at 2:00:06 PM UTC+2 Adam Johnson wrote:

one thing to consider might be that md5 is usually disabled for FIPS enabled system

This is true, but if no one has complained, why "fix" it? As covered above one can always implement custom ETag header generation.

Well there are open tickets, so in that sense someone has "complained". I also know that pulp tries/tried to work around md5 usage in Django etc… But with regard to fips the proper way is probably the new `useforsecurity` argument in python3.9.

The arguments about performance are all good and I think that if we change away from md5 it should probably something which is on par performance-wise.