BCrypt and PBKDF2 Password Hash Caching

405 views
Skip to first unread message

Erik van Zijst

unread,
Nov 15, 2013, 1:42:52 PM11/15/13
to django-d...@googlegroups.com
We run bitbucket.org and are upgrading from SHA1 to BCrypt hashes. We offer Basic Auth support which is used a lot. So much so that we can't handle the increased load from these more expensive hashes. This has been the cause behind a recent self-inflicted DOS.

BCrypt and PBKDF2 are ~4-5 orders of magnitude slower than a SHA1 (deliberately so of course), bringing them into the hundred ms per hash range. For a high volume site that's a rather steep price to pay. We would have to lower the number of rounds substantially, which would negate much of their strength.

To make bcrypt scale, we wrote a hasher that stores user passwords and their hash results in Django's cache (Memcached in our case). To prevent plain text passwords leaving the process, we SHA1 the values first. The code is here: https://github.com/django/django/pull/1918/files

How do people feel about this approach and should it be merged into Django? If not, then I can turn it into a library instead. Maybe at our size we're not in Django's sweet spot anymore. However, in their current version the recommended hashers are just not usable for us.

Cheers,
Erik

Marc Tamlyn

unread,
Nov 15, 2013, 2:27:57 PM11/15/13
to django-d...@googlegroups.com

I would suggest that's the kind of thing which is unlikely to get merged, mainly for security reasons as someone could potentially configure it more wrong than other things. It's also only useful or relevant for nonstandard large deployments such as yourselves.

That said, sounds an interesting solution and would make a good library. However I'm not knowledgeable enough to say if it is a good idea from a security perspective.

Marc

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5f384586-2183-46b7-a6a2-9ffd14caa3b0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Javier Guerra Giraldez

unread,
Nov 15, 2013, 2:41:43 PM11/15/13
to django-d...@googlegroups.com
On Fri, Nov 15, 2013 at 2:27 PM, Marc Tamlyn <marc....@gmail.com> wrote:
> That said, sounds an interesting solution and would make a good library.
> However I'm not knowledgeable enough to say if it is a good idea from a
> security perspective.


imagine this scenario:

an attacker gets the user database and _a_single_one of these cache entries.
the paswords are bcrypt, but the salts are cleartext. the attacker
chooses _any_ user and calculates a password such that when
concatenated with that user's salt produces a collision [1] with the
single SHA1 cache key stolen.

in short, this library reduces the security from bcrypt to salted
SHA1, and the data needed for any and all the users to any single
cache entry.

hum.... i don't like it

[1]https://www.schneier.com/blog/archives/2005/02/sha1_broken.html


--
Javier

Erik van Zijst

unread,
Nov 19, 2013, 8:48:31 PM11/19/13
to django-d...@googlegroups.com
You make a good point.

An obvious fix would seem to be to add the username to the cache key. This way users cannot "use" another user's cache entry.

Cheers,
Erik

Javier Guerra Giraldez

unread,
Nov 19, 2013, 9:10:26 PM11/19/13
to django-d...@googlegroups.com
On Tue, Nov 19, 2013 at 8:48 PM, Erik van Zijst
<erik.va...@gmail.com> wrote:
> You make a good point.
>
> An obvious fix would seem to be to add the username to the cache key. This
> way users cannot "use" another user's cache entry.


right, that would fix it. (i guess, i'm no security expert)

but still you get only SHA1-level strength, when the whole idea was to
switch to stronger crypto. if in your case SHA1 is enough, you can
simply keep using it. if it's not enough, then you shouldn't be using
it.

of course, that's easy for me to say; i don't manage a big site like
yours, so the switch to PBKDF2 doesn't cost me a cent.

i wonder if siphash is strong enough for paswords...

--
Javier

Donald Stufft

unread,
Nov 19, 2013, 9:20:21 PM11/19/13
to django-d...@googlegroups.com
Password hashing schemes are slow on purpose to prevent brute force.
Siphash wouldn't make sense because if you're switching for speed you
can just use any secure hash function.

crypt by default is much slower than PBKDF2 FWIW. You should tune the
work factor/iterations until it's fast enough that it doesn't negatively
impact your site but as slow as possible otherwise. The higher the work
factor/iterations the harder it is to brute force, but the more negative
impact each login has.

I would tune bcrypt or PBKDF2 down before I implemented this custom
scheme.

--
Donald Stufft
don...@stufft.io
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAFkDaoSRRWFq6zNmYMtSOzPeTuoRQFN7ZbF72f5xeLda%3DQSG%3Dw%40mail.gmail.com.

Wim Lewis

unread,
Nov 19, 2013, 9:38:33 PM11/19/13
to django-d...@googlegroups.com

On 19 Nov 2013, at 6:10 PM, Javier Guerra Giraldez wrote:
> but still you get only SHA1-level strength, when the whole idea was to
> switch to stronger crypto. if in your case SHA1 is enough, you can
> simply keep using it. if it's not enough, then you shouldn't be using
> it.

Well, it seems to me it's still an improvement over plain SHA1 password storage. If the attacker only has access to on-disk data (or backups, etc.), then you have BCrypt-level strength. If the attacker has access to memcached, then you only have SHA1-level strength, as you say.

I don't know what bitbucket's access pattern looks like, but how much less effective would this mixin be if you didn't use memcached (and just had an in-process, unshared password cache / memoized BCrypt)? If an attacker gains access to *that* cache, then they presumably also have access to the plaintext passwords coming from the users, so you haven't lost anything.

Another idea would be to store PBKDF2-with-lower-work-factor(salt+user+password) entries in the cache instead of using SHA1(...). This would let you tune the amount of security you're giving up vs. performance.


Erik van Zijst

unread,
Nov 20, 2013, 1:32:39 AM11/20/13
to django-d...@googlegroups.com
On Tuesday, 19 November 2013 18:38:33 UTC-8, Wim Lewis wrote:

On 19 Nov 2013, at 6:10 PM, Javier Guerra Giraldez wrote:
> but still you get only SHA1-level strength, when the whole idea was to
> switch to stronger crypto.  if in your case SHA1 is enough, you can
> simply keep using it.  if it's not enough, then you shouldn't be using
> it.

Well, it seems to me it's still an improvement over plain SHA1 password storage. If the attacker only has access to on-disk data (or backups, etc.), then you have BCrypt-level strength. If the attacker has access to memcached, then you only have SHA1-level strength, as you say.

Exactly, that's the idea behind it. It's based on the assumption that persistent storage is more vulnerable than transient state. Memcached also only ever contains entries for active accounts and even those get purged after a while, so the "bounty" will only ever be a fraction of what's in the database.
 
I don't know what bitbucket's access pattern looks like, but how much less effective would this mixin be if you didn't use memcached (and just had an in-process, unshared password cache / memoized BCrypt)? If an attacker gains access to *that* cache, then they presumably also have access to the plaintext passwords coming from the users, so you haven't lost anything.

Absolutely. If we could make it work with an in-process cache, we would have.

However, Bitbucket is distributed across many servers through stateless load balancing. This means that consecutive requests by a user typically end up on different servers. Worse still, we use fairly simple single-threaded, synchronous worker processes (gunicorn syncworker) and obtain parallelization through multi processing. Private, in-process caches would thus be very inefficient.
 
Another idea would be to store PBKDF2-with-lower-work-factor(salt+user+password) entries in the cache instead of using SHA1(...). This would let you tune the amount of security you're giving up vs. performance.

I think that's a little bit of a red herring. PBKDF2 and BCrypt are roughly in the same category with regards to cost. In order to make that viable, we'd have to reduce the work factor substantially. I'm not saying there's no better alternative for the cache than SHA1, but through their very design PBKDF2 and BCrypt are unbelievably expensive and would have to be weakened dramatically.

Cheers,
Erik

Luke Plant

unread,
Nov 27, 2013, 6:28:43 AM11/27/13
to django-d...@googlegroups.com
On 15/11/13 18:42, Erik van Zijst wrote:

> How do people feel about this approach and should it be merged into
> Django? If not, then I can turn it into a library instead. Maybe at our
> size we're not in Django's sweet spot anymore. However, in their current
> version the recommended hashers are just not usable for us.

From my point of view, this is definitely something for an external
library, not for Django itself. The additional complexity makes it much
harder to review from a security point of view, and easier to make
mistakes when deploying, and we want to avoid that. Also, many people
will not need the additional performance, and we don't want to make it
easy for people to use a less secure option just because they want a
really fast site or something.

It seems like this can work fine as external code, and so I can't see a
reason why this needs to be in Django itself.

Thanks,

Luke

--
"DO NOT DISTURB. I'm disturbed enough already."

Luke Plant || http://lukeplant.me.uk/
Reply all
Reply to author
Forward
0 new messages