Methodology for increasing the number of PBKDF2 iterations

997 views
Skip to first unread message

Tim Graham

unread,
Sep 20, 2015, 7:20:26 PM9/20/15
to Django developers (Contributions to Django itself)
The latest guidance on increasing the number of PBKDF2 iterations for each release of Django was written by Alex in July 2014:

For each release... "Increase the default PBKDF2 iterations in django.contrib.auth.hashers.PBKDF2PasswordHasher by about 20% (pick a round number)."

He noted in that commit message, "The rate at which we've increased this has not been keeping up with hardware (and software) improvements, and we're now considerably behind where we should be. The delta between our performance and an optimized implementation's performance prevents us from improving that further, but hopefully once Python 2.7.8 and 3.4+ get into more hands we can more aggressively increase this number."

https://github.com/django/django/commit/6732566967888f2c12efee1146940c85c0154e60

Upon seeing a proposed 25% increase for 1.10 (to bring the iteration count to 30,000), Claude and Aymeric questioned this:

Aymeric: "I don't believe single-threaded execution gets 25% faster every 8 months with modern CPUs. Should be have a guideline about the duration of one call to the hasher on some reference platform?
Claude: "Same question for me. I wouldn't blindly apply that 25% increase each time. It's good that we question that number at each release, but let's be smart enough to evaluate if the increase is justified or not."

Alex Gaynor

unread,
Sep 20, 2015, 7:26:10 PM9/20/15
to django-d...@googlegroups.com
Unfortunately here is where we hit an asymmetry: single threaded performance of PBKDF2 _as realized in our pure Python implementation_ indeed does not improve by 25% every 8 months.

Unfortunately 24k iterations is behind where we'd want to be (~100k iterations, or a factor of 4, last I checked).

The only way to reconcile this is for more users to get Python 2.7.8 and 3.4+, where there's a faster implementation of PBKDF2, or to entirely switch to alternate algorithms such as bcrypt.

Alex

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a13898dc-5f34-4d3a-83f4-88dff82bdfb8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

Donald Stufft

unread,
Sep 20, 2015, 7:30:11 PM9/20/15
to django-d...@googlegroups.com, Alex Gaynor
On September 20, 2015 at 7:26:09 PM, Alex Gaynor (alex....@gmail.com) wrote:
> > Unfortunately 24k iterations is behind where we'd want to be
> (~100k iterations, or a factor of 4, last I checked).

If I remember, a key thing was we wanted the PBKDF2 iterations to be much
higher than they were because they hadn't kept up with improvements (or
adjusted at all at) but we didn't want to just jump from some low amount (20k?)
straight to 100k in one release. The 25% number was, if I recall, an attempt
to move us to that point over time, so it was purposely chosen to be faster
than CPU increases because if it was equal to that we'd never catch up to where
we should be.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA


Tim Graham

unread,
Sep 21, 2015, 10:30:29 AM9/21/15
to Django developers (Contributions to Django itself), alex....@gmail.com
Django 1.8 is the last version to support Python 3.2 and 3.3, so I believe we could assume Python 2.7.8+ and 3.4+ as of Django 1.9. While we only *officially* support the latest release of each Python series, explicitly dropping support for < Python 2.7.8 might not be acceptable, however, it seems like it would resolve the asymmetry and allow us to more aggressively increase the number of iterations?

Collin Anderson

unread,
Sep 21, 2015, 10:56:00 AM9/21/15
to django-d...@googlegroups.com, alex....@gmail.com
Is there an external library for Python < 2.7.8? I know we don't officially support the system version of python in RHEL/CentOS and Ubuntu, but I bet we could get away with requiring a dependency for those old versions of Python in new versions of Django.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Donald Stufft

unread,
Sep 21, 2015, 12:12:35 PM9/21/15
to django-d...@googlegroups.com, Collin Anderson, alex....@gmail.com
On September 21, 2015 at 10:55:57 AM, Collin Anderson (cmawe...@gmail.com) wrote:
> Is there an external library for Python < 2.7.8? I know we don't officially
> support the system version of python in RHEL/CentOS and Ubuntu, but I bet
> we could get away with requiring a dependency for those old versions of
> Python in new versions of Django.
>


https://cryptography.io/en/latest/hazmat/primitives/key-derivation-functions/#cryptography.hazmat.primitives.kdf.pbkdf2.PBKDF2HMAC 

Josh Smeaton

unread,
Sep 22, 2015, 12:15:46 AM9/22/15
to Django developers (Contributions to Django itself), cmawe...@gmail.com, alex....@gmail.com
Is the concern that 100,000 iterations is too slow on python < 2.7.8 but is acceptable on versions after that? If so, then we wouldn't be breaking < 2.7.8, we'd just be reducing the performance profile, right? We could call out such things in the release notes. 

Tim Graham

unread,
Sep 22, 2015, 1:18:46 PM9/22/15
to Django developers (Contributions to Django itself), cmawe...@gmail.com, alex....@gmail.com
As I understand it, the problem with increasing the number of iterations on the slower hasher is that upgrading Django could effectively result in a DDoS attack after you upgrade Django as users passwords are upgraded.

Some benchmarking suggests that the new algorithm results in a 3x speed up (100,000 iterations done 100 times is ~30 sec. on old Python 2.7's and ~10s with 2.7.8+).

An option could be to make the number of iterations dependent on the Python version?

Christophe Pettus

unread,
Sep 22, 2015, 1:23:01 PM9/22/15
to django-d...@googlegroups.com

On Sep 22, 2015, at 10:18 AM, Tim Graham <timog...@gmail.com> wrote:

> As I understand it, the problem with increasing the number of iterations on the slower hasher is that upgrading Django could effectively result in a DDoS attack after you upgrade Django as users passwords are upgraded.

Is that correct? My understanding was that the passwords were only modified when changed. Given that it is a unidirectional hash, I'm not sure how they *would* be rehashed.

--
-- Christophe Pettus
x...@thebuild.com

Tim Graham

unread,
Sep 22, 2015, 1:27:34 PM9/22/15
to Django developers (Contributions to Django itself)
We have access to the plain text password when the user logs in.

Christophe Pettus

unread,
Sep 22, 2015, 1:39:12 PM9/22/15
to django-d...@googlegroups.com

On Sep 22, 2015, at 10:27 AM, Tim Graham <timog...@gmail.com> wrote:

> We have access to the plain text password when the user logs in.

Right, so we could *in theory* upgrade the user's password then if we wished (not clear if we want to). Even so, I don't think that would be a DDoS-attack level problem, since it's no worse than a user resetting their password.

Tim Graham

unread,
Sep 22, 2015, 2:43:05 PM9/22/15
to Django developers (Contributions to Django itself)
Sorry, I explained poorly. We do upgrade passwords when the iteration count is increased https://docs.djangoproject.com/en/1.8/topics/auth/passwords/#password-upgrading

If we increase the iterations to <new iterations>, when a user logs in, we have to hash <current iterations> to check the password against the current hash plus <new iterations> to store the upgraded password. If pbkdf2 is slow, isn't it reasonable that this could cause a CPU spike on a high traffic site?

Alex Gaynor

unread,
Sep 22, 2015, 2:59:18 PM9/22/15
to django-d...@googlegroups.com
Sure, but such a problem has nothing to do with password upgrades, it can already be triggered by registration, or even just logging in without a password upgrade.

Alex

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.

Tim Graham

unread,
Sep 22, 2015, 3:21:57 PM9/22/15
to Django developers (Contributions to Django itself)
Right -- the performance is only slightly worse during an upgrade than during other actions. I found the conversation I was thinking about. It's in the private security tracker (the patch to upgrade the iterations count was developed there), so I'll copy/paste the relevant bits here. The patch in question is https://github.com/django/django-private/commit/7d0d0dbf26a3c0d16e9c2b930fd6d7b89f215946

Paul McMillan: "I don't believe this is necessary"

Donald Stufft: @PaulMcMillan Why? I think it's a good change and I think it's needed. Otherwise in X years people created with a 1.4 era hasher will have severely weaker password hashes.

P: "Upgrading hashes like the patch does causes double the login load after any upgrade. The hashing margin is already high, and having "my server's running really slow and everything broke after I upgraded" is not a good deployment experience. It will also cause people to say "hmm, maybe these new hashes are too many rounds, let's switch to something less secure".

@dstufft "severely weaker" is an overstatement.

Any security change which dramatically changes the load on a server is something we have to weigh extremely carefully. I don't think this is an acceptable tradeoff. The design goal was to have hashes gradually get upgraded as users create new passwords or change existing ones. This allows admins to see the load increasing gradually, and plan server resources accordingly. Walloping them with a wall of increased load immediately after upgrade isn't fair, and discourages users from upgrading."

D: "@PaulMcMillan I don't believe I agree with that. Using PBKDF2 is already a fraction of what you get with bcrypt so by that logic we shouldn't allow any use of bcrypt either. Notably passlib supports this upgrading of hash iterations (although they have a range of acceptable values).

Severely weaker isn't an understatement. It takes 5 releases past whichever release increases the limit to 12000 to get roughly double the iterations from the 1.4 number of 10k. As far as I know Django is planning on a 6month release schedule so that means 2 and a half years any user who was created with a 1.4, 1.5, 1.6 installation has a password hash half as strong as one created in 2 years. There's no way for that user to *get* a stronger hash besides change their password which most users do not do unless forced to.

P: "`<shrug>` ok then.

As I've said before, anyone using this in large scale production is probably using weaker hashers anyway.

It's really a nasty surprise to have user logins take more than twice as much cpu when you bring the site back from any major upgrade, but I guess most people won't even notice that, since they aren't running at scale.

Too bad this isn't the sort of behavior a large deployment is likely to notice during pre-upgrade testing before it bites them in the ass during a real upgrade with users hammering the server trying to get back in after planned downtime."
------------------

At this point, I'm inclined to continue with the 20-25% iterations increase per release methodology we've been using unless someone wants to advocate for one of the other proposals. We only have 2 releases left (1.10 and 1.11) that will support Python 2. After dropping Python 2 and any chance that someone will be using the slower pbkdf2, we can reevaluate.

Aymeric Augustin

unread,
Sep 22, 2015, 5:13:53 PM9/22/15
to django-d...@googlegroups.com
> On 22 sept. 2015, at 19:22, Christophe Pettus <x...@thebuild.com> wrote:
>
> Given that it is a unidirectional hash, I'm not sure how they *would* be rehashed.

If you have password hashed at 15 000 rounds and want 20 000, you do 5 000 rounds on the current hash. (I don’t know if Django does this.)

--
Aymeric.




Aymeric Augustin

unread,
Sep 22, 2015, 5:15:37 PM9/22/15
to django-d...@googlegroups.com
On 22 sept. 2015, at 21:21, Tim Graham <timog...@gmail.com> wrote:

> At this point, I'm inclined to continue with the 20-25% iterations increase per release methodology we've been using unless someone wants to advocate for one of the other proposals.


I agree.

--
Aymeric.



Tim Graham

unread,
Jan 2, 2017, 9:50:26 AM1/2/17
to Django developers (Contributions to Django itself)
Now that Python 2 is dropped in Django 2.0 and a faster implementation of pbkdf2 is guaranteed to be available (from what I understand), it's time to reevaluate our strategy for increasing the number of iterations each release.

I'm not sure how to evaluate potential performance issues from bumping the number of iterations too aggressively. I'd be happy to do some benchmarks but I'm not sure what will be meaningful.

The Python docs say, "As of 2013, at least 100,000 iterations of SHA-256 are suggested." [0]

Here are the number of iterations in recent versions of Django:
Django 1.8: 20000
Django 1.9: 24000
Django 1.10: 30000
Django 1.11: 36000

[0] https://docs.python.org/3/library/hashlib.html#hashlib.pbkdf2_hmac

Martin Koistinen

unread,
Jan 3, 2017, 12:45:42 PM1/3/17
to Django developers (Contributions to Django itself)
I think the best practice is to set the iterations as high as you can tolerate without adversely affecting the user experience as they log-in. Iteration numbers as high as 200,000 for SHA-256 or even more are not unheard of these days. Without looking at an application's password expiration policies, there's really no "one size fits all" number here.

But, to be consistent with Django 1.x going forward, let's define 36,000 iterations as "acceptable performance" for a Python2 with Django 1.11 install on a typical piece of server hardware today (beginning of 2017). A useful benchmark would be to determine how many iterations would yield the same delay on a Py3 + Django 1.11 install on the same server.

This should probably server as a baseline default number of iterations and, IMHO, there should probably be reasonable amount of encouragement in the documentation to set the number of iterations to a value as high as the application can tolerate. Ideally, there could be some in-built benchmarking tools to make this easier for the admin.

Adam Johnson

unread,
Jan 3, 2017, 1:14:58 PM1/3/17
to django-d...@googlegroups.com
But, to be consistent with Django 1.x going forward, let's define 36,000 iterations as "acceptable performance" for a Python2 with Django 1.11 install on a typical piece of server hardware today (beginning of 2017). A useful benchmark would be to determine how many iterations would yield the same delay on a Py3 + Django 1.11 install on the same server.

That sounds like a sensible benchmark to see where we are at current. I think Django should be aiming for 100k+ as default at least to match the Python docs though. Let's not forget that users can tweak it down as well as up if they do have problems with the execution time.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/583ae294-7307-4db2-898a-7f6e558df904%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Adam

Tobias McNulty

unread,
Jan 3, 2017, 1:45:30 PM1/3/17
to django-developers
On Tue, Jan 3, 2017 at 1:14 PM, Adam Johnson <m...@adamj.eu> wrote:
But, to be consistent with Django 1.x going forward, let's define 36,000 iterations as "acceptable performance" for a Python2 with Django 1.11 install on a typical piece of server hardware today (beginning of 2017). A useful benchmark would be to determine how many iterations would yield the same delay on a Py3 + Django 1.11 install on the same server.

That sounds like a sensible benchmark to see where we are at current. I think Django should be aiming for 100k+ as default at least to match the Python docs though. Let's not forget that users can tweak it down as well as up if they do have problems with the execution time.

I agree; this seems like a strong argument for setting it appropriately high and documenting that it can be decreased if required, rather than setting low enough that it won't cause an issue on any hardware and documenting that developers should increase it (which seems less likely to happen, IMO).

If the Python docs are correct and the Python 3 pbkdf2_hmac implementation is 3x faster than the version in Python 2, 100,000 would seem to be a fairly unobjectionable (perhaps even a bare minimum) starting point, given we're starting with around 30,000-36,000 in current Django. That said, that recommendation is also ~4 years old at this point, so some benchmarks or at least further research may be in order...

Tobias
--

Tobias McNulty
Chief Executive Officer

tob...@caktusgroup.com
www.caktusgroup.com

Message has been deleted

Martin Koistinen

unread,
Jan 3, 2017, 7:56:35 PM1/3/17
to Django developers (Contributions to Django itself)
Hmmmm, I just tried this using a simple management command to do some basic benchmarking of password hashing. I made this little package Py2/Py3 compatible. You can find it here: https://github.com/mkoistinen/hash_benchmark

(Just install it from the repo into an existing project, then add 'hash_benchmark' to your INSTALLED_APPS and you now have the management command `hash_benchmark`.)

I was expecting to see Py3 out-perform Py2 here by roughly 3X based on this thread. Instead, I see the opposite.

Python: 2.7.10 (default, Jul 13 2015, 12:05:58) [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)]

Django: 1.9.7

Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0955s


vs.

Python: 3.5.1 (v3.5.1:37a07cee5969, Dec  5 2015, 21:12:44) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

Django: 1.10.3

Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.2751s


What am I missing here?

Tim Graham

unread,
Jan 3, 2017, 8:06:38 PM1/3/17
to Django developers (Contributions to Django itself)
The PBKDF2 speed improvements are in Python 2.7.8 and 3.4+, so you'd need to use Python 2.7.7 or earlier to get the slower version.

Aymeric Augustin

unread,
Jan 4, 2017, 3:22:58 AM1/4/17
to django-d...@googlegroups.com
Still, this benchmark shows Python 3.5 being 3 times slower than Python 2.7.

This is a surprisingly large regression for this time-sensitive function.

-- 
Aymeric.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

Tobias McNulty

unread,
Jan 4, 2017, 11:33:17 AM1/4/17
to django-developers
Here's an interesting tidbit from Alex Gaynor in 2014:


It's worth noting that, if I'm understanding this correctly, there are two slow versions of pbkdf2 we have to worry about -- the one bundled in Django (https://github.com/django/django/blob/6732566967888f2c12efee1146940c85c0154e60/django/utils/crypto.py#L142, which is used pre-2.7.8 and pre-3.4 and claims to be 5x slower) and the Python fallback for pbkdf2_hmac (which I suppose is used if OpenSSL is unavailable (?) and claims to be 3x slower).

Martin, is it possible your version of Python 3 is not linked against OpenSSL and hence is missing the fast version of pbkdf2_hmac? I haven't had a chance to try your benchmark yet, but in a quick test I don't see any difference between Python 3.5.2 and Python 2.7.12 on a Mac.

Tobias


-- 
Aymeric.

To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Joey Wilhelm

unread,
Jan 4, 2017, 12:40:03 PM1/4/17
to django-d...@googlegroups.com
FWIW, here are my own results from that benchmark (I ran each 5 times just to account for any other system activity):

Python: 2.7.12, Django: 1.10.4
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0884s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0854s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.1034s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.1119s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0949s

Python: 3.5.2, Django: 1.10.4
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0876s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0857s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0872s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0847s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0874s

Python: 3.6.0, Django: 1.10.4
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0861s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0789s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0803s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0779s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.0815s

This appears to agree with Tobias' results; this is also on a Mac. I can toss in an older Python 2.7 as well if necessary or desired to see the slower implementation. But I think this shows that there's a near enough negligible speed difference in recent Python versions. Aside from perhaps a very slight speedup in 3.6.

-Joey Wilhelm

On Wed, Jan 4, 2017 at 9:32 AM, Tobias McNulty <tob...@caktusgroup.com> wrote:
Here's an interesting tidbit from Alex Gaynor in 2014:


It's worth noting that, if I'm understanding this correctly, there are two slow versions of pbkdf2 we have to worry about -- the one bundled in Django (https://github.com/django/django/blob/6732566967888f2c12efee1146940c85c0154e60/django/utils/crypto.py#L142, which is used pre-2.7.8 and pre-3.4 and claims to be 5x slower) and the Python fallback for pbkdf2_hmac (which I suppose is used if OpenSSL is unavailable (?) and claims to be 3x slower).

Martin, is it possible your version of Python 3 is not linked against OpenSSL and hence is missing the fast version of pbkdf2_hmac? I haven't had a chance to try your benchmark yet, but in a quick test I don't see any difference between Python 3.5.2 and Python 2.7.12 on a Mac.

Tobias

Alex Gaynor

unread,
Jan 4, 2017, 12:42:47 PM1/4/17
to django-d...@googlegroups.com
Python 2.7.12 will look the same as 3.5.x, they both have the optimized implementation. Only 2.7.X where X<8 will have the slow implementation.

If someone was motivated, they could look at the PyPI bigquery and see what versions of 2.7 people are using to install django.

Alex


For more options, visit https://groups.google.com/d/optout.



--
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: D1B3 ADC0 E023 8CA6

Adam Johnson

unread,
Jan 4, 2017, 12:52:43 PM1/4/17
to django-d...@googlegroups.com

Joey Wilhelm

unread,
Jan 4, 2017, 1:05:47 PM1/4/17
to django-d...@googlegroups.com
Okay, for good measure, here's with 2.7.7. And yeah, looks like almost 4x slower.

Python: 2.7.7, Django: 1.10.4
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.3050s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.3096s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.3064s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.3162s
Using cipher: "pbkdf2_sha256" with 100,000 iterations, verification takes, on average, 0.3054s

I'm not sure what conclusions this can help with, but at least they are some solid-ish numbers to work with.

-Joey Wilhelm

Martin Koistinen

unread,
Jan 4, 2017, 2:13:09 PM1/4/17
to Django developers (Contributions to Django itself)
I think this is a pretty solid guess. Bear in mind this was a direct install from Python.org.

The important thing here is, this demonstrates that we cannot just assume that all Python 3 installs have a "fast" PBKDF2 implementation =/

On Wednesday, January 4, 2017 at 11:33:17 AM UTC-5, Tobias McNulty wrote:
... 

Alex Gaynor

unread,
Jan 4, 2017, 2:17:54 PM1/4/17
to django-d...@googlegroups.com
If anyone is curious about the breakdown of versions, I used the following query:

SELECT
  REGEXP_EXTRACT(details.python, r"^([^\.]+\.[^\.]+\.[^\.]+)") as python_version,
  COUNT(*) as download_count,
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    DATE_ADD(CURRENT_TIMESTAMP(), -2, "week"),
    CURRENT_TIMESTAMP()
  )
WHERE
  LOWER(file.project) = 'django'
GROUP BY
  python_version,
ORDER BY
  download_count DESC
LIMIT 100


And got these for the top 9:

Rowpython_versiondownload_count 
13.5.275888 
22.7.1265879 
32.7.663925 
4null56744 
52.7.940378 
62.7.1025213 
73.4.323223 
82.7.1320657 
92.7.517256 

(That's an HTML table, no clue what happens if you use a plaintext email client)

Alex

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.



--
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: D1B3 ADC0 E023 8CA6

Martin Koistinen

unread,
Jan 5, 2017, 10:58:38 AM1/5/17
to Django developers (Contributions to Django itself)
Slightly off-topic, this presents a really nice case for switching to Argon2 via argon2_cffi (supported in Django 1.10+). Its super fast (C-lib) and resistant to GPU/ASIC brute-forcing. So, where as an attacker's 8-GPU hashing machine would probably have something on the order of 24,000X more hashing capability for SHA256 than a typical Django server, I estimate that the same hardware (8 GPUs) would only have about 20-30X more hashing capability than a typical server. (Note, the anecdotal evidence across the internet supporting this is pretty thin).

Tim Graham

unread,
Jan 9, 2017, 7:39:18 PM1/9/17
to Django developers (Contributions to Django itself)
About "we cannot just assume that all Python 3 installs have a "fast" PBKDF2 implementation" -- I'd expect very few if any Django users to be compiling their own Python and doing so without OpenSSL. I'm guessing that any operating system Python will have the OpenSSL bindings. Or is that a bad assumption?

Alex Gaynor

unread,
Jan 9, 2017, 7:47:19 PM1/9/17
to django-d...@googlegroups.com
That's a correct assumption -- you won't be able to use pip without OpenSSL.

Alex

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.



--
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: D1B3 ADC0 E023 8CA6

Martin Koistinen

unread,
Jan 9, 2017, 10:44:33 PM1/9/17
to Django developers (Contributions to Django itself)
The Python3.5 on my system was installed by the official Python installer, and is almost 3X slower than the Apple-built 2.7 install. I use pip all day long.

True, my MacBook is not a server, but it still serves to demonstrate the point that it is not a reasonable assumption that all 3.5 installs use OpenSSL libraries.

Tobias McNulty

unread,
Jan 10, 2017, 1:27:19 PM1/10/17
to django-developers
IMO this doesn't change the argument that it would be best to default to the higher number of iterations (i.e., 100k or higher, given some time as passed since 2013), while noting in the documentation that individual projects have the ability to reduce it if need be (though perhaps recommending that they try first to find a faster Python). Other thoughts?

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.

Tim Graham

unread,
Jan 11, 2017, 10:39:26 AM1/11/17
to Django developers (Contributions to Django itself)
I agree. The question in my mind is how to pick an appropriate number of iterations that we don't risk causing a DoS on (at least most) existing sites due to increased CPU usage. Or at least, can we offer some suggestions about how to tell if your site receives sufficient traffic that you might be impacted? Did anyone notice increased CPU usage in past upgrades?


On Tuesday, January 10, 2017 at 1:27:19 PM UTC-5, Tobias McNulty wrote:
IMO this doesn't change the argument that it would be best to default to the higher number of iterations (i.e., 100k or higher, given some time as passed since 2013), while noting in the documentation that individual projects have the ability to reduce it if need be (though perhaps recommending that they try first to find a faster Python). Other thoughts?
On Mon, Jan 9, 2017 at 10:44 PM, Martin Koistinen <mkois...@gmail.com> wrote:
The Python3.5 on my system was installed by the official Python installer, and is almost 3X slower than the Apple-built 2.7 install. I use pip all day long.

True, my MacBook is not a server, but it still serves to demonstrate the point that it is not a reasonable assumption that all 3.5 installs use OpenSSL libraries.

On Monday, January 9, 2017 at 7:39:18 PM UTC-5, Tim Graham wrote:
About "we cannot just assume that all Python 3 installs have a "fast" PBKDF2 implementation" -- I'd expect very few if any Django users to be compiling their own Python and doing so without OpenSSL. I'm guessing that any operating system Python will have the OpenSSL bindings. Or is that a bad assumption?

On Wednesday, January 4, 2017 at 2:13:09 PM UTC-5, Martin Koistinen wrote:
I think this is a pretty solid guess. Bear in mind this was a direct install from Python.org.

The important thing here is, this demonstrates that we cannot just assume that all Python 3 installs have a "fast" PBKDF2 implementation =/

On Wednesday, January 4, 2017 at 11:33:17 AM UTC-5, Tobias McNulty wrote:
... 
Martin, is it possible your version of Python 3 is not linked against OpenSSL and hence is missing the fast version of pbkdf2_hmac? I haven't had a chance to try your benchmark yet, but in a quick test I don't see any difference between Python 3.5.2 and Python 2.7.12 on a Mac.

Tobias

 

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

Tobias McNulty

unread,
Jan 15, 2017, 5:45:02 PM1/15/17
to django-developers
I'm not sure the DoS concern is really something that can be addressed here. Regardless of the number of iterations we choose, POSTing to the login form will always be a target, unless it's appropriately protected (i.e., with some combination of rate limiting, recaptcha, and/or something at the network level). A run-of-the-mill cloud server that doesn't limit access to the Python app in some way is simply never going to be a match for a malicious person with a laptop, let alone a more sophisticated attack.

I created a tox.ini to run Martin's benchmark with multiple Django & Python versions. A couple notes:
  • I ran this several times on Circle CI using Ubuntu 12.04 with Python 2.7.7, 3.3.3, 3.4.3, and 3.5.0, and Ubuntu 14.04 with 2.7.12, 3.3.6, 3.4.4, and 3.5.2. To view the results, expand the "tox" section under the "Test" header.
  • All results are what one would expect: Python 2.7.7 and Python 3.3.x are ~3-4x slower than Python 2.7.8+ and Python 3.4+, and there are no inexplicably slow outliers, like the official Python 3.5.2 installer for OS X.
My local results are as follows:
  • Ubuntu 16.04 w/a Core i5 @ 3.50GHz:
    • 62-65ms for 100,000 iterations
    • 100-106ms for 165,000 iterations
  • Mac OS 10.12, Core i5 @ 2.7GHz:
    • 117-120ms for 100,000 iterations
    • 195-203ms for 165,000 iterations
I really don't know how we can pick a number that'll work for everyone, but I'm all for setting it high and allowing people to decrease the number of iterations or, better yet, switch to the hasher that the docs recommend everyone use anyway (Argon2). If we define 100-120ms as acceptable performance, 100k would seem reasonable based on the results above and posted elsewhere in this thread.

Martin, FWIW, I can confirm that the Python 3.5.2 installer from python.org demonstrates the same 3x slower behavior on my Mac that you saw. The Python 3.5.2 I installed from Homebrew does not, nor does the official python.org installer for Python 3.6. Based on the absence of any similar outliers in the above tests, however, I still think the conclusion here should be to fix the underlying Python build (if it's really creating a performance issue for you or anyone else), not hold back Django from bumping its default number of PBKDF2 iterations. Dropping Python 2.7 support still means we lose a large swath of definitely-slow PBKDF2 implementations: 24.4% of installs where the Python version was known were using 2.7.5 or 2.7.6 in the chart Alex posted.

The point about switching Django's default to Argon2 is an intriguing one. In the event there are still a bunch of slow PBKDF2 implementations out there with Python 3.5+, one benefit of dramatically increasing PBKDF2 iterations is that it might push more people to Argon2. :-D On a more serious note, I'll reply separately to that thread to save this one for the original topic.

Tobias

To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Martin Koistinen

unread,
Jan 16, 2017, 12:55:25 PM1/16/17
to Django developers (Contributions to Django itself)
Tobias,

Thanks for the comprehensive benchmarking and summary of the situation! I agree on all points, but I'd like to add, that we should err on the side of high iterations for the simple fact that most developers would sooner accept the risk of a DoS long before the risk of compromised user accounts.

Also, if a developer is experienced/motivated enough to lower the hash iterations, s/he'll be more likely to also be experienced/motivated enough to put other controls in place to compensate.


Best,
- Martin

On Sunday, January 15, 2017 at 5:45:02 PM UTC-5, Tobias McNulty wrote:
I'm not sure the DoS concern is really something that can be addressed here. Regardless of the number of iterations we choose, POSTing to the login form will always be a target, unless it's appropriately protected (i.e., with some combination of rate limiting, recaptcha, and/or something at the network level). A run-of-the-mill cloud server that doesn't limit access to the Python app in some way is simply never going to be a match for a malicious person with a laptop, let alone a more sophisticated attack.

I created a tox.ini to run Martin's benchmark with multiple Django & Python versions. A couple notes:
  • I ran this several times on Circle CI using Ubuntu 12.04 with Python 2.7.7, 3.3.3, 3.4.3, and 3.5.0, and Ubuntu 14.04 with 2.7.12, 3.3.6, 3.4.4, and 3.5.2. To view the results, expand the "tox" section under the "Test" header.
  • All results are what one would expect: Python 2.7.7 and Python 3.3.x are ~3-4x slower than Python 2.7.8+ and Python 3.4+, and there are no inexplicably slow outliers, like the official Python 3.5.2 installer for OS X.
My local results are as follows:
  • Ubuntu 16.04 w/a Core i5 @ 3.50GHz:
    • 62-65ms for 100,000 iterations
    • 100-106ms for 165,000 iterations
  • Mac OS 10.12, Core i5 @ 2.7GHz:
    • 117-120ms for 100,000 iterations
    • 195-203ms for 165,000 iterations
I really don't know how we can pick a number that'll work for everyone, but I'm all for setting it high and allowing people to decrease the number of iterations or, better yet, switch to the hasher that the docs recommend everyone use anyway (Argon2). If we define 100-120ms as acceptable performance, 100k would seem reasonable based on the results above and posted elsewhere in this thread.

Martin, FWIW, I can confirm that the Python 3.5.2 installer from python.org demonstrates the same 3x slower behavior on my Mac that you saw. The Python 3.5.2 I installed from Homebrew does not, nor does the official python.org installer for Python 3.6. Based on the absence of any similar outliers in the above tests, however, I still think the conclusion here should be to fix the underlying Python build (if it's really creating a performance issue for you or anyone else), not hold back Django from bumping its default number of PBKDF2 iterations. Dropping Python 2.7 support still means we lose a large swath of definitely-slow PBKDF2 implementations: 24.4% of installs where the Python version was known were using 2.7.5 or 2.7.6 in the chart Alex posted.

The point about switching Django's default to Argon2 is an intriguing one. In the event there are still a bunch of slow PBKDF2 implementations out there with Python 3.5+, one benefit of dramatically increasing PBKDF2 iterations is that it might push more people to Argon2. :-D On a more serious note, I'll reply separately to that thread to save this one for the original topic.

Tobias

Tim Graham

unread,
Jan 18, 2017, 10:25:46 AM1/18/17
to Django developers (Contributions to Django itself)
I increased the iterations to 100,000 on master (targeting Django 2.0). It would be nice to determine a guideline for how to determine future increases.

Martin Koistinen

unread,
Jan 18, 2017, 12:32:55 PM1/18/17
to Django developers (Contributions to Django itself)
Tim, I've sent you a model I've assembled recently for your review. I'll work towards making it more user-friendly (I.e., NOT in Apple Numbers format) and share it here for the whole community.

But for here and now, I would at the very least assume that the cost of a brute-force attack on password hashes falls over time inversely proportional to Moore's Law. Its a naive approach, but is a fairly reasonable one. So, to compensate, we should be doubling the number of iterations every 18-24 months, or perhaps at a minimum, raise them by sqrt(2) annually ~= +40% each year.

+40%/year assumes that the starting point of 100,000 is OK for Q1 2017 (this will not be true for every project) and it assumes that Moore's Law is evenly "applied" over time (its not).

Martin Koistinen

unread,
Jan 19, 2017, 1:19:57 PM1/19/17
to Django developers (Contributions to Django itself)
All, I've converted my worksheet into a Google Docs Sheet here: https://docs.google.com/spreadsheets/d/16_KdYAW03sb86-w_AFFnM79IaTWQ7Ugx4T0VMfGteTM/edit?usp=sharing

Note that it isn't really editable here. You'll need to make a copy into your own account or download into a local spreadsheet to tweak for your system and security policy.

Comments and suggestions are welcome and if appropriate, I'll make edits accordingly.

Martin Koistinen

unread,
Jan 24, 2017, 7:52:35 PM1/24/17
to Django developers (Contributions to Django itself)
Updated the sheet with more recent GPU pricing.

Patryk Zawadzki

unread,
Jan 30, 2017, 3:28:28 AM1/30/17
to Django developers (Contributions to Django itself)
W dniu poniedziałek, 16 stycznia 2017 18:55:25 UTC+1 użytkownik Martin Koistinen napisał:
Also, if a developer is experienced/motivated enough to lower the hash iterations, s/he'll be more likely to also be experienced/motivated enough to put other controls in place to compensate.

Or they just copy-pasted from an out-of-date response to a Stack Overflow question. I'm concerned that if someone is motivated enough to do it they may not get the memo to bump it by 40% every year even if they are experienced and competent.

Martin Koistinen

unread,
Jan 30, 2017, 2:09:56 PM1/30/17
to Django developers (Contributions to Django itself)
IMPORTANT NOTICE: I've just made an important change to the Google Docs Sheet here: https://docs.google.com/spreadsheets/d/16_KdYAW03sb86-w_AFFnM79IaTWQ7Ugx4T0VMfGteTM/edit?usp=sharing

Realizing that most security policies make requirements such as "At least 1 character must be a numeral", etc. for other character classes, I've adjusted this sheet to take this into account along with the resulting reduction of password strength that comes with it. I do recognize that these symbol-requirements policies are there to force people to choose passwords that use a broader set of symbols which has the desired effect of raising password strength, but the actual, theoretical maximum entropy of the resulting passwords is significantly lowered as a result.

As a result, a 8-character password formed with at least 1 of each of these sets:
  • numerals (10);
  • lower-case letters (26);
  • upper-case letters (26);
  • and punctuation symbols (10-ish);
will offer at most 40.7 bits of entropy.

Passwords of this level of strength, when used on a system that uses 30000 iterations of PBKDF2 will be quickly and easily cracked by virtually any serious attacker. 100,000 iterations isn't really any better.

Martin Koistinen

unread,
Feb 12, 2017, 1:12:47 PM2/12/17
to Django developers (Contributions to Django itself)
If anyone is still following this thread... =)

I've just updated the Google sheet above with significant changes. I was using the wrong values for PBKDF2-HMAC-SHA256 hash performance. I now have up-to-date hw costs and new evidence in play. Definitely worth having a look at the latest version. The up-side is PBKDF2 is significantly better than was previously calculated.

Enjoy!

Tim Graham

unread,
Sep 21, 2017, 2:39:38 PM9/21/17
to Django developers (Contributions to Django itself)
It's time to decide how much to bump the iterations for Django 2.1 -- anyone care to make a proposal? My understanding is that we should revisit the current "bump by 20% each release" guideline in Django's release checklist. Django 2.0 uses 100,000 iterations.
Reply all
Reply to author
Forward
0 new messages