Right now, Django's auth system pretty much uses sha1 hardwired in (literally, in the case of User.set_password) for the hash. For a discussion of why a general-purpose hash function is not the best idea in the world for password encryption, see:
http://codahale.com/how-to-safely-store-a-password/
I'd like to propose a backwards-compatible method of allowing different hash algorithms to be used, while not adding new dependencies on external libraries to the core.
1. Add a setting DEFAULT_PASSWORD_HASH. This contains the code for the algorithm to use; if it is absent, 'sha1' is assumed.
2. Add a setting PASSWORD_HASH_FUNCTIONS. This is a map of algorithm codes to callables; the callable has the same parameters as auth.models.get_hexdigest, and return the hex digest its parameters (to allow for a single function to handle multiple algorithms, the algorithm aprameter to get_hexdigest is retained). For example:
PASSWORD_HASH_FUNCTIONS = { 'bcrypt': 'myproject.myapp.bcrypt_hex_digest' }
3. auth.models.get_hexdigest is modified such that if the algorithm isn't one of the ones it knows about, it consults PASSWORD_HASH_FUNCTIONS and uses the matching function, if present. If there's no match, it fails as it does currently.
4. User.set_password() is modified to check the value of DEFAULT_PASSWORD_HASH, and uses that algorithm if specified; otherwise, it uses 'sha1' as it does not. (Optional: Adding the algorithm as a default parameter to User.set_password().)
Comments?
--
-- Christophe Pettus
x...@thebuild.com
Completely leaving aside the potential benefit of allowing different
hash algorithms, I think the specific argument made by the author of
that article, along with their proposed solution of an "intentionally
slow" algorithm, is the wrong approach. Your application ends up just
as hobbled by such an algorithm as a potential attacker. If you're
choosing a slowdown factor based on your worst-case attacker, you're
likely to significantly impair the ability of a website running on
hardware that's not as fast, especially if you're authenticating users
all the time.
I think there are better potential solutions to concerns about
password cracking. Django already salts the hashes, which is
asymmetrical in a good way: it helps complicate brute force attacks
without slowing down Django's ability to test a given password.
Better password policies can also help; e.g., each additional letter
you require in your users' passwords exponentially increases the space
of passwords that need to be brute-forced. In cases where your
attacker doesn't have direct access to the database, you can greatly
slow them down by only allowing a certain amount of login attempts in
a given time period.
> Your application ends up just
> as hobbled by such an algorithm as a potential attacker.
Actually, no, the situations are really quite asymmetrical. In order to brute-force a password, an attacker has to be able to try many, many thousands of combinations per second. To log in a user, an application has to do it exactly once. A hash computation time of, say, 10ms is probably unnoticeable in a login situation, unless you have tens of thousands of users logging in per minute (and if this is the case, then you probably have other problems than the speed of your password hash algorithm). But that would pretty much slam the door down on any brute force attempt at a password recovery.
> Django already salts the hashes, which is
> asymmetrical in a good way: it helps complicate brute force attacks
> without slowing down Django's ability to test a given password.
A salt is of no benefit on a brute force attack; it's function is to prevent dictionary attacks, which are a different animal.
And if you are willing to assume that no attacker can ever get access to your database, then you don't have to hash the password at all.
But, as you point out, that's a separate discussion from the value of pluggable encryption algorithms. There was a time that MD5 was the perfect answer; now, it's SHA-1. Different applications will have different needs as far as how they write the passwords to disk, and having an architecture to handle this seems like a good idea.
But how far are you willing to go in your assumption of the worst-case
computational ability of your attacker? Would tuning the hash to
(say) a 10ms delay for your web server's modest hardware translate
into a significant delay for an attacker with far more resources?
(This isn't a rhetorical question; I honestly don't know.)
> A salt is of no benefit on a brute force attack; it's function is to prevent dictionary attacks, which are a different animal.
It does in fact slow down brute force attacks against multiple
encrypted passwords; each password with a different salt is within an
entirely different space that needs to be brute forced separately from
the other passwords.
> And if you are willing to assume that no attacker can ever get access to your database, then you don't have to hash the password at all.
Sure, but my point was that there are various walls you can throw up
against attackers to slow them down that don't involve slowing down
your hash algorithm.
> But, as you point out, that's a separate discussion from the value of pluggable encryption algorithms.
Right; I didn't mean to dissent from (or concur with) that proposal.
Let's do the math. The space of eight alphanumeric character passwords is 2.8e12. Even assuming you can cut two orders of magnitude off of that with good assumptions about the kind of passwords that people are picking, this means that the attacker has to run about 28 billion times more computations that you do. At 10ms per password, it would take them about 447.8 years to crack a single password, assuming hardware of equivalent speed.
> It does in fact slow down brute force attacks against multiple
> encrypted passwords; each password with a different salt is within an
> entirely different space that needs to be brute forced separately from
> the other passwords.
Remember how a brute force attack works. Given a hash x, the attacker does:
hash('00000000' + salt) = x? No, then,
hash('00000001' + salt) = x? No, then,
...
The only benefit of the salt here is that it makes the string to be hashed a bit longer, but the benefit is linear, not exponential.
A dictionary attack works by consulting a precomputed set of passwords and their hashes, (pwd, hash(pwd)). The attacker then runs down the dictionary, comparing hashes; if they get a hit, they know the password. The salt defeats this by making the pwd -> hash(pwd) mapping incorrect.
I'm being slightly inaccurate here; what I'm describing above is a rainbow dictionary attack, rather than just a plain dictionary attack (which is a brute force attempt on the password over a limited range of input values). Anyway, a salt isn't helpful for a plain dictionary attack, either, for the same reason as a brute force attack.
Anyway, back to the discussion of the actual proposal. :)
The point is that I'm *not* assuming hardware of equivalent speed.
I'm assuming that a worst-case attacker has hardware significantly
faster than your webserver at their disposal, so I was curious if the
purported benefit still held in that case. Maybe it does; I don't
know.
>> It does in fact slow down brute force attacks against multiple
>> encrypted passwords; each password with a different salt is within an
>> entirely different space that needs to be brute forced separately from
>> the other passwords.
>
> Remember how a brute force attack works. Given a hash x, the attacker does:
>
> hash('00000000' + salt) = x? No, then,
> hash('00000001' + salt) = x? No, then,
> ...
>
> The only benefit of the salt here is that it makes the string to be hashed a bit longer, but the benefit is linear, not exponential.
I'm not arguing that a salt helps against brute-forcing a *single*
password (it doesn't), but it does in fact help against someone trying
to brute-force your entire password database (or any subset of more
than one password), since each password with a different salt lies
within an entirely different space that must be brute-forced
separately from the rest.
> Anyway, back to the discussion of the actual proposal. :)
Sure, I didn't mean to veer things too far off course here; even
assuming the bcrypt argument doesn't hold, it's entirely possible that
someone may want to easily plug in SHA512/SHA3/whatever into their
password encryption.
Well, yes, it does, for exactly the reason described: The application has to encode exactly one password; the attacker has to try billions in order to brute-force one. If you assume, say, one password per week is the slowest practical attack, and if it takes 10ms to hash one password, the attacker's hardware has to be about 46,654 times more powerful than your web server.
> I'm not arguing that a salt helps against brute-forcing a *single*
> password (it doesn't), but it does in fact help against someone trying
> to brute-force your entire password database (or any subset of more
> than one password), since each password with a different salt lies
> within an entirely different space that must be brute-forced
> separately from the rest.
I'm not sure what you mean by the "space"; I think you are thinking of a rainbow dictionary attack, where the hashes are precomputed; a salt does indeed help (and probably blocks) that kind of attack. In the case of a straight brute-force attack or a standard dictionary attack without precomputing, the only benefit of the salt is that it makes computing the candidate hash a bit longer, based on the length of the salt. It's a trivial amount of time.
Remember, it's extremely inexpensive to brute-force a single MD5 or SHA1 hash, and the salt does not make it appreciably more expensive. If a CUDA application can brute force 700 million MD5s per second, doubling the length is not really going to make it any more secure.
No, I'm not thinking of rainbow tables. The key word here is
*single*. As I said before, a salt *does* help against an attacker
trying to brute-force multiple passwords from your database, since he
can't simply test each brute-force result against all your passwords
at once; he has to start all over from scratch for every single
password that has a different salt. If he only cares about one
*particular* account, the salt doesn't help, no.
But regardless, I apologize for derailing this conversation so far off.
Even in your scenario, it only helps as much as the entropy in the password selection. If everyone has a unique password, it doesn't help at all (admittedly unlikely). Again, it's a linear benefit, but not an exponential one.
Right. So, about that proposal... :)
As far as the proposal goes, I think this is a perfectly reasonable
feature request (and you should open a ticket about it if one does not
already exist).
I'd favor a solution where your setting mapped the algo name to the
actual function used:
PASSWORD_HASH_FUNCTIONS = { 'bcrypt':
myproject.myapp.bcrypt_hexdigest, 'sha1':
django.utils.hashcompat.sha_constructor, etc.}
Then we could put the existing hash functions (sha1, md5, etc.) in
that setting as the default, and get rid of the algo-checking code
that currently lives in auth.models. When we do a password comparison,
we simply pull the hash name, lookup the function, and away we go.
I don't think this will make it into 1.3, but it's a reasonable thing
to do and I think it would help improve all the special-case code that
currently lives in auth.models. The patch itself wouldn't be too hard,
and I'd be willing to write it myself if nobody else will.
-Paul
First comment is that Django already has a pluggable authentication
stack, which already allows for this - simply define a new auth
backend that tests the password in the manner you wish.
It doesn't allow for this with the default authenticator, but it is
doable. I have a django project with >100k users, and none of them
have a sha1 hash as their password.
Cheers
Tom
> First comment is that Django already has a pluggable authentication
> stack, which already allows for this - simply define a new auth
> backend that tests the password in the manner you wish.
My understanding of the pluggable authentication system is that it's
for situations where you need a totally different authentication
mechanism, such as LDAP. Simply replacing the crypto mechanism for the
default authentication system should not require developing a lot of
pieces. It is something that needs to be upgraded on an ongoing basis
for everyone. It's simply best practices.
The federal government already forbids use of SHA-1 after 2010.
> It doesn't allow for this with the default authenticator, but it is
> doable. I have a django project with >100k users, and none of them
> have a sha1 hash as their password.
I won't comment on the wisdom of this, but I'd not use it as an
example of why we don't need to provide flexibility to improve
security.
Chris
--
| Chris Petrilli
| petr...@amber.org
It doesn't 'require developing a lot of pieces'. Have you even tried
implementing this in the current stack?
At the moment, a typical setup has AUTHENTICATION_BACKENDS set to
('django.contrib.auth.backends.ModelBackend',). Changing how passwords
are tested simply requires a different backend, typically derived from
ModelBackend, that overrides the authenticate method.
Is that a lot of pieces, or one small one?
>
> The federal government already forbids use of SHA-1 after 2010.
>
>> It doesn't allow for this with the default authenticator, but it is
>> doable. I have a django project with >100k users, and none of them
>> have a sha1 hash as their password.
>
> I won't comment on the wisdom of this, but I'd not use it as an
> example of why we don't need to provide flexibility to improve
> security.
>
> Chris
Wow, that's a thing to say. Your federal government forbids SHA-1, I
don't use SHA-1, but you "won't comment on the wisdom of this"? Let's
try to keep it civil without casting FUD and aspersions around, eh.
We already have flexibility to implement security in any manner that
you can think of. I'm looking for the argument that says 'This current
flexibility is not enough, and we need to re-architecture', and I
don't think that has been made.
Cheers
Tom
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
In a nutshell, if something requires python >= 2.5 or a lib for older
versions of Python, forget about adding it.
See f. e. http://code.djangoproject.com/ticket/5600 which was closed as
a no-fix 3 years ago (full disclosure: I'm coh in that bug report).
There was also a discussion on this mailing list a few weeks ago about
increasing the salt length, but afaik it had no code-change as a result.
I apologize if I sound a bit grumpy, but I've spend the last 5 days with
monkey-patching a local branch of the auth lib up to the latest in
security (SHA512, 128-bit salt, pre-stretching, pbkdf2, stronger random
token generation (salt, csrf, default-password)), now it spreads into
other areas of the django-lib as well (currently SECRET_KEY in the
starproject script).
Of course I would very much welcome such a proposal, yet I just believe
the odds for it to happen are (very) low.
Cheers,
coh
the idea of a salt is only to make certain that two users who happen to
use the same password (123456, anyone?) don't end up with the same hash
in order to make a pre-computation (password lists or rainbow tables)
infeasible. yet given the short salts in django, it's not really
unlikely that two users will not share the same salt as well as
password. Also keep in mind that, due to the Birthday Paradoxon, a hash
with N bits only has odds of 1:2^(N/2) instead of 1:2^N for a collision
to occur.
Hope that clears up things a little bit :)
coh
That's a subject which comes up every few months, sadly.
In a nutshell, if something requires python >= 2.5 or a lib for older
versions of Python, forget about adding it.
Peter
[1]
http://stacksmashing.net/2010/11/15/cracking-in-the-cloud-amazons-new-ec2-gp
u-instances/
Yet the patch for the salt-size only increase, it was added not 24 hours
after that, still didn't make its way into any release as far as I'm
aware of it.
Given the current 20-bit length (5 hex chars), salt-collisions will happen.
On 02/11/2011 04:04 PM, Russell Keith-Magee wrote:
> If an idea is important enough, we will include compatibility options
> for older Python versions.
>> In a nutshell, if something requires python >= 2.5 or a lib for older
>> versions of Python, forget about adding it.
> That's not true at all.
> ... but to say that we won't do
> this at all is patently and demonstrably incorrect.
Sorry if it came along as too harsh --
> I apologize if I sound a bit grumpy, but I've spend the last 5 days with
> monkey-patching a local branch of the auth lib...
Once again, I didn't mean to insult any dev (running a few projects
myself, so I know how much work it is) and I appreciate the work that is
done.
> Yours,
> Russell Keith-Magee
Cheers,
coh
Wasn't, but now is -- https://bitbucket.org/coh/django_sec_mod/ . It's
against svn rev 15488. Please note that any sort of backwards compat is
broken with on purpose and I haven't really tested the changes yet.
As for the salt length, I was actually referring to ticket 5600.
Have fun,
coh
http://www.f-secure.com/weblog/archives/00002095.html
And referenced from that, http://www.golubev.com/hashgpu.htm with the quote:
> Recovery speed on ATI HD 5970 peaks at 5600M/s MD5 hashes and 2300M/s SHA1 hashes.
That means, 2,300,000,000 SHA1 hashes per second.
On 02/12/2011 02:02 PM, poswald wrote:
> I'm hoping this background material is useful and gets everyone on the
> same page.
fullack.
Cheers,
coh
As with Carl -- I'm only one core dev, and I'm not a crypto expert,
here's my take:
I agree that it's less than ideal for us to continue to use SHA1 given
it's known inadequacies.
My concern with this approach is that it requires us to either
maintain or adopt an encryption algorithm. We're not just using
something from the standard library, we're taking responsibility for
the holes in a specific implementation. Even if we just adopt code
from an existing implementation, we are accepting responsbility for
finding and fixing any holes in that implementation. This is a
responsibility that can't be taken lightly, and I'm not completely
convinced that we should pick up that particular gauntlet.
For this reason alone, I could be convinced that a configuration item
may be called for here -- e.g., registering a user-crypto library that
the default User object will use, in the same way that you can
currently register serialization libraries.
However, that said:
> As for the broader configurability question, I'm just fine with
> requiring a custom auth backend, which really isn't that hard, as a
> condition for customizing password hashing. So I'm not particularly
> tempted by proposals to add a new setting for this. The hardcoded
> stuff in the User model does bug me, though; I'm interested in the
> proposal to make the User model delegate that to new methods on an
> authentication backend (with backwards-compatibility fallback for old
> auth backends that don't have the new methods).
One of the things that I want to tackle in the 1.4 timeframe is the
general problem of a 'pluggable' User model. Allowing for customizable
authentication schemes is one (of many) parts of this problem. Right
now, my focus is on getting the 1.3 release out the door; once the 1.4
feature phase starts, I'll have a lot more time to discuss this sort
of thing.
Yours,
Russ Magee %-)
New features are always applied to trunk, so if you're developing new
code, thats what you should be developing against.
Yours,
Russ Magee %-)
I've been desperately trying to get up to speed on this stuff over the
past few weeks. Crypto's very far from my strong suit, but I think I
know enough now to agree. It seems to me we need two things:
1. A new, updated default for Django's password hashing. PBKDF2,
perhaps, but whatever as long as it meets some basic requirements.
2. A mechanism to make swapping this hashing algorithm out easy(-ier).
Again, details don't matter, requirements do.
#1's a blocker for 1.4, I think, but if for some reason #2 can't be
figured out I think it's ok to punt there for a bit longer. Ideally
though they'd both go in at once.
Now, I want to make very explicit my requirements here since we've
gone 'round on this one a few times, so I'll lay out exactly what I'm
going to want to see to get on board with any proposal. So:
Requirements for a new password hash:
* As little crypto code in Django as possible. We're not security
experts, and we shouldn't try to be. Ideally would be something that
leaves all of the dangerous parts to the stdlib. Perhaps we relax our
dependency policy (we need to some day, I think, but that's a bigger
argument maybe we shouldn't have now).
* Any code we distribute gets audited by people who know what they're
talking about.
* Those people have reputations sufficient to convince me (or other
core devs) that they know what they're doing. This is sorta a "who
watches the watchers" moment, but we can't just trust someone who says
they're a crypto expert; we have to believe them, too.
Requirements for pluggable hashing algorithms:
* The big one is cross-installation password compatibility. If I
upgrade from Python 2.4 to 2.7 my passwords have to keep working. If I
install django-bcrypt my old passwords have to keep working. If I then
decide to switch to pbkdf2 my bcrypt passwords have to keep working.
We already have an in-place upgrade mechanism for md5; we probably
need something similar as a generic thing.
* Failures need to be clear - I shouldn't get mysterious login
failures if I accidentally uninstall bcrypt (i.e. I should get a loud,
clear, failure quickly).
* We need an internal upgrade path that *we* can use when a few years
from now everyone starts complaining that PBKDF2 is fundamentally
flawed and that we're total idiots for clinging to it.
[It occurs to me that, with the right mentor, this would make a
fantastic SoC project...]
Jacob