Hashlib docs:
http://docs.python.org/lib/module-hashlib.html
Both md5 and sha have deprecation warnings:
http://docs.python.org/lib/module-md5.html
http://docs.python.org/lib/module-sha.html
Using hashlib to generate SHA1 when it's available is something I
could get behind. Deprecating SHA1 hashes, not so much -- *every* hash
algorithm, inevitably, will have collisions (think about it -- you
have a fixed number of digits in the final hash and, hence, a fixed
number of distinct possible permutations of digits. That number of
permutations will always be smaller than the number of possible inputs
to the algorithm, so there will always be at least some sets of inputs
which all yield the same hash).
And collisions, by themselves, don't make an algorithm useless for
what we want out of it, which is a roughly unique representation of a
password that isn't the password itself. Generating a collision
wouldn't mean you could log in as someone else, it'd mean you could
have two users with the same password hash, and that -- since auth
lookups start with the username and not the password hash, and require
a match on *both* columns -- doesn't cause a problem either. Just as
two users in a non-hashing system could both have the password
"secret123" without interfering with one another, two users in a
hashing system can have the same password hash without interfering
with one another.
--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."
Totally agree... the news about SHA-1 made me want to look up Python's
cryptographic functions as a curiosity. SHA-1 isn't going away any
time soon. A new hashing method won't even be picked by NIST until
2011.
It's still early, but I think adding support for hashlib is good,
especially since Python appears to be deprecating anything else.
As an extra option, could we also add support for sha-256 in
contrib.auth if folks would prefer that?
-Rob
Yes. That's why they're called hashes. What's bad now is if you can
generate collisions faster than by brute force, which is exactly what
was happening. This is very different. Basically your hash doesn't
mean anything anymore if Joe Random Cracker can present you any data
he wants and your hash algorithm still says "Yes, correct".
> And collisions, by themselves, don't make an algorithm useless for
> what we want out of it, which is a roughly unique representation of a
> password that isn't the password itself. Generating a collision
> wouldn't mean you could log in as someone else, it'd mean you could
> have two users with the same password hash, and that -- since auth
> lookups start with the username and not the password hash, and require
> a match on *both* columns -- doesn't cause a problem either. Just as
> two users in a non-hashing system could both have the password
> "secret123" without interfering with one another, two users in a
> hashing system can have the same password hash without interfering
> with one another.
Which would be right, if you couldn't use a broken hash algorithm to
login without the right password, but something that just generates
the same hash - in other words, knowing the hash (poking at the db,
SQL injection, anything) you don't need the password. It's like
storing a clear text password, and you wouldn't argue that's a good
idea, no?
Alas, the current situation with SHA-1 isn't that bad, there are still
enough bits left, but any algorithm with one successful attack has
historically been taken apart. Could happen again. Right now, there is
now real alternative, the larger SHAs are probably vulnerable to the
same attack vector and WHIRLPOOL's still young (but looks good so
far).
Regards,
Thomas (nitpicker)
Well, the important thing here is that in order to take over a user's
account by generating a hash collision, an attacker has to know *in
advance* the hash to generate the collision for. And if your attacker
has enough access to get that information out of your database, I
don't really see how choosing a different hash algorithm could help
you out -- if the attacker can retrieve password hashes, it's likely
she no longer needs to generate collisions in order to impersonate
people (and, since the DB entries contain the salt used to generate
the hash, a standard dictionary attack is likely to be a much more
efficient use of the attacker's resources if she does need to do
that).
Some general confusion about what's going on in contrib.auth.models...
There's 2 check_password methods in there. 1 in the global namespace
and 1 in the User class. User.check_password is there mainly to check
for an md5 password (by absence of a '$') and if it is an md5
password, it converts it to the sha1 password and passes handling to
the global check_password.
But set_password will only set a sha1 password. So why would the
global check_password need to check if the algo is 'md5' if
set_password could never use md5?
Could Django remove the BC check prior to 1.0 to clean this up? I
guess those applications that are in active use with real users this
would be bad since the only way to migrate this to sha1 would be to
know the actual password.
Maybe I answered my own question. :)
Yes, file a bug so the idea is not forgotten. Patches are always
welcome.
> Some general confusion about what's going on in contrib.auth.models...
>
> There's 2 check_password methods in there. 1 in the global namespace
> and 1 in the User class. User.check_password is there mainly to check
> for an md5 password (by absence of a '$') and if it is an md5
> password, it converts it to the sha1 password and passes handling to
> the global check_password.
>
> But set_password will only set a sha1 password. So why would the
> global check_password need to check if the algo is 'md5' if
> set_password could never use md5?
Because Django used to use md5 hashes.
> Could Django remove the BC check prior to 1.0 to clean this up? I
> guess those applications that are in active use with real users this
> would be bad since the only way to migrate this to sha1 would be to
> know the actual password.
Or a collision :)
Be careful to ensure backwards compatibility. Otherwise an
inconsequential Python upgrade (to 2.5) will mean all your previously
recorded passwords are now unusable. You need to at least be able to
check for SHA1-style hashes and use those if necessary no matter which
version of Python you are using.
Malcolm
Good point. I did a quick test and the SHA-1 hashes are equivalent...
Python 2.4.3 (#1, Nov 3 2006, 21:03:52)
[GCC 4.0.1 (Apple Computer, Inc. build 5247)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import random
>>> rand = str(random.random())
>>> rand
'0.55628289848'
>>> import sha
>>> salt = sha.new(rand).hexdigest()[:5]
>>> raw_pass = 'turing'
>>> hsh = sha.new(salt+raw_pass).hexdigest()
>>> '%s$%s$%s' % ('sha1', salt, hsh)
'sha1$cb374$bd6289a5f976888b532141483391c108656edfb5'
Python 2.5 (r25:51908, Nov 3 2006, 20:49:30)
[GCC 4.0.1 (Apple Computer, Inc. build 5247)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> rand = '0.55628289848'
>>> import hashlib
>>> salt = hashlib.sha1(rand).hexdigest()[:5]
>>> raw_pass = 'turing'
>>> hsh = hashlib.sha1(salt+raw_pass).hexdigest()
>>> '%s$%s$%s' % ('sha1', salt, hsh)
'sha1$cb374$bd6289a5f976888b532141483391c108656edfb5'
I also tried with Python 2.3
rhymes@groove ~ % python2.3
[10:08]
Python 2.3.5 (#1, Jan 13 2006, 20:13:11)
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> rand = '0.55628289848'
>>> import sha
>>> salt = sha.new(rand).hexdigest()[:5]
>>> raw_pass = 'turing'
>>> hsh = sha.new(salt+raw_pass).hexdigest()
>>> '%s$%s$%s' % ('sha1', salt, hsh)
'sha1$cb374$bd6289a5f976888b532141483391c108656edfb5'
--
Lawrence, oluyede.org - neropercaso.it
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair
Django used to use MD5 hashes; that function is in there so that an
old installation which was using MD5 can be upgraded and get switched
over to SHA1 without needing to manually go through and reset
passwords.