Security - contrib.auth hashing

30 views

Skip to first unread message

Craig Younkins

unread,

Jul 20, 2010, 11:41:58 AM7/20/10

to django-d...@googlegroups.com

Please note this email does not include or indicate a specific, immediately viable flaw.

I'm doing a brief analysis of the contrib.auth system: http://www.pythonsecurity.org/wiki/django/#authentication . I have a couple of notes that I'd like to share with you.

I'm very glad you don't have MD5 as the default. SHA-1 (currently employed) is acceptable for now, but at this point there are theoretical attacks that can find collisions in time that is "within the realm of computational possibility". It is recommended that SHA-2 be used for new applications. See http://www.pythonsecurity.org/wiki/hashing/
The hashing scheme uses random.random(). The random module uses the deterministic Mersenne Twister algorithm to generate random numbers. This is fine for most purposes, but it is not suitable for cryptographic purposes. It is much better to create a random.SystemRandom instance to get random data from the OS that is suitable for cryptography.
The most concerning thing in the hashing algorithm is that a salt of only 5 hexadecimal characters is used. This is just over a million possible salts (20 bits). We'd really like to see something closer to our recommendation of 64 bits.

Other tidbits:

Is there a measure to prevent users from having dollar signs in their passwords? This would mess up the concatenated string that is stored in the database.
You might consider hashing with multiple rounds. By applying the hash function many times, you essentially lengthen the hashing/password verification stage. Since users spend very little time in this stage, it will have minimal impact in them. Crackers spend nearly 100% of their time doing this, so it significantly slows them down. See http://www.pythonsecurity.org/wiki/hashing/#multiple-rounds

Craig Younkins

Jacob Kaplan-Moss

unread,

Jul 20, 2010, 12:09:38 PM7/20/10

to django-d...@googlegroups.com

Hey Craig --

Thanks for the notes - this is good stuff!

On Tue, Jul 20, 2010 at 8:41 AM, Craig Younkins <cyou...@gmail.com> wrote:
> I'm very glad you don't have MD5 as the default. SHA-1 (currently employed)
> is acceptable for now, but at this point there are theoretical attacks that
> can find collisions in time that is "within the realm of computational
> possibility". It is recommended that SHA-2 be used for new applications.
> See http://www.pythonsecurity.org/wiki/hashing/

Actually, if we're being picky, it'd probably be best to use a
high-cost hashing algorithm like bcrypt or scrypt. SHA (all flavors)
is designed to be fairly fast, so by using MD5 or SHA we're
essentially helping brute-force attacks take less time.

http://code.djangoproject.com/ticket/5600 and
http://code.djangoproject.com/ticket/5787 tracked this request; we
eventually determined the trade-off in supporting multiple versions
wasn't worth the extra feature. I might be convinced to change my mind
now, though, but only if there's a good answer to the
backwards-compatibility issues.

> The hashing scheme uses random.random(). The random module uses the
> deterministic Mersenne Twister algorithm to generate random numbers. This is
> fine for most purposes, but it is not suitable for cryptographic purposes.
> It is much better to create a random.SystemRandom instance to get random
> data from the OS that is suitable for cryptography.

The problem with SystemRandom is buried in the docs: it's "not
available on all systems."
(http://docs.python.org/library/random.html#random.SystemRandom). I'm
open to a solution here, but we'd need to be very careful to determine
if SystemRandom is available.

> The most concerning thing in the hashing algorithm is that a salt of only 5
> hexadecimal characters is used. This is just over a million possible salts
> (20 bits). We'd really like to see something closer to our recommendation of
> 64 bits.

I'm not sure why we're only using 5 characters. Anyone remember?

Could you open a ticket to track this issue?

> Is there a measure to prevent users from having dollar signs in their
> passwords? This would mess up the concatenated string that is stored in the
> database.

Unless I'm really dense, I think it doesn't matter -- we hash
passwords before they get stored, so we can't "mess up" the stored
string: it's always just [0-9A-F]. Right?

> You might consider hashing with multiple rounds. By applying the hash
> function many times, you essentially lengthen the hashing/password
> verification stage. Since users spend very little time in this stage, it
> will have minimal impact in them. Crackers spend nearly 100% of their time
> doing this, so it significantly slows them down.
> See http://www.pythonsecurity.org/wiki/hashing/#multiple-rounds

Yup -- or, as said above, use s/bcrypt. I *would* like to revisit slow
hashing algorithms -- maybe if we can't make s/bcrypt be a good option
we could switch to multiple rounds.

Jacob

Craig Younkins

unread,

Jul 20, 2010, 2:23:52 PM7/20/10

to django-d...@googlegroups.com

On Tue, Jul 20, 2010 at 12:09 PM, Jacob Kaplan-Moss <ja...@jacobian.org> wrote:

On Tue, Jul 20, 2010 at 8:41 AM, Craig Younkins <cyou...@gmail.com> wrote:
> I'm very glad you don't have MD5 as the default. SHA-1 (currently employed)
> is acceptable for now, but at this point there are theoretical attacks that
> can find collisions in time that is "within the realm of computational
> possibility". It is recommended that SHA-2 be used for new applications.
> See http://www.pythonsecurity.org/wiki/hashing/

Actually, if we're being picky, it'd probably be best to use a
high-cost hashing algorithm like bcrypt or scrypt. SHA (all flavors)
is designed to be fairly fast, so by using MD5 or SHA we're
essentially helping brute-force attacks take less time.

http://code.djangoproject.com/ticket/5600 and
http://code.djangoproject.com/ticket/5787 tracked this request; we
eventually determined the trade-off in supporting multiple versions
wasn't worth the extra feature. I might be convinced to change my mind
now, though, but only if there's a good answer to the
backwards-compatibility issues.

Maybe. The issue in my mind with bcrypt and scrypt is that they are not validated by NIST or NSA, unlike SHA-2. Blowfish was examined by NIST for the AES competition but to my knowledge the use of hashing has not been. SHA-2 was developed by NSA and is recommended by NIST (http://csrc.nist.gov/groups/ST/toolkit/secure_hashing.html).

That being said, I'm asking the opinion of a few other folks at OWASP and trying to get a consensus of 1 sentence to summarize how passwords should be stored. In my mind, this sentence should be "Use a SHA-2 algorithm with a 64-bit random salt and 1000 iterations," but this statement is my own and does not necessarily reflect the views of OWASP. I'll post here with developments.

> The hashing scheme uses random.random(). The random module uses the
> deterministic Mersenne Twister algorithm to generate random numbers. This is
> fine for most purposes, but it is not suitable for cryptographic purposes.
> It is much better to create a random.SystemRandom instance to get random
> data from the OS that is suitable for cryptography.

The problem with SystemRandom is buried in the docs: it's "not
available on all systems."
(http://docs.python.org/library/random.html#random.SystemRandom). I'm
open to a solution here, but we'd need to be very careful to determine
if SystemRandom is available.

SystemRandom should be available on Linux, Solaris, Mac OS X, NetBSD, OpenBSD, Tru64 UNIX 5.1B, AIX 5.2, and HP-UX 11i v2, and at least Windows 2000 on up. It's unclear to me if CryptGenRandom was in the API for 95 or 98.

In any case, this is to generate the salt. There is no reason I can think of why we can't default to SystemRandom and fall back to regular random module methods if it raises NotImplementedError.

> The most concerning thing in the hashing algorithm is that a salt of only 5
> hexadecimal characters is used. This is just over a million possible salts
> (20 bits). We'd really like to see something closer to our recommendation of
> 64 bits.

I'm not sure why we're only using 5 characters. Anyone remember?

Could you open a ticket to track this issue?

http://code.djangoproject.com/ticket/13969

> Is there a measure to prevent users from having dollar signs in their
> passwords? This would mess up the concatenated string that is stored in the
> database.

Unless I'm really dense, I think it doesn't matter -- we hash
passwords before they get stored, so we can't "mess up" the stored
string: it's always just [0-9A-F]. Right?

You're right! I'm the one that's dense and looking too quickly. :-)

Bret W

unread,

Dec 14, 2010, 3:11:20 PM12/14/10

to django-d...@googlegroups.com

On Tuesday, July 20, 2010 2:23:52 PM UTC-4, Craig Younkins wrote:

Maybe. The issue in my mind with bcrypt and scrypt is that they are not validated by NIST or NSA, unlike SHA-2. Blowfish was examined by NIST for the AES competition but to my knowledge the use of hashing has not been. SHA-2 was developed by NSA and is recommended by NIST (http://csrc.nist.gov/groups/ST/toolkit/secure_hashing.html).

That being said, I'm asking the opinion of a few other folks at OWASP and trying to get a consensus of 1 sentence to summarize how passwords should be stored. In my mind, this sentence should be "Use a SHA-2 algorithm with a 64-bit random salt and 1000 iterations," but this statement is my own and does not necessarily reflect the views of OWASP. I'll post here with developments.

I wanted to follow up on this discussion to see if any further thought had been given to using bcrypt.

With the recent Gawker hacking incident, there has been another round of discussion happening regarding best practices for securely storing credentials, and from the discussions I've seen at Hacker News, those in the know are still recommending bcrypt.

I am not a security researcher or a cryptographer, so I don't have much to add to this conversation, other than that I want to be sure that Django's following the best practices put forth by security professionals.

Backward compatibility is definitely an issue to be addressed, and it's not in the scope of this message to do so, but I would like to say that some changes worth ugly fixes. It's obvious that it's not possible to rehash passwords that application developers don't have, so it seems likely that there's going to need to be a hack to support an old and a new hashing scheme for a couple of versions of Django. I believe most developers would be accepting of a little interim cruft if it meant a more secure product in the long term.

While we're on the subject of security, have the security-related pieces of Django ever undergone a security audit? I remember Simon W. asking for a code review of his signed-cookie implementation (https://groups.google.com/forum/#topic/django-developers/KX6LIgBvfzo), and I now see that Jacob didn't feel that a security audit was worthwhile, given what the DSF can afford and the implications for peer review. If contrib.auth hasn't been reviewed by a security expert, I'd like to suggest that someone investigate the possibility of having it reviewed. Security, and specifically cryptography, is one area of computing that requires tons of expertise. Even with many eyeballs, it's hard to be certain that a salient detail wasn't overlooked. That being said, I'm not close to the framework development process, and I don't know what's been done in the past or who's been consulted.

Bret

Reply all

Reply to author

Forward

0 new messages