Thanks for the notes - this is good stuff!
On Tue, Jul 20, 2010 at 8:41 AM, Craig Younkins <cyou...@gmail.com> wrote:
> I'm very glad you don't have MD5 as the default. SHA-1 (currently employed)
> is acceptable for now, but at this point there are theoretical attacks that
> can find collisions in time that is "within the realm of computational
> possibility". It is recommended that SHA-2 be used for new applications.
> See http://www.pythonsecurity.org/wiki/hashing/
Actually, if we're being picky, it'd probably be best to use a
high-cost hashing algorithm like bcrypt or scrypt. SHA (all flavors)
is designed to be fairly fast, so by using MD5 or SHA we're
essentially helping brute-force attacks take less time.
http://code.djangoproject.com/ticket/5600 and
http://code.djangoproject.com/ticket/5787 tracked this request; we
eventually determined the trade-off in supporting multiple versions
wasn't worth the extra feature. I might be convinced to change my mind
now, though, but only if there's a good answer to the
backwards-compatibility issues.
> The hashing scheme uses random.random(). The random module uses the
> deterministic Mersenne Twister algorithm to generate random numbers. This is
> fine for most purposes, but it is not suitable for cryptographic purposes.
> It is much better to create a random.SystemRandom instance to get random
> data from the OS that is suitable for cryptography.
The problem with SystemRandom is buried in the docs: it's "not
available on all systems."
(http://docs.python.org/library/random.html#random.SystemRandom). I'm
open to a solution here, but we'd need to be very careful to determine
if SystemRandom is available.
> The most concerning thing in the hashing algorithm is that a salt of only 5
> hexadecimal characters is used. This is just over a million possible salts
> (20 bits). We'd really like to see something closer to our recommendation of
> 64 bits.
I'm not sure why we're only using 5 characters. Anyone remember?
Could you open a ticket to track this issue?
> Is there a measure to prevent users from having dollar signs in their
> passwords? This would mess up the concatenated string that is stored in the
> database.
Unless I'm really dense, I think it doesn't matter -- we hash
passwords before they get stored, so we can't "mess up" the stored
string: it's always just [0-9A-F]. Right?
> You might consider hashing with multiple rounds. By applying the hash
> function many times, you essentially lengthen the hashing/password
> verification stage. Since users spend very little time in this stage, it
> will have minimal impact in them. Crackers spend nearly 100% of their time
> doing this, so it significantly slows them down.
> See http://www.pythonsecurity.org/wiki/hashing/#multiple-rounds
Yup -- or, as said above, use s/bcrypt. I *would* like to revisit slow
hashing algorithms -- maybe if we can't make s/bcrypt be a good option
we could switch to multiple rounds.
Jacob
On Tue, Jul 20, 2010 at 8:41 AM, Craig Younkins <cyou...@gmail.com> wrote:Actually, if we're being picky, it'd probably be best to use a
> I'm very glad you don't have MD5 as the default. SHA-1 (currently employed)
> is acceptable for now, but at this point there are theoretical attacks that
> can find collisions in time that is "within the realm of computational
> possibility". It is recommended that SHA-2 be used for new applications.
> See http://www.pythonsecurity.org/wiki/hashing/
high-cost hashing algorithm like bcrypt or scrypt. SHA (all flavors)
is designed to be fairly fast, so by using MD5 or SHA we're
essentially helping brute-force attacks take less time.
http://code.djangoproject.com/ticket/5600 and
http://code.djangoproject.com/ticket/5787 tracked this request; we
eventually determined the trade-off in supporting multiple versions
wasn't worth the extra feature. I might be convinced to change my mind
now, though, but only if there's a good answer to the
backwards-compatibility issues.
> The hashing scheme uses random.random(). The random module uses theThe problem with SystemRandom is buried in the docs: it's "not
> deterministic Mersenne Twister algorithm to generate random numbers. This is
> fine for most purposes, but it is not suitable for cryptographic purposes.
> It is much better to create a random.SystemRandom instance to get random
> data from the OS that is suitable for cryptography.
available on all systems."
(http://docs.python.org/library/random.html#random.SystemRandom). I'm
open to a solution here, but we'd need to be very careful to determine
if SystemRandom is available.
> The most concerning thing in the hashing algorithm is that a salt of only 5I'm not sure why we're only using 5 characters. Anyone remember?
> hexadecimal characters is used. This is just over a million possible salts
> (20 bits). We'd really like to see something closer to our recommendation of
> 64 bits.
Could you open a ticket to track this issue?
> Is there a measure to prevent users from having dollar signs in theirUnless I'm really dense, I think it doesn't matter -- we hash
> passwords? This would mess up the concatenated string that is stored in the
> database.
passwords before they get stored, so we can't "mess up" the stored
string: it's always just [0-9A-F]. Right?
Maybe. The issue in my mind with bcrypt and scrypt is that they are not validated by NIST or NSA, unlike SHA-2. Blowfish was examined by NIST for the AES competition but to my knowledge the use of hashing has not been. SHA-2 was developed by NSA and is recommended by NIST (http://csrc.nist.gov/groups/ST/toolkit/secure_hashing.html).That being said, I'm asking the opinion of a few other folks at OWASP and trying to get a consensus of 1 sentence to summarize how passwords should be stored. In my mind, this sentence should be "Use a SHA-2 algorithm with a 64-bit random salt and 1000 iterations," but this statement is my own and does not necessarily reflect the views of OWASP. I'll post here with developments.