Is this secure?

mk

unread,

Feb 23, 2010, 9:36:02 AM2/23/10

to pytho...@python.org

Hello,

I need to generate passwords and I think that pseudo-random generator is
not good enough, frankly. So I wrote this function:

import struct

def gen_rand_string():
fileobj = open('/dev/urandom','rb')
rstr = fileobj.read(4)
rnum = struct.unpack('L',rstr)[0]
rstr = '%i' % rnum
rnuml = []
while len(rstr) >= 2:
c = rstr[:2]
try:
num = int(c)
rnuml.append(num)
except ValueError:
pass
rstr = rstr[2:]
rnuml = map(lambda x: 97+x/4, rnuml)
rnumc = map(chr, rnuml)
return ''.join(rnumc)

if __name__ == "__main__":
print gen_rand_string()

(yes I know that this way generated string will not contain 'z' because
99/4 + 97 = 121 which is 'y')

The question is: is this secure? That is, can the string generated this
way be considered truly random? (I abstract from not-quite-perfect
nature of /dev/urandom at the moment; I can always switch to /dev/random
which is better)

Regards,
mk

Paul Rubin

unread,

Feb 23, 2010, 2:19:59 PM2/23/10

to

mk <mrk...@gmail.com> writes:
> I need to generate passwords and I think that pseudo-random generator

> is not good enough, frankly. So I wrote this function:...

> The question is: is this secure? That is, can the string generated
> this way be considered truly random? (I abstract from
> not-quite-perfect nature of /dev/urandom at the moment; I can always
> switch to /dev/random which is better)

urandom is fine and the entropy loss from the numeric conversions and
eliminating 'z' in that code before you get letters out is not too bad.
The code is pretty ugly. The main problem is you end up with a password
that's usually 5 letters but sometimes just 4 or fewer. Passwords that
short are vulnerable to dictionary attacks. Longer passwords made from
random letters are difficult to remember.

I find it's most practical to use a few random words (chosen from a word
list like /usr/dict/words) rather than random letters. Words are easier
to remember and type.

You might look at the site www.diceware.com for an approach to this,
which you can implement with a program. The docs there are pretty
thoughtful and may help you understand the relevant issues.

mk

unread,

Feb 23, 2010, 2:59:23 PM2/23/10

to

On Feb 23, 7:19 pm, Paul Rubin <no.em...@nospam.invalid> wrote:

> The code is pretty ugly. The main problem is you end up with a password
> that's usually 5 letters but sometimes just 4 or fewer.

Well I didn't write the whole thing here, in actual use I'd write a
loop repeating the function until I have enough characters and then
I'd select a substring of specified length.

Anything else in the code that is ugly and I should correct?

> You might look at the sitewww.diceware.comfor an approach to this,

> which you can implement with a program. The docs there are pretty
> thoughtful and may help you understand the relevant issues.

Thanks. But I would also be grateful for indicating what is wrong/ugly
in my code.

Regards,
mk

Robert Kern

unread,

Feb 23, 2010, 3:04:02 PM2/23/10

to pytho...@python.org

On 2010-02-23 13:19 PM, Paul Rubin wrote:

> I find it's most practical to use a few random words (chosen from a word
> list like /usr/dict/words) rather than random letters. Words are easier
> to remember and type.
>
> You might look at the site www.diceware.com for an approach to this,
> which you can implement with a program. The docs there are pretty
> thoughtful and may help you understand the relevant issues.

I like RFC 1751 for this:

http://gitweb.pycrypto.org/?p=crypto/pycrypto-2.x.git;a=blob;f=lib/Crypto/Util/RFC1751.py;h=1c98a212c22066adabfee521b495eeb4f9d7232b;hb=HEAD

Shortened URL:

http://tr.im/Pv9B

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Robert Kern

unread,

Feb 23, 2010, 3:49:25 PM2/23/10

to pytho...@python.org

On 2010-02-23 13:59 PM, mk wrote:
> On Feb 23, 7:19 pm, Paul Rubin<no.em...@nospam.invalid> wrote:
>
>> The code is pretty ugly. The main problem is you end up with a password
>> that's usually 5 letters but sometimes just 4 or fewer.
>
> Well I didn't write the whole thing here, in actual use I'd write a
> loop repeating the function until I have enough characters and then
> I'd select a substring of specified length.
>
> Anything else in the code that is ugly and I should correct?

I would recommend using random.SystemRandom.choice() on a sequence of acceptable
characters. E.g. (untested)

import random
import string

characters = string.letters + string.digits + '~!@#$%^&*()-+=,;./\?><|'
# ... or whatever.

def gen_rand_string(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice(characters))
return ''.join(chars)

Lawrence D'Oliveiro

unread,

Feb 23, 2010, 6:18:19 PM2/23/10

to

In message <mailman.110.1266935...@python.org>, mk wrote:

> I need to generate passwords and I think that pseudo-random generator is
> not good enough, frankly. So I wrote this function:

Much simpler:

import subprocess

data, _ = subprocess.Popen \
(
args = ("pwgen", "-nc"),
stdout = subprocess.PIPE
).communicate()
print data

Steven D'Aprano

unread,

Feb 23, 2010, 9:07:42 PM2/23/10

to

On Tue, 23 Feb 2010 15:36:02 +0100, mk wrote:

> Hello,
>
> I need to generate passwords and I think that pseudo-random generator is
> not good enough, frankly. So I wrote this function:

[snip]

> (yes I know that this way generated string will not contain 'z' because
> 99/4 + 97 = 121 which is 'y')

You're worried about the security of the PRNG but then generate a TWO to
FIVE character lowercase password with no digits, punctuation or the
letter 'z'? That's priceless!

Python's PRNG is not suitable for producing cryptographically strong
streams of random bytes, but it is perfectly strong enough for generating
good passwords.

> The question is: is this secure?

No.

You are wasting your time trying to fix something which isn't a problem,
and introducing a much bigger problem instead. You are MUCH MUCH MUCH
better off with a six or ten character password taken from upper and
lowercase letters, plus digits, plus punctuation, than a four digit
password taken from lowercase letters only. Even if the first case has
some subtle statistical deviation from uniformity, and the second is
"truly random" (whatever that means), it doesn't matter.

Nobody is going to crack your password because the password generator is
0.01% more likely to generate a "G" than a "q". But they *will* brute-
force your password if you have a four digit password taken from a-y only.

> That is, can the string generated this
> way be considered truly random?

Define truly random.

--
Steven

Steven D'Aprano

unread,

Feb 23, 2010, 9:19:31 PM2/23/10

to

On Tue, 23 Feb 2010 11:19:59 -0800, Paul Rubin wrote:

> mk <mrk...@gmail.com> writes:
>> I need to generate passwords and I think that pseudo-random generator
>> is not good enough, frankly. So I wrote this function:... The question
>> is: is this secure? That is, can the string generated this way be
>> considered truly random? (I abstract from not-quite-perfect nature of
>> /dev/urandom at the moment; I can always switch to /dev/random which is
>> better)
>
> urandom is fine and the entropy loss from the numeric conversions and
> eliminating 'z' in that code before you get letters out is not too bad.

What?

You're going from a possible alphabet of 62 (excluding punctuation) or 94
(inc punctuation available on an American keyboard) distinct letters down
to 25, and you say that's "not too bad"?

Paul, if you were anyone else, I'd be sneering uncontrollably about now,
but you're not clueless about cryptography, so what have I missed? Why is
reducing the number of distinct letters by more than 50% anything but a
disaster? This makes the task of brute-forcing the password exponentially
easier.

Add the fact that the passwords are so short (as little as two characters
in my tests) and this is about as far from secure as it is possible to be.

--
Steven

Steven D'Aprano

unread,

Feb 23, 2010, 9:40:13 PM2/23/10

to

On Tue, 23 Feb 2010 15:36:02 +0100, mk wrote:

> The question is: is this secure? That is, can the string generated this
> way be considered truly random?

Putting aside the philosophical question of what "truly random" means, I
presume you mean that the letters are uniformly distributed. The answer
to that is, they don't like uniformly distributed.

This isn't a sophisticated statistical test, it's the equivalent of a
back-of-the-envelope calculation: I generated 100,000 random strings with
your code, and counted how often each letter appears:

If the letters are uniformly distributed, you would expect all the
numbers to be quite close, but instead they range from 15063 to 25679:

{'a': 15063, 'c': 20105, 'b': 15100, 'e': 25465, 'd': 25458, 'g': 25597,
'f': 25589, 'i': 25045, 'h': 25679, 'k': 22945, 'j': 25531, 'm': 16187,
'l': 16252, 'o': 16076, 'n': 16012, 'q': 16069, 'p': 16119, 's': 16088,
'r': 16087, 'u': 15951, 't': 16081, 'w': 16236, 'v': 15893, 'y': 15834,
'x': 15956}

Eye-balling it, it looks vaguely two-humped, one hump around 15-16K, the
second around 22-25K. Sure enough, here's a quick-and-dirty graph:

a | ***********************************
b | ***********************************
c | ***********************************************
d | ***********************************************************
e | ***********************************************************
f | ************************************************************
g | ************************************************************
h | ************************************************************
i | ***********************************************************
j | ************************************************************
k | ******************************************************
l | **************************************
m | **************************************
n | *************************************
o | **************************************
p | **************************************
q | **************************************
r | **************************************
s | **************************************
t | **************************************
u | *************************************
v | *************************************
w | **************************************
x | *************************************
y | *************************************

The mean of the counts is 19056.72, and the mean deviation is 3992.28.
While none of this is statistically sophisticated, it does indicate to me
that your function is nowhere even close to uniform. It has a very strong
bias.

--
Steven

Steven D'Aprano

unread,

Feb 23, 2010, 9:43:05 PM2/23/10

to

On Wed, 24 Feb 2010 02:40:13 +0000, Steven D'Aprano wrote:

> On Tue, 23 Feb 2010 15:36:02 +0100, mk wrote:
>
>> The question is: is this secure? That is, can the string generated this
>> way be considered truly random?
>
> Putting aside the philosophical question of what "truly random" means, I
> presume you mean that the letters are uniformly distributed. The answer
> to that is, they don't like uniformly distributed.

Er, they don't *look* uniformly distributed.

(Of course, being random, perhaps they are and I just got unlucky.)

--
Steven

Paul Rubin

unread,

Feb 23, 2010, 9:39:53 PM2/23/10

to

Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> writes:
> Paul, if you were anyone else, I'd be sneering uncontrollably about now,
> but you're not clueless about cryptography, so what have I missed? Why is
> reducing the number of distinct letters by more than 50% anything but a
> disaster? This makes the task of brute-forcing the password exponentially
> easier.

Reducing the number of distinct letters by 50% decreases the entropy per
character by 1 bit. That stuff about mixing letters and digits and
funny symbols just makes the password a worse nuisance to remember and
type, for a small gain in entropy that you can compute and make up for.
The main thing you have to make sure is that the min-entropy is
sufficient for your purposes, and it's generally more convenient to do
that by making the password a little bit longer than by imposing
contortions on the person typing it. Ross Anderson's "Security
Engineering" chapter about passwords is worth reading too:

http://www.cl.cam.ac.uk/~rja14/Papers/SE-03.pdf

When I mentioned entropy loss to the OP though, I mostly meant loss from
getting rid of the letter z. The (binary) Shannon entropy of the
uniform probability distribution on 26 letters is 4.7004397 bits; on 25
letters, it's 4.6438561 bits. The difference isn't enough to give an
attacker that much advantage.

I like the diceware approach to passphrase generation and I've been
using it for years. www.diceware.com explains it in detail and the docs
there are quite well-thought-out and informative. Keep in mind that the
entropy needed for an online password (attacker has to make a server
query for every guess, and hopefully gets locked out after n wrong
tries) and an offline one (attacker has something like a hash of the
password and can run a completely offline search) are different.

Here is a program that I use sometimes:

from math import log
dictfile = '/usr/share/dict/words'

def genrandom(nbytes):
with open('/dev/urandom') as f:
return int(f.read(nbytes).encode('hex'), 16)

def main():
wordlist = list(x.strip() for x in open(dictfile) if len(x) < 7)
nwords = len(wordlist)
print "%d words, entropy=%.3f bits/word"% (
nwords, log(nwords, 2))
print '-'.join(wordlist[genrandom(10)%nwords] for i in xrange(6))

main()

Paul Rubin

unread,

Feb 23, 2010, 9:41:59 PM2/23/10

to

Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> writes:

> Putting aside the philosophical question of what "truly random" means, I
> presume you mean that the letters are uniformly distributed. The answer
> to that is, they don't like uniformly distributed.

That is a good point, the way those letters are generated (through the
decimal conversion stuff), they won't be all that uniform.

Paul Rubin

unread,

Feb 23, 2010, 9:50:52 PM2/23/10

to

mk <mrk...@gmail.com> writes:
>> You might look at the sitewww.diceware.comfor an approach to this,
>> which you can implement with a program. The docs there are pretty
>> thoughtful and may help you understand the relevant issues.
>
> Thanks. But I would also be grateful for indicating what is wrong/ugly
> in my code.

The stuff about converting 4 random bytes to a decimal string and then
peeling off 2 digits at a time is pretty awful, and notice that since
2**32 is 4294967296, in the cases where you get 10 digits, the first
2-digit pair is never higher than 42. There are also some effects on
the lower digits. The total entropy loss probably isn't fatal but as
described, it's ugly.

I'd write your code something like this:

nletters = 5

def randomword(n):

with open('/dev/urandom') as f:

return ''.join([chr(ord('a')+ord(c)%26) for c in f.read(n)])

print randomword(nletters)

I wouldn't rely on a 5 letter combination for a high security
application, but it might be ok for some low security purposes. Two
random 5-letter combinations separated by a hyphen will be much better,
and is probably easier to type than a solid block of 10 letters.

Robert Kern

unread,

Feb 23, 2010, 10:09:36 PM2/23/10

to pytho...@python.org

You'd have to be very, *very* unlucky to get a sample of that size so far from
uniformly distributed if the generating process actually were uniform.

Of course, uniformity isn't really necessary. You just need enough entropy in
the distribution (amongst other things like protection of the seed from being
known or guessed). A skewed distribution of characters is perfectly fine
provided that you had enough characters in the password to meet the desired
entropy requirement. A skewed distribution does require more characters to meet
a specified entropy requirement than a uniform distribution, of course.

That said, for a naive strategy like "pick an independent random character,
repeat", you should just use a uniform distribution. It makes the analysis
easier. Worthwhile generators that give skewed distributions usually do so for a
good reason, like generating pronounceable passwords.

Lie Ryan

unread,

Feb 23, 2010, 10:41:26 PM2/23/10

to

If an attacker knows the that the random number generator have an
extreme skew and he knows the distribution of the letters, how much
advantage would it give the attacker? My initial guess is that the more
skewed the letters are, the better the advantage, since an attacker
using brute-force can write his program to prefer the most likely letters?

Paul Rubin

unread,

Feb 23, 2010, 10:51:53 PM2/23/10

to

Lie Ryan <lie....@gmail.com> writes:
> If an attacker knows the that the random number generator have an
> extreme skew and he knows the distribution of the letters, how much
> advantage would it give the attacker? My initial guess is that the more
> skewed the letters are, the better the advantage, since an attacker
> using brute-force can write his program to prefer the most likely letters?

A useable (conservative) estimate is that the attacker's workload is 1/p
where p is the probability of the most likely password. That basically
says the password strength can be measured by the min-entropy.
Cryptographers often use that approach. If you want to be more precise,
you can do a conditional probability calculation assuming the attacker
works down the list of possible passwords in order of decreasing
probability, stopping when they hit the right one.

More generally still, passwords regardless of their entropy content are
a sucky way to encapsulate cryptographic secrets. We keep using them
because every alternative has drawbacks of its own.

Lawrence D'Oliveiro

unread,

Feb 23, 2010, 11:48:59 PM2/23/10

to

In message <7xwry39...@ruckus.brouhaha.com>, Paul Rubin wrote:

> More generally still, passwords regardless of their entropy content are
> a sucky way to encapsulate cryptographic secrets.

They’re a shared secret. How else would you represent a shared secret, if
not with a shared secret?

Steven D'Aprano

unread,

Feb 24, 2010, 2:11:01 AM2/24/10

to

On Tue, 23 Feb 2010 18:39:53 -0800, Paul Rubin wrote:

> Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> writes:
>> Paul, if you were anyone else, I'd be sneering uncontrollably about
>> now, but you're not clueless about cryptography, so what have I missed?
>> Why is reducing the number of distinct letters by more than 50%
>> anything but a disaster? This makes the task of brute-forcing the
>> password exponentially easier.
>
> Reducing the number of distinct letters by 50% decreases the entropy per
> character by 1 bit.

You say that as if 1 bit of entropy isn't much :)

Given a random six character password taken out of an alphabet of 52
characters, it takes over nine billion attempts to brute force it.
Reducing the alphabet by 50% cuts that down to less than 200 million. To
make up for that loss of 1 bit of entropy, you need two extra characters
in your password.

> That stuff about mixing letters and digits and
> funny symbols just makes the password a worse nuisance to remember and
> type, for a small gain in entropy that you can compute and make up for.

Well, everybody has their own ways of remembering passwords, and I'd much
prefer to remember an eight character password with "funny symbols" that
I chose myself, than a six character password with nothing but letters
that was chosen for me.

Of course, I flatter myself that I know how to choose good passwords, and
I hate remembering long random strings even from a reduced alphabet (e.g.
I hate memorizing eight digit phone numbers, and am completely incapable
of remembering ten digit mobile phone numbers).

--
Steven

Paul Rubin

unread,

Feb 24, 2010, 2:58:30 AM2/24/10

to

Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> writes:
> Given a random six character password taken out of an alphabet of 52
> characters, it takes over nine billion attempts to brute force it.
> Reducing the alphabet by 50% cuts that down to less than 200 million. To
> make up for that loss of 1 bit of entropy, you need two extra characters
> in your password.

One extra character comes pretty close (within 1.3 bits). Even two
extra chars is probably (subjective) easier for a user to deal with than
a completely random mixture of upper/lower case. You don't get the
extra bit per character if that distribution is anything other than
random, of course.

For something like a web password (each guess takes a server hit), where
the resource guarded is not very valuable, 5 chars is probably enough
for most purposes. For something like an encryption key subject to
offline attacks, 6 mixed-case characters will barely slow a real
attacker down.

As before, my suggestion is still diceware. I've used random
alphanumerics in the past but they're too big a hassle, they have to be
written down, etc.

And of course, if you're doing something serious, use a hardware token.

mk

unread,

Feb 24, 2010, 12:23:17 PM2/24/10

to pytho...@python.org

On 2010-02-24 03:50, Paul Rubin wrote:
> The stuff about converting 4 random bytes to a decimal string and then
> peeling off 2 digits at a time is pretty awful, and notice that since
> 2**32 is 4294967296, in the cases where you get 10 digits, the first
> 2-digit pair is never higher than 42.

Yikes! I didn't think about that. This is probably where (some part of)
probability skewing comes from.

Anyway, the passwords for authorized users will be copied and pasted
from email into in the application GUI which will remember it for them,
so they will not have to remember and type them in. So I have little in
the way of limitations of password length - even though in *some* cases
somebody might have to (or be ignorant enough) to retype the password
instead of pasting it in.

In that case the "diceware" approach is not necessary, even though I
will certainly remember this approach for a case when users will have to
remember & type the passwords in.

The main application will access the data using HTTP (probably), so the
main point is that an attacker is not able to guess passwords using
brute force.

Using A-z with 10-char password seems to provide 3 orders of magnitude
more combinations than a-z:

>>> 57 ** 10
362033331456891249L
>>> 25 ** 10
95367431640625L

Even though I'm not sure it is worth it, assuming 1000 brute-force
guesses per second (which over the web would amount pretty much to DOS),
this would take # days:

>>> 57 ** 10 / (1000 * 3600 * 24)
4190200595L
>>> 25 ** 10 / (1000 * 3600 * 24)
1103789L

Even then I'm not getting completely uniform distribution for some reason:

d 39411
l 39376
f 39288
a 39275
s 39225
r 39172
p 39159
t 39073
k 39071
u 39064
e 39005
o 39005
n 38995
j 38993
h 38975
q 38958
c 38938
b 38906
g 38894
i 38847
m 38819
v 38712
z 35321
y 35228
w 35189
x 35075

Code:

import operator

def gen_rand_word(n):

with open('/dev/urandom') as f:

return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)])

def count_chars(chardict, word):
for c in word:
try:
chardict[c] += 1
except KeyError:
chardict[c] = 0

if __name__ == "__main__":
chardict = {}
for i in range(100000):
w = gen_rand_word(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
for char, count in counts:
print char, count

> I'd write your code something like this:
>
> nletters = 5
>
> def randomword(n):
> with open('/dev/urandom') as f:
> return ''.join([chr(ord('a')+ord(c)%26) for c in f.read(n)])
>
> print randomword(nletters)

Aw shucks when will I learn to do the stuff in 3 lines well instead of
20, poorly. :-/

Regards,
mk

Michael Rudolf

unread,

Feb 24, 2010, 12:56:03 PM2/24/10

to

The reason is 256 % 26 != 0
256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
approx. 10) more often than w-z. You might want to skip the values 0-22
to achieve a truly uniform distribution.

FYI: Electronic Cash PINs in europe (dont know about the rest of the
world) were computed the same way (random hexdigit and just mod it when
it's too large) leading to a high probability that your first digit was
a 1 :)

Regards,
Michael

Steve Holden

unread,

Feb 24, 2010, 12:59:53 PM2/24/10

to pytho...@python.org

mk wrote:
> On 2010-02-24 03:50, Paul Rubin wrote:

>> The stuff about converting 4 random bytes to a decimal string and then
>> peeling off 2 digits at a time is pretty awful, and notice that since
>> 2**32 is 4294967296, in the cases where you get 10 digits, the first
>> 2-digit pair is never higher than 42.
>

> with open('/dev/urandom') as f:

> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)])
>

> def count_chars(chardict, word):
> for c in word:
> try:
> chardict[c] += 1
> except KeyError:
> chardict[c] = 0
>
> if __name__ == "__main__":
> chardict = {}
> for i in range(100000):
> w = gen_rand_word(10)
> count_chars(chardict, w)
> counts = list(chardict.items())
> counts.sort(key = operator.itemgetter(1), reverse = True)
> for char, count in counts:
> print char, count
>

>> I'd write your code something like this:
>>
>> nletters = 5
>>
>> def randomword(n):
>> with open('/dev/urandom') as f:
>> return ''.join([chr(ord('a')+ord(c)%26) for c in f.read(n)])
>>
>> print randomword(nletters)
>

> Aw shucks when will I learn to do the stuff in 3 lines well instead of
> 20, poorly. :-/
>

When you've got as much experience as Paul?

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/

mk

unread,

Feb 24, 2010, 1:13:28 PM2/24/10

to pytho...@python.org

On 2010-02-24 18:59, Steve Holden wrote:

>> Aw shucks when will I learn to do the stuff in 3 lines well instead of
>> 20, poorly. :-/
>>
> When you've got as much experience as Paul?

And how much experience does Paul have? (this is mostly not a facile
question)

For my part, my more serious effort (on and off) with programming in
Python is under a year.

Regards,
mk

mk

unread,

Feb 24, 2010, 1:35:11 PM2/24/10

to pytho...@python.org

On 2010-02-24 18:56, Michael Rudolf wrote:

> The reason is 256 % 26 != 0
> 256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
> approx. 10) more often than w-z.

<Barbie voice>writing secure code is hard...

I'm going to switch to PHP: Python world wouldn't lose much, but PHP
would gain a lot.

> You might want to skip the values 0-22
> to achieve a truly uniform distribution.

Hmm perhaps you meant to skip values over 256 - 22 ? Bc I'm getting this
(reduced the run to 1000 generated strings):

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)

if ord(x) > 22])

z 3609
b 3608
s 3567
e 3559
j 3556
r 3555
g 3548
p 3540
m 3538
q 3532
h 3528
y 3526
v 3524
i 3500
x 3496
c 3488
k 3488
l 3487
u 3487
a 3469
o 3465
d 3455
t 3439
f 3437
n 3417
w 3175

While with this:

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)

if ord(x) < 235])

a 3852
w 3630
s 3623
v 3582
y 3569
p 3568
c 3558
k 3558
b 3556
r 3553
x 3546
m 3534
n 3522
o 3515
h 3510
d 3505
u 3487
t 3486
i 3482
f 3477
e 3474
g 3460
q 3453
l 3437
z 3386
j 3382

1. I'm systematically getting 'a' outlier: have no idea why for now.

2. This is somewhat better (except 'a') but still not uniform.

> FYI: Electronic Cash PINs in europe (dont know about the rest of the
> world) were computed the same way (random hexdigit and just mod it when
> it's too large) leading to a high probability that your first digit was
> a 1 :)

Schadenfreude is deriving joy from others' misfortunes; what is the
German word, if any, for deriving solace from others' misfortunes? ;-)

Regards,
mk

Robert Kern

unread,

Feb 24, 2010, 2:01:03 PM2/24/10

to pytho...@python.org

On 2010-02-24 12:35 PM, mk wrote:

> While with this:

>
> def gen_rand_word(n):
> with open('/dev/urandom') as f:

> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
> < 235])
>
> a 3852

...

> 1. I'm systematically getting 'a' outlier: have no idea why for now.
>
> 2. This is somewhat better (except 'a') but still not uniform.

I will repeat my advice to just use random.SystemRandom.choice() instead of
trying to interpret the bytes from /dev/urandom directly.

mk

unread,

Feb 24, 2010, 2:09:59 PM2/24/10

to pytho...@python.org

On 2010-02-24 20:01, Robert Kern wrote:
> I will repeat my advice to just use random.SystemRandom.choice() instead
> of trying to interpret the bytes from /dev/urandom directly.

Oh I hear you -- for production use I would (will) certainly consider
this. However, now I'm interested in the problem itself: why is the damn
distribution not uniform?

Regards,
mk

Robert Kern

unread,

Feb 24, 2010, 2:19:23 PM2/24/10

to pytho...@python.org

You want "< 234", not "< 235". (234 % 26 == 0), so you get some extra 'a's.

mk

unread,

Feb 24, 2010, 2:16:24 PM2/24/10

to pytho...@python.org

On 2010-02-24 20:01, Robert Kern wrote:
> I will repeat my advice to just use random.SystemRandom.choice() instead
> of trying to interpret the bytes from /dev/urandom directly.

Out of curiosity:

def gen_rand_string(length):
prng = random.SystemRandom()
chars = []
for i in range(length):

chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

if __name__ == "__main__":
chardict = {}

for i in range(10000):
## w = gen_rand_word(10)
w = gen_rand_string(10)

count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
for char, count in counts:
print char, count

s 3966
d 3912
g 3909
h 3905
a 3901
u 3900
q 3891
m 3888
k 3884
b 3878
x 3875
v 3867
w 3864
y 3851
l 3825
z 3821
c 3819
e 3819
r 3816
n 3808
o 3797
f 3795
t 3784
p 3765
j 3730
i 3704

Better, although still not perfect.

Regards,
mk

Paul Rubin

unread,

Feb 24, 2010, 2:09:43 PM2/24/10

to

Robert Kern <rober...@gmail.com> writes:
> I will repeat my advice to just use random.SystemRandom.choice()
> instead of trying to interpret the bytes from /dev/urandom directly.

SystemRandom is something pretty new so I wasn't aware of it. But
yeah, if I were thinking more clearly I would have suggested os.urandom
instead of opening /dev/urandom.

Robert Kern

unread,

Feb 24, 2010, 2:36:29 PM2/24/10

to pytho...@python.org

On 2010-02-24 13:16 PM, mk wrote:

> On 2010-02-24 20:01, Robert Kern wrote:

>> I will repeat my advice to just use random.SystemRandom.choice() instead
>> of trying to interpret the bytes from /dev/urandom directly.
>

> Out of curiosity:
>
> def gen_rand_string(length):
> prng = random.SystemRandom()
> chars = []
> for i in range(length):
> chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
> return ''.join(chars)
>

> if __name__ == "__main__":
> chardict = {}

> for i in range(10000):
> ## w = gen_rand_word(10)
> w = gen_rand_string(10)

> count_chars(chardict, w)
> counts = list(chardict.items())
> counts.sort(key = operator.itemgetter(1), reverse = True)
> for char, count in counts:
> print char, count
>
>

This distribution is well within expectations.

Paul Rubin

unread,

Feb 24, 2010, 2:31:51 PM2/24/10

to

mk <mrk...@gmail.com> writes:
> So I have little in the way of limitations of password length ...>

> The main application will access the data using HTTP (probably), so
> the main point is that an attacker is not able to guess passwords
> using brute force.

If it's HTTP instead of HTTPS and you're sending the password in the
clear, then a serious attacker can simply eavesdrop the connection and
pick up the password. Again, if the application is a web forum or
something like that, the security requirements probably aren't terribly
high. If it's (say) a financial application with potentially motivated
attackers, you've got to be a lot more careful than I think you're being
right now, and you should really get security specialists involved.

> Using A-z with 10-char password seems to provide 3 orders of magnitude
> more combinations than a-z:

Yes, 2**10 = 1024 so (57/25)**10 is a little more than that.

> Even then I'm not getting completely uniform distribution for some reason:

Exact equality of the counts would be surprising and a sign that
something was wrong with the generation process. It would be like
flipping a coin 10000 times and getting exactly 5000 heads. The
binomial distribution tells you that the number should be close to 5000,
but that it's unlikely to be -exactly- 5000.

Also, as Michael Rudolf mentioned, getting a letter by taking n%26 where
n is drawn uniformly from [0..255] doesn't give a uniform distribution
because 256 is not a multiple of 26. I had thought about making an
adjustment for that when I posted, but it didn't seem worth cluttering
up the code. Uniformity for its own sake doesn't gain you anything;
what matters is entropy. If you compute the entropy difference between
the slightly nonuniform distribution and a uniform one, it's very small.

To get a more uniform distribution I usually just take a larger n,
rather than conditionalizing the draws. For example, in the
diceware-like code I posted, I read 10 random bytes (giving a uniform
random number on [0..2**80]) from urandom for each word. That is still
not perfectly uniform, but it's closer to the point where the difference
would be very hard to detect.

> Aw shucks when will I learn to do the stuff in 3 lines well instead of
> 20, poorly. :-/

Well, that's partly a matter of practice, but I'll mention one way I
simplified the code, which was by reading more bytes from /dev/urandom
than was really necessary. I read one byte for each random letter
(i.e. throwing away about 3 random bits for each letter) while you tried
to encode the urandom data cleverly and map 4 random bytes to 5
alphabetic letters. /dev/urandom uses a cryptographic PRNG and it's
pretty fast, so reading a few extra bytes from it to simplify your code
doesn't really cost you anything.

Michael Rudolf

unread,

Feb 24, 2010, 2:30:40 PM2/24/10

to

Am 24.02.2010 19:35, schrieb mk:
> On 2010-02-24 18:56, Michael Rudolf wrote:
>
>> The reason is 256 % 26 != 0
>> 256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
>> approx. 10) more often than w-z.
>
> <Barbie voice>writing secure code is hard...

So true. That's why one should stick to standard libs when it comes to
crypto or security in general. It's just to easy to mess it up. Just ask
Debian about whether touching OpenSSL was a good idea ;)

>> You might want to skip the values 0-22
>> to achieve a truly uniform distribution.
> Hmm perhaps you meant to skip values over 256 - 22 ?

That's the same thing as x mod y equals x+N*y mod y for every natural N.

> def gen_rand_word(n):
> with open('/dev/urandom') as f:
> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
> > 22])

Off-by-one-error: you're skipping len(range(22))==23 hits.
OK, I just see that I wrote misleading 0-22 while I meant range(22).

> While with this:
> def gen_rand_word(n):
> with open('/dev/urandom') as f:
> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
> < 235])

Same off-by-one.

>> FYI: Electronic Cash PINs in europe (dont know about the rest of the
>> world) were computed the same way (random hexdigit and just mod it when
>> it's too large) leading to a high probability that your first digit was
>> a 1 :)
> Schadenfreude is deriving joy from others' misfortunes; what is the
> German word, if any, for deriving solace from others' misfortunes? ;-)

Well - "Schadenfreude" *is* in fact a german word :)
"Schaden" is the event or result of misfortune, "Freude" is joy.

Well, I really think that you should use repeated Random.choice on an
alphabet.
Or Random.Systemrandom.choice if you don't trust the PRNG.

Regards,
Michael

mk

unread,

Feb 24, 2010, 3:02:07 PM2/24/10

to pytho...@python.org

On 2010-02-24 20:19, Robert Kern wrote:

> On 2010-02-24 13:09 PM, mk wrote:
>> On 2010-02-24 20:01, Robert Kern wrote:
>>> I will repeat my advice to just use random.SystemRandom.choice() instead
>>> of trying to interpret the bytes from /dev/urandom directly.
>>

>> Oh I hear you -- for production use I would (will) certainly consider
>> this. However, now I'm interested in the problem itself: why is the damn
>> distribution not uniform?
>
> You want "< 234", not "< 235". (234 % 26 == 0), so you get some extra 'a's.

Right, this explains the 'a' outlier. Fixed. But still:

import operator
import os
import random
import math

def rand_str_custom(n):
s = os.urandom(n)
return ''.join([chr(ord('a') + ord(x) % 26) for x in s if ord(x) <
234])

def count_chars(chardict, word):
for c in word:
try:
chardict[c] += 1
except KeyError:
chardict[c] = 0

def rand_str_SystemRandom_seeding(length):
seed = os.urandom(32)
random.seed(seed)

prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

def rand_str_SystemRandom_noseeding(length):

prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

def sd(x):
sd.sum += x
sd.sum2 += x*x
sd.n += 1.0
sum, sum2, n = sd.sum, sd.sum2, sd.n
return math.sqrt(sum2/n - sum*sum/n/n)

def gen_rand_with_fun(fun):
print fun.__name__
chardict = {}
for i in range(10000):
w = fun(10)

count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)

nums = [c[1] for c in counts]
sd.sum = sd.sum2 = sd.n = 0
mean = (1.0*sum(nums))/len(nums)
stddev = map(sd, nums)[-1]
print 'mean', mean, 'std dev', stddev

for char, count in counts:

print char, count, '%.2f' % ((count - mean)/stddev), 'std devs
away from mean'

if __name__ == "__main__":
gen_rand_with_fun(rand_str_SystemRandom_seeding)
print
gen_rand_with_fun(rand_str_SystemRandom_noseeding)
print
gen_rand_with_fun(rand_str_custom)

rand_str_SystemRandom_seeding
mean 3845.15384615 std dev 46.2016419186
l 3926 1.75 std devs away from mean
y 3916 1.53 std devs away from mean
d 3909 1.38 std devs away from mean
a 3898 1.14 std devs away from mean
p 3898 1.14 std devs away from mean
c 3889 0.95 std devs away from mean
u 3884 0.84 std devs away from mean
j 3873 0.60 std devs away from mean
n 3873 0.60 std devs away from mean
w 3866 0.45 std devs away from mean
x 3863 0.39 std devs away from mean
r 3855 0.21 std devs away from mean
m 3852 0.15 std devs away from mean
b 3841 -0.09 std devs away from mean
t 3835 -0.22 std devs away from mean
o 3829 -0.35 std devs away from mean
k 3827 -0.39 std devs away from mean
i 3821 -0.52 std devs away from mean
s 3812 -0.72 std devs away from mean
q 3806 -0.85 std devs away from mean
v 3803 -0.91 std devs away from mean
g 3799 -1.00 std devs away from mean
h 3793 -1.13 std devs away from mean
e 3782 -1.37 std devs away from mean
f 3766 -1.71 std devs away from mean
z 3758 -1.89 std devs away from mean

rand_str_SystemRandom_noseeding
mean 3845.15384615 std dev 55.670522726
i 3961 2.08 std devs away from mean
r 3911 1.18 std devs away from mean
e 3910 1.16 std devs away from mean
m 3905 1.08 std devs away from mean
a 3901 1.00 std devs away from mean
u 3893 0.86 std devs away from mean
t 3882 0.66 std devs away from mean
w 3872 0.48 std devs away from mean
s 3870 0.45 std devs away from mean
c 3868 0.41 std devs away from mean
n 3866 0.37 std devs away from mean
q 3865 0.36 std devs away from mean
k 3863 0.32 std devs away from mean
y 3848 0.05 std devs away from mean
j 3836 -0.16 std devs away from mean
v 3830 -0.27 std devs away from mean
f 3829 -0.29 std devs away from mean
z 3829 -0.29 std devs away from mean
g 3827 -0.33 std devs away from mean
l 3818 -0.49 std devs away from mean
b 3803 -0.76 std devs away from mean
d 3803 -0.76 std devs away from mean
p 3756 -1.60 std devs away from mean
x 3755 -1.62 std devs away from mean
h 3744 -1.82 std devs away from mean
o 3729 -2.09 std devs away from mean

rand_str_custom
mean 3517.15384615 std dev 40.7541336343
i 3586 1.69 std devs away from mean
a 3578 1.49 std devs away from mean
e 3575 1.42 std devs away from mean
m 3570 1.30 std devs away from mean
q 3562 1.10 std devs away from mean
c 3555 0.93 std devs away from mean
g 3552 0.86 std devs away from mean
w 3542 0.61 std devs away from mean
p 3536 0.46 std devs away from mean
x 3533 0.39 std devs away from mean
s 3528 0.27 std devs away from mean
o 3524 0.17 std devs away from mean
d 3516 -0.03 std devs away from mean
t 3515 -0.05 std devs away from mean
h 3511 -0.15 std devs away from mean
v 3502 -0.37 std devs away from mean
z 3502 -0.37 std devs away from mean
b 3500 -0.42 std devs away from mean
f 3496 -0.52 std devs away from mean
u 3492 -0.62 std devs away from mean
l 3486 -0.76 std devs away from mean
r 3478 -0.96 std devs away from mean
n 3476 -1.01 std devs away from mean
j 3451 -1.62 std devs away from mean
k 3450 -1.65 std devs away from mean
y 3430 -2.14 std devs away from mean

It would appear that SystemRandom().choice is indeed best (in terms of
how much the counts stray from mean in std devs), but only after seeding
it with os.urandom.

Regards,
mk

mk

unread,

Feb 24, 2010, 3:06:00 PM2/24/10

to pytho...@python.org

On 2010-02-24 20:30, Michael Rudolf wrote:
>>> The reason is 256 % 26 != 0
>>> 256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
>>> approx. 10) more often than w-z.
>>
>> <Barbie voice>writing secure code is hard...
>
> So true. That's why one should stick to standard libs when it comes to
> crypto or security in general. It's just to easy to mess it up. Just ask
> Debian about whether touching OpenSSL was a good idea ;)

That was brain-dead hiccup, for crying out loud how could they do smth
so stupid.

>> def gen_rand_word(n):
>> with open('/dev/urandom') as f:
>> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
>> > 22])
>
> Off-by-one-error: you're skipping len(range(22))==23 hits.

Argh, it's late here.

> Well, I really think that you should use repeated Random.choice on an
> alphabet.
> Or Random.Systemrandom.choice if you don't trust the PRNG.

I just posted a comparison with calculating std deviations for various
methods - using os.urandom, SystemRandom.choice with seeding and without
seeding.

They all seem to have slightly different distributions.

Regards,
mk

Paul Rubin

unread,

Feb 24, 2010, 3:09:54 PM2/24/10

to

mk <mrk...@gmail.com> writes:
> def rand_str_custom(n):
> s = os.urandom(n)
> return ''.join([chr(ord('a') + ord(x) % 26) for x in s if ord(x) < 234])

Note that simply throws away some of the chars. You have to replace
them, not throw them away.

> rand_str_SystemRandom_seeding
> mean 3845.15384615 std dev 46.2016419186
> l 3926 1.75 std devs away from mean
> y 3916 1.53 std devs away from mean

...

What do you think you're measuring here? Yes, if you're doing 1000's of
draws from a distribution, you'd expect a few of them to be 1.75 sigma
from the mean. Since there are 26 letters, you'd expect a multinomial
distribution which you can test for with the multinomial test or some
approximation from the article:

http://en.wikipedia.org/wiki/Multinomial_test

I wish I knew more statistics than I do, since there is probably some
more familiar statistical test (e.g. the T-test) that you can use as the
number of trials gets large, since each bin of the multinomial
distribution should eventually start to look like a normal distribution
due to the central limit theorem. Others here know a lot more about
this stuff than I do, and can probably give better advice.

Anyway though, the output of os.urandom should be extremely hard to
distinguish from real randomness (that's the whole point of a
cryptographic PRNG).

Robert Kern

unread,

Feb 24, 2010, 3:20:37 PM2/24/10

to pytho...@python.org

On 2010-02-24 14:02 PM, mk wrote:

> It would appear that SystemRandom().choice is indeed best (in terms of
> how much the counts stray from mean in std devs), but only after seeding
> it with os.urandom.

Calling random.seed() does not affect SystemRandom() whatsoever. You are getting
perfectly acceptable distributions for all three variants.

Michael Rudolf

unread,

Feb 24, 2010, 4:00:19 PM2/24/10

to

Am 24.02.2010 21:06, schrieb mk:
>
> I just posted a comparison with calculating std deviations for various
> methods - using os.urandom, SystemRandom.choice with seeding and without
> seeding.

I saw them

> They all seem to have slightly different distributions.

No they don't. Just run those tests again and you will see that you
cannot put them in any order or behaviour. They are all correct now,
except that you cannot seed SystemRandom, as it is *not* a PRNG (at
least here, it is a wrapper for /dev/random)

Regards,
Michael

Steven D'Aprano

unread,

Feb 24, 2010, 8:07:31 PM2/24/10

to

On Wed, 24 Feb 2010 18:23:17 +0100, mk wrote:

> Anyway, the passwords for authorized users will be copied and pasted
> from email into in the application GUI which will remember it for them,
> so they will not have to remember and type them in.

So to break your application's security model, all somebody has to do is
use their PC and they have full access to their account?

Or get hold of the copy and paste buffer?

Or the application's config files?

> So I have little in
> the way of limitations of password length - even though in *some* cases
> somebody might have to (or be ignorant enough) to retype the password
> instead of pasting it in.

Or your users might be sensible enough to not trust a role-your-own
security model, and prefer to memorize the password than to trust that
nobody will get access to their PC.

> The main application will access the data using HTTP (probably), so the
> main point is that an attacker is not able to guess passwords using
> brute force.

And why would they bother doing that when they can sniff the wire and get
the passwords in plain text? You should assume your attackers are
*smarter* than you, not trust them to be foolish.

--
Steven

Paul Rubin

unread,

Feb 24, 2010, 8:31:31 PM2/24/10

to

mk <mrk...@gmail.com> writes:
> Anyway, the passwords for authorized users will be copied and pasted
> from email into in the application GUI which will remember it for
> them, so they will not have to remember and type them in.

It occurs to me that you don't even need to mess with letters in that
case:

password = os.urandom(5).encode('hex')

will generate a string of 10 hex digits that you can give to the user.
(That is for Python 2.x but I think it might be broken in Python 3).

It might be helpful if you could say what your application does, or
anyway give an idea of what its actual security requirements are.
Generating and emailing someone a random password is a fairly standard
method for (e.g.) web forums to verify that the person has supplied a
working email address, basically as a first level spam filter. Your
scheme is probably ok for that. If you're doing something with more
demanding security requirements, then as mentioned before, there is a
whole lot of stuff you have to pay attention to, and focusing narrowly
on password generation isn't that useful.

mk

unread,

Feb 25, 2010, 9:05:56 AM2/25/10

to pytho...@python.org

On 2010-02-25 02:07, Steven D'Aprano wrote:
> On Wed, 24 Feb 2010 18:23:17 +0100, mk wrote:
>
>> Anyway, the passwords for authorized users will be copied and pasted
>> from email into in the application GUI which will remember it for them,
>> so they will not have to remember and type them in.
>
> So to break your application's security model, all somebody has to do is
> use their PC and they have full access to their account?
>
> Or get hold of the copy and paste buffer?
>
> Or the application's config files?

Yes. There's no way around this, short of forcing them to use hardware
key, which is an overkill for this application.

>> So I have little in
>> the way of limitations of password length - even though in *some* cases
>> somebody might have to (or be ignorant enough) to retype the password
>> instead of pasting it in.

> Or your users might be sensible enough to not trust a role-your-own
> security model, and prefer to memorize the password than to trust that
> nobody will get access to their PC.

The app is not that critical, it's about quarterly subscription to the
service, and the users will be able to reset the password anyway. If it
were that critical, I'd use the hardware keys; if hardware keys are not
used, once somebody gains an (unconstrained) access to the user's PC,
there's not much that app developer can do. I've read somewhere a
warning from PuTTY developer that even though the key is (normally)
protected by the passphrase, losing even an encrypted key is quite
likely to lead to its compromise. There's even some software for that on
the net:

http://www.neophob.com/serendipity/index.php?/archives/127-PuTTY-Private-Key-cracker.html

>> The main application will access the data using HTTP (probably), so the
>> main point is that an attacker is not able to guess passwords using
>> brute force.

> And why would they bother doing that when they can sniff the wire and get
> the passwords in plain text? You should assume your attackers are
> *smarter* than you, not trust them to be foolish.

I should have written HTTPS.

Regards,
mk

mk

unread,

Feb 25, 2010, 10:03:47 AM2/25/10

to pytho...@python.org

On 2010-02-25 02:31, Paul Rubin wrote:
> It might be helpful if you could say what your application does, or
> anyway give an idea of what its actual security requirements are.
> Generating and emailing someone a random password is a fairly standard
> method for (e.g.) web forums to verify that the person has supplied a
> working email address, basically as a first level spam filter. Your
> scheme is probably ok for that. If you're doing something with more
> demanding security requirements, then as mentioned before, there is a
> whole lot of stuff you have to pay attention to, and focusing narrowly
> on password generation isn't that useful.

OK some data:

1. I'm going to probably use HTTPS (I meant HTTP over SSL, but wrote
HTTP instead of being precise)

2. The app will have GUI and it will be locally installed; it's not
going to be web app, it will just be online in the sense of downloading
data frequently from the net.

3. I can't disclose the details on what the app will be doing, but it's
not going to be terribly security-critical, medium-level at most - what
happens in the app itself is not _directly_ related to money.

4. The app will be based on subscription model, costing $10-$20 per
month. It's not really doing online banking or smth like that.

5. The worst thing that can happen when security of some account is
compromised is that the user will have to reset the password, resending
it to predefined email (I don't really see the way of organizing it in a
different manner).

I'm thinking about optionally passphrase-securing the password saved in
GUI. In that case 'diceware' approach would be helpful.

I certainly do not want to focus narrowly on password generation: the
entire security model will have to be worked out, but the project didn't
get to that stage yet, it's all still in planning stages. I just wanted
to have this one part (password generation) researched before I get to
other stages so I don't have to implement this later in haste and do
smth wrong.

Regards,
mk

Robert Kern

unread,

Feb 25, 2010, 11:32:20 AM2/25/10

to pytho...@python.org

On 2010-02-25 09:03 AM, mk wrote:

> 2. The app will have GUI and it will be locally installed; it's not
> going to be web app, it will just be online in the sense of downloading
> data frequently from the net.

If you are storing the password instead of making your user remember it, most
platforms have some kind of keychain secure password storage. I recommend
reading up on the APIs available on your targeted platforms.

Steven D'Aprano

unread,

Feb 25, 2010, 12:15:50 PM2/25/10

to

On Thu, 25 Feb 2010 15:05:56 +0100, mk wrote:

> On 2010-02-25 02:07, Steven D'Aprano wrote:
>> On Wed, 24 Feb 2010 18:23:17 +0100, mk wrote:
>>
>>> Anyway, the passwords for authorized users will be copied and pasted
>>> from email into in the application GUI which will remember it for
>>> them, so they will not have to remember and type them in.
>>
>> So to break your application's security model, all somebody has to do
>> is use their PC and they have full access to their account?
>>
>> Or get hold of the copy and paste buffer?
>>
>> Or the application's config files?
>
> Yes. There's no way around this, short of forcing them to use hardware
> key, which is an overkill for this application.

Of course there is. Why don't you find out how applications with real
security work, instead of making up amateur insecure schemes or worrying
about insignificant deviations from uniformity in your password generator?

You can't get hold of a user's login password in Linux or Windows by
grabbing the copy-and-paste buffer, or by looking in the password file.
No hardware key required.

Today, you say that your application only needs weak security because the
value of the accounts are low. (If they're that low, why do you need a
password at all?) But tomorrow, your application will be bigger, better,
new and improved, with remote logins over the Internet and much more
value -- and it will still be using the same crappy weak security that it
has now, I guarantee it.

If you are storing the password, instead of a hash, you fail.

If you are storing a hash without a salt, you fail.

Yes, an awful lot of software do these things. They shouldn't, even for
supposed "low value passwords".

http://blog.moertel.com/articles/2006/12/15/never-store-passwords-in-a-database

http://www.codinghorror.com/blog/2007/09/youre-probably-storing-passwords-incorrectly.html

>> Or your users might be sensible enough to not trust a role-your-own
>> security model, and prefer to memorize the password than to trust that
>> nobody will get access to their PC.
>
> The app is not that critical, it's about quarterly subscription to the
> service, and the users will be able to reset the password anyway.

And when users realise that they don't need to buy a subscription, they
just need to steal a password from somebody else, what does that do to
your business model?

--
Steven

Peter Pearson

unread,

Feb 26, 2010, 3:07:52 PM2/26/10

to

What would be perfect? Surely one shouldn't be happy if all the
tallies come out exactly equal: that would be a blatant indication
of something very nonrandom going on.

The tallies given above give a chi-squared value smack in the
middle of the range expected for random sampling of a uniform
distribution (p = 0.505). So the chi-squared metric of
goodness-of-fit to a unifom distribution says you're doing fine.

--
To email me, substitute nowhere->spamcop, invalid->net.

Peter Pearson

unread,

Feb 26, 2010, 3:17:43 PM2/26/10

to

On Wed, 24 Feb 2010 21:02:07 +0100, mk <mrk...@gmail.com> wrote:
[snip]

Chi2 = 14.43, 25 d.f., prob = 0.046362.
The observed distribution is SIGNIFICANTLY CLOSER
to the uniform distribution than reasonable by chance.

Chi2 = 20.96, 25 d.f., prob = 0.304944.
The observed distribution is not significantly different
from the uniform distribution.

Chi2 = 12.28, 25 d.f., prob = 0.015815.
The observed distribution is SIGNIFICANTLY CLOSER
to the uniform distribution than reasonable by chance.

> It would appear that SystemRandom().choice is indeed best (in terms of
> how much the counts stray from mean in std devs), but only after seeding
> it with os.urandom.

I don't see any reason to worry about any of the three, except
perhaps that the first and last are surprisingly uniform.

Aahz

unread,

Feb 28, 2010, 2:28:39 AM2/28/10

to

In article <mailman.247.1267115...@python.org>,

Robert Kern <rober...@gmail.com> wrote:
>
>If you are storing the password instead of making your user remember
>it, most platforms have some kind of keychain secure password
>storage. I recommend reading up on the APIs available on your targeted
>platforms.

Are you sure? I haven't done a lot of research, but my impression was
that Windows didn't have anything built in.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer

Paul Rubin

unread,

Feb 28, 2010, 3:43:10 AM2/28/10

to

aa...@pythoncraft.com (Aahz) writes:
> Are you sure? I haven't done a lot of research, but my impression was
> that Windows didn't have anything built in.

I don't know much about the windows but there is the CAPI and then
there is all the TCPA (i.e. DRM) stuff. Maybe it can be used somehow.

Robert Kern

unread,

Mar 2, 2010, 11:37:55 AM3/2/10

to pytho...@python.org

On 2010-02-28 01:28 AM, Aahz wrote:
> In article<mailman.247.1267115...@python.org>,
> Robert Kern<rober...@gmail.com> wrote:
>>
>> If you are storing the password instead of making your user remember
>> it, most platforms have some kind of keychain secure password
>> storage. I recommend reading up on the APIs available on your targeted
>> platforms.
>
> Are you sure? I haven't done a lot of research, but my impression was
> that Windows didn't have anything built in.

You're right, not built-in, but Windows does provide enough crypto services for
a cross-platform Python implementation to be built:

http://pypi.python.org/pypi/keyring

Aahz

unread,

Mar 2, 2010, 5:59:14 PM3/2/10

to

In article <mailman.120.12675480...@python.org>,

Robert Kern <rober...@gmail.com> wrote:
>On 2010-02-28 01:28 AM, Aahz wrote:
>> In article<mailman.247.1267115...@python.org>,
>> Robert Kern<rober...@gmail.com> wrote:
>>>
>>> If you are storing the password instead of making your user remember
>>> it, most platforms have some kind of keychain secure password
>>> storage. I recommend reading up on the APIs available on your targeted
>>> platforms.
>>
>> Are you sure? I haven't done a lot of research, but my impression was
>> that Windows didn't have anything built in.
>
>You're right, not built-in, but Windows does provide enough crypto
>services for a cross-platform Python implementation to be built:
>
> http://pypi.python.org/pypi/keyring

Thanks you! That's a big help!

Lie Ryan

unread,

Mar 2, 2010, 10:51:52 PM3/2/10

to

I give you this:

import itertools
def gen():
valid_chars = 'abcdefghijklmnopqrstuvwxyz'
for char in itertools.repeat(valid_chars):
yield char

gen = gen()
def gen_rand_string(length):
chars = (next(gen) for i in range(length))
return ''.join(chars)

since it gives me a perfect distribution of letters, it must be a very
secure random password generation scheme.

Michael Rudolf

unread,

Mar 3, 2010, 10:15:27 AM3/3/10

to

Am 03.03.2010 04:51, schrieb Lie Ryan:
> import itertools
> def gen():
> valid_chars = 'abcdefghijklmnopqrstuvwxyz'
> for char in itertools.repeat(valid_chars):
> yield char
>
> gen = gen()
> def gen_rand_string(length):
> chars = (next(gen) for i in range(length))
> return ''.join(chars)
>
> since it gives me a perfect distribution of letters,

It does not. Only if not (length(valid_chars) % length)

Regards,
Michael