How good is security via hashing

Robin Becker

unread,

Jun 7, 2011, 6:18:19 AM6/7/11

to pytho...@python.org

A python web process is producing files that are given randomized names of the form

hhhhhh-YYYYMMDDhhmmss-rrrrrrrr.pdf

where rrr.. is a 128bit random number (encoded as base62). The intent of the
random part is to prevent recipients of one file from being able to guess the
names of others.

The process was originally a cgi script which meant each random number was
produced thusly

pid is process id, dur is 4 bytes from /dev/urandom.

random.seed(long(time.time()*someprimeint)|(pid<<64)|(dur<<32))
rrr = random.getrandbits(128)

is this algorithm safe? Is it safe if the process is switched to fastcgi and the
initialization is only carried out once and then say 50 rrr values are generated.
--
Robin Becker

Robin Becker

unread,

Jun 7, 2011, 7:35:21 AM6/7/11

to pytho...@python.org

On 07/06/2011 11:26, Nitin Pawar wrote:
> Have you tried using UUID module?
>
> Its pretty handy and comes with base64 encoding function which gives
> extremely high quality randon strings
>
> ref:
> http://stackoverflow.com/questions/621649/python-and-random-keys-of-21-char-max
......
I didn't actually ask for a suitable method for doing this; I assumed that Tim
Peters' algorithm (at least I think he's behind most of the python random
support) is pretty good so that the bits produced are indeed fairly good
approximations to random.

I guess what I'm asking is whether any sequence that's using random to generate
random numbers is predictable if enough samples are drawn. In this case assuming
that fastcgi is being used can I observe a sequence of generated numbers and
work out the state of the generator. If that is possible then the sequence
becomes deterministic and such a scheme is useless. If I use cgi then we're
re-initializing the sequence hopefully using some other unrelated randomness for
each number.

Uuid apparently uses machine internals etc etc to try and produce randomness,
but urandom and similar can block so are probably not entirely suitable.
--
Robin Becker

Jean-Paul Calderone

unread,

Jun 7, 2011, 7:40:52 AM6/7/11

to

How much randomness do you actually have in this scheme? The PID is
probably difficult
for an attacker to know, but it's allocated roughly monotonically with
a known
wrap-around value. The time is probably roughly known, so it also
contributes less
than its full bits to the randomness. Only dur is really
unpredictable. So you have
something somewhat above 4 bytes of randomness in your seed - perhaps
8 or 10. That's
much less than even the fairly small 16 bytes of "randomness" you
expose in the
filename.

The random module is entirely deterministic, so once the seed is known
the value you
produce is known too.

Is 10 bytes enough to thwart your attackers? Hard to say, what does
an attack look like?

If you want the full 16 bytes of unpredictability, why don't you just
read 16 bytes from
/dev/urandom and forget about all the other stuff?

Jean-Paul

Jean-Paul Calderone

unread,

Jun 7, 2011, 7:42:31 AM6/7/11

to

On Jun 7, 7:35 am, Robin Becker <ro...@reportlab.com> wrote:
> On 07/06/2011 11:26, Nitin Pawar wrote:> Have you tried using UUID module?
>
> > Its pretty handy and comes with base64 encoding function which gives
> > extremely high quality randon strings
>
> > ref:

> >http://stackoverflow.com/questions/621649/python-and-random-keys-of-2...

>
> ......
> I didn't actually ask for a suitable method for doing this; I assumed that Tim
> Peters' algorithm (at least I think he's behind most of the python random
> support) is pretty good so that the bits produced are indeed fairly good
> approximations to random.
>
> I guess what I'm asking is whether any sequence that's using random to generate
> random numbers is predictable if enough samples are drawn. In this case assuming
> that fastcgi is being used can I observe a sequence of generated numbers and
> work out the state of the generator. If that is possible then the sequence
> becomes deterministic and such a scheme is useless. If I use cgi then we're
> re-initializing the sequence hopefully using some other unrelated randomness for
> each number.
>
> Uuid apparently uses machine internals etc etc to try and produce randomness,
> but urandom and similar can block so are probably not entirely suitable.

/dev/urandom does not block, that's the point of it as compared to /
dev/random.

Jean-Paul

Robin Becker

unread,

Jun 7, 2011, 8:07:15 AM6/7/11

to pytho...@python.org

........

>
> /dev/urandom does not block, that's the point of it as compared to /
> dev/random.
>
> Jean-Paul

my mistake, I thought it was the other way round, on FreeBSD they're the same
anyway which is what we test on.
--
Robin Becker

Robin Becker

unread,

Jun 7, 2011, 8:27:59 AM6/7/11

to pytho...@python.org

On 07/06/2011 12:40, Jean-Paul Calderone wrote:
astcgi and the
>> initialization is only carried out once and then say 50 rrr values are generated.
>
> How much randomness do you actually have in this scheme? The PID is
> probably difficult
> for an attacker to know, but it's allocated roughly monotonically with
> a known
> wrap-around value. The time is probably roughly known, so it also
> contributes less
> than its full bits to the randomness. Only dur is really
> unpredictable. So you have
> something somewhat above 4 bytes of randomness in your seed - perhaps
> 8 or 10. That's
> much less than even the fairly small 16 bytes of "randomness" you
> expose in the
> filename.

I'm sure you're right about the limited amount of entropy in the initial state,
but how much state can be in the prng?

>
> The random module is entirely deterministic, so once the seed is known
> the value you
> produce is known too.
>
> Is 10 bytes enough to thwart your attackers? Hard to say, what does
> an attack look like?

An attacker could try to gain information from seeing others' results by
guessing the filename.

an attack would consist of generating a sample file via a web query which might
take 1 or 2 seconds; the sequence number could then be seen and if the state
established future filenames could be guessed if fastcgi is in operation.

In a cgi type scheme that requires searching over the pid space, the time space
and some random bits from the OS.

I'm not sure such an attack is realistic given the size of the space even in the
initial seed.

>
> If you want the full 16 bytes of unpredictability, why don't you just
> read 16 bytes from
> /dev/urandom and forget about all the other stuff?
>
> Jean-Paul

I have a vague memory that the original author felt that entropy might run out
or something like that so reading from /dev/urandom always was not a good idea.

FreeBSD re-uses the entropy, but the end target is Solaris so I'm not really
sure about the details of /dev/urandom.
--
Robin Becker

Paul Rubin

unread,

Jun 7, 2011, 9:00:59 AM6/7/11

to

Robin Becker <ro...@reportlab.com> writes:
> I have a vague memory that the original author felt that entropy might
> run out or something like that so reading from /dev/urandom always was
> not a good idea.

If there is enough entropy to begin with, then /dev/urandom should be
cryptographically strong. The main danger is just after the system
boots and there has not yet been much entropy gathered from physical
events.

> FreeBSD re-uses the entropy, but the end target is Solaris so I'm not
> really sure about the details of /dev/urandom.

No idea about Solaris. Another area of danger these days is virtual
hosts, since their I/O may be completely simulated. They are not
certified for payment card processing, mostly for that reason.

Terry Reedy

unread,

Jun 7, 2011, 2:26:14 PM6/7/11

to pytho...@python.org

On 6/7/2011 7:35 AM, Robin Becker wrote:

> I guess what I'm asking is whether any sequence that's using random to
> generate random numbers is predictable if enough samples are drawn.

Apparently so. random.random is *not* 'cryptographically secure'.
https://secure.wikimedia.org/wikipedia/en/wiki/Cryptographically_secure_pseudorandom_number_generator

One of Python's crypto wrapper modules (sorry, forget which one) was
recently modified to expose the crypto rng functions in the wrapped C
library. It should be mentioned in What New for 3.3. You might be able
to get at the same functions with ctypes.

--
Terry Jan Reedy

geremy condra

unread,

Jun 7, 2011, 4:02:46 PM6/7/11

to Robin Becker, pytho...@python.org

On Tue, Jun 7, 2011 at 3:18 AM, Robin Becker <ro...@reportlab.com> wrote:
> A python web process is producing files that are given randomized names of
> the form
>
> hhhhhh-YYYYMMDDhhmmss-rrrrrrrr.pdf
>
> where rrr.. is a 128bit random number (encoded as base62). The intent of the
> random part is to prevent recipients of one file from being able to guess
> the names of others.
>
> The process was originally a cgi script which meant each random number was
> produced thusly
>
>
> pid is process id, dur is 4 bytes from /dev/urandom.
>
> random.seed(long(time.time()*someprimeint)|(pid<<64)|(dur<<32))
> rrr = random.getrandbits(128)
>
>

> is this algorithm safe? Is it safe if the process is switched to fastcgi and

> the initialization is only carried out once and then say 50 rrr values are
> generated.

The advice you got about just using urandom seems to be the best
you're likely to get. Given how few values you have to pull out of
random.random to reconstruct its state, the progress that's been made
in the last few years on similar hidden state problems, and the
limited amount of entropy you're feeding it in the first place, I'd
probably stay away from this method. And besides,

# adds random junk to the filename- should make it hard to guess
rrr = os.urandom(16)
fname += base64.b64encode(rrr)

has to be easier to read and reason about than the process above.

Geremy Condra

Paul Rubin

unread,

Jun 7, 2011, 4:42:15 PM6/7/11

to

geremy condra <deba...@gmail.com> writes:
> # adds random junk to the filename- should make it hard to guess
> rrr = os.urandom(16)
> fname += base64.b64encode(rrr)

Don't use b64 output in a filename -- it can have slashes in it! :-(

Simplest is to use old fashioned hexadeimal for stuff like that, unless
the number of chars is a significant problem. Go for a more complicated
encoding if you must.

Ian Kelly

unread,

Jun 7, 2011, 4:58:50 PM6/7/11

to pytho...@python.org

You can use base64.urlsafe_b64encode, or specify a custom altchars
argument that doesn't include '/'.

Definitely don't use base64 filenames on a case-insensitive
filesystem, though. :-)

Nobody

unread,

Jun 7, 2011, 5:23:05 PM6/7/11

to

On Tue, 07 Jun 2011 13:27:59 +0100, Robin Becker wrote:

>> If you want the full 16 bytes of unpredictability, why don't you just
>> read 16 bytes from
>> /dev/urandom and forget about all the other stuff?
>

> I have a vague memory that the original author felt that entropy might
> run out or something like that so reading from /dev/urandom always was
> not a good idea.

The problem with /dev/urandom is that it shares the same entropy pool as
/dev/random, so you're "stealing" entropy which may be needed for tasks
which really need it (e.g. generating SSL/TLS keys).

Personally, I'd take whatever "cheap" entropy I can get and hash it.
If you're going to read from /dev/urandom, limit it to a few bytes per
minute, not per request.

geremy condra

unread,

Jun 7, 2011, 5:41:55 PM6/7/11

to pytho...@python.org

Eeesh, that completely slipped my mind. Thanks for pointing it out.

Geremy Condra

Christian Heimes

unread,

Jun 7, 2011, 6:08:03 PM6/7/11

to pytho...@python.org

PyCrypto has a strong pseudorandom number generator, too.

Paul Rubin

unread,

Jun 7, 2011, 10:30:26 PM6/7/11

to

Christian Heimes <li...@cheimes.de> writes:
> PyCrypto has a strong pseudorandom number generator, too.

If you mean the one at pycrypto.org, that page now says:

Random number generation

Do not use RandomPool to generate random numbers. Use Crypto.Random
instead. RandomPool is deprecated and will be removed in a future
release. See this thread to find out why.

Crypto.Random just uses system randomness, which is the right thing to
do. It then goes and runs them through a distiller (Fortuna), which
seems a little bit silly to me, but harmless.

Paul Rubin

unread,

Jun 7, 2011, 10:38:29 PM6/7/11

to

Nobody <nob...@nowhere.com> writes:
> The problem with /dev/urandom is that it shares the same entropy pool as
> /dev/random, so you're "stealing" entropy which may be needed for tasks
> which really need it (e.g. generating SSL/TLS keys).

The most thorough analysis of Linux's /dev/*random that I know of is
here:

http://www.pinkas.net/PAPERS/gpr06.pdf

It says random and urandom use separate pools, though both fed from the
same primary pool. The point is that reading from one should not
interfere with the other.

There have been interminable threads on sci.crypt about /dev/random vs
/dev/urandom and the sane conclusion seems to be that if there's enough
entropy in the system, /dev/urandom is not really worse than
/dev/random. Any type of userspace randomness collection is probably
worse than either. If you're doing typical low-to-medium security stuff
on traditional PC hardware (i.e. not virtualized, not something like
OpenWRT which has few real entropy sources), /dev/urandom is fine. If
you don't believe in its cryptographic PRNG, you shouldn't believe in
OpenSSL either.

If you're doing high security stuff (server-side financial apps, etc.)
then /dev/urandom and /dev/random are both considered not good enough,
and your PC is probably leaking keys, so you should be using dedicated
crypto hardware.

> Personally, I'd take whatever "cheap" entropy I can get and hash it.
> If you're going to read from /dev/urandom, limit it to a few bytes per
> minute, not per request.

That's really not going to help you.

geremy condra

unread,

Jun 8, 2011, 1:25:12 AM6/8/11

to pytho...@python.org

On Tue, Jun 7, 2011 at 7:30 PM, Paul Rubin <no.e...@nospam.invalid> wrote:
> Christian Heimes <li...@cheimes.de> writes:
>> PyCrypto has a strong pseudorandom number generator, too.
>
> If you mean the one at pycrypto.org, that page now says:
>
> Random number generation
>
> Do not use RandomPool to generate random numbers. Use Crypto.Random
> instead. RandomPool is deprecated and will be removed in a future
> release. See this thread to find out why.

On a related note, keyczar just got bitten by this.

> Crypto.Random just uses system randomness, which is the right thing to
> do. It then goes and runs them through a distiller (Fortuna), which
> seems a little bit silly to me, but harmless.

IIRC this is mostly to help deal with the possibility of running on
older Windows machines, where the cryptographic random number service
was of very poor quality.

Geremy Condra

Nobody

unread,

Jun 8, 2011, 3:18:40 AM6/8/11

to

On Tue, 07 Jun 2011 19:38:29 -0700, Paul Rubin wrote:

>> Personally, I'd take whatever "cheap" entropy I can get and hash it.
>> If you're going to read from /dev/urandom, limit it to a few bytes per
>> minute, not per request.
>
> That's really not going to help you.

In what way?

If I need security, I'll use /dev/random or /dev/urandom. If I don't, I'll
save the real entropy for something which needs it.

Issues with observability of entropy sources (mainly the use of network
traffic as an entropy source) are overblown. The staff of a co-location
facility have physical access, and anyone further out doesn't see enough
of the traffic for it to do them any good.

Predicting an entropy-hashing RNG based upon a fraction of the entropy
and a fraction of the output is a theoretical attack which is only
relevant to entities who have far easier options available to them.

Paul Rubin

unread,

Jun 8, 2011, 3:40:33 AM6/8/11

to

Nobody <nob...@nowhere.com> writes:
>>> If you're going to read from /dev/urandom, limit it to a few bytes per
>>> minute, not per request.
>> That's really not going to help you.
> In what way?
> If I need security, I'll use /dev/random or /dev/urandom. If I don't, I'll
> save the real entropy for something which needs it.

I just mean that if /dev/urandom has enough internal state then within
practical bounds, its output is effectively random no matter how much
you read from it. Did you look at the paper I linked? "Saving" the
"real entropy" isn't feasible since the maximum capacity of the two
"real" entropy pools is 4096 bits each. They will both fill pretty
quickly on an active system. Reading /dev/urandom will empty the
primary pool but /dev/random is fed by the secondary pool, which
receives entropy from both the primary pool and physical sources. If
you read too fast from /dev/urandom, the worst that happens (if I
understand correctly) is that the rate you can read from /dev/random is
cut in half and it will block more often. If that's a serious issue for
your application, you should probably rethink your approach and get an
HSM.

Robin Becker

unread,

Jun 8, 2011, 5:13:25 AM6/8/11

to pytho...@python.org

we have been using base62 ie 0-9A-Za-z just to reduce the name length.
--
Robin Becker

Thomas Rachel

unread,

Jun 8, 2011, 7:30:28 AM6/8/11

to

Am 08.06.2011 11:13 schrieb Robin Becker:

> we have been using base62 ie 0-9A-Za-z just to reduce the name length.

Ugly concerning calculation. Then maybe better use radix32 - 0..9a..v,
case-insensitive.

Thomas