hhhhhh-YYYYMMDDhhmmss-rrrrrrrr.pdf
where rrr.. is a 128bit random number (encoded as base62). The intent of the
random part is to prevent recipients of one file from being able to guess the
names of others.
The process was originally a cgi script which meant each random number was
produced thusly
pid is process id, dur is 4 bytes from /dev/urandom.
random.seed(long(time.time()*someprimeint)|(pid<<64)|(dur<<32))
rrr = random.getrandbits(128)
is this algorithm safe? Is it safe if the process is switched to fastcgi and the
initialization is only carried out once and then say 50 rrr values are generated.
--
Robin Becker
I guess what I'm asking is whether any sequence that's using random to generate
random numbers is predictable if enough samples are drawn. In this case assuming
that fastcgi is being used can I observe a sequence of generated numbers and
work out the state of the generator. If that is possible then the sequence
becomes deterministic and such a scheme is useless. If I use cgi then we're
re-initializing the sequence hopefully using some other unrelated randomness for
each number.
Uuid apparently uses machine internals etc etc to try and produce randomness,
but urandom and similar can block so are probably not entirely suitable.
--
Robin Becker
How much randomness do you actually have in this scheme? The PID is
probably difficult
for an attacker to know, but it's allocated roughly monotonically with
a known
wrap-around value. The time is probably roughly known, so it also
contributes less
than its full bits to the randomness. Only dur is really
unpredictable. So you have
something somewhat above 4 bytes of randomness in your seed - perhaps
8 or 10. That's
much less than even the fairly small 16 bytes of "randomness" you
expose in the
filename.
The random module is entirely deterministic, so once the seed is known
the value you
produce is known too.
Is 10 bytes enough to thwart your attackers? Hard to say, what does
an attack look like?
If you want the full 16 bytes of unpredictability, why don't you just
read 16 bytes from
/dev/urandom and forget about all the other stuff?
Jean-Paul
/dev/urandom does not block, that's the point of it as compared to /
dev/random.
Jean-Paul
my mistake, I thought it was the other way round, on FreeBSD they're the same
anyway which is what we test on.
--
Robin Becker
I'm sure you're right about the limited amount of entropy in the initial state,
but how much state can be in the prng?
>
> The random module is entirely deterministic, so once the seed is known
> the value you
> produce is known too.
>
> Is 10 bytes enough to thwart your attackers? Hard to say, what does
> an attack look like?
An attacker could try to gain information from seeing others' results by
guessing the filename.
an attack would consist of generating a sample file via a web query which might
take 1 or 2 seconds; the sequence number could then be seen and if the state
established future filenames could be guessed if fastcgi is in operation.
In a cgi type scheme that requires searching over the pid space, the time space
and some random bits from the OS.
I'm not sure such an attack is realistic given the size of the space even in the
initial seed.
>
> If you want the full 16 bytes of unpredictability, why don't you just
> read 16 bytes from
> /dev/urandom and forget about all the other stuff?
>
> Jean-Paul
I have a vague memory that the original author felt that entropy might run out
or something like that so reading from /dev/urandom always was not a good idea.
FreeBSD re-uses the entropy, but the end target is Solaris so I'm not really
sure about the details of /dev/urandom.
--
Robin Becker
If there is enough entropy to begin with, then /dev/urandom should be
cryptographically strong. The main danger is just after the system
boots and there has not yet been much entropy gathered from physical
events.
> FreeBSD re-uses the entropy, but the end target is Solaris so I'm not
> really sure about the details of /dev/urandom.
No idea about Solaris. Another area of danger these days is virtual
hosts, since their I/O may be completely simulated. They are not
certified for payment card processing, mostly for that reason.
> I guess what I'm asking is whether any sequence that's using random to
> generate random numbers is predictable if enough samples are drawn.
Apparently so. random.random is *not* 'cryptographically secure'.
https://secure.wikimedia.org/wikipedia/en/wiki/Cryptographically_secure_pseudorandom_number_generator
One of Python's crypto wrapper modules (sorry, forget which one) was
recently modified to expose the crypto rng functions in the wrapped C
library. It should be mentioned in What New for 3.3. You might be able
to get at the same functions with ctypes.
--
Terry Jan Reedy
The advice you got about just using urandom seems to be the best
you're likely to get. Given how few values you have to pull out of
random.random to reconstruct its state, the progress that's been made
in the last few years on similar hidden state problems, and the
limited amount of entropy you're feeding it in the first place, I'd
probably stay away from this method. And besides,
# adds random junk to the filename- should make it hard to guess
rrr = os.urandom(16)
fname += base64.b64encode(rrr)
has to be easier to read and reason about than the process above.
Geremy Condra
Don't use b64 output in a filename -- it can have slashes in it! :-(
Simplest is to use old fashioned hexadeimal for stuff like that, unless
the number of chars is a significant problem. Go for a more complicated
encoding if you must.
You can use base64.urlsafe_b64encode, or specify a custom altchars
argument that doesn't include '/'.
Definitely don't use base64 filenames on a case-insensitive
filesystem, though. :-)
>> If you want the full 16 bytes of unpredictability, why don't you just
>> read 16 bytes from
>> /dev/urandom and forget about all the other stuff?
>
> I have a vague memory that the original author felt that entropy might
> run out or something like that so reading from /dev/urandom always was
> not a good idea.
The problem with /dev/urandom is that it shares the same entropy pool as
/dev/random, so you're "stealing" entropy which may be needed for tasks
which really need it (e.g. generating SSL/TLS keys).
Personally, I'd take whatever "cheap" entropy I can get and hash it.
If you're going to read from /dev/urandom, limit it to a few bytes per
minute, not per request.
Eeesh, that completely slipped my mind. Thanks for pointing it out.
Geremy Condra
PyCrypto has a strong pseudorandom number generator, too.
If you mean the one at pycrypto.org, that page now says:
Random number generation
Do not use RandomPool to generate random numbers. Use Crypto.Random
instead. RandomPool is deprecated and will be removed in a future
release. See this thread to find out why.
Crypto.Random just uses system randomness, which is the right thing to
do. It then goes and runs them through a distiller (Fortuna), which
seems a little bit silly to me, but harmless.
The most thorough analysis of Linux's /dev/*random that I know of is
here:
http://www.pinkas.net/PAPERS/gpr06.pdf
It says random and urandom use separate pools, though both fed from the
same primary pool. The point is that reading from one should not
interfere with the other.
There have been interminable threads on sci.crypt about /dev/random vs
/dev/urandom and the sane conclusion seems to be that if there's enough
entropy in the system, /dev/urandom is not really worse than
/dev/random. Any type of userspace randomness collection is probably
worse than either. If you're doing typical low-to-medium security stuff
on traditional PC hardware (i.e. not virtualized, not something like
OpenWRT which has few real entropy sources), /dev/urandom is fine. If
you don't believe in its cryptographic PRNG, you shouldn't believe in
OpenSSL either.
If you're doing high security stuff (server-side financial apps, etc.)
then /dev/urandom and /dev/random are both considered not good enough,
and your PC is probably leaking keys, so you should be using dedicated
crypto hardware.
> Personally, I'd take whatever "cheap" entropy I can get and hash it.
> If you're going to read from /dev/urandom, limit it to a few bytes per
> minute, not per request.
That's really not going to help you.
On a related note, keyczar just got bitten by this.
> Crypto.Random just uses system randomness, which is the right thing to
> do. It then goes and runs them through a distiller (Fortuna), which
> seems a little bit silly to me, but harmless.
IIRC this is mostly to help deal with the possibility of running on
older Windows machines, where the cryptographic random number service
was of very poor quality.
Geremy Condra
>> Personally, I'd take whatever "cheap" entropy I can get and hash it.
>> If you're going to read from /dev/urandom, limit it to a few bytes per
>> minute, not per request.
>
> That's really not going to help you.
In what way?
If I need security, I'll use /dev/random or /dev/urandom. If I don't, I'll
save the real entropy for something which needs it.
Issues with observability of entropy sources (mainly the use of network
traffic as an entropy source) are overblown. The staff of a co-location
facility have physical access, and anyone further out doesn't see enough
of the traffic for it to do them any good.
Predicting an entropy-hashing RNG based upon a fraction of the entropy
and a fraction of the output is a theoretical attack which is only
relevant to entities who have far easier options available to them.
I just mean that if /dev/urandom has enough internal state then within
practical bounds, its output is effectively random no matter how much
you read from it. Did you look at the paper I linked? "Saving" the
"real entropy" isn't feasible since the maximum capacity of the two
"real" entropy pools is 4096 bits each. They will both fill pretty
quickly on an active system. Reading /dev/urandom will empty the
primary pool but /dev/random is fed by the secondary pool, which
receives entropy from both the primary pool and physical sources. If
you read too fast from /dev/urandom, the worst that happens (if I
understand correctly) is that the rate you can read from /dev/random is
cut in half and it will block more often. If that's a serious issue for
your application, you should probably rethink your approach and get an
HSM.
> we have been using base62 ie 0-9A-Za-z just to reduce the name length.
Ugly concerning calculation. Then maybe better use radix32 - 0..9a..v,
case-insensitive.
Thomas