Re: Random numbers in web2py

446 views
Skip to first unread message

Massimo Di Pierro

unread,
Jun 2, 2011, 8:43:07 AM6/2/11
to David Wagner, cyou...@gmail.com, Massimo e Claudia Di Pierro, web2py-developers
Hello David,

According to this:

"The version 4 UUID is meant for generating UUIDs from truly-random or pseudo-random numbers."

The source code in the python uuid.py module for uuid4 is this:

  bytes = [chr(random.randrange(256)) for i in range(16)]
  return UUID(bytes=bytes, version=4)

which in fact uses 16x256bits random number (of course assuming pseudo-random-numbers are random).

We used this function but run into one problem. When users where cloning VM behind a load balancer, different machines cloned machine where generating the same session cookies for different users thus creating conflicts. We fixed with the following trunk implemented below:

def rotate(i):
    a = random.randrange(256)
    b = (node_id >> 4*i) % 256
    c = (milliseconds >> 4*i) % 256
    return (a + b + c) % 256

def web2py_uuid():
    web2py_uuid_locker.acquire()
    try:
        bytes = [chr(rotate(i)) for i in range(16)]
        return str(uuid.UUID(bytes=bytes, version=4))
    finally:
web2py_uuid_locker.release()

I.e. we byte-rotate each of the 16 256bits number using the same value of node_id and milliseconds determined once outside the function. That means that for a fixed node_id and milliseconds this function has the same entropy as the python implementation of UUID4.

If you think we can improve on this, please feel free to suggest a solution. Of if you can show an example of attack that would also be useful. I cannot think of any other solution.

Massimo




On Jun 2, 2011, at 12:08 AM, David Wagner wrote:

Hi,

I was looking at web2py, and in particular, at how web2py generates random
numbers for things like session cookies.  I was wondering if you could say
anything about the security of the way web2py generates random numbers.
In particular, based on what I saw, I'm concerned about whether it has
sufficient entropy in its source of randomness.

Based on the description at pythonsecurity.org [1], it appears that
session cookies and other values that need to be unpredictable and
unguessable are generated with web2py_uuid(), which in turn generates
its random value using two sources of randomness:

  * the current time, down to the millisecond, and
  * Python's random module

This seems consistent with my quick glance at the code [2].

However, neither of those sources of randomness is very good for security
purposes.  The web server reveals the current time down to the second to
anyone who asks, so from an adversary's point of view, there are only
1000 possibilities for the time in milliseconds.  The random number
module is not designed for cryptographic purposes (it uses Mersenne
twister, and the documentation warns it is not cryptographically secure
and must not be used for cryptographic purposes), so it is not a good
basis for generating numbers that need to be unguessable/unpredictable
to an adversary.  For instance, after observing some outputs from the
ranodm module (say, observing multiple session cookie values), I expect
it will be possible to deduce the state of the random module and predict
all its future outputs.

As a result, it looks to me like session cookies in web2py (and
other values that need to be unguessable) do not have enough entropy.
This makes me worry that perhaps they can be guessed by a knowledgeable
adversary.  If so, it sounds like this might allow session stealing
attacks, authentication bypass, and other bad things to happen.

I also verified that a number of other places in web2py seem to use
web2py_uuid() in contexts that (it would seem) need an unpredictable
value [3].

Standard security practice would say that you should not use this
kind of design in situations where you need random numbers that cannot
be guessed by an adversary: e.g., this is not a good way to generate
session cookies.

This seems bad, at least on the surface -- so I suspect I must be
misinterpreting something.  Am I missing something?  Or is this indeed
a security flaw in the design and implementation of web2py?

I hope I'm not wasting your time due to my ignorance.  I'm clueless about
web2py, so my analysis might well be totally bogus.  But I figured it was
probably better to check, than to say nothing.  I apologize in advance
if I've wasted your time on nothing.

Regards,
-- David



[1] http://www.pythonsecurity.org/wiki/web2py/#session-management
[2] https://code.google.com/p/web2py/source/browse/gluon/utils.py?name=#76
[3] https://www.google.com/codesearch?q=web2py_uuid+package%3Ahttp%3A%2F%2Fweb2py\.googlecode\.com&origq=web2py_uuid&btnG=Search+Trunk

Massimo Di Pierro

unread,
Jun 2, 2011, 11:41:06 AM6/2/11
to Craig Younkins, David Wagner, Massimo e Claudia Di Pierro, web2py-developers
I agree. We am not using a custom entropy source.

The web2py_uuid function is the same binary sequence as python uuid4. The bits are exactly the same because the algorithm is the same, except that they have a fixed permutation that is always the same on the same system and thus does not change the entropy.

Are you saying that instead of random.random() we should use random.SystemRandom()?

I have no strong objection to the change but I am not convinced it is better. While only the web2py process has access to random.random() (Marsenne Twister) other processes on the same machine have access to random.SystemRandom() and that may constitute an issue.

Massimo

On Jun 2, 2011, at 10:33 AM, Craig Younkins wrote:

Whoa whoa whoa. Do not try to write your own entropy source. The most secure solution is to punt to the OS because it has much better sources of entropy than what is available to our program. Use random.SystemRandom (not the default).

In particular, the default random module uses the Mersenne Twister, a deterministic PRNG. It's crucial to NOT use a deterministic PRNG because if an adversary gains enough information, they could predict new random numbers and guess session keys.

See http://www.pythonsecurity.org/wiki/random/ for more information.

Craig Younkins

Massimo Di Pierro

unread,
Jun 2, 2011, 12:37:48 PM6/2/11
to David Wagner, cyou...@gmail.com, Massimo e Claudia Di Pierro, web2py-developers
I hear you. I am happy to follow your suggestion and use /dev/urandom if present. I still need to do a fixed bit permutation that depends on the machine IP and the time when the server starts (the same permutation for all uuid issued by the same machine). That is because people clone the VM and they have the same /dev/urandom. We run into this problem before.

So this is my proposal:

node_id = uuid.getnode()
milliseconds = int(time.time() * 1e3)
fixed = (0.5**48) * ((node_id + milliseconds) % 2**48)

def web2py_uuid():
bytes = os.urandom()
random.shuffle(bytes,lambda:fixed) # always the same
return str(uuid.UUID(bytes=bytes, version=4))

What do you think?

Massimo

On Jun 2, 2011, at 11:04 AM, David Wagner wrote:

> But random.randrange() is neither random nor cryptographically
> pseudorandom. You could generate a million bytes from it, and its output
> would still be predictable. Python's random module should not be used
> for this purpose.
>
> The kind of attack would be to connect to a web2py server a bunch of
> times, observe a bunch of session cookies, which lets you see a bunch
> of consecutive outputs from random.randrange(256), and then use that
> to predict the next outputs from random.randrange. My point is that
> Python's random module is not designed to prevent this, its documentation
> specifically warns of this fact, and the PRNG it uses is known to be
> predictable if you can see outputs from it. Once you can predict its
> output, you can then predict other likely session cookies that may have
> been handed out to other users, and then try to log in as them.
>
> My recommendation: This strikes me as a medium-severity flaw that should
> be treated as a security vulnerability and should be handled through
> whatever process you have for handling security problems.
>
> How to improve it: The improvement is to use a cryptographic-strength
> PRNG. For instance, you could read 16 bytes from /dev/urandom and use
> that as the session cookie. Or you could read 16 bytes from /dev/urandom
> and use it as the seed for an internal crypto-strength PRNG, e.g.,
> AES in CTR mode or somesuch. There are many choices. Today all modern
> operating systems that I am aware of provide some way to get access to
> crypto-quality randomness, that you use can use as a seed for an internal
> crypto PRNG.
>
> Here are some example resources on this:
>
> http://security.stackexchange.com/questions/2202/lessons-learned-and-misconceptions-regarding-encryption-and-cryptology/2211#2211
> http://www.cs.berkeley.edu/~daw/papers/ddj-netscape.html
>
> -- David


>
>
>>
>> Hello David,
>>
>> According to this:
>> http://tools.ietf.org/html/rfc4122.html
>>

>> "The version 4 UUID is meant for generating UUIDs from truly-random or =


>> pseudo-random numbers."
>>
>> The source code in the python uuid.py module for uuid4 is this:
>>

>> bytes =3D [chr(random.randrange(256)) for i in range(16)]
>> return UUID(bytes=3Dbytes, version=3D4)
>>
>> which in fact uses 16x256bits random number (of course assuming =
>> pseudo-random-numbers are random).
>>
>> We used this function but run into one problem. When users where cloning =
>> VM behind a load balancer, different machines cloned machine where =
>> generating the same session cookies for different users thus creating =


>> conflicts. We fixed with the following trunk implemented below:
>>
>> def rotate(i):

>> a =3D random.randrange(256)
>> b =3D (node_id >> 4*i) % 256
>> c =3D (milliseconds >> 4*i) % 256


>> return (a + b + c) % 256
>>
>> def web2py_uuid():
>> web2py_uuid_locker.acquire()
>> try:

>> bytes =3D [chr(rotate(i)) for i in range(16)]
>> return str(uuid.UUID(bytes=3Dbytes, version=3D4))
>> finally:
>> web2py_uuid_locker.release()
>>
>> I.e. we byte-rotate each of the 16 256bits number using the same value =
>> of node_id and milliseconds determined once outside the function. That =
>> means that for a fixed node_id and milliseconds this function has the =


>> same entropy as the python implementation of UUID4.
>>

>> If you think we can improve on this, please feel free to suggest a =
>> solution. Of if you can show an example of attack that would also be =


>> useful. I cannot think of any other solution.
>>
>> Massimo
>>
>>
>>
>>
>> On Jun 2, 2011, at 12:08 AM, David Wagner wrote:
>>
>>> Hi,

>>> =20
>>> I was looking at web2py, and in particular, at how web2py generates =
>> random
>>> numbers for things like session cookies. I was wondering if you could =
>> say
>>> anything about the security of the way web2py generates random =


>> numbers.
>>> In particular, based on what I saw, I'm concerned about whether it has
>>> sufficient entropy in its source of randomness.

>>> =20


>>> Based on the description at pythonsecurity.org [1], it appears that
>>> session cookies and other values that need to be unpredictable and
>>> unguessable are generated with web2py_uuid(), which in turn generates
>>> its random value using two sources of randomness:

>>> =20


>>> * the current time, down to the millisecond, and
>>> * Python's random module

>>> =20


>>> This seems consistent with my quick glance at the code [2].

>>> =20
>>> However, neither of those sources of randomness is very good for =
>> security
>>> purposes. The web server reveals the current time down to the second =


>> to
>>> anyone who asks, so from an adversary's point of view, there are only
>>> 1000 possibilities for the time in milliseconds. The random number
>>> module is not designed for cryptographic purposes (it uses Mersenne

>>> twister, and the documentation warns it is not cryptographically =


>> secure
>>> and must not be used for cryptographic purposes), so it is not a good
>>> basis for generating numbers that need to be unguessable/unpredictable
>>> to an adversary. For instance, after observing some outputs from the

>>> ranodm module (say, observing multiple session cookie values), I =
>> expect
>>> it will be possible to deduce the state of the random module and =


>> predict
>>> all its future outputs.

>>> =20


>>> As a result, it looks to me like session cookies in web2py (and
>>> other values that need to be unguessable) do not have enough entropy.

>>> This makes me worry that perhaps they can be guessed by a =


>> knowledgeable
>>> adversary. If so, it sounds like this might allow session stealing
>>> attacks, authentication bypass, and other bad things to happen.

>>> =20


>>> I also verified that a number of other places in web2py seem to use
>>> web2py_uuid() in contexts that (it would seem) need an unpredictable
>>> value [3].

>>> =20


>>> Standard security practice would say that you should not use this
>>> kind of design in situations where you need random numbers that cannot
>>> be guessed by an adversary: e.g., this is not a good way to generate
>>> session cookies.

>>> =20


>>> This seems bad, at least on the surface -- so I suspect I must be
>>> misinterpreting something. Am I missing something? Or is this indeed
>>> a security flaw in the design and implementation of web2py?

>>> =20
>>> I hope I'm not wasting your time due to my ignorance. I'm clueless =
>> about
>>> web2py, so my analysis might well be totally bogus. But I figured it =


>> was
>>> probably better to check, than to say nothing. I apologize in advance
>>> if I've wasted your time on nothing.

>>> =20
>>> Regards,
>>> -- David
>>> =20
>>> =20
>>> =20
>>> [1] http://www.pythonsecurity.org/wiki/web2py/#session-management
>>> [2] =
>> https://code.google.com/p/web2py/source/browse/gluon/utils.py?name=3D#76
>>> [3] =
>> https://www.google.com/codesearch?q=3Dweb2py_uuid+package%3Ahttp%3A%2F%2Fw=
>> eb2py\.googlecode\.com&origq=3Dweb2py_uuid&btnG=3DSearch+Trunk
>>
>>
>> --Apple-Mail-35-146389343
>> Content-Transfer-Encoding: quoted-printable
>> Content-Type: text/html; charset="us-ascii"
>>
>> <html><head></head><body style=3D"word-wrap: break-word; =
>> -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
>> "><div>Hello David,</div><div><br></div><div>According to =
>> this:</div><div><a =
>> href=3D"http://tools.ietf.org/html/rfc4122.html">http://tools.ietf.org/htm=
>> l/rfc4122.html</a></div><div><br></div><div><div>"The version 4 UUID is =
>> meant for generating UUIDs from truly-random or&nbsp;pseudo-random =
>> numbers."</div></div><div><br></div><div>The source code in the python =
>> uuid.py module for uuid4 is =
>> this:</div><div><br></div><div><div>&nbsp;&nbsp;bytes =3D =
>> [chr(random.randrange(256)) for i in =
>> range(16)]</div><div><div>&nbsp;&nbsp;return UUID(bytes=3Dbytes, =
>> version=3D4)</div></div></div><div><br></div><div>which in fact uses =
>> 16x256bits random number (of course assuming pseudo-random-numbers are =
>> random).</div><div><br></div><div>We used this function but run into one =
>> problem. When users where cloning VM behind a load balancer, different =
>> machines cloned machine where generating the same session cookies for =
>> different users thus creating conflicts. We fixed with the following =
>> trunk implemented below:</div><div><br></div><div><div>def =
>> rotate(i):</div><div>&nbsp;&nbsp; &nbsp;a =3D =
>> random.randrange(256)</div><div>&nbsp;&nbsp; &nbsp;b =3D (node_id =
>> &gt;&gt; 4*i) % 256</div><div>&nbsp;&nbsp; &nbsp;c =3D (milliseconds =
>> &gt;&gt; 4*i) % 256</div><div>&nbsp;&nbsp; &nbsp;return (a + b + c) % =
>> 256</div></div><div><br></div><div><div>def =
>> web2py_uuid():</div><div>&nbsp;&nbsp; =
>> &nbsp;web2py_uuid_locker.acquire()</div><div>&nbsp;&nbsp; =
>> &nbsp;try:</div><div>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;bytes =3D =
>> [chr(rotate(i)) for i in range(16)]</div><div>&nbsp;&nbsp; &nbsp; &nbsp; =
>> &nbsp;return str(uuid.UUID(bytes=3Dbytes, =
>> version=3D4))</div><div>&nbsp;&nbsp; &nbsp;finally:</div><div><span =
>> class=3D"Apple-tab-span" style=3D"white-space:pre"> =
>> </span>web2py_uuid_locker.release()</div></div><div><br></div><div>I.e. =
>> we byte-rotate each of the 16 256bits number using the same value =
>> of&nbsp;node_id and milliseconds determined once outside the function. =
>> That means that for a fixed node_id and milliseconds this function has =
>> the same entropy as the python implementation of =
>> UUID4.</div><div><br></div><div>If you think we can improve on this, =
>> please feel free to suggest a solution. Of if you can show an example of =
>> attack that would also be useful. I cannot think of any other =
>> solution.</div><div><br></div><div>Massimo</div><div><br></div><div><br></=
>> div><div><br></div><br><div><div>On Jun 2, 2011, at 12:08 AM, David =
>> Wagner wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
>> type=3D"cite"><div>Hi,<br><br>I was looking at web2py, and in =
>> particular, at how web2py generates random<br>numbers for things like =
>> session cookies. &nbsp;I was wondering if you could say<br>anything =
>> about the security of the way web2py generates random numbers.<br>In =
>> particular, based on what I saw, I'm concerned about whether it =
>> has<br>sufficient entropy in its source of randomness.<br><br>Based on =
>> the description at <a =
>> href=3D"http://pythonsecurity.org">pythonsecurity.org</a> [1], it =
>> appears that<br>session cookies and other values that need to be =
>> unpredictable and<br>unguessable are generated with web2py_uuid(), which =
>> in turn generates<br>its random value using two sources of =
>> randomness:<br><br> &nbsp;&nbsp;* the current time, down to the =
>> millisecond, and<br> &nbsp;&nbsp;* Python's random module<br><br>This =
>> seems consistent with my quick glance at the code [2].<br><br>However, =
>> neither of those sources of randomness is very good for =
>> security<br>purposes. &nbsp;The web server reveals the current time down =
>> to the second to<br>anyone who asks, so from an adversary's point of =
>> view, there are only<br>1000 possibilities for the time in milliseconds. =
>> &nbsp;The random number<br>module is not designed for cryptographic =
>> purposes (it uses Mersenne<br>twister, and the documentation warns it is =
>> not cryptographically secure<br>and must not be used for cryptographic =
>> purposes), so it is not a good<br>basis for generating numbers that need =
>> to be unguessable/unpredictable<br>to an adversary. &nbsp;For instance, =
>> after observing some outputs from the<br>ranodm module (say, observing =
>> multiple session cookie values), I expect<br>it will be possible to =
>> deduce the state of the random module and predict<br>all its future =
>> outputs.<br><br>As a result, it looks to me like session cookies in =
>> web2py (and<br>other values that need to be unguessable) do not have =
>> enough entropy.<br>This makes me worry that perhaps they can be guessed =
>> by a knowledgeable<br>adversary. &nbsp;If so, it sounds like this might =
>> allow session stealing<br>attacks, authentication bypass, and other bad =
>> things to happen.<br><br>I also verified that a number of other places =
>> in web2py seem to use<br>web2py_uuid() in contexts that (it would seem) =
>> need an unpredictable<br>value [3].<br><br>Standard security practice =
>> would say that you should not use this<br>kind of design in situations =
>> where you need random numbers that cannot<br>be guessed by an adversary: =
>> e.g., this is not a good way to generate<br>session cookies.<br><br>This =
>> seems bad, at least on the surface -- so I suspect I must =
>> be<br>misinterpreting something. &nbsp;Am I missing something? &nbsp;Or =
>> is this indeed<br>a security flaw in the design and implementation of =
>> web2py?<br><br>I hope I'm not wasting your time due to my ignorance. =
>> &nbsp;I'm clueless about<br>web2py, so my analysis might well be totally =
>> bogus. &nbsp;But I figured it was<br>probably better to check, than to =
>> say nothing. &nbsp;I apologize in advance<br>if I've wasted your time on =
>> nothing.<br><br>Regards,<br>-- David<br><br><br><br>[1] <a =
>> href=3D"http://www.pythonsecurity.org/wiki/web2py/#session-management">htt=
>> p://www.pythonsecurity.org/wiki/web2py/#session-management</a><br>[2] <a =
>> href=3D"https://code.google.com/p/web2py/source/browse/gluon/utils.py?name=
>> =3D#76">https://code.google.com/p/web2py/source/browse/gluon/utils.py?name=
>> =3D#76</a><br>[3] <a =
>> href=3D"https://www.google.com/codesearch?q=3Dweb2py_uuid+package%3Ahttp%3=
>> A%2F%2Fweb2py\.googlecode\.com&amp;origq=3Dweb2py_uuid&amp;btnG=3DSearch+T=
>> runk">https://www.google.com/codesearch?q=3Dweb2py_uuid+package%3Ahttp%3A%=
>> 2F%2Fweb2py\.googlecode\.com&amp;origq=3Dweb2py_uuid&amp;btnG=3DSearch+Tru=
>> nk</a><br></div></blockquote></div><br></body></html>=
>>
>> --Apple-Mail-35-146389343--
>>
>

Massimo Di Pierro

unread,
Jun 2, 2011, 12:41:01 PM6/2/11
to web2py-d...@googlegroups.com, David Wagner, cyou...@gmail.com, Massimo e Claudia Di Pierro
Correction...

> random.shuffle(bytes,lambda:fixed) # always the same

is not sufficiently sensitive to the number of milliseconds.... would still create conflicts on cloned VMs.

> --
> mail from:GoogleGroups "web2py-developers" mailing list
> make speech: web2py-d...@googlegroups.com
> unsubscribe: web2py-develop...@googlegroups.com
> details : http://groups.google.com/group/web2py-developers
> the project: http://code.google.com/p/web2py/
> official : http://www.web2py.com/

Massimo Di Pierro

unread,
Jun 2, 2011, 12:54:49 PM6/2/11
to Massimo Di Pierro, web2py-d...@googlegroups.com, David Wagner, cyou...@gmail.com, Massimo e Claudia Di Pierro
Other proposal:

import time, uuid, random, os

### compute fixed signature

node_id = uuid.getnode()
milliseconds = int(time.time() * 1e3)

fixed = (node_id + milliseconds) % 2**48

def web2py_uuid():
## use /dev/urandom
bytes = os.urandom(16)
## convert bytes to long int
v = sum(ord(c) << (i * 16) for i, c in enumerate(bytes))
## binary XOR with fixed
v = v ^ fixed
## convert back to bytes
bytes = ''.join(chr((v >> (i*16)) % 256) for i in range(16))
## format as uuid
return str(uuid.UUID(bytes=bytes, version=4))

Perhaps there is a more efficient way to the XORING.

Massimo Di Pierro

unread,
Jun 2, 2011, 12:54:53 PM6/2/11
to Massimo Di Pierro, web2py-d...@googlegroups.com, David Wagner, cyou...@gmail.com, Massimo e Claudia Di Pierro

Massimo Di Pierro

unread,
Jun 2, 2011, 1:33:08 PM6/2/11
to Craig Younkins, web2py-d...@googlegroups.com, David Wagner, Massimo e Claudia Di Pierro
Let's also keep in mind that windows does not have /dev/urandom and and this has to work on all supported architectures. web2py does not include OpenSSL, it uses the python ssl module.

On Jun 2, 2011, at 12:30 PM, Craig Younkins wrote:

This issue extends far beyond the use of python or any web framework. The issue is that the entropy pool is cloned with the VM. 

Trace the entropy through UUID generation. /dev/urandom is the only valid source of entropy because time is not (it's quite deterministic). No amount of bit twiddling or shuffling changes that.

The issue is the entropy pool. Administrators _must_ reseed the PRNG with a valid source of entropy. This could be clearing the pool and waiting 5 minutes (probably not ideal), or seeding it from another source. Probably the best solution is to reseed it using a long-running machine like the one using to clone the VM or pulling entropy from a networked device.

Here are a number of ideas: http://adrianotto.com/page/6/

Craig Younkins

Massimo Di Pierro

unread,
Jun 2, 2011, 3:17:35 PM6/2/11
to David Wagner, Massimo e Claudia Di Pierro, web2py-developers, Craig Younkins
I really appreciate your help on this matter. This is proposal which I
believe follows your advice:

import time, uuid, random, os

### compute constant ctokens


node_id = uuid.getnode()
milliseconds = int(time.time() * 1e3)

ctokens = [((node_id + milliseconds) >> ((i%6)*8)) % 256 for i in
range(16)]

def web2py_uuid():
## use /dev/urandom

try:
bytes = os.urandom(16)
except:
bytes = [chr(random.randrange(256)) for i in range(16)]
## xor bytes with contant ctokens
bytes = ''.join(chr(ord(c) ^ ctokens[i]) for i,c in
enumerate(bytes))
return str(uuid.UUID(bytes=bytes, version=4))

Basically if /dev/urandom is avalable except for this line:
bytes = ''.join(chr(ord(c) ^ ctokens[i]) for i,c in
enumerate(bytes))
this is exactly what you suggest.

The one line simply does a XOR with a constant string (ctokens) and
therefore it does not change the entropy or the predictability of the
string.
It just avoid that two machines with the same /dev/urandom streams
generate the same numbers.

Massimo


On Jun 2, 2011, at 2:05 PM, David Wagner wrote:

>> The web2py_uuid function is the same binary sequence as python uuid4.
>> The bits are exactly the same because the algorithm is the same,
>> except
>> that they have a fixed permutation that is always the same on the
>> same
>> system and thus does not change the entropy.
>

> I understand. However, this method is not suitable for
> cryptographic use,
> nor for generating session cookies, nor any of the other uses in
> web2py
> that require the output of web2py_uuid to be unpredictable/unguessable
> to an adversary.


>
>> Are you saying that instead of random.random() we should use
>> random.SystemRandom()?
>

> Yes.


>
>> I have no strong objection to the change but I am not convinced it is
>> better. While only the web2py process has access to random.random()
>> (Marsenne Twister) other processes on the same machine have access to
>> random.SystemRandom() and that may constitute an issue.
>

> Thanks for talking through this. Here's my take:
>
> random.random() (Mersenne Twister) is unsuitable for these purposes.
> An adversary who connects to the web server multiple times can
> obtain multiple session cookies, and thereby observe the output of
> random.random() multiple times. The adversary doesn't have to have a
> process on the same machine. An adversary who observes the output of
> random.random() can infer its internal seed, and then predict all
> future
> (or past) outputs of random.random(). This is bad.
>
> random.SystemRandom() is suitable for these purposes. It is
> unpredictable/unguessable. There's no issue with other processes
> on the same machine having access to random.SystemRandom(). If
> process A calls random.SystemRandom(), and process B calls
> random.SystemRandom(), they get independent pseudorandom numbers;
> even if process A is malicious and calls random.SystemRandom() many
> times, it won't help process A predict the pseudorandom numbers
> that process B sees.
>
> random.SystemRandom() is better than random.random(). Using
> random.random() for these purposes is a real flaw.
>
> You probably will need to test whether random.SystemRandom() is
> available
> on all platforms that web2py supports. Rather than falling back to
> random.random() if SystemRandom is not supported, I would suggest
> having
> web2py fail with an error (assuming it is better to not operate than
> to
> operate insecurely).
>
> This stuff matters. Flaws in random number generation have caused
> catastrophic security holes in many security-sensitive applications.
> It looks to me like web2py has a similar problem.
>
> -- David
>
> P.S. I see what you mean about the issue with VMs. That is indeed a
> tricky one. I would suggest that a good way to deal with this might
> be to take a cryptographic hash of 128 bits from random.SystemRandom()
> together with some other de-duplication values that you think will be
> different in different VMs (e.g., time of day).

Massimo Di Pierro

unread,
Jun 2, 2011, 4:51:55 PM6/2/11
to David Wagner, Massimo e Claudia Di Pierro, web2py-developers, Craig Younkins
About 2. I will commit this for now and add documentation. That should
not be the final word on the topic. If we can brainstorm a better
solution for machines that do not support os.urandom() that would be
great. Honestly I do not know how likely these machines are. Mind that
we can detect this at startup and display a warning (we just have to
make sure the warning does not scare new users too much and think this
is a web2py weakness when all other frameworks will have the same
issue).

I also agree with Craig that copying the entropy stream is not a good
idea but it happens. This is a problem that goes beyond web2py. People
get VM images from various places, install the software and they are
up and running. I do not know how to solve this problem but I am open
to suggestions.

massimo

> Thanks for the concrete proposal. Here is my analysis:
>
> 1. On platforms that support os.urandom() and don't have VM
> duplication issues, this is great. It is secure.
>
> 2. One platforms that don't support os.urandom(), I'd expect this to
> be insecure. I don't know how prevalent/widespread those platforms
> may be. One good test would be to ask yourself "Would I be
> comfortable
> writing in the documentation that web2py is likely silently insecure
> on platforms that don't support os.urandom?". If the answer is yes,
> then document it and move on. If the answer is no, then I suppose we
> could try to brainstorm about this case to see if we can do any
> better.
>
> 3. On platforms that support os.urandom() but where you have to worry
> about VM duplication issues, your code seems likely to avoid
> duplication
> of session ids.
>
> There might be a remaining risk related to cross-VM guessing of
> session
> ids, but I'm not sure how serious it is. It might be a non-issue in
> practice. Suppose that the same web2py-based VM image is started up
> on two machines. Suppose an attacker connects to the first machine,
> and gets a session id in a session cookie. Then there may be a risk
> that the attacker can guess a valid session id for the second machine.
> If the attacker can guess the xor of the ctoken on the first machine
> and the ctoken on the second machine, and if os.urandom() produces the
> same sequence of outputs on the two machines, then the attacker might
> be able to convert a valid session cookie from one machine into a
> valid
> session cookie (for a different session) on the other machine.
>
> I don't know how to evaluate the likelihood of this being a problem
> in practice. I don't know how MAC addresses are assigned on VMs or
> whether they are easily guessable. I don't know for how long
> os.urandom()
> produces the same stream of outputs, if you start up the same image
> twice.
> Guessing the xor of the times is probably easy (there are only about
> 1000
> possibilities for that). Nonetheless, there are so many variables
> that
> I don't know, that I can't say whether this is a real risk in practice
> or not. It might be purely theoretical. Or it might be something
> that
> an adversary could just barely exploit.
>
> (Incidentally, just changing random to random.SystemRandom() in the
> existing code has the same issue; I failed to notice it the first time
> you proposed it.)
>
> One possible stance would be to document that this is a possible
> risk of
> unknown probability.
>
> Another possible stance is to revise the code to eliminate this
> possible
> risk. One solution would be to compute a cryptographic hash (e.g.,
> SHA256, SHA1, MD5) of the concatenation of 16 bytes from os.urandom()
> and the ctoken. With that approach, this possible risk would go away.
>
> I don't know whether it's worth worrying about this possible risk.
> As a security person, I tend towards "eliminate all risks that I can't
> prove are safe", i.e., "err on the side of security", but maybe I go
> overboard sometimes.
>
>
> Overall, it's not an unreasonable design.
>
> Thanks so much for your attention to this.
>
> Regards,
> -- David

Massimo Di Pierro

unread,
Jun 2, 2011, 5:08:56 PM6/2/11
to web2py-d...@googlegroups.com, Craig Younkins, David Wagner, Massimo e Claudia Di Pierro
perhaps the message should be something like:

"web2py could not find a cryptographically secure random stream on your system and will be using the default Python random number generator"

Massimo

Massimo Di Pierro

unread,
Jun 2, 2011, 9:50:29 PM6/2/11
to David Wagner, Craig Younkins, Massimo e Claudia Di Pierro, web2py-d...@googlegroups.com
That unfortunately can be misinterpreted. People will read it as a web2py specific problem and move on to another framework which is less secure and less honest about it.

On Jun 2, 2011, at 6:13 PM, David Wagner wrote:

>> perhaps the message should be something like:
>>
>> "web2py could not find a cryptographically secure random stream on
>> your system and will be using the default Python random number
>> generator"
>

> I'm not sure if web2py users will be able to understand the implications
> of this message. I might suggest writing something that explains the
> impact on the web2py application. For instance, "web2py could not find
> a secure way to generate session cookies on your system and will fall
> back to an insecure session management method."
>
> -- David

Massimo Di Pierro

unread,
Jun 2, 2011, 11:20:56 PM6/2/11
to David Wagner, Craig Younkins, Massimo e Claudia Di Pierro, web2py-d...@googlegroups.com
So this is what we now have in trunk:

### compute constent ctokens
def initialize_urandom():
"""
This function and the web2py_uuid follow from the following discussion:
http://groups.google.com/group/web2py-developers/browse_thread/thread/7fd5789a7da3f09

At startup web2py compute a unique ID that identifies the machine by adding
uuid.getnode() + int(time.time() * 1e3)

This is a 48bits number. It converts the number into 16x8bits tokens.
It uses thie unique if to initilize the entropy source ('/dev/urandom') or to seed random.

If os.random() is not supported, it falls back to using random and issues a warning.

"""
node_id = uuid.getnode()
milliseconds = int(time.time() * 1e3)

ctokens = [((node_id + milliseconds) >> ((i%6)*8)) % 256 for i in range(16)]

try:
os.urandom(1)
if os.path.exists('/dev/urandom'):
open('/dev/urandom','wb').write(''.join(chr(t) for t in ctokens))
except NotImplementedError:
random.seed(node_id + milliseconds)
logging.warn(
"""Cryptographycally secure session management is not possible on your system because
your system does not provide a cryptographically secure entropy source.
This is not specific to web2py. Consider deploying on a different Operating System.""")
return ctokens
ctokens = initialize_urandom()

def web2py_uuid():
"""
This function follows from the following discussion:
http://groups.google.com/group/web2py-developers/browse_thread/thread/7fd5789a7da3f09

It works like uuid.uuid4 exxcept that tries to use os.urandom() if possible
And it XORs the output with the tokens uniquely associated to this machine.
"""
try:
bytes = os.urandom(16) # use /dev/urandom if possible
except NotImplementedError:


bytes = [chr(random.randrange(256)) for i in range(16)]
## xor bytes with contant ctokens
bytes = ''.join(chr(ord(c) ^ ctokens[i]) for i,c in enumerate(bytes))
return str(uuid.UUID(bytes=bytes, version=4))


On Jun 2, 2011, at 9:53 PM, David Wagner wrote:

>> That unfortunately can be misinterpreted. People will read it as a
>> web2py specific problem and move on to another framework which is less
>> secure and less honest about it.
>

> OK, I'll try another possibility:
>
> "Your system does not support a cryptographically secure random
> stream. Secure session management is not possible on your system."
>
> or even
>
> "Your system does not support a cryptographically secure random
> stream. Secure session management is not possible on your system.
> This is not specific to web2py. If security is important, consider
> deploying on a different OS."
>
> My goal would be to ensure that users understand what the message means.
>
> Regards,
> -- David

Reply all
Reply to author
Forward
0 new messages