there was a problem with uuid. People doing load balancing using
cloned machines have experiences uuid conflicts. I also discovered the
python uuid function is ot thread safe.
I created a web2py_uuid which is the same as uuid4() but
1) rotates each binary char randomly based on the MAC address of the
machine and the time it is called in nanoseconds.
2) locks web2py to avoid threading conflicts with prng.
I know, it is never a good idea to create one's own uuid function but,
given the situation, I still think this is better than what we have.
Moreover the source is identical to uuid4() only the random number
generator takes into account MAC and time in a way that can only
increase its period (actually makes it aperiodic).
Massimo
+
+web2py_uuid_locker = thread.allocate_lock()
+node_id = uuid.getnode()
+nanoseconds = int(time.time() * 1e9)
+
+def rotate(i):
+ a = random.randrange(256)
+ b = (node_id >> 4*i) % 256
+ c = (nanoseconds >> 4*i) % 256
+ return (a + b + c) % 256
+
+def web2py_uuid():
+ web2py_uuid_locker.acquire()
+ try:
+ bytes = [chr(rotate(i)) for i in range(16)]
+ return str(uuid.UUID(bytes=bytes, version=4))
+ finally:
+ web2py_uuid_locker.release()
What's the point of the "* 1e9"? It's not adding any entropy.
Massimo
> --
> mail from:GoogleGroups "web2py-developers" mailing list
> make speech: web2py-d...@googlegroups.com
> unsubscribe: web2py-develop...@googlegroups.com
> details : http://groups.google.com/group/web2py-developers
> the project: http://code.google.com/p/web2py/
> official : http://www.web2py.com/
> Good catch. I took that line from the actual uuid code. I think it should be 1e3.
Why not 1, and call it microseconds?
I multiply by 1000 and called it mlliiseconds. Is that better?
Massimo
> time(...)
> time() -> floating point number
> Return the current time in seconds since the Epoch.
>
> I multiply by 1000 and called it mlliiseconds. Is that better?
Sure. I missed the floating point part, sorry.
Maybe nanoseconds is good after all. The only problem I see with it is that different systems have different timebases available, and I wouldn't trust VMs, especially, to give us good high-res timestamps.
So we'd want to assume that even a nominally nanosecond-resolution timestamp could be identical from one call to the next. In that case, where is the entropy coming from?
> On Feb 15, 2010, at 10:21 AM, Massimo Di Pierro wrote:
>
>> time(...)
>> time() -> floating point number
>> Return the current time in seconds since the Epoch.
>>
>> I multiply by 1000 and called it mlliiseconds. Is that better?
>
> Sure. I missed the floating point part, sorry.
>
> Maybe nanoseconds is good after all. The only problem I see with it
> is that different systems have different timebases available, and I
> wouldn't trust VMs, especially, to give us good high-res timestamps.
>
> So we'd want to assume that even a nominally nanosecond-resolution
> timestamp could be identical from one call to the next. In that
> case, where is the entropy coming from?
I agree. Moreover it takes longer then 1 nanosecond to serve a page.
>
> On Feb 15, 2010, at 12:32 PM, Jonathan Lundell wrote:
>
>> On Feb 15, 2010, at 10:21 AM, Massimo Di Pierro wrote:
>>
>>> time(...)
>>> time() -> floating point number
>>> Return the current time in seconds since the Epoch.
>>>
>>> I multiply by 1000 and called it mlliiseconds. Is that better?
>>
>> Sure. I missed the floating point part, sorry.
>>
>> Maybe nanoseconds is good after all. The only problem I see with it is that different systems have different timebases available, and I wouldn't trust VMs, especially, to give us good high-res timestamps.
>>
>> So we'd want to assume that even a nominally nanosecond-resolution timestamp could be identical from one call to the next. In that case, where is the entropy coming from?
>
> I agree. Moreover it takes longer then 1 nanosecond to serve a page.
It does. But just because the nominal resolution is a nanosecond (or whatever) doesn't mean that the actual resolution is a second. I think you need to assume that it's possible to have the same timestamp from one call to the next.
Here's an example from OS X; notice the 2nd & 3rd values:
Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> time.time()
1266259647.299962
>>> time.time()
1266259650.139703
>>> def x():
... a = time.time()
... b = time.time()
... c = time.time()
... return (a,b,c)
...
>>> x()
(1266259698.8739901, 1266259698.873991, 1266259698.873991)
The only case would be if two machines with same MAC address call the
web2py_uuid function while in the same state (same prng) in less then
1 millsecond apart. I think it is practically impossible. I am not
worried.
Massimo
> I agree but mind that if web2py_uuid is called within the same app, even if in different threads, this conflict cannot happen because of the role of the PRNG.
>
> The only case would be if two machines with same MAC address call the web2py_uuid function while in the same state (same prng) in less then 1 millsecond apart. I think it is practically impossible. I am not worried.
OK. I was worried that the PRNG state might not survive across exec calls. If it does, then I'm fine with it.
> On Feb 15, 2010, at 10:54 AM, Massimo Di Pierro wrote:
>
>> I agree but mind that if web2py_uuid is called within the same app, even if in different threads, this conflict cannot happen because of the role of the PRNG.
>>
>> The only case would be if two machines with same MAC address call the web2py_uuid function while in the same state (same prng) in less then 1 millsecond apart. I think it is practically impossible. I am not worried.
>
> OK. I was worried that the PRNG state might not survive across exec calls. If it does, then I'm fine with it.
Also, I see that if urandom is present, we should get a good seed:
def seed(self, a=None):
"""Initialize internal state from hashable object.
None or no argument seeds from current time or from an operating
system specific randomness source if available.
If a is not None or an int or long, hash(a) is used instead.
"""
if a is None:
try:
a = long(_hexlify(_urandom(16)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
super(Random, self).seed(a)
self.gauss_next = None