clj-uuid: time-based uuid now 350% faster than java.util.UUID/randomUUID

359 views
Skip to first unread message

danl...@gmail.com

unread,
Mar 1, 2015, 7:35:16 PM3/1/15
to clo...@googlegroups.com
Ok, for anyone following my adventures optimizing clj-uuid, I've gotten another substantial win.   
Check it out:    http://danlentz.github.io/clj-uuid

#'uuid/v1:                    443 nanoseconds
#'java.util.UUID/randomUUID: 2012 nanoseconds

Also, the test suite has much greater coverage with individual tests for each v1, v,3, v4, v5 uuid version.
And, finally, there more interesting notes about some of the esoteric details of the node-id representation
and the way it is calculated to disambiguate it from any legal 802 MAC hardware address.  And more.... ;)


======================================================================

user> (criterium/bench (uuid/v1))

Evaluation count: 139356300 in 60 samples of 2322605 calls.
Execution time mean: 443.707611 ns


user> (criterium/bench (java.util.UUID/randomUUID))

Evaluation count : 30850980 in 60 samples of 514183 calls.
Execution time mean : 2.012861 µs


danl...@gmail.com

unread,
Mar 1, 2015, 10:36:22 PM3/1/15
to clo...@googlegroups.com
Dammit, I knew as soon as I went and posted something with numbers I'd find some low hanging fruit I overlooked.  So now, 10x as fast.

#'uuid/v1:                    201 nanoseconds
#'java.util.UUID/randomUUID: 2012 nanoseconds


clj-uuid> (criterium.core/bench (uuid/v1))


Evaluation count :        302673360 in 60 samples of 5044556 calls.

Execution time mean : 201.153073 ns

Colin Yates

unread,
Mar 2, 2015, 4:11:42 AM3/2/15
to clo...@googlegroups.com
Note - at least in chrome on OS X the link is broken as it terminates at the hype (http://danlentz.github.io/clj-).

Great work despite the chrome breaking project name ;).

danl...@gmail.com

unread,
Mar 2, 2015, 9:05:55 AM3/2/15
to clo...@googlegroups.com
At first I was gonna go with "Dan's Miraculous UUID-o-Matic: computes a UUID before light can travel 200 feet in a vacuum" but I got tired of typing the namespace.

Thanks for the kind words, though.
Dan

Jacob Strength

unread,
Mar 3, 2015, 2:45:38 PM3/3/15
to clo...@googlegroups.com
That is pretty amazing, I'll have to remember this library next time I need to use UUID's.
Also I think you meant 450% faster.

danl...@gmail.com

unread,
Mar 3, 2015, 3:06:35 PM3/3/15
to clo...@googlegroups.com
I invoked "engineer's privilege" to quote the number very conservatively.  In one of the later episodes, didn't Scotty confide that the Enterprise actually went up to warp 11, he just never told anyone? :)

Actually, you are right -- I noticed it after I hit "send".  As you can tell, the relative performance number has been a little bit of a moving target as things improved quickly and I grabbed the number from a little too far back in the REPL. 

Thanks Jacob.  I really do hope this is a library that people will find modestly useful.

danl...@gmail.com

unread,
Mar 3, 2015, 3:08:47 PM3/3/15
to clo...@googlegroups.com
PS.  We are now TEN TIMES faster, so it is a lot easier to compute that percentage:

#'uuid/v1:                    201 nanoseconds
#'java.util.UUID/randomUUID: 2012 nanoseconds


Best,
Dan

Lucas Bradstreet

unread,
Mar 3, 2015, 3:11:09 PM3/3/15
to clo...@googlegroups.com
Hi,

Nice work!

I wanted to clarify something: it seems to me that the v1 uuids clj-uuid generate are not exactly equivalent or comparable to the uuids generated by java.util/randomUUID? It appears that V1 uuids are time (and MAC) based and don't use a cryptographic random number generator, so the use cases are different and the speed difference seems like more of a trade off. 

If I'm completely off base then please just say so. 

Thanks for the good work. Even with the above limitations I can see myself using the faster version in the future. 

Lucas
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Colin Yates

unread,
Mar 3, 2015, 3:11:23 PM3/3/15
to clo...@googlegroups.com

Are v1 as "unique" as randomUUID()?

Colin Yates

unread,
Mar 3, 2015, 3:13:24 PM3/3/15
to clo...@googlegroups.com

Ha - the irony of you and I posting a message about uniqueness at pretty much the same time :).

Lucas Bradstreet

unread,
Mar 3, 2015, 3:24:42 PM3/3/15
to clo...@googlegroups.com
I've been thinking about this for a bit and our posts just happen to be within a minute of each other! A hash bucket is filling up somewhere, I just know it!

danl...@gmail.com

unread,
Mar 3, 2015, 3:27:37 PM3/3/15
to clo...@googlegroups.com
v1 UUID's are deterministically unique across both space and time.   Random UUID's are random.  There is an (infinitesimally small chance of a duplicate).   To my knowledge, the reason random uuid's exist and are part of the RFC is that they are bone simple to implement, whereas v1 and v3/v5 involve all kinds of non-obvious supporting facilities such as a monotonic clock or message-digest algorithm.  

V1 UUID's are uniquified in three different ways: a unique 6 byte node id is computed (the details are interesting, but refer to the source for more).  It is combined with a thread-safe, concurrent monotonic timestamp that is guaranteed to generate always increasing values regardless of clock precision or degree of concurrency.  Finally, a 2-byte "clock-seq" value is randomly seeded at load-time and incorporated into the calculation.  So, even if the system is restarted on the exact same node with the hardware clock for some reason adjusted backwards and it is attempted to generate another UUID at the exact "same" millisecond as one which had been generated prior, there is only a 1 in 65536 chance of issuing a duplicate.  

We do not use the MAC address actually.  clj-uuid/node.clj at master · danlentz/clj-uuid has more detail, but this topic is specifically addressed by the RFC.  the following excerpt describes the node address calculation:

;;   "A better solution is to obtain a 47-bit cryptographic quality random
;;   number and use it as the low 47-bits of the Node-ID, with the least
;;   significant bit of the first octet of the Node-ID set to one.  This
;;   bit is the unicast/multicast bit, which will never be set in IEEE 802
;;   addresses obtained from network cards.  Hence, there can never be a
;;   conflict between UUID's generated by machines with and without network
;;   cards."
;;    . . .
;;    . . .
;;   "In addition, items such as the computer's name and the name of the
;;   operating system, while not strictly speaking random, will help
;;   differentiate the results from those obtained by other systems...
;;   ... A generic approach... IS TO ACCUMULATE AS MANY SOURCES AS POSSIBLE
;;   INTO A BUFFER, USE A MESSAGE DIGEST SUCH AS MD5 OR SHA1, TAKE AN
;;   ARBITRARY 6 BYTES FROM THE HASH VALUE, AND SET THE MULTICAST BIT
;;   AS DESCRIBED ABOVE."
;;
;;     -- [RFC4122:4.5 "Node IDs that do not Identify the Host"]

If there are any good reasons why one should pay 10 times the cost to generate a UUID randomly, I'd like to hear them.

Colin Yates

unread,
Mar 3, 2015, 3:39:34 PM3/3/15
to clo...@googlegroups.com

Great - thanks Dan.

danl...@gmail.com

unread,
Mar 3, 2015, 10:32:32 PM3/3/15
to clo...@googlegroups.com
Lucas, if there are any good reasons why one should pay 10 times the cost to generate a UUID randomly, I'd like to hear them.

Thank you very much for your kind words and please see my prior reply to Colin.

Best,
Dan

Lucas Bradstreet

unread,
Mar 4, 2015, 12:33:34 AM3/4/15
to clo...@googlegroups.com
Hi Dan,

The hash bucket sentence was only a joke about us posting the same
question at the same time :).

I don't have any particular thoughts on whether there's a reason to
use the truly random UUIDs over v1 UUIDs, I just wanted to check
whether the methods were equivalent so I know if I should evaluate
whether v1 UUIDs are appropriate for my use cases. I guess one reason
you might use randomUUID is if you need your UUID to be unguessable to
attackers that might know your MAC address.

Thanks for your reply, it was illuminating.

Lucas

danl...@gmail.com

unread,
Mar 4, 2015, 7:07:53 AM3/4/15
to clo...@googlegroups.com
My pleasure! Just to be clear, clj-uuid does NOT use the hardware MAC address and does NOT pose any security concern.

Dan Lentz

unread,
Mar 4, 2015, 8:30:34 AM3/4/15
to clo...@googlegroups.com
It is a fair point about guessability, though, in the sense that you might be able to mount a brute force attack to guess a time based UUID, but it would not be easy.  You would need to guess an effectively random 59 bit number (the low order bits) and then step through all possible time stamps millisecond by millisecond (the high order bits).  If you had some means to know the specific range of time to search (and a very fast test to determine when you had guessed correctly) you would certainly improve your odds vs guessing a purely random UUID.  

Best,
Dan


On Wednesday, March 4, 2015, danl...@gmail.com <danl...@gmail.com> wrote:
My pleasure!  Just to be clear, clj-uuid does NOT use the hardware MAC address and does NOT pose any security concern.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/D8l-qRqCyOw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.

danl...@gmail.com

unread,
Mar 4, 2015, 8:40:46 AM3/4/15
to clo...@googlegroups.com
Also, if someone were given another time based UUID to use as a basis of comparison, they could eliminate 47 more bits of randomness to guess at. So, I think you make a good point that I think will be worthwhile to mention in my documentation. Thank you.

Lucas Bradstreet

unread,
Mar 4, 2015, 11:17:30 AM3/4/15
to clo...@googlegroups.com
Thanks for the extra analysis. My feeling was that it would be possible, but I wasn't sure.

Luckily my current use cases don't depend on keeping UUIDs secret, but I was still wondering if there was a trade off. A mention in the docs seems worthwhile.

Cheers

> On 4 Mar 2015, at 21:40, "danl...@gmail.com" <danl...@gmail.com> wrote:
>
> Also, if someone were given another time based UUID to use as a basis of comparison, they could eliminate 47 more bits of randomness to guess at. So, I think you make a good point that I think will be worthwhile to mention in my documentation. Thank you.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

danl...@gmail.com

unread,
Mar 4, 2015, 12:28:33 PM3/4/15
to clo...@googlegroups.com
Yep.  I can't overstate how useful feedback from this group has been to find new ways to optimize the code and to try to make the documentation more clear and useful.
Reply all
Reply to author
Forward
0 new messages