|Guaranteeing globally unique TimeUUID's in a high throughput distributed system||Josh Dzielak||3/16/13 2:24 PM|
I have a system where a client sends me arbitrary JSON events containing a timestamp at millisecond resolution. The timestamp is used to generate column names of type TimeUUIDType.
The problem I run into is this - if I client sends me 2 events with the same timestamp, the TimeUUID that gets generated for each is the same, and we get 1 insert and 1 update instead of 2 inserts. I might be running many processes (in my case Storm supervisors) on the same node, so the machine-specific part of the UUID doesn't help.
I have noticed how the Cassandra UUIDGen class lets you work around this. It has a 'createTimeSafe' method that adds extra precision to the timestamp such that you can actually get up to 10k unique UUID's for the same millisecond. That works pretty good for a single process (although it's still possible to go over 10k, it's unlikely in our actual production scenario). It does make searches at boundary conditions a little unpredictable – 'equal' may or may not work depending on whether extra ns intervals were added – but I can live with that.)
However, this still leaves vulnerability across a distributed system. If 2 events arrive in 2 processes at the exact same millisecond, one will overwrite the other. If events keep flowing to each process evenly over the course of the millisecond, we'll be left with roughly half the events we should have. To work around this, I add a distinct 'component id' to my row keys that roughly equates to a Storm worker or a JVM process I can cheaply synchronize.
The real problem is that this trick of adding ns intervals only works when you are generating timestamps from the current time (or any time that's always increasing). As I mentioned before, my client might be providing a past or future timestamp, and I have to find a way to make sure each one is unique.
For example, a client might send me 10k events with the same millisecond timestamp today, and 10k again tomorrow. Using the standard Java library stuff to generate UUID's, I'd end up with only 1 event stored, not 20,000. The warning in UUIDGen.getTimeUUIDBytes is clear about this.
Adapting the ns-adding 'trick' to this problem requires synchronized external state (i.e. storing that the current ns interval for millisecond 12330982383 is 1234, etc) - definitely a non-starter.
So, my dear, and far more seasoned Cassandra users, do you have any suggestions for me?
Should I drop TimeUUID altogether and just make column names a combination of millisecond and a big enough random part to be safe? e.g. '1363467790212-a6c334fefda'. Would I be able to run proper slice queries if I did this? What other problems might crop up? (It seems too easy :)
Or should I just create a normal random UUID for every event as the column key and create the non-unique index by time in some other way?
Would appreciate any thoughts, suggestions, and off-the-wall ideas!
PS- I assume this could be a problem in any system (not just Cassandra) where you want to use 'time' as a unique index yet might have multiple records for the same time. So any solutions from other realms could be useful too.
VP Engineering • Keen IO
Twitter • @dzello
Mobile • 773-540-5264
|Re: Guaranteeing globally unique TimeUUID's in a high throughput distributed system||Ariel Weisberg||3/16/13 2:31 PM|
This has been solved a couple of times, and always pretty much the same way. Encode the id of the worker generating the id into the timestamp, and as you mentioned, maintain a counter for each millisecond.
|Re: Guaranteeing globally unique TimeUUID's in a high throughput distributed system||Josh Dzielak||3/16/13 2:39 PM|
Thanks Ariel. That works in the case where the timestamp is always increasing, i.e. the monotonically increasing clock.
The problem for me is that the timestamps can be provided by the client, and they may be in the past or the future (I only generate the timestamp using the current time if no other timestamp was provided). So the counter method isn't guaranteed to work unless I store a separate counter for every separate millisecond I see. So k entries in a map where k is unique millisecond and v is where the counter is at for that millisecond. I feel like that could get unwieldily. And it still wouldn't get me out of the 10,000 unique events per 1 ms cap (maybe there's another way to handle that).
Will definitely check out those links though, appreciate it.
|Re: Guaranteeing globally unique TimeUUID's in a high throughput distributed system||Philip O'Toole||3/16/13 2:40 PM|
It is a mistake, IMHO, to use the timestamp contained within the event
to generate the time-based UUID. While it will work, it suffers from
exactly the problem you describe. Instead, use the clock of the host
system to generate the timestamp. In otherwords, the event timestamp
may be different from the timestamp in the UUID. In fact, it *will* be
different, if the rate gets fast enough (since the 100ns period clock
used to generate time-based UUIDs may not be fine-grained enough, and
the UUID timestamp will increase as explained by RFC4122).
|Re: Guaranteeing globally unique TimeUUID's in a high throughput distributed system||Philip O'Toole||3/16/13 2:47 PM|
On Sat, Mar 16, 2013 at 2:31 PM, Ariel Weisberg <ar...@weisberg.ws> wrote:RFC4122 describes the clock sequence field, which is 14-bits. I have
used that in the past and left the other fields as per the RFC. If you
have more than one thread assigning the UUIDs within a process, you
may also need a thread-specific number. I have achieved this in the
past as follows:
- Assign each process its own number within a, say, 6 bit space.
- Assign each UUID-generating thread within the process to its number
within the remaining 8-bit space (this restricts you to 256 of these
threads, which may or may not be an issue for you).
- Set the clock sequence to these 14 bits. Combined with the timestamp
of the host system, and the MAC address, this should give truly unique
IDs across your system.
This does require some synchronization during processes and threads at
assignment, but you should only have to take that hit at start up.
After that, each can run without regards for the others.
|Re: Guaranteeing globally unique TimeUUID's in a high throughput distributed system||Philip O'Toole||3/16/13 2:49 PM|
On Sat, Mar 16, 2013 at 2:40 PM, Philip O'Toole <phi...@loggly.com> wrote:It is important that your systems run NTP, so time doesn't go
backwards. You could end up re-using IDs.
|Re: Guaranteeing globally unique TimeUUID's in a high throughput distributed system||Josh Dzielak||3/16/13 2:50 PM|
Thanks Philip. I see where you are coming from; that'd be much simpler and avoid these bumps.
The only downside is that I'd have to separately maintain an index of event timestamps that reflected when they happened according to the client. That way when the client asks for 'events last Wednesday' I give them the right answer even if the events were recorded in Cassandra today. I think it's at least worth weighing against the other solution.
|Re: Guaranteeing globally unique TimeUUID's in a high throughput distributed system||Philip O'Toole||3/16/13 2:56 PM|
On Sat, Mar 16, 2013 at 2:50 PM, Josh Dzielak <jo...@keen.io> wrote:Way ahead of you. Use wide-rows, and use the UUID to create a
composite column key. like so:
This guarantees a unique ID for *every* event.
And use the "event_time % (some interval you choose)" as your row key
(many events will then have this as their row key). This makes it easy
to find the events within a given range by performing the modulo math
on the requested time range (you must choose the interval as part of
your design, and stick with it). You do not need a secondary index.
|Re: Guaranteeing globally unique TimeUUID's in a high throughput distributed system||Josh Dzielak||3/16/13 3:21 PM|
Ahh right on. I'm already using wide rows with a similar row key heuristic (basically YYYYMMDDHH, pulled from the event_time). So I think I'm good there but hadn't thought about using a mod instead - any in-practice advantages to that?
Excited to try composite columns for this- sounds ideal. Had a similar idea of concatenating a UUID onto the event time manually but this looks the right, non-janky way to do that.
Would you just use a type 4 UUID then, since the range slicing/querying will be on the event_time part? Or are there advantages to still using a time UUID with the thread/process uniqueness tricks you mentioned?
|Re: Guaranteeing globally unique TimeUUID's in a high throughput distributed system||Philip O'Toole||3/16/13 3:29 PM|
On Sat, Mar 16, 2013 at 3:21 PM, Josh Dzielak <jo...@keen.io> wrote:Not really, it's just they way I learned to do it.
By using a timebased UUID (i.e. type 1) in the composite key, you can
tell Cassandra to sort by the UUID part of the key, and then have your
events, in that row, sorted by the reception time of your system (be
sure to tell Cassandra it's a timebased UUID). This may or may not
matter to you, but I think it useful to know during debug that event A
was received before event B (and for some applications, it is even
more important than that). Correct sequencing is not *guaranteed*
across processes due to clock drift, but it will often be close
enough, and it will be exactly right for events received in the same
A type-4 UUID (random) will give you no such ordering within the row.