From everything I have read here I see that we cannot assume event_id will be unique. I am wondering then how I can ensure correct joins in Redshift on my custom tables where it is suggested that I use root_id=event_id?
--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
An important point: the reason we have moved to generating event IDs in the trackers is so that both types of duplicate are detectable.
Dealing with natural/endogenous duplicates is not hugely difficult - a simple lookup of previously-seen event IDs will suffice. Dealing with synthetic/exogenous duplicates is much more complex - the best solution currently is, as Christophe and Grzegorz say, to use appropriate queries or de-dupe using SQL.
Note that the ElasticSearch sink for the Kinesis flow has a "last event wins" approach to duplicates: each event is upserted into the ES collection using the event_id, so later dupes will overwrite earlier.
Hope this helps,
Alex
Changing the UUID type won't get rid of either type of duplicate.
A