ReplacingMergeTree or JOIN

57 views
Skip to first unread message

Joe DeRobertis

unread,
Oct 12, 2021, 11:46:19 AM10/12/21
to ClickHouse
Hey all,

  I'm starting a POC to build a data warehouse that pulls together event/alert/incident data from a variety of security products into a single data warehouse.

I've worked through creating a common schema and now I'm trying to decide how best to store this data in ClickHouse.

I am planning to flatten all the data, but also make use of the Nested data type to store several 1-to-many relationships that I need to model.

My biggest question/concern is how best to handle changes to the data.  Consider the use case of an incident coming in today, but then 3 days from now an analyst closes that incident, so I want to be able to report on the incident on that first day with a status of New, but then I also want to "update" the incident when 3 days from now the analyst closes it.

So I was planning to use the ReplacingMergeTree, however since I only have a small number of mutable attributes AND I'm not sure how often they will be used in reporting I'm considering splitting my data into two tables, one for mutable data and another for immutable data.  The table for immutable data won't require me to do any special max aggregation.  Then I'd put the mutable data in another table that uses replacingmergetree.  This way I ONLY have to use the max trick to get the latest version when one of those attributes is selected, but I'd then have to JOIN those two tables...

Any thoughts on this?
Reply all
Reply to author
Forward
0 new messages