Good let's go deeper. But all further minor improvements / ideas for next new MV.
1.
PARTITION BY toYYYYMMDD(day)
ORDER BY (day, resp_h, orig_h, resp_p)
your table is partitioned by day so all parts have only one/single value of a day
BTW PARTITION BY toYYYYMMDD(day) === PARTITION BY (day)
So you can use
PARTITION BY (day)
ORDER BY (resp_h, orig_h, resp_p)
it's slightly save memory because primary key is shorter (w/o day column)
and queries (day =
'2020-08-31' ) are still as fast as they could be because of partition pruning.
2.
You use implicit internal storage for MV. MV creates a table by itself and name the table as .inner.$mvname
You can create the storage explicitly
create table zeek_conn_mv_resp_h_tuples (
day Date,
orig_h String,
resp_h String,
resp_p UInt16,
proto Enum8(\'icmp\' = 1, \'tcp\' = 6, \'udp\' = 17)
conn_state Enum8(\'OTH\' = 0, \'REJ\' = 1, \'RSTO\' = 2,
\'RSTOS0\' = 3, \'RSTR\' = 4, \'RSTRH\' = 5, \'S0\' = 6, \'S1\' = 7,
\'S2\' = 8, \'S3\' = 9, \'SF\' = 10, \'SH\' = 11, \'SHR\' = 12),
conn_counts UInt64 )
SummingMergeTree()
PARTITION BY (day)
ORDER BY (resp_h, orig_h, resp_p, proto, conn_state);
and point MV to store own data in this table (in this case you can name MV as you wish and don't need to define an engine because table already exists).
CREATE MATERIALIZED VIEW zeek_conn_mv_resp_h_tuples_mv to zeek_conn_mv_resp_h_tuples
AS SELECT day, orig_h, resp_h, resp_p, proto, conn_state, count() AS conn_counts
FROM zeek_conn
GROUP BY (day, orig_h, resp_h, resp_p, proto, conn_state)
3. you can create MV with a primary key that differs from the sorting key. It also saves memory. And improves performance in some cases.
For example you never use proto, conn_state in where conditions and probably they are already unusable for index search because they behind high-cardinality columns resp_h, orig_h, resp_p
So you can create your table as
SummingMergeTree()
PARTITION BY (day)
PRIMARY KEY (resp_h, orig_h, resp_p)
ORDER BY (resp_h, orig_h, resp_p, proto, conn_state)
In this case primary_index will contain only columns (resp_h, orig_h, resp_p)
but the merge process will collapse rows over (resp_h, orig_h, resp_p, proto, conn_state) (summing result will be the same).