I have created an example:
insert into default.missing_custs
(startOf15Min, customerId)
values
('2019-10-15 07:30:00', 'A')
('2019-10-15 07:30:00', 'B')
('2019-10-15 07:45:00', 'A')
('2019-10-15 07:45:00', 'C')
('2019-10-15 08:00:00', 'A')
('2019-10-15 08:00:00', 'C')
('2019-10-15 08:00:00', 'D')
('2019-10-15 08:15:00', 'A')
select *
from (
select startOf15MinNew startOf15Min
, groupUniqArrayMerge(prevUniqCustomerIds) prevUniqCustomerIds
, groupUniqArrayMerge(currUniqCustomerIds) currUniqCustomerIds
, hasAll(currUniqCustomerIds, prevUniqCustomerIds) hasAllPrevInCurr
, hasAny(currUniqCustomerIds, prevUniqCustomerIds) hasAnyPrevInCurr
, arrayUniq(prevUniqCustomerIds) prevUniqCustIds
, arrayUniq(currUniqCustomerIds) currUniqCustIds
from (
SELECT startOf15Min startOf15MinNew
,groupUniqArrayState(customerId) prevUniqCustomerIds
FROM default.missing_custs
where startOf15MinDay = today()
GROUP BY startOf15Min
ORDER BY startOf15Min
)
any left join (
SELECT subtractMinutes(startOf15Min, 15) startOf15MinNew
,groupUniqArrayState(customerId) currUniqCustomerIds
FROM default.missing_custs
where startOf15MinDay = today()
GROUP BY startOf15Min
ORDER BY startOf15Min
) using startOf15MinNew
group by startOf15Min
)
order by startOf15Min

For the first row I need an info how many prevUniqCustomerIds doesn't exit in currUniqCustomerIds, which is 1 (B doesn't exists in [A, C].
I also need all elements that doesn't exist in curr, which is for the 1st record B
These two information for second row would be 0 (all previous [A, C] exit in curr [D, A, C]), element that doesn't exit is null.
For third row, from [D, A, C] only A exists in curr, so count of not existing is 2 for D and C, and elements that are missing are [D, C]
Table missing_custs is huge.
I am looking for a way to aggregate everything as fast as possible.
Regards.