Analysing sequence of events

662 views
Skip to first unread message

Maxim Fridental

unread,
Aug 29, 2016, 5:03:07 AM8/29/16
to ClickHouse
Hi there,

given CREATE TABLE Actions (ID, UserID, ActionDate, ActionType, ActionParameter), how would you count the number of actions of type 2 happened after the action of type 1 grouped by ActionParameter?

In Vertica, we're using the following query:

select ActionParameter, sum(num_action2) CountAction2 from (
select action1.id, min(ActionParameter) ActionParameter, ifnull(count(action2.id), 0) num_action2
from Actions action1
left outer join Actions action2 on action2.ActionType=2 and action1.UserID = action2.UserID and action2.ActionDate > action1.ActionDate
where action1.ActionType=1
and not exists (select 1 from Actions z where z.ActionType=1 and z.UserID = action1.UserID
 and z.ActionDate > action1.ActionDate and z.ActionDate < action2.ActionDate)
group by action1.id
) k
group by ActionParameter
order by ActionParameter;

We cannot translate this query to Clickhouse trivially, because correlated subqueries are not supported.

The goal of the query is to study the influence of ActionParameter values of action of type 1 to the number of following actions of type 2.

Thanks,
Maxim

man...@gmail.com

unread,
Aug 29, 2016, 7:46:07 PM8/29/16
to ClickHouse
Hi.

1. You may use sequenceMatch and sequenceCount functions, introduced exactly for this purpose.

2. When you need to check if there exist action of type 1 before action of type 2, without respect to number of such pairs per dimension, you may write
GROUP BY dimensions... HAVING minIf(ActionDate, ActionType = 1) <= maxIf(ActionDate, ActionType = 2)
which is more optimal, but sequences like 1 1 2 1 2 will be count only once.

Maxim Fridental

unread,
Aug 30, 2016, 3:23:39 AM8/30/16
to ClickHouse
Thanks for the answer. I've spent two hours yesterday trying to use sequenceMatch to accomplish this task, and gave up.  If the user A has the action stream 1 2 2 2 1 2 and user B has action stream 1 2 2, how would you produce the result

  A  3
  A  1
  B  2

using sequenceMatch or sequenceCount? Action type 1 is a search result page, and action type 2 are clicks in the found results, so that action 1 can be followed by any number of action 2.
 
 

man...@gmail.com

unread,
Aug 30, 2016, 8:26:59 PM8/30/16
to ClickHouse
sequenceMatch, sequenceCount functions are not suitable for interpreting sequence

1 2 2 2 1 2

as 3 and 1

(number of consecutive events of type 2 after event of type 1, for each of such patterns)

Because it only allows to match non-overlapping patterns. When start of sequence (1 2) is matched, function continue with rest of sequence (2 2 1 2).

So, this is not implemented.
Usually we use preprocessing of data by separate program and insert into table pre-calculated values.

If you look at source code of sequenceMatch function, maybe you could implement desired behaviour by adding another function.
It works as follows: all events of interested types are inserted into array, and after all, patterns are matched.
Reply all
Reply to author
Forward
0 new messages