Real world use case of CollapsingMergeTree

S M

unread,

Aug 4, 2016, 1:37:21 AM8/4/16

to ClickHouse

Hello,

I fully understand purpose of aggregating, replacing, summing merge trees. I still do not understand purpose of collapsing merge tree, and how it is superior to, for example: replacing merge tree (to ensure only one row exists). Could someone please elaborate with a use case?

For example, we have table:
EventTime, session_id, page_id
user with session_id 1 visits page_id 2
we insert row into table
same session_id visits page_id 5
insert row in table

In this case, if we wanted to calculate number of unique sessions in period, it is clear how to use aggregating merge tree. If we want simple calculation of number of pages visited per session. What is purpose of collapsing merge tree?

Thank you!

man...@gmail.com

unread,

Aug 4, 2016, 9:26:03 PM8/4/16

to ClickHouse

By the way, we are using mostly MergeTree and CollapsingMergeTree, in contrary to others.

Let you have some program, that maintains state of continuously changing data,
that is stored not in ClickHouse: in memory of that program, in embedded storage, in key-value storage...
For example, that program may store and update user sessions for last 3 days.
Important, that this program did not do any SELECTs from ClickHouse, it use its own storage.

You want to do analytics on user sessions, not on last 3 days, but for all history.
For analytics, you want to store data in ClickHouse.
How to store continuously updating data in ClickHouse?

Solution is following. For each update of internal state of this program (for each update of user sessions),
program knows old state (before update, if any) and new (updated) state of changed sessions.
Let write to ClickHouse "log of updates": up to two rows for each updated session: old values and new values.
And let have additional Sign column, that will differentiate between them: -1 for old values and 1 for new values.

Easy to see, that log of updates is immutable. We also call it 'incremental log' internally.
Also, we could calculate some analytics directly from incremental log.

Example: calculate average session length and number of unique visitors for each first search phrase in session.
Let we have session with id = 123, and its first search phrase was sometimes changed from 'Google' to 'Yandex' and session length from 300 seconds to 600 seconds.
We have following content of incremental log:

Sign SessionID SearchPhrase SessionDuration UserID
1     123       Google         300         12345
-1     123       Google         300         12345
1     123       Yandex         600         12345

Calculate report with following query:

SELECT SearchPhrase, sum(SessionDuration * Sign) / sum(Sign) AS avg_duration, uniq(UserID) AS visitors FROM log GROUP BY SearchPhrase HAVING sum(Sign) > 0

With correct result:

Yandex 600 1

Some aggregates (count, sum, avg) could be always rewritten and calculated from incremental log. Some (uniq) could be calculated if objects was not deleted. Some (min, max) could not be calculated at all.

Incremental log could be stored in ordinary MergeTree tables. But it will be much larger than needed, because we could 'collapse' consecutive updates.
CollapsingMergeTree doing exactly that, saving space. Note that report result is invariant relative to collapsing.

Drawbacks of incremental log are:

1. Additional write amplification: better to use when average number of updates of each object is single digit.
2. Better not to use with long arrays, that get larger during updates, because you get O(n^2) amount of written array values.
3. Data artifacts (larger session depth for example) if first chunks of data was lost: higher dependency of cleannes of history of updates.
4. If data is inconsistent, you could get negative values of what is non-negative naturally (negative session depth is funny).

Advantages:

At least, you can do realtime analytics on constantly updating data without loss of performance.

It is very imporant that program, who updates data, need another storage to maintain such updates. It don't do any reads from ClickHouse, because it won't be viable. It only writes logs of updates to ClickHouse, and then, ClickHouse is used for analytics.

S M

unread,

Aug 8, 2016, 3:27:32 AM8/8/16

to ClickHouse

Hello,

Thank you for your detailed response.

If the state is kept outside clickhouse, why can't we just use ReplacingMergeTree with a higher version, maybe even natural versioning? This will reduce the amount of write amplification? With CollapsingMergeTree, you have to maintain "old state" + "new state"
For example:

Sign SessionID SearchPhrase SessionDuration UserID
1     123       Google         300         12345
-1     123       Google         300         12345
1     123       Yandex         600         12345

Instead, with ReplacingMergeTree:
TS SessionID SearchPhrase SessionDuration UserID

1 123 Google 300 12345

2 123 Yandex 600 12345

With appropriate keys?

I understand ReplacingMergeTree is not as optimized as CollapsingMergeTree, but it seems it would be easier to optimize ReplacingMergeTree for this use case?

Thanks!

man...@gmail.com

unread,

Aug 8, 2016, 4:51:48 AM8/8/16

to ClickHouse

With ReplacingMergeTree you cannot calculate aggregates by simply rewriting it multiplying by Sign.

(You must use either subquery with argMax for each field and GROUP BY unique key; or FINAL: both are very expensive.)

Dmitriy Gladkih

unread,

Aug 8, 2016, 9:09:42 AM8/8/16

to ClickHouse

Добрый день!

Connected to ClickHouse server version 1.1.53988.

Создал таблицу:

CREATE TABLE IF NOT EXISTS test.test0 (
sign Int8,
event_date Date DEFAULT toDate(event_time),
event_time DateTime DEFAULT now(),
session_id UInt32,
search_phrase String,
session_duration UInt32,
user_id UInt64
) ENGINE = CollapsingMergeTree(event_date, (event_date, session_id), 8192, sign)
;

Вставил данные:

INSERT INTO test.test0 (sign, session_id, search_phrase, session_duration, user_id) VALUES
( 1, 123, 'Google', 300, 12345),
(-1, 123, 'Google', 300, 12345),
( 1, 123, 'Yandex', 600, 12345)
;

Выборка несхлопнутых данных:

SELECT *
FROM test.test0

┌─sign─┬─event_date─┬──────────event_time─┬─session_id─┬─search_phrase─┬─session_duration─┬─user_id─┐
│ 1 │ 2016-08-07 │ 2016-08-07 08:21:53 │ 123 │ Google │ 300 │ 12345 │
│ -1 │ 2016-08-07 │ 2016-08-07 08:21:53 │ 123 │ Google │ 300 │ 12345 │
│ 1 │ 2016-08-07 │ 2016-08-07 08:21:53 │ 123 │ Yandex │ 600 │ 12345 │
└──────┴────────────┴─────────────────────┴────────────┴───────────────┴──────────────────┴─────────┘

Выборка схлопнутых данных:

SELECT *
FROM test.test0
FINAL

┌─sign─┬─event_date─┬──────────event_time─┬─session_id─┬─search_phrase─┬─session_duration─┬─user_id─┐
│ 1 │ 2016-08-07 │ 2016-08-07 08:21:53 │ 123 │ Google │ 300 │ 12345 │
│ 1 │ 2016-08-07 │ 2016-08-07 08:21:53 │ 123 │ Yandex │ 600 │ 12345 │
└──────┴────────────┴─────────────────────┴────────────┴───────────────┴──────────────────┴─────────┘

По документации и по примеру выше должна была остаться только последняя строка. Где моя ошибка?

--
С уважением, Дмитрий

man...@gmail.com

unread,

Aug 8, 2016, 3:19:20 PM8/8/16

to ClickHouse

Good day.

For FINAL to work, "updates" must be inserted in different parts.

INSERT INTO test.test0 (sign, session_id, search_phrase, session_duration, user_id) VALUES

( 1, 123, 'Google', 300, 12345);

INSERT INTO test.test0 (sign, session_id, search_phrase, session_duration, user_id) VALUES

(-1, 123, 'Google', 300, 12345),
( 1, 123, 'Yandex', 600, 12345);

Which is natural in real usage.

Dmitriy Gladkih

unread,

Aug 9, 2016, 5:01:17 AM8/9/16

to ClickHouse

Отдельными вставками / обновлениями работает согласно документации.

Интуитивно, да, изменения вставляются парами: старое-новое значения. Но инфраструктура нашего приложения такова, что ряд серверов регистрируют сравнительно большой поток событий, дальше события уходят пачками на ETL сервера, и ETL сервера выполняют вставку в ClickHouse также порциями по 32К записей (или с заданной периодичностью). И возможны вставки вида:

INSERT INTO test.test0 (sign, session_id, search_phrase, session_duration, user_id) VALUES

( 1, 123, 'Google', 300, 12345),

(-1, 123, 'Google', 300, 12345),
( 1, 123, 'Yandex', 600, 12345)
;

вместо:

INSERT INTO test.test0 (sign, session_id, search_phrase, session_duration, user_id) VALUES
( 1, 123, 'Google', 300, 12345)
;

INSERT INTO test.test0 (sign, session_id, search_phrase, session_duration, user_id) VALUES
(-1, 123, 'Google', 300, 12345),
( 1, 123, 'Yandex', 600, 12345)
;

Таким образом, в нашем случае ETL должны выполнять локальное схлопывание, чтобы не было более 1 пары записей с одинаковым первичным ключом + sign? Будем пробовать в памяти приложения (ETL) схлопывать записи, и только потом писать их в ClickHouse.

Спасибо за ценное замечание!

man...@gmail.com

unread,

Aug 9, 2016, 2:44:38 PM8/9/16

to ClickHouse

Таким образом, в нашем случае ETL должны выполнять локальное схлопывание, чтобы не было более 1 пары записей с одинаковым первичным ключом + sign? Будем пробовать в памяти приложения (ETL) схлопывать записи, и только потом писать их в ClickHouse.

Да.

Игорь Str

unread,

Aug 10, 2016, 9:21:17 AM8/10/16

to ClickHouse

Посоветуйте по паттерну работы со статичными и динамичными полями в CollapsingMergeTree.

Дано: таблица со 100 колонками, 80 из которых имеют статичные значения, например, referrer, url-адрес страницы, user_id , user_agent, и 20 - значения, которые могут меняться/ счетчики, например, время проведенное на сайте, максимальная величина прокрутки страницы и т.д.

В случае одной (общей с движком CollapsingMergeTree) таблицы и добавления нового (более свежего) события, требуется перечислять при вставке все колонки: и статичные, и динамичные.

Иначе, статичные значения обнуляются.

Как правильно хранить статичные и динамичные колонки: статичные в обычной MergeTree, а динамичные в CollapsingMergeTree с последующими JOIN-ами при запросах?

Или в одной (общей) таблице, но с постоянным дублированием значений всех колонок?

Какой вариант предпочитает Яндекс?

Заранее большое спасибо!

man...@gmail.com

unread,

Aug 10, 2016, 5:03:23 PM8/10/16

to ClickHouse

В одной общей таблице. Это дешевле чем JOIN.
В Яндексе именно так. В таблице сессий большинство столбцов не изменяются.

Dmitriy Gladkih

unread,

Nov 9, 2016, 6:08:53 AM11/9/16

to ClickHouse

Алексей, добрый день!

Как можно объяснить следующее поведение при вычислении среднего значения параметра в CollapsingMergeTree:

SELECT
avgIf(page_time_on_page, counter_version != '0.0.23') AS TimeOnPage_seconds_OldVersion,
avgIf(page_time_on_page, counter_version = '0.0.23') AS TimeOnPage_seconds_NewVersion
FROM test.hits
FINAL

┌─TimeOnPage_seconds_OldVersion─┬─TimeOnPage_seconds_NewVersion─┐
│ 2001.5091547059221 │ 42.375877511051506 │
└───────────────────────────────┴───────────────────────────────┘

SELECT
sumIf(page_time_on_page * sign, counter_version != '0.0.23') / sumIf(sign, counter_version != '0.0.23') AS TimeOnPage_seconds_OldVersion,
sumIf(page_time_on_page * sign, counter_version = '0.0.23') / sumIf(sign, counter_version = '0.0.23') AS TimeOnPage_seconds_NewVersion
FROM test.hits
HAVING sum(sign) > 0

┌─TimeOnPage_seconds_OldVersion─┬─TimeOnPage_seconds_NewVersion─┐
│ 2165.5359425851475 │ 42.51628765920429 │
└───────────────────────────────┴───────────────────────────────┘

Какие данные корректны?
Спасибо!

man...@gmail.com

unread,

Nov 10, 2016, 3:39:11 PM11/10/16

to ClickHouse

Приведите минимальный набор данных, на котором есть различие.

В этой теме были примеры, в каких случаях FINAL может работать не так, как использование sign в запросе. Надо сравнить.

Кирилл Агатов

unread,

Aug 3, 2017, 8:13:47 AM8/3/17

to ClickHouse

Получается, что для каждой сущности нужно в каком-то промежуточном хранилище хранить старое (вставленное в CH) и новое состояния, чтобы отправлять "обновления" сущности в таблицу с движком CollapsingMergeTree?

Reply all

Reply to author

Forward