CollapsingMergeTree OPTIMIZE FINAL versus select * from table final

598 views
Skip to first unread message

Ryan Pfenninger

unread,
May 16, 2018, 3:55:51 PM5/16/18
to ClickHouse
I am trying to understand what I'm seeing.  When I run:

optimize table tablename partition 'YYYYMM' final

on a collapsingmergetree table, the table still contains entries that have both +1 and -1 offsets that should have been removed from the table.

When I run a select query and use the final keyword after the table name, the rows do not appear in the query, but if i I do not use the final keyword, the only way to eliminate the rows is a group by having sum(sign) > 0.

Can anyone explain to me why the optimize final does not merge out all the offsetting records, yet the final keyword modifier in a select query knows how to do just that?

Denis Zhuravlev

unread,
May 16, 2018, 8:11:35 PM5/16/18
to ClickHouse
Optimize does not guarantee collapsing or replacing and does not work across partitions (because merge process works only with one partition).
"from final" is exactly the opposite, guaranteeing  the proper result including across partitions but works very slow.


create table test.testcp (p date, v Int32, sign Int8) Engine = CollapsingMergeTree(p, v, 8192, sign)

insert into test.testcp values ('2018-05-16', 33, 1);
insert into test.testcp values ('2018-05-01', 33, -1);
insert into test.testcp values ('2018-04-16', 34, 1);
insert into test.testcp values ('2018-05-16', 34, -1);

optimize table test.testcp partition 201805 final;
optimize table test.testcp partition 201804 final;

select * from test.testcp

2018-05-16    34    -1    <----- 201805 partition
2018-04-16    34    1     <------ 201804 partition


As far as I know for better performance sum(sign) evaluations are recommended
https://clickhouse.yandex/docs/en/table_engines/collapsingmergetree/
...write 'sum(Sign)' instead of 'count()'. To calculate the sum of something, write 'sum(Sign * x)' instead of 'sum(x)', and so on....

Ryan Pfenninger

unread,
May 16, 2018, 11:29:59 PM5/16/18
to Denis Zhuravlev, ClickHouse
Hi Denis,

Here's a slightly modified version that does not work and is similar to my table structure.  The big difference is that i have included the date field as part of the key.  I would expect all rows to be deleted but none are.

create table testcp (p date, v Int32, sign Int8) Engine = CollapsingMergeTree(p, (p, v), 8192, sign)

insert into testcp values ('2018-05-16', 33, 1);
insert into testcp values ('2018-05-16', 33, -1);
insert into testcp values ('2018-04-16', 34, 1);
insert into testcp values ('2018-04-16', 34, -1);

optimize table testcp partition 201805 final;
optimize table testcp partition 201804 final;

select * from testcp

--
You received this message because you are subscribed to a topic in the Google Groups "ClickHouse" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clickhouse/A5WZKR1H0jU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clickhouse+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clickhouse/f1e2a088-d375-4af2-b7f2-14b0704e7a28%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ryan Pfenninger

unread,
May 16, 2018, 11:47:32 PM5/16/18
to Denis Zhuravlev, ClickHouse
Actually here's a simpler version.  I'm not sure why this doesn't remove all the rows?

create table testcp (p date, v Int32, sign Int8) Engine = CollapsingMergeTree(p, v, 8192, sign)

insert into testcp values ('2018-05-16', 33, 1);
insert into testcp values ('2018-05-16', 33, -1);
insert into testcp values ('2018-04-16', 34, 1);
insert into testcp values ('2018-04-16', 34, -1);

optimize table testcp partition 201805 final;
optimize table testcp partition 201804 final;

select * from testcp
On Wed, May 16, 2018 at 8:11 PM, Denis Zhuravlev <denis.z...@gmail.com> wrote:

--
You received this message because you are subscribed to a topic in the Google Groups "ClickHouse" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clickhouse/A5WZKR1H0jU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clickhouse+unsubscribe@googlegroups.com.

tatiana....@revjet.com

unread,
Jun 20, 2018, 1:27:14 PM6/20/18
to ClickHouse
I think I saw somewhere in the docs that 'optimize' does not remove the last rows in partition, so that the partition would not become empty
To unsubscribe from this group and all its topics, send an email to clickhouse+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages