Probably a bug for CollapsingMergeTree

117 views
Skip to first unread message

Alps Wang

unread,
Jan 20, 2017, 11:50:28 AM1/20/17
to ClickHouse
Hi There 

we are doing a POC with clickhouse db , and using CollapsingMergeTree table,loaded around 3.8Billion records into one table. 

 ENGINE = CollapsingMergeTree(date, (rdr, dr, oid, aid, bid, lid, date, id, rank), 8192, sign)

and according to the documentation , 
When merging, each group of consecutive identical primary key values (columns for sorting data) is reduced to  no more than one row with the column value 'sign_column = 1' (the "positive row").

if i understand this engine correctly, when we inserted 2 records with the same primary key(index) and both have sign =1. after run "optimize table " command. 
only the last one got inserted should stay in the table. and the previous records should be deleted automatically. 

but the truth is we are still seeing duplicate records (with same primary key).  i know the primary index is not UNIQUE, but with CollapsingMergeTree table, 
we should be able to get rid of duplicate record , am i right , please advise . 


Thanks

Alps Wang

unread,
Jan 20, 2017, 11:54:34 AM1/20/17
to ClickHouse
btw , the CollapsingMergeTree  table works well with small table 

:) select * from abc;
┌─keyword_name─┬─ranking_date─┬─sign─┐
│ 123          │   2016-01-04 │    1 │
│ abc          │   2016-01-02 │    1 │
│ def          │   2016-01-03 │    1 │
└──────────────┴──────────────┴──────┘
┌─keyword_name─┬─ranking_date─┬─sign─┐
│ abc          │   0000-00-00 │    1 │
└──────────────┴──────────────┴──────┘

4 rows in set. Elapsed: 0.003 sec.

:)
:)  insert into abc values ('123', '2016-01-05',1);
:) select * from abc;
┌─keyword_name─┬─ranking_date─┬─sign─┐
│ 123          │   2016-01-04 │    1 │
│ abc          │   2016-01-02 │    1 │
│ def          │   2016-01-03 │    1 │
└──────────────┴──────────────┴──────┘
┌─keyword_name─┬─ranking_date─┬─sign─┐
│ abc          │   0000-00-00 │    1 │
└──────────────┴──────────────┴──────┘
┌─keyword_name─┬─ranking_date─┬─sign─┐
│ 123          │   2016-01-05 │    1 │
└──────────────┴──────────────┴──────┘

5 rows in set. Elapsed: 0.004 sec.

:) optimize table abc;
:) select * from abc;
┌─keyword_name─┬─ranking_date─┬─sign─┐
│ abc          │   0000-00-00 │    1 │
└──────────────┴──────────────┴──────┘
┌─keyword_name─┬─ranking_date─┬─sign─┐
│ 123          │   2016-01-05 │    1 │
│ abc          │   2016-01-02 │    1 │
│ def          │   2016-01-03 │    1 │
└──────────────┴──────────────┴──────┘

4 rows in set. Elapsed: 0.004 sec.


==
so we have 2 suspects right now 
1. table is too large (3.8 billion ) ?
2. key it too long ?

please help 

Thanks

man...@gmail.com

unread,
Jan 22, 2017, 9:05:44 AM1/22/17
to ClickHouse
Hi.

OPTIMIZE TABLE performs only single merging pass. It not always selects all parts to merge.
There is threshold for maximum total size of merged parts, by default - 100 GB.

To merge all parts within partition, you could run OPTIMIZE TABLE ... PARTITION YYYYMM FINAL.

Note that OPTIMIZE ... FINAL will do "merge" even if partition consist of single part - it will rewrite this part.
(This was intended for materializing recently added columns with DEFAULT expressions.)

Usually you should not rely on fully merged data, as full merging is very expensive. Better to write queries in assumption that data was not fully merged.
See examples in previous topics: https://groups.google.com/forum/#!searchin/clickhouse/sum%28Sign%29|sort:relevance

Alps Wang

unread,
Jan 22, 2017, 7:44:36 PM1/22/17
to ClickHouse
There is threshold for maximum total size of merged parts, by default - 100 GB.
is it configurable ?  could you please let me know the setting name?

Thanks

man...@gmail.com

unread,
Jan 26, 2017, 3:05:26 PM1/26/17
to ClickHouse
In config.xml, write

<merge_tree>
    <max_bytes_to_merge_at_max_space_in_pool>100000000000</max_bytes_to_merge_at_max_space_in_pool>
</merge_tree>

with desired value.

Alps Wang

unread,
Jan 26, 2017, 5:18:10 PM1/26/17
to ClickHouse
thank you !
Reply all
Reply to author
Forward
0 new messages