Distributed table with ReplicatedMergeTree - wrong counts

1,072 views
Skip to first unread message

mystic m

unread,
Sep 7, 2018, 4:11:38 AM9/7/18
to ClickHouse

I have created a 5 node cluster with replica per shard = 3.

Next I created ReplicatedMergeTree local table on all the nodes and a Distributed table on top these replicated tables and inserted data into Distributed table from a log table where record count is 2337420 (log table).


When I fire a count(*) on my Distributed table I get total count across all the replicas = 7012260.


I have configured replication as below.

<Clickhouse_10shards_3replica>

<shard>

 <replica>

  <host>host1</host>

  <port>9020</port>

 </replica> 

 <replica>

  <host>host2</host> 

  <port>9020</port> 

 </replica> 

 <replica> 

  <host>host3</host> 

  <port>9020</port> 

 </replica> 

</shard>

...

<Clickhouse_10shards_3replica>


Create ReplicatedMergeTree is as below:
create table replica_table ()ENGINE = ReplicatedMergeTree('/clickhouse/Clickhouse_10shards_3replica/online/replica_table/{shard}', 'host1-01', end_date, (id), 8192)");


Distributed table is as below.
CREATE TABLE replica_tableD AS replica_table ENGINE = Distributed (Clickhouse_10shards_3replica, online, replica_table, rand())


Let me know if I missing something or what is the best way to replicate a distributed table?

Denis Zhuravlev

unread,
Sep 7, 2018, 9:53:57 AM9/7/18
to ClickHouse

mystic m

unread,
Sep 10, 2018, 2:15:09 AM9/10/18
to ClickHouse
Thanks for the post, it is helpful but it seems like replication is not at production ready stage, if replication strategy needs to be changed probably it will be all manual steps to re balance the data as per new strategy.

Denis Zhuravlev

unread,
Sep 10, 2018, 8:51:35 AM9/10/18
to ClickHouse
>replication is not at production ready stage
Of course replication is at production ready stage. But you've asked about the unsupported mode.
5 shard * 3 replica = 15 nodes === supported
x shard * 3 replica = 5 nodes  === usupported (and it will not be supported, AFAIK it's not in roadmap)


On Friday, 7 September 2018 08:11:38 UTC, mystic m wrote:

mystic m

unread,
Sep 11, 2018, 4:33:03 AM9/11/18
to ClickHouse
Thanks Denis, we evaluated Clickhouse for one of our usecase and found the performance to be great for (PB) volumes of data we are dealing with.
But to take the evaluation to next step we want to add fault tolerance or Availability via replication.

In our evaluations we used Distributed view on top of MergeTree shards on 10 node cluster. To add replication capability we started exploring ReplicatedMergeTree.
To use Distributed+ReplicatedMergeTree correctly looks like your blog guideline to create shards with different names is the way to go about as Distributed engine doesn't use zookeeper to distinguish between replica shards and consider them all part of same table. (Correct me if my understanding is wrong here).

With cross segmented or circular replication if I start with a cluster of 5 nodes and RF = 3, and some time later I plan to scale it to may be by 8-10 nodes, replica share per node may get imbalanced as old nodes already have 2 replicas so some may end up having 3 replicas or new nodes can be configured to hold replicas of newly added shards only.

I see one caveat with second approach if all newly added nodes are in same rack then we may have a situation where entire rack fails and hence our queries start failing although a very unusual scenario but just want to identify if there is a better way of enhancing fault tolerance.

Denis Zhuravlev

unread,
Sep 11, 2018, 11:20:23 AM9/11/18
to ClickHouse
The only supported way:
start with 10 nodes = 5 shards * 2 replicas (cover it by distributed tables (in cluster with internal_replication=true)
later you can add 5 nodes to use 3 replicas.
later you can aexpand to more shards *3replicas (it will be dis-balanced but who cares).
Reply all
Reply to author
Forward
0 new messages