Using Clickhouse on Cluster

650 views
Skip to first unread message

Raj Malhotra

unread,
May 28, 2017, 3:02:47 PM5/28/17
to ClickHouse
Hi everyone,

  I have just began using Clickhouse on a Cluster and had a few doubts . 
  I have 2 shards and two replicas with my config file as :

<remote_servers>
    <HouseCluster>
               <shard>
            <!-- Optional. Shard weight when writing data. By default, 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. By default, false - write data to all of the replicas. -->
            <internal_replication>false</internal_replication>
                        <replica>
                <host>server1</host>
                <port>9000</port>
            </replica>
                        <replica>
                <host>server2</host>
                <port>9000</port>
            </replica>
                        <replica>
                <host>server3</host>
                <port>9000</port>
            </replica>
                    </shard>
               <shard>
            <!-- Optional. Shard weight when writing data. By default, 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. By default, false - write data to all of the replicas. -->
            <internal_replication>false</internal_replication>
                        <replica>
                <host>server1</host>
                <port>9000</port>
            </replica>
                        <replica>
                <host>server2</host>
                <port>9000</port>
            </replica>
                        <replica>
                <host>server3</host>
                <port>9000</port>
            </replica>
                    </shard>
           </HouseCluster>
    </remote_servers>


To understand how clickhouse works on a cluster, I created a similar table in each of the server and a distributed table on one of the server, however when I insert a record into the distributed table, two copies are created into the distributed table instead of one, why is that so ?
Also I am confused about replicas, if I want to use zookeeper is it necessary to define macros for each server . In my case, I have replica for each shard on each of the server 
How would my macros look in that case as each server has two shards .
Thanks .

Raj Malhotra

unread,
May 29, 2017, 5:39:17 PM5/29/17
to ClickHouse
I found out what the problem was, while inserting a record in a distributed table, one shard is chosen based on the shared key parameter given while creating the table .
As internal replication is set to false, it creates a copy to all the replicas in the shard, so each record goes to all three servers server 1,2 and 3.

Now when we query, it chooses both the shards and one replica out of them, but here both the shards have the exact same data ! , so all records will be selected twice .
Moral : Not to keep one server in two shards or else while querying same records can come multiple times (if this is not a problem to your use case then it's fine)
Thanks 

Raj

sunil sunny

unread,
Aug 8, 2018, 1:24:44 AM8/8/18
to ClickHouse
Hi Raj ,

I am facing the same problem. How to solve the above duplicates issue. Can you please help me


On Monday, May 29, 2017 at 12:32:47 AM UTC+5:30, Raj Malhotra wrote:
Reply all
Reply to author
Forward
0 new messages