Any documents for clickhouse cluster up?

1,077 views
Skip to first unread message

Qiang Wang

unread,
Jun 16, 2016, 8:51:38 AM6/16/16
to ClickHouse
Any example about cluster setup?

man...@gmail.com

unread,
Jun 16, 2016, 3:25:18 PM6/16/16
to ClickHouse

Ivan Žužak

unread,
Jun 17, 2016, 6:58:30 PM6/17/16
to ClickHouse
First of all, great kudos for open-sourcing ClickHouse.

It seems that we need to use Distributed table in order to be able to use clustering feature of ClickHouse. The problem with that is we need to write the node IPs to a XML file and restart the cluster (which basically means down-time) in order to add / remove nodes from the cluster. However; it's a real problem for everyone who wants to use ClickHouse in production. How do you scale your cluster in Yandex, could you please give some hints to us?

If I understand correctly, it's a replacement for Kafka style queues. We stream the data (one CREATE TABLE per-second-server) to TinyLog table from API services and eventually move the data to MergeTree tables. In that way, we can actually read the real-time data from TinyLog tables. Since they won't have tens of millions of rows, reading data from those tables won't be a problem. It actually simplifies real-time big data workflows, the only thing we need is to handle the transition of data from TinyLog tables to MergeTree tables via a client application. This seems like a good deal, I will try to use them in a week. 

man...@gmail.com

unread,
Jun 18, 2016, 12:16:57 PM6/18/16
to ClickHouse

problem with that is we need to write the node IPs to a XML file
You could write domains. They are resolved just one time and cached till server restart.


and restart the cluster (which basically means down-time) in order to add / remove nodes from the cluster.
We don't restart whole cluster at once. First restart half of replicas, then other.
As client application tries to connect to any available node, no downtime occurs.


How do you scale your cluster in Yandex, could you please give some hints to us?
We add servers for new shard and create tables on them. Then update configuration of cluster on each server (we are using Salt to simplify this task). Then restart servers, one location at one time. Also, we start to write data to new shard with some weight.
 
Configuration of distributed tables will be inconsistent for short time. It is no problem because new shard contains no data at that time.


If I understand correctly, it's a replacement for Kafka style queues. We stream the data (one CREATE TABLE per-second-server) to TinyLog table from API services and eventually move the data to MergeTree tables.

In many cases there is no need to replace Kafka. You have mini-batches of data anyway and INSERT them to MergeTree.
Using of ClickHouse for intermediate data is more complex case (althrough we use).
Reply all
Reply to author
Forward
0 new messages