First of all, great kudos for open-sourcing ClickHouse.
It seems that we need to use Distributed table in order to be able to use clustering feature of ClickHouse. The problem with that is we need to write the node IPs to a XML file and restart the cluster (which basically means down-time) in order to add / remove nodes from the cluster. However; it's a real problem for everyone who wants to use ClickHouse in production. How do you scale your cluster in Yandex, could you please give some hints to us?
If I understand correctly, it's a replacement for Kafka style queues. We stream the data (one CREATE TABLE per-second-server) to TinyLog table from API services and eventually move the data to MergeTree tables. In that way, we can actually read the real-time data from TinyLog tables. Since they won't have tens of millions of rows, reading data from those tables won't be a problem. It actually simplifies real-time big data workflows, the only thing we need is to handle the transition of data from TinyLog tables to MergeTree tables via a client application. This seems like a good deal, I will try to use them in a week.