ClickHouse roadmap

1,177 views
Skip to first unread message

asmun...@gmail.com

unread,
Jun 25, 2016, 6:12:02 PM6/25/16
to ClickHouse
Hello,

Currently, setting up ClickHouse for distributed environment is not easy. In order to create a distributed table, we need the create the actual table in all nodes in the cluster explicitly and the table definition must be same for all nodes. Although INSERT INTO to distributed tables works, the documentation suggest that we should consider INSERTing data to individual tables because is more efficient and flexible. Also ALTER TABLE distributedtable doesn't seem to be working. All these problems make us hard to think that ClickHouse is a distributed database since we usually need to deal with each node in the cluster in order to perform an operation.

The other problem is that scaling cluster is not easy. Adding nodes to the cluster requires configuration file change in all nodes and the ClusterHouse process needs to be restarted in order to be able to use the new node.

Also I'm not sure how can we scale down (removing node) the cluster without losing data. Maybe ReplicatedMergeTree table may recover the data but it's not clear which configuration we should use in order to be able to create ClickHouse cluster.

Given that the use-case of columnar databases is mostly involve distributed environment and replication feature in practice, I think it should be an easy and straightforward to create distributed tables that has failure recovery (via replication) feature. I understand that it may not be easy and you guys already have a setup for ClickHouse at Yandex and solve these problems in application level but it's hard for us to do that and it's an important barrier for CloudHouse.

I wonder if you have any plans to solve these problems and create a feature-complete distributed database. You may also have other priorities such as extending the SQL syntax or improve the performance of ClickHouse (It's already great BTW) so it would be great if you could share your roadmap for ClickHouse. I have seen that there is "cloud databases" feature in development which might solve these problems but couldn't find documentation about it.

Thanks for open-sourcing ClickHouse BTW!

man...@gmail.com

unread,
Jun 26, 2016, 2:33:10 AM6/26/16
to ClickHouse
Hello.

Most of roadmap is kept in secret, even inside company.
Only few plans are public:

1. Cloud databases.
It is intended for managing intermediate data.
It is not replacement for Distributed tables.

2. SQL dialect compatibility. Open-sourcing of JDBC driver. Development of ODBC driver.
Attempt for integration with Microstrategy or Tableau or Pentaho.


For ALTERing Distributed tables, there are task to implement "Distributed DDL".
In short, it allows to write ALTER TABLE ... CASCADE, to alter local tables in whole cluster.
Task has no timeline.

It is relatively easy to implement updating cluster configuration without restart.
We have no task for that right now.

We never faced need to scale down a cluster.
We have added 'resharding' functionality recenlty, that could be used for that.

It is very easy to add/remove replicas.
You could move replicas between physical servers by adding new replica (CREATE TABLE) on new server and removing (DROP TABLE) from old server.


Currently we have no plans to implement fully automatic resharding, but have some thoughts about it.
Most difficult is to implement automatic sharding strategy that is compatible with quite complex domain-specific sharding that we use in practice for some of our projects. If we could not test in production on large clusters, implementation would be low quality.

Stepan Semiokhin

unread,
Jul 4, 2016, 9:03:01 AM7/4/16
to ClickHouse
Attempt for integration with Microstrategy or Tableau or Pentaho.

It will be soooo nice to have to have integration with Pentaho. Is it a distant prospect or something that we can expect to see in the nearest half-year? 

man...@gmail.com

unread,
Jul 4, 2016, 4:20:59 PM7/4/16
to ClickHouse
We are already working in this direction.
I hope, it is in nearest half-year perspective, though I am not completely sure.

man...@gmail.com

unread,
Jul 18, 2016, 3:49:18 PM7/18/16
to ClickHouse
2. SQL dialect compatibility. Open-sourcing of JDBC driver.

JDBC driver has been open-sourced: https://github.com/yandex/clickhouse-jdbc
It has limited usage. For example, Pentaho will not work out of the box due to SQL dialect incompatibilities.

Max Khon

unread,
Feb 2, 2017, 1:32:34 AM2/2/17
to ClickHouse
Hello!
I wonder if there is any progress with Pentaho integration.

Thanks!

Max
 
Reply all
Reply to author
Forward
0 new messages