Ingestion times went from amazing to really bad

269 views
Skip to first unread message

Oscar Campos

unread,
Nov 16, 2016, 5:50:50 AM11/16/16
to ClickHouse
Hi guys.

Our configuration is as follows:

4 x powerful machine nodes running with the following topology:

1 Layer -> 2 Shards -> 2 Replicas per shard

1 x less powerful machine that ingest CSV files into the cluster each 2 minutes

3 x ZooKeeper nodes 

All those nodes are connected between them using 10GB link 

We ingest around 10 million rows each two minutes, when we started to testing the system, ingestion times where really good, between 9 and 14 seconds, now they are really bad beyond 2 minutes (so we have backlog). We have two tables with 58 fields each. Actually we have 25,807,220,822 records in total in each table (counting it in the distributed table, data is obviously sharded).

Looks like the problem is that merges get delayed because too many parts to merge, actually we have the following parts per node:

Layer1-shard1-replica1: 59
Layer1-shard1-replica2: 478

Layer1-shard2-replica1: 58
Layer1-shard2-replica2: 343

I also see loads of warnings about ZooKeeper not being able to delete some parts because "no node" in three of the nodes

I have mainly three questions:

1) Any idea about why replica 2 in both shards have way more parts than replica 1?
2) Is there any documentation or guide about how to tweak ClickHouse server to improve performance? Our node machines are almost free, for example I never seen more CPU usage than 2% (40 cores per node, 256GB Ram per node)
3) How can I know what is wrong in the system to this drop of performance in the ingestion times?

Thank you.

man...@gmail.com

unread,
Dec 5, 2016, 8:22:04 PM12/5/16
to ClickHouse
Hello.

There was an issue in previous versions: after high throughput of INSERTs into non-replicated MergeTree tables, parts could not get merged in a long time.
Now it is fixed.

S M

unread,
Dec 7, 2016, 8:26:10 PM12/7/16
to ClickHouse
Hello,

Which version of clickhouse was this issue fixed in?

man...@gmail.com

unread,
Dec 8, 2016, 5:08:40 PM12/8/16
to ClickHouse
Starting from 1.1.54074 (two weeks ago).
I suggest to install latest version: 1.1.54083.

Itai Shirav

unread,
Dec 10, 2016, 2:53:30 AM12/10/16
to ClickHouse
Hi,

I think this shows there's a real need for a more organized procedure for version releases, including at least:
1. A clearly-written change log
2. An announcement in the Google Group

Clickhouse is no longer an internal project, there are other people and companies using it in production and they need to know which version is stable, what are the known bugs, when they get fixed, should they upgrade, etc.

This is very important for you users, please don't neglect it.

Thanks!

Alps Wang

unread,
Dec 12, 2016, 1:49:11 PM12/12/16
to ClickHouse
do we have a documentation regarding how to do safe upgrade ?

Thanks

man...@gmail.com

unread,
Dec 12, 2016, 4:01:52 PM12/12/16
to ClickHouse
To do upgrade, just install new package and restart clickhouse-server.
To do upgrade on cluster without downtime, install new package everywhere, then restart on half of replicas, then on another half.


> 1. A clearly-written change log
> 2. An announcement in the Google Group

Difficult, but we will try to adapt for that.
Reply all
Reply to author
Forward
0 new messages