Hello,
We are using clickhouse for some months now, not yet in production though as we are still in validation stage.
We were almost good to go until few days ago where we faced a very wierd issue...
We have a clickhouse clusters composed of 7 nodes. When we insert data on a specific node and only this single one x.x.x.12 it says ok but no data can be retrieved from the cluster nor this node. We agree with eventual consistency but we are many days after the insert.
Furthermore, we have many tables and this problem occurs only on a few of them on this specific node.
We may destroy this node and create a new empty one then it will be replicated but we are not sure it will save the day nor it is the best solution.
We would greatly appreciate that clickhouse returns some failure to the client application or having a way to detect such issue cluster-wise and above all, a way to fix it as the cluster is now unusable.
Here is an example insert and select queries. Our setup is described below then followed by clickhouse server logs that might be of interest.
We use Distributed on top of ReplicatedMergeTree engines. We have 7 replicas each containing 3 nodes. We roll over instances in the cluster to create replica sets (ie: 123,234,345,...).
# docker run -it yandex/clickhouse-client --host x.x.x.12
ClickHouse client version 1.1.54236.
Connecting to x.x.x.12:9000.
Connected to ClickHouse server version 1.1.54236.
:) INSERT INTO mtm_trade_count_global VALUES('7bf1b76d-43c8-4bfb-bf00-784835a6c464', -9099749510027196988, '2017-11-06');
INSERT INTO mtm_trade_count_global VALUES
Ok.
1 rows in set. Elapsed: 0.006 sec.
:) select * from mtm_trade_count_global where job_id = '7bf1b76d-43c8-4bfb-bf00-784835a6c464' and scenario = -9099749510027196988;
SELECT *
FROM mtm_trade_count_global
WHERE (job_id = '7bf1b76d-43c8-4bfb-bf00-784835a6c464') AND (scenario = -9099749510027196988)
Ok.
0 rows in set. Elapsed: 0.013 sec. Processed 217.96 thousand rows, 3.40 MB (16.77 million rows/s., 261.52 MB/s.)
Configuration of one of the table:
#!/bin/bash
echo "Creation of the Replicated tables"
echo "Server 1 : x.x.x..10"
echo "CREATE DATABASE IF NOT EXISTS replica_1" | curl -XPOST
http://x.x.x..10:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_1.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/01/mtm_trade_count',
'node1',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..10:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_7" | curl -XPOST
http://x.x.x..10:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_7.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/07/mtm_trade_count',
'node1',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..10:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_6" | curl -XPOST
http://x.x.x..10:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_6.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/06/mtm_trade_count',
'node1',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..10:8123/' -d @-
echo "Server 2 : x.x.x..11"
echo "CREATE DATABASE IF NOT EXISTS replica_2" | curl -XPOST
http://x.x.x..11:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_2.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/02/mtm_trade_count',
'node2',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..11:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_1" | curl -XPOST
http://x.x.x..11:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_1.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/01/mtm_trade_count',
'node2',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..11:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_7" | curl -XPOST
http://x.x.x..11:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_7.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/07/mtm_trade_count',
'node2',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..11:8123/' -d @-
echo "Server 3 : x.x.x..12"
echo "CREATE DATABASE IF NOT EXISTS replica_3" | curl -XPOST
http://x.x.x..12:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_3.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/03/mtm_trade_count',
'node3',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..12:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_2" | curl -XPOST
http://x.x.x..12:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_2.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/02/mtm_trade_count',
'node3',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..12:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_1" | curl -XPOST
http://x.x.x..12:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_1.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/01/mtm_trade_count',
'node3',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..12:8123/' -d @-
echo "Server 4 : x.x.x..13"
echo "CREATE DATABASE IF NOT EXISTS replica_4" | curl -XPOST
http://x.x.x..13:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_4.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/04/mtm_trade_count',
'node4',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..13:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_3" | curl -XPOST
http://x.x.x..13:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_3.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/03/mtm_trade_count',
'node4',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..13:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_2" | curl -XPOST
http://x.x.x..13:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_2.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/02/mtm_trade_count',
'node4',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..13:8123/' -d @-
echo "Server 5 : x.x.x..14"
echo "CREATE DATABASE IF NOT EXISTS replica_5" | curl -XPOST
http://x.x.x..14:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_5.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/05/mtm_trade_count',
'node5',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..14:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_4" | curl -XPOST
http://x.x.x..14:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_4.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/04/mtm_trade_count',
'node5',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..14:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_3" | curl -XPOST
http://x.x.x..14:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_3.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/03/mtm_trade_count',
'node5',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..14:8123/' -d @-
echo "Server 6 : x.x.x..15"
echo "CREATE DATABASE IF NOT EXISTS replica_6" | curl -XPOST
http://x.x.x..15:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_6.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/06/mtm_trade_count',
'node6',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..15:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_5" | curl -XPOST
http://x.x.x..15:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_5.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/05/mtm_trade_count',
'node6',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..15:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_4" | curl -XPOST
http://x.x.x..15:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_4.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/04/mtm_trade_count',
'node6',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..15:8123/' -d @-
echo "Server 7 : x.x.x..17"
echo "CREATE DATABASE IF NOT EXISTS replica_7" | curl -XPOST
http://x.x.x..17:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_7.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/07/mtm_trade_count',
'node7',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..17:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_6" | curl -XPOST
http://x.x.x..17:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_6.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/06/mtm_trade_count',
'node7',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..17:8123/' -d @-
echo "CREATE DATABASE IF NOT EXISTS replica_5" | curl -XPOST
http://x.x.x..17:8123/ -d @-
echo "CREATE TABLE IF NOT EXISTS
replica_5.mtm_trade_count(job_id String, scenario Int64, date Date)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/05/mtm_trade_count',
'node7',
date,
job_id,
8192
)" | curl -XPOST '
http://x.x.x..17:8123/' -d @-
echo "Creation of the Distributed tables"
echo "Server 1 : x.x.x..10"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
ENGINE =
Distributed(
clickHouseCluster,
'',
mtm_trade_count,
intHash64(scenario)
)" | curl -XPOST '
http://x.x.x..10:8123/' -d @-
echo "Server 2 : x.x.x..11"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
ENGINE =
Distributed(
clickHouseCluster,
'',
mtm_trade_count,
intHash64(scenario)
)" | curl -XPOST '
http://x.x.x..11:8123/' -d @-
echo "Server 3 : x.x.x..12"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
ENGINE =
Distributed(
clickHouseCluster,
'',
mtm_trade_count,
intHash64(scenario)
)" | curl -XPOST '
http://x.x.x..12:8123/' -d @-
echo "Server 4 : x.x.x..13"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
ENGINE =
Distributed(
clickHouseCluster,
'',
mtm_trade_count,
intHash64(scenario)
)" | curl -XPOST '
http://x.x.x..13:8123/' -d @-
echo "Server 5 : x.x.x..14"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
ENGINE =
Distributed(
clickHouseCluster,
'',
mtm_trade_count,
intHash64(scenario)
)" | curl -XPOST '
http://x.x.x..14:8123/' -d @-
echo "Server 6 : x.x.x..15"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
ENGINE =
Distributed(
clickHouseCluster,
'',
mtm_trade_count,
intHash64(scenario)
)" | curl -XPOST '
http://x.x.x..15:8123/' -d @-
echo "Server 7 : x.x.x..17"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
ENGINE =
Distributed(
clickHouseCluster,
'',
mtm_trade_count,
intHash64(scenario)
)" | curl -XPOST '
http://x.x.x..17:8123/' -d @-
Certainly some related logs in clickhouse:
We see these error logs. Notice we did not change any rules in the network and other nodes seem to work fine.
2017.11.06 21:13:56.462541 [ 10153 ] <Error> executeQuery: Code: 209, e.displayText() = DB::NetException: Timeout exceeded while reading from socket ([::ffff:x.x.x.12]:59366), e.what() = DB::NetException (from [::ffff:x.x.x.12]:59366) (in query: INSERT INTO mtm_trade VALUES), Stack trace:
2017.11.06 21:08:56.455780 [ 10164 ] <Error> ServerErrorHandler: Code: 210, e.displayText() = DB::NetException: Connection reset by peer: while reading from socket ([::ffff:x.x.x.12]:56390), e.what() = DB::NetException, Stack trace:
On many tables on the buggy node we observed this king of log too:
2017.11.06 21:06:13.207536 [ 5277 ] <Error> hedge_xva_trajectory_global.Distributed.DirectoryMonitor: Code: 32, e.displayText() = DB::Exception: Attempt to read after eof, e.what() = DB::Exception, Stack trace: