Insert ok on distributed table but unable to retrieve any data

1,367 views
Skip to first unread message

Yohann Bredoux

unread,
Nov 7, 2017, 6:06:19 AM11/7/17
to ClickHouse
Hello,

We are using clickhouse for some months now, not yet in production though as we are still in validation stage.
We were almost good to go until few days ago where we faced a very wierd issue...

We have a clickhouse clusters composed of 7 nodes. When we insert data on a specific node and only this single one x.x.x.12 it says ok but no data can be retrieved from the cluster nor this node. We agree with eventual consistency but we are many days after the insert.

Furthermore, we have many tables and this problem occurs only on a few of them on this specific node.

We may destroy this node and create a new empty one then it will be replicated but we are not sure it will save the day nor it is the best solution.

We would greatly appreciate that clickhouse returns some failure to the client application or having a way to detect such issue cluster-wise and above all, a way to fix it as the cluster is now unusable.

Here is an example insert and select queries. Our setup is described below then followed by clickhouse server logs that might be of interest.

We use Distributed on top of ReplicatedMergeTree engines. We have 7 replicas each containing 3 nodes. We roll over instances in the cluster to create replica sets (ie: 123,234,345,...).

# docker run -it yandex/clickhouse-client --host x.x.x.12

ClickHouse client version 1.1.54236.

Connecting to x.x.x.12:9000.

Connected to ClickHouse server version 1.1.54236.

:) INSERT INTO mtm_trade_count_global VALUES('7bf1b76d-43c8-4bfb-bf00-784835a6c464', -9099749510027196988, '2017-11-06');

INSERT INTO mtm_trade_count_global VALUES

Ok.

1 rows in set. Elapsed: 0.006 sec.

:) select * from mtm_trade_count_global where job_id = '7bf1b76d-43c8-4bfb-bf00-784835a6c464' and scenario = -9099749510027196988;

SELECT *
FROM mtm_trade_count_global
WHERE (job_id = '7bf1b76d-43c8-4bfb-bf00-784835a6c464') AND (scenario = -9099749510027196988)

Ok.

0 rows in set. Elapsed: 0.013 sec. Processed 217.96 thousand rows, 3.40 MB (16.77 million rows/s., 261.52 MB/s.)


Configuration of one of the table:

#!/bin/bash

echo "Creation of the Replicated tables"


echo "Server 1 : x.x.x..10"


echo "CREATE DATABASE IF NOT EXISTS replica_1" | curl -XPOST http://x.x.x..10:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_1.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/01/mtm_trade_count',
          'node1',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..10:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_7" | curl -XPOST http://x.x.x..10:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_7.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/07/mtm_trade_count',
          'node1',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..10:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_6" | curl -XPOST http://x.x.x..10:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_6.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/06/mtm_trade_count',
          'node1',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..10:8123/' -d @-
echo "Server 2 : x.x.x..11"


echo "CREATE DATABASE IF NOT EXISTS replica_2" | curl -XPOST http://x.x.x..11:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_2.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/02/mtm_trade_count',
          'node2',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..11:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_1" | curl -XPOST http://x.x.x..11:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_1.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/01/mtm_trade_count',
          'node2',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..11:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_7" | curl -XPOST http://x.x.x..11:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_7.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/07/mtm_trade_count',
          'node2',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..11:8123/' -d @-
echo "Server 3 : x.x.x..12"


echo "CREATE DATABASE IF NOT EXISTS replica_3" | curl -XPOST http://x.x.x..12:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_3.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/03/mtm_trade_count',
          'node3',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..12:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_2" | curl -XPOST http://x.x.x..12:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_2.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/02/mtm_trade_count',
          'node3',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..12:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_1" | curl -XPOST http://x.x.x..12:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_1.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/01/mtm_trade_count',
          'node3',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..12:8123/' -d @-
echo "Server 4 : x.x.x..13"


echo "CREATE DATABASE IF NOT EXISTS replica_4" | curl -XPOST http://x.x.x..13:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_4.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/04/mtm_trade_count',
          'node4',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..13:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_3" | curl -XPOST http://x.x.x..13:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_3.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/03/mtm_trade_count',
          'node4',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..13:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_2" | curl -XPOST http://x.x.x..13:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_2.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/02/mtm_trade_count',
          'node4',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..13:8123/' -d @-
echo "Server 5 : x.x.x..14"


echo "CREATE DATABASE IF NOT EXISTS replica_5" | curl -XPOST http://x.x.x..14:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_5.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/05/mtm_trade_count',
          'node5',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..14:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_4" | curl -XPOST http://x.x.x..14:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_4.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/04/mtm_trade_count',
          'node5',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..14:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_3" | curl -XPOST http://x.x.x..14:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_3.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/03/mtm_trade_count',
          'node5',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..14:8123/' -d @-
echo "Server 6 : x.x.x..15"


echo "CREATE DATABASE IF NOT EXISTS replica_6" | curl -XPOST http://x.x.x..15:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_6.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/06/mtm_trade_count',
          'node6',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..15:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_5" | curl -XPOST http://x.x.x..15:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_5.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/05/mtm_trade_count',
          'node6',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..15:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_4" | curl -XPOST http://x.x.x..15:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_4.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/04/mtm_trade_count',
          'node6',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..15:8123/' -d @-
echo "Server 7 : x.x.x..17"


echo "CREATE DATABASE IF NOT EXISTS replica_7" | curl -XPOST http://x.x.x..17:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_7.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/07/mtm_trade_count',
          'node7',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..17:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_6" | curl -XPOST http://x.x.x..17:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_6.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/06/mtm_trade_count',
          'node7',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..17:8123/' -d @-

echo "CREATE DATABASE IF NOT EXISTS replica_5" | curl -XPOST http://x.x.x..17:8123/ -d @-


echo "CREATE TABLE IF NOT EXISTS
      replica_5.mtm_trade_count(job_id String, scenario Int64, date Date)
      ENGINE = ReplicatedMergeTree(
          '/clickhouse/tables/05/mtm_trade_count',
          'node7',
          date,
          job_id,
          8192
        )" | curl -XPOST 'http://x.x.x..17:8123/' -d @-

echo "Creation of the Distributed tables"

echo "Server 1 : x.x.x..10"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
      ENGINE =
        Distributed(
          clickHouseCluster,
          '',
          mtm_trade_count,
          intHash64(scenario)
        )" | curl -XPOST 'http://x.x.x..10:8123/' -d @-

echo "Server 2 : x.x.x..11"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
      ENGINE =
        Distributed(
          clickHouseCluster,
          '',
          mtm_trade_count,
          intHash64(scenario)
        )" | curl -XPOST 'http://x.x.x..11:8123/' -d @-

echo "Server 3 : x.x.x..12"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
      ENGINE =
        Distributed(
          clickHouseCluster,
          '',
          mtm_trade_count,
          intHash64(scenario)
        )" | curl -XPOST 'http://x.x.x..12:8123/' -d @-

echo "Server 4 : x.x.x..13"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
      ENGINE =
        Distributed(
          clickHouseCluster,
          '',
          mtm_trade_count,
          intHash64(scenario)
        )" | curl -XPOST 'http://x.x.x..13:8123/' -d @-

echo "Server 5 : x.x.x..14"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
      ENGINE =
        Distributed(
          clickHouseCluster,
          '',
          mtm_trade_count,
          intHash64(scenario)
        )" | curl -XPOST 'http://x.x.x..14:8123/' -d @-

echo "Server 6 : x.x.x..15"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
      ENGINE =
        Distributed(
          clickHouseCluster,
          '',
          mtm_trade_count,
          intHash64(scenario)
        )" | curl -XPOST 'http://x.x.x..15:8123/' -d @-

echo "Server 7 : x.x.x..17"
echo "CREATE TABLE IF NOT EXISTS mtm_trade_count_global(job_id String, scenario Int64, date Date)
      ENGINE =
        Distributed(
          clickHouseCluster,
          '',
          mtm_trade_count,
          intHash64(scenario)
        )" | curl -XPOST 'http://x.x.x..17:8123/' -d @-


Certainly some related logs in clickhouse:

We see these error logs. Notice we did not change any rules in the network and other nodes seem to work fine.

2017.11.06 21:13:56.462541 [ 10153 ] <Error> executeQuery: Code: 209, e.displayText() = DB::NetException: Timeout exceeded while reading from socket ([::ffff:x.x.x.12]:59366), e.what() = DB::NetException (from [::ffff:x.x.x.12]:59366) (in query: INSERT INTO mtm_trade VALUES), Stack trace:

2017.11.06 21:08:56.455780 [ 10164 ] <Error> ServerErrorHandler: Code: 210, e.displayText() = DB::NetException: Connection reset by peer: while reading from socket ([::ffff:x.x.x.12]:56390), e.what() = DB::NetException, Stack trace:

On many tables on the buggy node we observed this king of log too:

2017.11.06 21:06:13.207536 [ 5277 ] <Error> hedge_xva_trajectory_global.Distributed.DirectoryMonitor: Code: 32, e.displayText() = DB::Exception: Attempt to read after eof, e.what() = DB::Exception, Stack trace:

David Pérez Comendador

unread,
Jun 7, 2018, 11:40:11 AM6/7/18
to ClickHouse
Any news?¿

Yj H

unread,
Jun 8, 2018, 1:26:20 AM6/8/18
to ClickHouse

No idea about root cause.


Some suggestions:

1. Why are you having seven replica of the same data? Isn't two or three replicas enough against data loss ?


2. Can we simplify things by not using ReplicatedMergeTree ? And then see if the problem persists, so that we will know if it s a ReplicatedMergeTree related issue.

    

    Can use MergeTree and RAID 10 for data replication, instead of ReplicatedMergeTree in production ?

    The simpler the architecture, the less problems.


3. After inserting a batch of data onto the problematic node, check if part size is increased in system.parts, and if the part directory in the file system is increased.

    So that we will know if we are having an insertion-time problem or a query-time problem.


    If data is persisted on disk, try to copy it onto another node, and see if the migrated partition can be queried on a different node.


These suggestions do not hit the root cause directly but hopefully it gives some hint.

Reply all
Reply to author
Forward
0 new messages