Merge table issue

554 views
Skip to first unread message

Jordi Vilaseca Corderroure

unread,
Mar 31, 2017, 4:34:16 AM3/31/17
to ClickHouse
Hello everyone,

I have an issue after updating clickhouse to the version v1.1.54190-stable. I have a Merge table that reads from multiple tables. Before updating clickhouse, when a query was performed the obtained results were the same as if the query was performed to a table alone (For example a select using a limit 10 returned only 10 values). But after updating, I have found out that the obtained results are not the same, it returns the results for each one of the tables separately (In the same example that before, now it is returning 10 values for each one of the tables that the Merge table was including), for each entry the field _table also appears even though I have not explicitly added it to the returned columns.

I don't know if this is the desired behaviour from now on or it is a bug. In case of being the desired behaviour, there is some way to get the results as before?

Thanks!

Vitaliy Lyudvichenko

unread,
Mar 31, 2017, 12:30:27 PM3/31/17
to ClickHouse
We updated Merge table engine in December-January (~v1.1.54165). 
Now it reads underlying tables using multiple threads.

Previously it read tables sequentially using single thread, so results ordering was deterministic, but execution speed was slow.
You can run clickhouse-client with --max_threads=1 settings to keep ordering.
Note that MergeTree engine also don't provides the results ordering.

But it is strange that _table field is injected. It should be injected only if it is explicitly specified in query.
I cannot reproduce the problem.
Could you provide example of a query that injects _table filed and show create table for related tables?

пятница, 31 марта 2017 г., 11:34:16 UTC+3 пользователь Jordi Vilaseca Corderroure написал:

Jordi Vilaseca Corderroure

unread,
Apr 11, 2017, 6:07:26 AM4/11/17
to ClickHouse
Thank you for your response!

After some research more I have found out something strange. I have multiple databases, each one of them with multiple replicated tables and distributed tables, and on top of all I have a merge table that includes all the distributed tables. I give you an example of the schema:

-- Create database
CREATE DATABASE IF NOT EXISTS test;

-- Create replicated tables
CREATE TABLE IF NOT EXISTS test.r_1 (date Date, first Int32, second Int32) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/test.r_1', '{replica}', date, (first), 8192);
CREATE TABLE IF NOT EXISTS test.r_2 (date Date, first Int32, second Int32) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/test.r_2', '{replica}', date, (first), 8192);

-- Create distributed tables
CREATE TABLE IF NOT EXISTS test.d_1 (date Date, first Int32, second Int32) ENGINE = Distributed('cluster', test, r_1, rand());
CREATE TABLE IF NOT EXISTS test.d_2 (date Date, first Int32, second Int32) ENGINE = Distributed('cluster', test, r_2, rand());

-- Create Merge tables
CREATE TABLE IF NOT EXISTS test.distributed_all (date Date, first Int32, second Int32) ENGINE = Merge(test, '^d_*');
CREATE TABLE IF NOT EXISTS test.replicated_all (date Date, first Int32, second Int32) ENGINE = Merge(test, '^r_*');

-- Insert data
INSERT INTO test.d_1 VALUES ('2017-04-11', 1, 2);
INSERT INTO test.d_2 VALUES ('2017-04-11', 3, 4);

If a count query is performed on top of the merge table that aims to the replicated tables, it returns the desired result:
Query
====
SELECT COUNT() FROM test.replicated_all FORMAT CSV
Result
=====
2

But, if the same query is performed on top of the merge table that aims to the distributed table it returns the results for each one of the tables.
Query
====
SELECT COUNT() FROM test.distributed_all FORMAT CSV
Result
=====
"d_1",1
"d_2",1

This was tested using the last stable version (1.1.54198)


El divendres, 31 març de 2017 18:30:27 UTC+2, Vitaliy Lyudvichenko va escriure:

Vitaliy Lyudvichenko

unread,
Aug 18, 2017, 12:19:55 PM8/18/17
to ClickHouse
This should be fixed in 1.1.54276 version

вторник, 11 апреля 2017 г., 13:07:26 UTC+3 пользователь Jordi Vilaseca Corderroure написал:

Jordi Vilaseca Corderroure

unread,
Aug 28, 2017, 10:33:03 AM8/28/17
to ClickHouse
Wow! Thank you very match!

El divendres, 18 agost de 2017 18:19:55 UTC+2, Vitaliy Lyudvichenko va escriure:
Reply all
Reply to author
Forward
0 new messages