About views query performance

22 views
Skip to first unread message

Tiago Krebs

unread,
Apr 29, 2021, 8:57:21 AMApr 29
to ClickHouse
Hi all,

I've been struggling to understand one slow query compared between one view and its original replicated table. If someone could help it would be much appreciated. Consider this table has 8 billion rows:

CREATE TABLE access_logs
(
    `ts` DateTime,
    `source` String,
    `upstream_status` Array(Int16)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/access_logs', '{replica}')
PARTITION BY toYYYYMM(ts)
PRIMARY KEY (ts)
ORDER BY (ts)

To solve a legacy internal UI query builder problem I can't read arrays (something very nasty with the connector that I don't want to touch at this moment). As for the first value of upstream_status is enough for this system I came up with a simple view to solving this.

CREATE VIEW access_logs_v
(
    `ts` DateTime,
    `source` String,
    `upstream_status` Int16
) AS
SELECT
    ts,
    source,
    upstream_status[1] AS upstream_status
FROM access_logs

Here is the problem. A simple query over the original table is fast as always, even when I get only the first index of that array.

SELECT ts, source, upstream_status[1] FROM access_logs ORDER BY ts DESC LIMIT 100;
100 rows in set. Elapsed: 0.121 sec. Processed 896.65 thousand rows, 25.12 MB (7.39 million rows/s., 207.17 MB/s.)

But using my view, the response is too much slower.

SELECT ts, source, upstream_status FROM access_logs_v ORDER BY ts DESC LIMIT 100;
100 rows in set. Elapsed: 17.161 sec. Processed 8.17 billion rows, 228.89 GB (476.14 million rows/s., 13.34 GB/s.)

I can guess of course that the huge difference in the processed rows is the cause and this is maybe related to the distributed characteristics from my original table or simply the way of views work. But I'm not understanding:

- Exactly why my view is behaving in that way, doing this full scan on the original table (I wasn't able to find anything on the docs about it).
- that is a better way to achieve what I need through this view or another kind of object?

Regards,

Denis Zhuravlev

unread,
Apr 30, 2021, 11:50:58 PMApr 30
to ClickHouse
Order by pushdown is not implemented.

CREATE TABLE access_logs
(
    `ts` DateTime,
    `source` String,
    `upstream_status` Array(Int16)
)
ENGINE = MergeTree

PARTITION BY toYYYYMM(ts)
PRIMARY KEY (ts)
ORDER BY (ts)

insert into access_logs select now(), 'x', [1,2,3,4,5] from numbers(10000000);

select * from (SELECT * FROM access_logs) ORDER BY ts limit 1;
Processed 10.00 million rows

You can use ALIAS column:
alter table access_logs add column upstream_status_1 Int16 alias upstream_status[1];
SELECT * FROM access_logs  ORDER BY ts limit 1;
Processed 524.29 thousand row
Reply all
Reply to author
Forward
0 new messages