Select* with searching two fields is slow

821 views
Skip to first unread message

Jing Ning

unread,
May 23, 2017, 4:15:54 PM5/23/17
to ClickHouse
Hi,  Clickhouse became very slow when executing the following query:

select * from tbl_event where dest_ip = 'xxx.xxx.xxx.xxx' and source_ip = 'xxx.xxx.xxx.xxx'


The scan rates dropped to thousands rows/Second.  This is a search of two records from 1.5B records and each record includes 80 fields.  It won't be slow if we only search one field or we only select a few fields.  I know 'Select*' is normally slow for Columnar database, but why it only became much slower when searching more than one field?  Anything we can do to make it faster?


Thanks a lot!

Jing

Alex Zatelepin

unread,
May 30, 2017, 3:05:23 PM5/30/17
to ClickHouse
Hi Jing!


This way only the dest_ip and source_ip columns will be read for the whole table. Other columns will be read only if the PREWHERE expression is true in the corresponding blocks. Thus it will be much more efficient for your type of queries.

ClickHouse tries to do this automatically but the heuristic is not always producing the optimal result.

Jing Ning

unread,
May 31, 2017, 9:49:30 AM5/31/17
to ClickHouse
Alex,  'PREWHERE' works great! Now I can get the query results in several seconds. Is 'PREWHERE' always doing better than 'WHERE'?  Should I set the setting of 'optimize_move_to_prewhere' to 1?

Thanks a lot!
Jing

Alex Zatelepin

unread,
May 31, 2017, 1:43:00 PM5/31/17
to ClickHouse
optimize_move_to_prewhere is turned on by default. But it just turns on the optimization pass which decides whether to actually move the conditions to PREWHERE based on some heuristics.

Moving everything to PREWHERE is not always beneficial - e.g. if there are two conditions, one a very selective condition on a small-sized column and other on a large column, it is beneficial to move only the first condition to PREWHERE (to avoid reading the large column for majority of blocks). But it is almost always beneficial to move something to PREWHERE.

In your case I guess the optimizer was not aggressive enough and pushed only one of the conditions (BTW you can check it by searching for messages of the form MergeTreeWhereOptimizer: condition XXX moved to PREWHERE in the logs), but because conditions are independent and both very selective, it makes sense to move both of them to PREWHERE.
Reply all
Reply to author
Forward
0 new messages