Memory limit (for query) exceeded

2,468 views
Skip to first unread message

Abdeali Kothari

unread,
May 17, 2018, 6:21:13 AM5/17/18
to ClickHouse
I have a query where I need to do a 3-table join and each of the tables has 10s of millions of records.
When I use clickhouse to handle this, I am doing something like this:


SELECT
         sum(col1) as sumCol1,
         sum(col2) as sumCol2,
         sum(col3) as sumCol3,
         ...,
         count(*) as total
         group1,
         group2
FROM (
    SELECT *, col10, col11, ...
    FROM (SELECT col1, col2, col3, col4, col5, .... FROM table1)
    ALL INNER JOIN table2 USING commonCol
)
ALL INNER JOIN table3 USING commonCol
where ... some conditions ...
GROUP BY group1, group2
ORDER BY group1, group2

On doing this, clickhouse tells me:
     DB::Exception: Memory limit (for query) exceeded: would use 9.33 GiB (attempt to allocate chunk of 67108864 bytes), maximum: 9.31 GiB

I overcame this by simply changing the 10000000000bytes int he users.xml to 12000000000bytes for now as I had a bit of additional RAM on the machine.

These tables will increase soon and I won't be able to do that again.

What can I do in such cases ?
 - Can I ask clickhouse to spill the temp things to save to HardDisk ?
 - Is there a more efficient way to do this kind of operation ?
 - Any configs I should keep in mind about ?

I am using: ClickHouse server version 1.1.54380 with only 1 machine.

Evan Hanks

unread,
May 18, 2018, 1:49:24 AM5/18/18
to ClickHouse
Take a look here and here first. I've had similar issues that have been resolved by a mixture of using external group/sort and separating very nested queries with joins/enrichment into material views. There are also tools like SAMPLE and such. 

Another idea I haven't tested myself might be to create a view and select a few columns at a time (depending on the nature of your data). I'm assuming this would read fewer columns and it looks like you're mostly doing sums on columns. 

If this isn't ad hoc type work I'd investigate restructuring to eliminate the joins if possible (dictionaries) as it looks like you aren't aggregating on them based on your example. You can also try some of the MergeTree tables that offer advanced features for aggregation e.g. AggregatingMergeTree. 

Abdeali Kothari

unread,
May 18, 2018, 2:58:24 AM5/18/18
to ClickHouse
Thanks for the reply :)

I tried setting both max_bytes_before_external_sort and max_bytes_before_external_group_by to half my max_memory usage. But I still get this issue.

I tried doing a "create materialized view innertbl engine=Log as <inner-join>" and using that view in the outer join - but that did not work either.

Notes:
I am actually doing a SELECT on all columns - as the WHERE condition is dynamically generated - and hence I need all the columns from all the tables. I currently have no way to know which filters are for which columns (its just a text blob) so I cannot smartly choose the columns first.
This is one of the many queries I run and all the others work fine - so restructuring the tables may not be the best idea for now.


Materialized views did the trick. I did not realize TEMP TABLE in clickhouse can only use Memory engine.

Evan Hanks

unread,
May 20, 2018, 1:29:29 PM5/20/18
to ClickHouse
Ah so you were able to fix the issue by using materialized views? Keep in mind you need to specify POPULATE if you want to load existing data into the view ;). Nearly all my problems in Clickhouse have boiled down to using expensive joins where denormalizing the data would work though, so perhaps long-term look into reducing the joins you have to do. 

And that's interesting. You could always look into using a Merge table/view perhaps, if you need to support dynamically reading from all the columns. I am not sure if your data would support logically separating out into multiple tables/materialized views though.

Perhaps it would be helpful in these sorts of situations to have a profiler for the  query steps, similar to query plans and such in other DBMS's. Not sure how hard this would be to implement given Clickhouse's architecture. Even a flag on the query (like FINAL) to print out a list of the steps of the query and the timings/resource usage would be great. Maybe a hint to the devs? ;)
Reply all
Reply to author
Forward
0 new messages