I have a query where I need to do a 3-table join and each of the tables has 10s of millions of records.
When I use clickhouse to handle this, I am doing something like this:
SELECT
sum(col1) as sumCol1,
sum(col2) as sumCol2,
sum(col3) as sumCol3,
...,
count(*) as total
group1,
group2
FROM (
SELECT *, col10, col11, ...
FROM (SELECT col1, col2, col3, col4, col5, .... FROM table1)
ALL INNER JOIN table2 USING commonCol
)
ALL INNER JOIN table3 USING commonCol
where ... some conditions ...
GROUP BY group1, group2
ORDER BY group1, group2
On doing this, clickhouse tells me:
DB::Exception: Memory limit (for query) exceeded: would use 9.33 GiB (attempt to allocate chunk of 67108864 bytes), maximum: 9.31 GiB
I overcame this by simply changing the 10000000000bytes int he users.xml to 12000000000bytes for now as I had a bit of additional RAM on the machine.
These tables will increase soon and I won't be able to do that again.
What can I do in such cases ?
- Can I ask clickhouse to spill the temp things to save to HardDisk ?
- Is there a more efficient way to do this kind of operation ?
- Any configs I should keep in mind about ?
I am using: ClickHouse server version 1.1.54380 with only 1 machine.