Попробовал поизменять max_block_size, 8192 действительно ускорило запрос, спасибо.
Параметр group_by_two_level_threshold на скорость не влияет никак.
Поведение при нагрузке воспроизводится, то есть: каждый раз, когда я сильно нагружаю базу с этим запросом, начинает валится ошибка с кодом 159, а после остановки "подачи нагрузки" и рассасывания всех запросов (по факту, я так понимаю, они все просто падают), каждый запрос начинает выполняться долго. Единственный спососб снова заставить его работать быстро - рестарт кликхауса на всех нодах.
В clickhouse.log:
......
2016.10.26 16:56:01.074 [ 634 ] <Error> executeQuery: Code: 159, e.displayText() = DB::Exception: Timeout exceeded: elapsed 48.864352586 seconds, maximum: 40, e.what() = DB::Exception (from ::ffff:10.255.4.121) (in query: select ... limit 1000 FORMAT TabSeparatedWithNamesAndTypes), Stack trace:
0. clickhouse-server(StackTrace::StackTrace()+0x16) [0xfea856]
1. clickhouse-server(DB::Exception::Exception(std::string const&, int)+0x1f) [0xfadbbf]
2. clickhouse-server(DB::IProfilingBlockInputStream::checkLimits()+0x54a) [0x11e709a]
3. clickhouse-server(DB::IProfilingBlockInputStream::read()+0x1d6) [0x11e8456]
4. clickhouse-server(DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0>::Handler, (DB::StreamUnionMode)0>::loop(unsigned long)+0x10b) [0x133219b]
5. clickhouse-server(DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0>::Handler, (DB::StreamUnionMode)0>::thread(MemoryTracker*, unsigned long)+0x56) [0x1332a66]
6. clickhouse-server(execute_native_thread_routine+0x20) [0x32ecd00]
7. /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f8973217184]
8. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f897283237d]
2016.10.26 16:56:01.074 [ 634 ] <Trace> UnionBlockInputStream: Waiting for threads to finish
2016.10.26 16:56:01.202 [ 1402 ] <Trace> RemoteBlockInputStream: (control002:9000, 10.233.10.43) Cancelling query
2016.10.26 16:56:01.497 [ 1402 ] <Trace> RemoteBlockInputStream: (control003:9000, 10.233.10.44) Cancelling query
2016.10.26 16:56:01.567 [ 1403 ] <Trace> Aggregator: Aggregation method: concat
2016.10.26 16:56:01.619 [ 1404 ] <Trace> Aggregator: Aggregation method: concat
2016.10.26 16:56:01.631 [ 1402 ] <Trace> RemoteBlockInputStream: (control004:9000, 10.233.10.45) Cancelling query
2016.10.26 16:56:01.670 [ 1405 ] <Trace> Aggregator: Aggregation method: concat
2016.10.26 16:56:02.002 [ 1402 ] <Trace> RemoteBlockInputStream: (control005:9000, 10.233.10.46) Cancelling query
2016.10.26 16:56:02.314 [ 1406 ] <Trace> Aggregator: Aggregation method: concat
2016.10.26 16:56:02.338 [ 1407 ] <Trace> Aggregator: Aggregation method: concat
2016.10.26 16:56:02.427 [ 1402 ] <Trace> RemoteBlockInputStream: (control006:9000, 10.233.10.47) Cancelling query
2016.10.26 16:56:02.427 [ 1402 ] <Trace> RemoteBlockInputStream: (control007:9000, 10.233.10.48) Cancelling query
.....
В clickhouse.err.log тоже самое:
.....
2016.10.26 16:58:54.778 [ 1800 ] <Error> HTTPHandler: Code: 159, e.displayText() = DB::Exception: Timeout exceeded: elapsed 137.316989926 seconds, maximum: 40, e.what() = DB::Exception, Stack trace:
0. clickhouse-server(StackTrace::StackTrace()+0x16) [0xfea856]
1. clickhouse-server(DB::Exception::Exception(std::string const&, int)+0x1f) [0xfadbbf]
2. clickhouse-server(DB::IProfilingBlockInputStream::checkLimits()+0x54a) [0x11e709a]
3. clickhouse-server(DB::IProfilingBlockInputStream::read()+0x1d6) [0x11e8456]
4. clickhouse-server(DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0>::Handler, (DB::StreamUnionMode)0>::loop(unsigned long)+0x10b) [0x133219b]
5. clickhouse-server(DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0>::Handler, (DB::StreamUnionMode)0>::thread(MemoryTracker*, unsigned long)+0x56) [0x1332a66]
6. clickhouse-server(execute_native_thread_routine+0x20) [0x32ecd00]
7. /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f8973217184]
8. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f897283237d]
2016.10.26 16:58:54.784 [ 2201 ] <Trace> RemoteBlockInputStream: (control007:9000, 10.233.10.48) Cancelling query
2016.10.26 16:58:54.810 [ 2034 ] <Trace> RemoteBlockInputStream: (control003:9000, 10.233.10.44) Cancelling query
2016.10.26 16:58:54.827 [ 2034 ] <Trace> RemoteBlockInputStream: (control004:9000, 10.233.10.45) Cancelling query
2016.10.26 16:58:54.847 [ 2201 ] <Trace> RemoteBlockInputStream: (control008:9000, 10.233.10.49) Cancelling query
2016.10.26 16:58:54.847 [ 2201 ] <Trace> RemoteBlockInputStream: (control009:9000, 10.233.10.50) Cancelling query
2016.10.26 16:58:54.884 [ 2201 ] <Trace> RemoteBlockInputStream: (control010:9000, 10.233.10.51) Cancelling query
2016.10.26 16:58:54.884 [ 2187 ] <Trace> UnionBlockInputStream: Waited for threads to finish
2016.10.26 16:58:54.889 [ 2187 ] <Debug> MemoryTracker: Peak memory usage (for query): 672.90 MiB.
2016.10.26 16:58:54.890 [ 2034 ] <Trace> RemoteBlockInputStream: (control005:9000, 10.233.10.46) Cancelling query
2016.10.26 16:58:54.895 [ 2187 ] <Error> HTTPHandler: Code: 159, e.displayText() = DB::Exception: Timeout exceeded: elapsed 117.036843666 seconds, maximum: 40, e.what() = DB::Exception, Stack trace:
0. clickhouse-server(StackTrace::StackTrace()+0x16) [0xfea856]
1. clickhouse-server(DB::Exception::Exception(std::string const&, int)+0x1f) [0xfadbbf]
2. clickhouse-server(DB::IProfilingBlockInputStream::checkLimits()+0x54a) [0x11e709a]
3. clickhouse-server(DB::IProfilingBlockInputStream::read()+0x1d6) [0x11e8456]
4. clickhouse-server(DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0>::Handler, (DB::StreamUnionMode)0>::loop(unsigned long)+0x10b) [0x133219b]
5. clickhouse-server(DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0>::Handler, (DB::StreamUnionMode)0>::thread(MemoryTracker*, unsigned long)+0x56) [0x1332a66]
6. clickhouse-server(execute_native_thread_routine+0x20) [0x32ecd00]
7. /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f8973217184]
8. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f897283237d]
......