how does the MergeTree index work ?

weijie tong

unread,

Sep 27, 2016, 11:33:37 AM9/27/16

to ClickHouse

When I saw the MergeTree code ,I find all the column data and index data was writed to the file by the HashingWriteBuffer. so, what is the index file structure ? How does the index file data affect the real column data file?

man...@gmail.com

unread,

Sep 27, 2016, 9:52:11 PM9/27/16

to ClickHouse

Index is maintained by following files:

1. primary.idx.
This file consists of values of primary key for each Nth row, where N is 'index_granularity'.
Values are stored contiguously. It is just an array.
This data is always loaded in memory.

For how it is written, see MergedBlockOutputStream.h, after
for (size_t i = index_offset; i < rows; i += storage.index_granularity)

For how it is stored in memory, see struct MergeTreeDataPart.

2. Files Column.mrk for each column.
These files (we call "marks") consists offset in data files (Column.bin) for each Nth row.
Offset is a pair: offset in compressed file to compressed block; offset in decompressed block.

This data not always loaded in memory, but cached. See MarkCache.

Index allows to locate ranges of primary key in data, possibly with some overhead of up to 'index_granularity' rows at start and end.
Index is sparse and so have very small size.

weijie tong

unread,

Sep 27, 2016, 10:59:19 PM9/27/16

to ClickHouse

Does ClickHouse work like a timeline database as to the MergeTree storage engine scenario ? As the date column must occurs in the primary columns and while the date range occurs in the storage path,I guess the query workflow as the flowing: when a query carrying a date field happens,ClickHouse use the date field to locate the matched index files,and then use index array of the index file to identify whether the data part possibly contains the matched rows.If the primary index fits the query,it continue to use the column mark for specific column query. Please review my opinion,tks

在 2016年9月28日星期三 UTC+8上午9:52:11，man...@gmail.com写道：

man...@gmail.com

unread,

Sep 27, 2016, 11:42:42 PM9/27/16

to ClickHouse

Date field (partition key) doesn't have to be in primary key.

Query works as follows:
- conditions on date field select subset of parts (directories) (1);
- conditions on primary key select sub-ranges of data in parts (2).

Primary key could be independent of date field.

1. Look at MergeTreeDataSelectExecutor::read method.
2. Look at PKCondition::mayBeTrueInRange method.

Reply all

Reply to author

Forward