Clickhouse on xfs file system

62 views
Skip to first unread message

kriticar

unread,
Nov 21, 2019, 4:28:34 AM11/21/19
to ClickHouse
Hi,

we have one clickhouse instalation on xfs file system (don't ask why).
Due to the large number of small files, looks like xfs has some problems.
For example, if you want to check size of the directory branch in clickhause data folder with du -h --max-depth=1, it lasts for more than hour or doesn't end at all.
We checked that in single table clickhouse/data directory there are 60k+ folders (in which are files).

How to reduce number of files on file system? Optimize?
What is te purpose of the active=0 and remove_time populated records in system.parts? Why optimize doesn't remove them?
How to keep file system and clickhouse healty?

Regards.

Denis Zhuravlev

unread,
Nov 21, 2019, 9:21:26 AM11/21/19
to ClickHouse
Not related to XFS.

The only reason is tiny partitions (PARTITION BY startOf15Min) --> many tiny partitions --> many parts (folders) --> many files

Denis Zhuravlev

unread,
Nov 21, 2019, 9:24:44 AM11/21/19
to ClickHouse
>What is te purpose of the active=0 and remove_time populated records in system.parts? Why optimize doesn't remove them?
CH removes them after 8 minutes.
The reason is because CH does not use fsync (for performance), so 8 minutes prevents data loss on spontaneous server(OS) reboot.

On Thursday, 21 November 2019 05:28:34 UTC-4, kriticar wrote:

kriticar

unread,
Nov 22, 2019, 7:27:59 AM11/22/19
to ClickHouse
in which it is stated:
"
We did two things to mitigate the issue: Switch from xfs to ext4, the xfs nodes experienced strange performance degradation under load where overall load went up 8x and disk i/o almost stalled. It was also impossible to recover from replicas on xfs due to connection breaks, while it progressed from nodes on ext4. Change the affected node configuration to only two replicas per shard, so it is forced to recover from only one replica. This works even with xfs nodes, but makes recovery tedious.
"

So xfs could be the problem. Purpose of the 15 min partition was to keep a single day (96 partitions max)  and to drop old partition every 15 minutes with ttl, but anyway (as this wasn't a good approach)  I changed partitioning to daily partitions, still keeping one day with ttl.

I am really confused with removing unused partitions.
If I don't touch anything, I have to many partitions.

If I optimize table (with no partitions and no final), it will optimize some, but not all partitions.

If I optimize table (with no partitions) with final it is to expensive regarding resources.

Now I am using a script hat uses select on each node:

select database, replace(table, '.inner.', '') table, partition, count() parts_cnt

from remote('ngasp01', system, parts)

where engine like '%MergeTree'

and partition != 'tuple()'

and partition not like toString(today()) || '%'

group by database

, table

, partition

having parts_cnt > 1


and then optimize partition by partition.

It works, but still some partitions are not optimized.
At the moment I have a table that is optimized on all 10 nodes. On 4 nodes (and its replicas) I have a single part.
On one node (and its replica) I cannot get single part for a yesterday's data (no new data are coming for yesterday), even if I optimize table ... partition .... final.
Whatever I do, I have 11 active parts on these two nodes.

┌─partition──┬─name─────────────────────┬─active─┬─bytes_on_disk─┐
│ 2019-11-21 │ 20191121_0_18761_26      │      1 │             7 │
│ 2019-11-21 │ 20191121_18762_18990_18  │      1 │       2019052 │
│ 2019-11-21 │ 20191121_18991_18997_1   │      1 │          5524 │
│ 2019-11-21 │ 20191121_18998_18998_0   │      1 │          2748 │
│ 2019-11-21 │ 20191121_19004_19004_0   │      1 │          2742 │
│ 2019-11-21 │ 20191121_19005_19005_0   │      1 │          2747 │
│ 2019-11-21 │ 20191121_19009_19009_0   │      1 │          2738 │
│ 2019-11-21 │ 20191121_19010_19010_0   │      1 │          3194 │
│ 2019-11-21 │ 20191121_19011_19011_0   │      1 │          2740 │
│ 2019-11-21 │ 20191121_19012_25605_244 │      1 │     788897472 │
│ 2019-11-21 │ 20191121_25606_33945_222 │      1 │             7 │
└────────────┴──────────────────────────┴────────┴───────────────┘



How do you keep parts/file system in good condition?

Regards.

Denis Zhuravlev

unread,
Nov 22, 2019, 11:02:56 AM11/22/19
to ClickHouse
>So xfs could be the problem.
Only if your company name is cloudflare.

>I am really confused with removing unused partitions.
you need to do nothing with inactive partitions.

>If I don't touch anything, I have to many partitions.
So with daily partitioning you're experiencing "too many parts" or too many partitions ? What is " too many partitions" ?
Reply all
Reply to author
Forward
0 new messages