in which it is stated:
"
We did two things to mitigate the issue:
Switch from xfs to ext4, the xfs nodes experienced strange performance degradation under load where overall load went up 8x and disk i/o almost stalled. It was also impossible to recover from replicas on xfs due to connection breaks, while it progressed from nodes on ext4.
Change the affected node configuration to only two replicas per shard, so it is forced to recover from only one replica. This works even with xfs nodes, but makes recovery tedious.
"
So xfs could be the problem. Purpose of the 15 min partition was to keep a single day (96 partitions max) and to drop old partition every 15 minutes with ttl, but anyway (as this wasn't a good approach) I changed partitioning to daily partitions, still keeping one day with ttl.
I am really confused with removing unused partitions.
If I don't touch anything, I have to many partitions.
If I optimize table (with no partitions and no final), it will optimize some, but not all partitions.
If I optimize table (with no partitions) with final it is to expensive regarding resources.
Now I am using a script hat uses select on each node:
select database, replace(table, '.inner.', '') table, partition, count() parts_cnt
from remote('ngasp01', system, parts)
where engine like '%MergeTree'
and partition != 'tuple()'
and partition not like toString(today()) || '%'
group by database
, table
, partition
having parts_cnt > 1
and then optimize partition by partition.
It works, but still some partitions are not optimized.
At the moment I have a table that is optimized on all 10 nodes. On 4 nodes (and its replicas) I have a single part.
On one node (and its replica) I cannot get single part for a yesterday's data (no new data are coming for yesterday), even if I optimize table ... partition .... final.
Whatever I do, I have 11 active parts on these two nodes.
┌─partition──┬─name─────────────────────┬─active─┬─bytes_on_disk─┐
│ 2019-11-21 │ 20191121_0_18761_26 │ 1 │ 7 │
│ 2019-11-21 │ 20191121_18762_18990_18 │ 1 │ 2019052 │
│ 2019-11-21 │ 20191121_18991_18997_1 │ 1 │ 5524 │
│ 2019-11-21 │ 20191121_18998_18998_0 │ 1 │ 2748 │
│ 2019-11-21 │ 20191121_19004_19004_0 │ 1 │ 2742 │
│ 2019-11-21 │ 20191121_19005_19005_0 │ 1 │ 2747 │
│ 2019-11-21 │ 20191121_19009_19009_0 │ 1 │ 2738 │
│ 2019-11-21 │ 20191121_19010_19010_0 │ 1 │ 3194 │
│ 2019-11-21 │ 20191121_19011_19011_0 │ 1 │ 2740 │
│ 2019-11-21 │ 20191121_19012_25605_244 │ 1 │ 788897472 │
│ 2019-11-21 │ 20191121_25606_33945_222 │ 1 │ 7 │
└────────────┴──────────────────────────┴────────┴───────────────┘
How do you keep parts/file system in good condition?
Regards.