Hi.
We have AKS cluster on Azure, which is monitored by in-cluster prometheus.
Prometheus pod has configured persistent storage which uses azure disk as a storage backend (it is discrbed
here).
Some time age we decided to use azure file as storage backend (which basic description can be found
here).
After some initial tests I have decided to run it, and migrate data from previous instance (by coping old directories with data).
So I have started new instance, copied some old data to /data dir, restarted prometheus and everything was fine. Newly scraped data are present on dashboards along with old migrated.
I have started to migrate another batch of data.
Everything is working fine, after ~1,5h hour from start I noticed that server was creating a one folder with data each minute:
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:00 01E20SAAAK4CC27H07QH3ZXN3C
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:00 01E20SAQWA1KVBCNDBRZ5KTVQ1
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:00 01E20SAYDGXC12DSXA7066NJ03
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:00 01E20SB6G921Y6T0ZFJTKNWDNX
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:00 01E20SBDYENAF1NE7YRDHHY34K
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:00 01E20SBP0VW47HWK00EREXW15A
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:00 01E20SBY3EN5JS0K32X9WTNXC2
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:01 01E20SC70QMKGF0QSSN25PESF0
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:01 01E20SCHR65MQRMPG58XG8JMT9
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:01 01E20SCVD861GEFYHR5695EQNH
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:01 01E20SD8C78SG806AN5X22DPHS
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:01 01E20SDK67WV2X82K8RNWAM1FD
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:01 01E20SDY6R4XVK6RJQA2PZH16H
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:02 01E20SE9X5ZG2WPV6E9RZ88JPK
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:02 01E20SEQG0T38AZK4G8SNZEAEG
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:02 01E20SF9AR8TJZE372ZRP06JYS
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:03 01E10T77TYT8YBJ7RH3R1CE88Z
drwxr-xr-x 2 nobody nogroup 0 Feb 26 13:03 01E20SG2NKH38JM55SS8Y3N0M5
And in logs i saw something like this:
level=error ts=2020-02-26T13:01:58.895708548Z caller=db.go:341 component=tsdb msg="compaction failed" err="reload blocks: invalid block sequence: block time ranges overlap: [mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s, blocks: 12]: <ulid: 01E20SBDYENAF1NE7YRDHHY34K, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SAQWA1KVBCNDBRZ5KTVQ1, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SAYDGXC12DSXA7066NJ03, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SB6G921Y6T0ZFJTKNWDNX, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SAAAK4CC27H07QH3ZXN3C, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SBP0VW47HWK00EREXW15A, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SBY3EN5JS0K32X9WTNXC2, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SC70QMKGF0QSSN25PESF0, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SCHR65MQRMPG58XG8JMT9, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SCVD861GEFYHR5695EQNH, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SD8C78SG806AN5X22DPHS, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SDK67WV2X82K8RNWAM1FD, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>" level=info ts=2020-02-26T13:02:02.134855312Z caller=compact.go:443 component=tsdb msg="write block" mint=1582711200000 maxt=1582718400000 ulid=01E20SDY6R4XVK6RJQA2PZH16H
...
(shortened for readability)
And after some time this log entry is repeated multiple times but with higher block count:
level=info ts=2020-02-26T14:02:24.079677532Z caller=compact.go:443 component=tsdb msg="write block" mint=1582711200000 maxt=1582718400000 ulid=01E20WWEN4Y1FW535M5VN1WZCZ
level=error ts=2020-02-26T14:03:40.144058489Z caller=db.go:341 component=tsdb msg="compaction failed" err="reload blocks: invalid block sequence: block time ranges overlap: [mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s, blocks: 95]: <ulid: 01E20TSHHXVCCEQZ7RMR1F0N8H, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SAQWA1KVBCNDBRZ5KTVQ1, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SAYDGXC12DSXA7066NJ03, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SB6G921Y6
T0ZFJTKNWDNX, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SBDYENAF1NE7YRDHHY34K, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SBP0VW47HWK00EREXW15A, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20TYZV0Y8FMM3WT1DY6FRKB, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SC70QMKGF0QSSN25PESF0, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SCHR65MQRMPG58XG8JMT9, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SCVD861GEFYHR5695EQNH, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SD8C78SG806AN5X22DPHS, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SDK67WV2X82K8RNWAM1FD, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SDY6R4XVK6RJQA2PZH16H, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SE9X5ZG2WPV6E9RZ88JPK, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SEQG0T38AZK4G8SNZEAEG, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SF9AR8TJZE372ZRP06JYS, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SG2NKH38JM55SS8Y3N0M5, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SGDREP1SQNRH738259AC5, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SGR6W3VCM0MH8QB6VTCPS, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SH2D4WAD4FM2DC6ZH0TAM, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SHEBKHMV1P7KP7B89FRYM, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SHWT3C6WA3TG09SB80H45, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SJ9NXN1M0AXS18R6X5CK2, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SJQ3Z3KD60YN6WQ805DNH, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SK3F800SK1MKN19PGGETZ, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SKM5AM6FB00248HPRZ1W4, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SM1Z00J5S07ZMCP5V3JR1, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SMF3F3PEGJ8244CBBG784, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SMXWGCF7J2ZW510TE5QVW, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SND70FW39613K843ZVH4Z, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SNVH2ACCPASRWT8H5RZH5, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SPARSEFP8E9EX9461NE4M, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SPT29R98NC852BMQTNZ1Y, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SQAKQE8T41XS5Q70049FG, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SQV17F1F4PBJYSJ9TXVJZ, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SRCYKKG5G210SX9RC4RGA, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SAAAK4CC27H07QH3ZXN3C, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SSG76YSCXA7K9Q75CWDJA, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20ST49KD8029JGAH7Y090EX, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20STV260VAPK05DY5DES9PV, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SVGFTJPGA2YENCGW75QTF, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SX2EEP791A6JJMFMSFRDG, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20SZFX1AG9NYBNHV22XAVQP, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T0683AYR4J7PSC8YTMBFR, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T0TA05FFDNYMJ0W33PFGV, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T1DQGFZ0SA3FW60H6TY2D, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T231MT7CTDQRHR7KR3A4Z, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T2VBWQR4D2T7C11V4HCGB, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T3H789XEW22454B17KHBF, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T46199V8E0WRWQ0GN3J69, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T4X0PXDM79TCW22A1ANJN, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T5K9DGF3TS09EYW5B3E9Q, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T6A0PFCT3JJS86MBQNNYS, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T71QNK9WBGT5QFDWMGYY2, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T7TEM7Y1TNKQYV8VPJW9H, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T8H90FC39T0G7JZG8E86Y, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20T99XASQ8MQTNVQRRPG9XZ, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20TA5JRJZZVP5AXJWTKCYHD, mint: 1582711200000, maxt: 1582718400000, range: 2h0m0s>, <ulid: 01E20TCS5H585B02X2SZCRFN9W, mint: 1582711200000, maxt:
...
(shortened for readability, but this is much longer than the first entry)
When I run prometheus server and I don't copy data everything is ok. Copied data don't overlap, for sure, they are from beginning of this month.
Timestamps and increasing number of blocks suggests that compacting loop creates additional folder with data, and don't delete old one, and creates another folder with compacted data... and so one.
What can be cause of this problem?