Data getting corrpted - missing meta.json

183 views
Skip to first unread message

Guru SD

unread,
Feb 18, 2020, 10:16:44 PM2/18/20
to Prometheus Users
Hello,

We have a clustered Prometheus setup but with a common storage disk. However, we are unable to view data after a few hours as the data is getting corrupted and meta.json is missing. Found that lot of people are facing a similar issue, however apart from data deletion did not find any other fix for this issue.  

Is there is a fix for this?? Please help.

Using the latest version of Prometheus.

Thanks in Advance,
Guru..

Brian Candler

unread,
Feb 19, 2020, 3:30:20 AM2/19/20
to Prometheus Users
On Wednesday, 19 February 2020 03:16:44 UTC, Guru SD wrote:
We have a clustered Prometheus setup but with a common storage disk.

Please can you be more specific about what you mean by "common storage disk" and how it is configured.

Do you have multiple prometheus servers accessing the same storage backend via NFS?  Are they configured with the same --storage.tsdb.path on the same NFS server?!

Guru SD

unread,
Feb 20, 2020, 7:40:59 AM2/20/20
to Prometheus Users
Yes we have two Prometheus servers accessing the same storage backend. Not sure if it is via NFS. They have configured to use the same storage.tsdb.path on one of the two servers.

The servers are Azure VMs running on Ubuntu. Using the latest version of Prometheus.

Stuart Clark

unread,
Feb 20, 2020, 8:06:45 AM2/20/20
to Guru SD, Prometheus Users
If you have more than one copy of Prometheus running against the same data directory you will cause corruption.

They need to be using separate directories and the use of NFS is generally not recommended.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Guru SD

unread,
Feb 20, 2020, 9:58:51 PM2/20/20
to Prometheus Users
Thanks for confirming, kind of thought this was the problem. Can you pls help with which file system to use? We are running Ubuntu on Azure VMs. Is it ext4? 

Robin Pharaoh

unread,
Apr 3, 2020, 2:36:08 PM4/3/20
to Prometheus Users
Was this ever resolved? We are hitting a the exact same issue right now.

We have a single instance of prometheus
We are using Azure File Share with a volume claim

Christian Hoffmann

unread,
Apr 3, 2020, 2:56:22 PM4/3/20
to Robin Pharaoh, Prometheus Users
Hi Robin,

On 4/3/20 8:36 PM, Robin Pharaoh wrote:
> Was this ever resolved? We are hitting a the exact same issue right now.
>
> We have a single instance of prometheus
> We are using Azure File Share with a volume claim
I don't have any Azure knowledge, but I assume this is a SMB-based mount?

Then I guess this is something which is not supported as I don't think
SMB matches all POSIX requirements which Prometheus relies on?

Citing the docs [1]:
"Non POSIX compliant filesystems are not supported by Prometheus's local
storage, corruptions may happen, without possibility to recover. NFS is
only potentially POSIX, most implementations are not."

I have never seen such corruption issues when running on xfs. I guess
your best bet would be block-based storage and a standard *nix
filesystem such as xfs or ext4.

Kind regards,
Christian

[1] https://prometheus.io/docs/prometheus/latest/storage/

Robin Pharaoh

unread,
Apr 3, 2020, 3:09:31 PM4/3/20
to Prometheus Users
Hi Christian,

This does appear to be the issue.

In the Azure ecosystem Azure Disks should be posix compliant, would switching to those resolve the issue or would we still have issues with NFS?

Reference at the very bottom of this link:

And if that also would be an issue, do you know if we would see similar problems trying to use Thanos to manage storage on Azure instead of PVCs?
Reply all
Reply to author
Forward
0 new messages