Consistent backup of TSDB data

52 views
Skip to first unread message

Doug Meredith

unread,
Jan 31, 2025, 9:46:12 PM1/31/25
to Prometheus Users
I'm running Prometheus on a VM and system backups are accomplished by doing live backups of the VM. This causes a snapshot of the VM to be done, and that is then backed up.

Do I need to shut Prometheus down while the snapshot is done, in order to ensure consistent backups?

Ben Kochie

unread,
Jan 31, 2025, 9:48:36 PM1/31/25
to Doug Meredith, Prometheus Users
That doesn't really follow best practice for backups, but in order to get a consistent dataset you should call the snapshot API.


On Sat, Feb 1, 2025 at 3:46 AM Doug Meredith <doug.j....@gmail.com> wrote:
I'm running Prometheus on a VM and system backups are accomplished by doing live backups of the VM. This causes a snapshot of the VM to be done, and that is then backed up.

Do I need to shut Prometheus down while the snapshot is done, in order to ensure consistent backups?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/prometheus-users/676143b5-33c2-456c-8f85-35e94ce6a64cn%40googlegroups.com.

Doug Meredith

unread,
Feb 1, 2025, 8:23:51 AM2/1/25
to Prometheus Users
Thanks for the reply, Ben. I looked at the link you sent, but if I understand it correctly, this command creates a TSDB snapshot, probably for extraction or copy purposes. Some sort of quiesce command is what would normally be used for the VM backup. I could always shut down Prometheus, but I'd rather not, if I don't have to.

Incidentally, how are full VM backups inconsistent with best practices?

Ben Kochie

unread,
Feb 1, 2025, 8:53:16 AM2/1/25
to Doug Meredith, Prometheus Users
The accepted best practice is to automate the creation of VMs via infra as code and configuration management. Then only backup the necessary data.

This has a large number of advantages.
* You don't backup standard system files / binaries, reducing the cost of your backup storage space.
* You don't carry forward corruption in system files, requiring bisection of backup snapshots in order to find the corruption. Instead you can simply wipe the system and restore the data.
* Your system is more reproducible, as your restore tests fully exercise the creation of the VM from scratch.
* Upgrades are easier, you can use your automated provisioning to create a new instance with whatever new software loadout you want. Then restore backups and test.

This has been a standard recommendation for at least 10-15 years. 

On Sat, Feb 1, 2025 at 2:24 PM Doug Meredith <doug.j....@gmail.com> wrote:
Thanks for the reply, Ben. I looked at the link you sent, but if I understand it correctly, this command creates a TSDB snapshot, probably for extraction or copy purposes. Some sort of quiesce command is what would normally be used for the VM backup. I could always shut down Prometheus, but I'd rather not, if I don't have to.

Incidentally, how are full VM backups inconsistent with best practices?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

Doug Meredith

unread,
Feb 1, 2025, 9:42:30 AM2/1/25
to Prometheus Users
I understand that you're trying to help, and I appreciate the effort, but you seem to be taking a very narrow view of "best practices". While what you are describing may well be the widely accepted best practice for a certain class of infrastructure, not everyone is playing in that league. 
Reply all
Reply to author
Forward
0 new messages