--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/388fa107-6a69-4584-81b1-79528bc9f1b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Basically if the storage gets corrupted in some non-anticipated way, there's nothing that we can easily do, and since it's not a replicated storage system, we also can't just restore it from some other replica. Finished blocks are mostly immutable, except for when they get compacted into bigger blocks.The WAL is only for not losing recent sample data every time in the face of server crashes.
On Mon, Jan 22, 2018 at 11:11 PM, Peter Zaitsev <p...@percona.com> wrote:
Hi,Reading on Prometheus 2.0 TSDBI wonder what is really expected durability of TSDB by design (I recognize there are crash recovery bugs to be expected in new code)One one side it states "It is secured against crashes by a write-ahead-log (WAL) that can be replayed when the Prometheus server restarts after a crash."On the other:"If your local storage becomes corrupted for whatever reason, your best bet is to shut down Prometheus and remove the entire storage directory. However, you can also try removing individual block directories to resolve the problem. This means losing a time window of around two hours worth of data per block directory. Again, Prometheus's local storage is not meant as durable long-term storage."Is it the case if there might be some bugs... time will tell or there are some known conditions in which storage will become corrupted ?
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/388fa107-6a69-4584-81b1-79528bc9f1b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CA%2BT6YoyPsWwO_VZLJ5rWNVVc_NPSLtruFR8cH9L7hUmPaPKxYw%40mail.gmail.com.
Basically if the storage gets corrupted in some non-anticipated way, there's nothing that we can easily do, and since it's not a replicated storage system, we also can't just restore it from some other replica. Finished blocks are mostly immutable, except for when they get compacted into bigger blocks.The WAL is only for not losing recent sample data every time in the face of server crashes.
On Mon, Jan 22, 2018 at 11:11 PM, Peter Zaitsev <p...@percona.com> wrote:
Hi,Reading on Prometheus 2.0 TSDBI wonder what is really expected durability of TSDB by design (I recognize there are crash recovery bugs to be expected in new code)One one side it states "It is secured against crashes by a write-ahead-log (WAL) that can be replayed when the Prometheus server restarts after a crash."On the other:"If your local storage becomes corrupted for whatever reason, your best bet is to shut down Prometheus and remove the entire storage directory. However, you can also try removing individual block directories to resolve the problem. This means losing a time window of around two hours worth of data per block directory. Again, Prometheus's local storage is not meant as durable long-term storage."Is it the case if there might be some bugs... time will tell or there are some known conditions in which storage will become corrupted ?
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscri...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/388fa107-6a69-4584-81b1-79528bc9f1b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi,
So if I understand what you're saying unless there are some bugs in the code or there is corruption on filesystem or device level the storage should be reliable ?I would point out what many Database Engines will have same kind of data loss in case of file system corruption. Some of them have repair tools or emergency data extractions tools... others may notand really the only universal recovery process is to recover from backup... which Prometheus supports too I assume
Given the immutable blocks, are there better recommendations we can give for full or partial recovery? Like deleting the WAL, or the corrupt block? Does the log give enough information to identify the problematic bits?
/MR
On Wed, Jan 24, 2018 at 12:04 AM, Peter Zaitsev <p...@percona.com> wrote:Hi,So if I understand what you're saying unless there are some bugs in the code or there is corruption on filesystem or device level the storage should be reliable ?I would point out what many Database Engines will have same kind of data loss in case of file system corruption. Some of them have repair tools or emergency data extractions tools... others may notand really the only universal recovery process is to recover from backup... which Prometheus supports too I assumeYeah, you can do consistent snapshots in 2.x now.However, when we talk about data durability, we usually think about local-only storage vs. clustered and replicated storage systems, which Prometheus is not (and by design, shouldn't be). That's the main differentiation we want to make with those points.
the only "gap" I see prometheus is missing some sort of long term transaction log which can be replayed to achieve point in time recovery, such as binlog in MySQL.
On Tue, Jan 23, 2018 at 4:29 AM, Julius Volz <juliu...@gmail.com> wrote:
Basically if the storage gets corrupted in some non-anticipated way, there's nothing that we can easily do, and since it's not a replicated storage system, we also can't just restore it from some other replica. Finished blocks are mostly immutable, except for when they get compacted into bigger blocks.The WAL is only for not losing recent sample data every time in the face of server crashes.
On Mon, Jan 22, 2018 at 11:11 PM, Peter Zaitsev <p...@percona.com> wrote:
Hi,Reading on Prometheus 2.0 TSDBI wonder what is really expected durability of TSDB by design (I recognize there are crash recovery bugs to be expected in new code)One one side it states "It is secured against crashes by a write-ahead-log (WAL) that can be replayed when the Prometheus server restarts after a crash."On the other:"If your local storage becomes corrupted for whatever reason, your best bet is to shut down Prometheus and remove the entire storage directory. However, you can also try removing individual block directories to resolve the problem. This means losing a time window of around two hours worth of data per block directory. Again, Prometheus's local storage is not meant as durable long-term storage."Is it the case if there might be some bugs... time will tell or there are some known conditions in which storage will become corrupted ?
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/388fa107-6a69-4584-81b1-79528bc9f1b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CA%2BT6YowT7duq3cYacxZpfrMf%2BW07LExuX%3DXmUaAwcxj-r3Yz6A%40mail.gmail.com.