Thanks Brian.I failed to mention that the setup is under a cluster orchestrator (e.g. Kubernetes) which can re-spawn a failed instance, and it is also possible to dynamically change the configuration file to convert the non-scraping Prometheus instance to a scraping instance.Bottomline, my environment does NOT allow me to run 2 Prometheus instances, both in a scraping mode.
Nevertheless, in a HA setup the problem of re-syncing a replacement Prometheus instance (launched after one has failed) with old time-series data still remains. The problem is the same even if the HA configuration is made up of 2 "scraping" Prometheus instances. There has to be a mechanism to re-sync the replacement Prometheus instance which allows it to get the historical time series data from a point in time "before" the replacement was started. Otherwise, the HA setup is good only until the first failure. While it is possible to launch a new instance after one has failed, without resync capability, you end up with one instance with a full time series, and the other with a partial time series.-Atul
On Tuesday, March 20, 2018 at 8:56:59 PM UTC-7, Atul Goel wrote:I am trying to figure out if there is a way to scrape historical time series data from an existing Prometheus server.The use case is to create a HA configuration1. Start with 2 Prometheus instances (say A, and B).2. A is configured in a scraping mode, while B is configured as a Federation destination of A.3. B fails.4. A new instance, say C, is started again as a Federation destination of A.5. While C will be able to collect all metrics from a point in time after it started, it does not get historical metrics that are present on A.
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4cc6ea62-50ff-4809-935d-260fa77f18b7%40googlegroups.com.
On 21 March 2018 at 15:57, <goel...@gmail.com> wrote:Thanks Brian.I failed to mention that the setup is under a cluster orchestrator (e.g. Kubernetes) which can re-spawn a failed instance, and it is also possible to dynamically change the configuration file to convert the non-scraping Prometheus instance to a scraping instance.Bottomline, my environment does NOT allow me to run 2 Prometheus instances, both in a scraping mode.You can't avoid a SPOF in such a setup. I'd suggest using a k8 volume with a demonset so the restarted instance has the old data.Brian
Nevertheless, in a HA setup the problem of re-syncing a replacement Prometheus instance (launched after one has failed) with old time-series data still remains. The problem is the same even if the HA configuration is made up of 2 "scraping" Prometheus instances. There has to be a mechanism to re-sync the replacement Prometheus instance which allows it to get the historical time series data from a point in time "before" the replacement was started. Otherwise, the HA setup is good only until the first failure. While it is possible to launch a new instance after one has failed, without resync capability, you end up with one instance with a full time series, and the other with a partial time series.-Atul
On Tuesday, March 20, 2018 at 8:56:59 PM UTC-7, Atul Goel wrote:I am trying to figure out if there is a way to scrape historical time series data from an existing Prometheus server.The use case is to create a HA configuration1. Start with 2 Prometheus instances (say A, and B).2. A is configured in a scraping mode, while B is configured as a Federation destination of A.3. B fails.4. A new instance, say C, is started again as a Federation destination of A.5. While C will be able to collect all metrics from a point in time after it started, it does not get historical metrics that are present on A.
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4cc6ea62-50ff-4809-935d-260fa77f18b7%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLqtgHrjfZFW%3DdsYAWaiBCBDUu-eWMZ9nihyy%2B2Ej%3DO8kQ%40mail.gmail.com.
Thanks guys..Just to confirm, the suggestion is to rely on a "highly available shared" clustered storage to store the time-series-database. That way the restarted Prometheus instance will indeed have historical time series database.However, it's not clear how to handle the time series data that was still in memory, and hadn't yet been flushed to disk on the Failed Prometheus instance. Based on the default checkpoint interval, I guess this could be 5 minutes worth of time series. So, are we saying that we are still exposed to a "checkpoint-interval" worth of data loss - unless there is a way to plug this hole
by querying the Non-failed instance of Prometheus.
A cursory look at Thanos appears promising.
On Tuesday, March 20, 2018 at 8:56:59 PM UTC-7, Atul Goel wrote:I am trying to figure out if there is a way to scrape historical time series data from an existing Prometheus server.The use case is to create a HA configuration1. Start with 2 Prometheus instances (say A, and B).2. A is configured in a scraping mode, while B is configured as a Federation destination of A.3. B fails.4. A new instance, say C, is started again as a Federation destination of A.5. While C will be able to collect all metrics from a point in time after it started, it does not get historical metrics that are present on A.
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/cde1d974-db5a-4a6e-8526-175af597b4dd%40googlegroups.com.
Apologize. I guess I should have been even more clearer, the configuration is as follows1) 2 Prometheus instances, one in scraping mode "A", and the other as federation destination "B"2) If "A" fails, then "B" is converted to a scraping mode, and hence becomes the primary. Recovery involves starting a new Prometheus instance "C" and making it a federation destination of "B".3) If however "B" had failed, then "A" is still the primary, and recovery involves starting a new instance "C" as a federation destination of "A".
The problem is getting "C" to have the historical time-series data from before it was started.Based on your suggestion, If both "A" and "B" were each using a "highly available remote storage", then when "C" is spawned it could be made to point to the instance of the "remote storage" being used by the failed instance.My question is that doing the above still doesn't address the hole for the "checkpoint-interval worth of time-series data" that did not get flushed to disk.Btw, as far as I understand, the above problem exists even if there were always 2 instances "A" and "B" each configured in a scraping mode.
On Tuesday, March 20, 2018 at 8:56:59 PM UTC-7, Atul Goel wrote:I am trying to figure out if there is a way to scrape historical time series data from an existing Prometheus server.The use case is to create a HA configuration1. Start with 2 Prometheus instances (say A, and B).2. A is configured in a scraping mode, while B is configured as a Federation destination of A.3. B fails.4. A new instance, say C, is started again as a Federation destination of A.5. While C will be able to collect all metrics from a point in time after it started, it does not get historical metrics that are present on A.
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b90abd27-449d-438e-9510-a916f3d5e413%40googlegroups.com.
Thanks Brian.Could you please also answer ifa) There is indeed the hole where time-series data that was "not" yet flushed to disk would get lost in the event of a failure. i.e. even with a storage volume that is kept across restarts.
b) If so, is there a way to plug this hole.
On Tuesday, March 20, 2018 at 8:56:59 PM UTC-7, Atul Goel wrote:I am trying to figure out if there is a way to scrape historical time series data from an existing Prometheus server.The use case is to create a HA configuration1. Start with 2 Prometheus instances (say A, and B).2. A is configured in a scraping mode, while B is configured as a Federation destination of A.3. B fails.4. A new instance, say C, is started again as a Federation destination of A.5. While C will be able to collect all metrics from a point in time after it started, it does not get historical metrics that are present on A.
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f68a95d9-bd8b-4b77-a301-f467c9e0d9fa%40googlegroups.com.