What is the right way to do HA in prometheus?

1,161 views
Skip to first unread message

mdou...@gmail.com

unread,
Jun 14, 2017, 1:05:39 PM6/14/17
to Prometheus Users
Obviously, you don't want to lose data when a single prometheus host goes offline and it seems like the logical approach is to spin up to prometheus boxes scraping the same targets, and federating those up to a global for visualization and alerting.  What is the right way to dedup those time series, assume that I don't care which of the redundant servers I get the time series from just as long a get them?

--Matt

Brian Brazil

unread,
Jun 14, 2017, 1:10:52 PM6/14/17
to mdou...@gmail.com, Prometheus Users
On 14 June 2017 at 18:05, <mdou...@gmail.com> wrote:
Obviously, you don't want to lose data when a single prometheus host goes offline and it seems like the logical approach is to spin up to prometheus boxes scraping the same targets,


 
and federating those up to a global for visualization and alerting.

Federation doesn't help you, have the main Prometheus servers for visualisation and alerting. Adding in another level reduces reliability.

 What is the right way to dedup those time series, assume that I don't care which of the redundant servers I get the time series from just as long a get them?

Don't, talk to a Prometheus server that works.

It's not generally possible to dedupe, see https://www.robustperception.io/monitoring-without-consensus/

--

Matt Doughty

unread,
Jun 14, 2017, 1:32:33 PM6/14/17
to Brian Brazil, Prometheus Users
Oh right, I'm not sure what I was thinking. I can just have redundant stacks with graphing on each stack and the only issue is duplicate alerts which alertmanager should handle, and there is working being done to make alertmanager HA.

--Matt 
--
--Matt

Ben Kochie

unread,
Jun 14, 2017, 1:39:16 PM6/14/17
to Matt Doughty, Brian Brazil, Prometheus Users
At GitLab, we have an haproxy setup in failover mode to switch to a backup server if the main one is down.

On Wed, Jun 14, 2017 at 7:32 PM, Matt Doughty <mdou...@gmail.com> wrote:
Oh right, I'm not sure what I was thinking. I can just have redundant stacks with graphing on each stack and the only issue is duplicate alerts which alertmanager should handle, and there is working being done to make alertmanager HA.

--Matt 

On Wed, Jun 14, 2017 at 1:10 PM, Brian Brazil <brian.brazil@robustperception.io> wrote:
On 14 June 2017 at 18:05, <mdou...@gmail.com> wrote:
Obviously, you don't want to lose data when a single prometheus host goes offline and it seems like the logical approach is to spin up to prometheus boxes scraping the same targets,


 
and federating those up to a global for visualization and alerting.

Federation doesn't help you, have the main Prometheus servers for visualisation and alerting. Adding in another level reduces reliability.

 What is the right way to dedup those time series, assume that I don't care which of the redundant servers I get the time series from just as long a get them?

Don't, talk to a Prometheus server that works.

It's not generally possible to dedupe, see https://www.robustperception.io/monitoring-without-consensus/

--



--
--Matt

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAGyBzcxibUNGs%3DHA7%3Dpb3e1h%3DTQa6w8bPW%2BM%2Bi%2Bs-%3DzxsZNxJw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Matt Doughty

unread,
Jun 14, 2017, 1:51:44 PM6/14/17
to Ben Kochie, Brian Brazil, Prometheus Users
Yeah that would work, but I probably wouldn't be able to set that up given my constraints, and the dual stacks works fine as long as alerts get deduped at routing.

I think the dual stack will work, and I'm not sure I want to add HA proxy into the mix. I don't much care if both stacks don't have exactly the same view of the world
just as long as I can prevent duplicately alerts being sent out from alert manager.

--Matt
--
--Matt
Reply all
Reply to author
Forward
0 new messages