aggregating metrics across remote systems with spotty connectivity?

48 views
Skip to first unread message

James Peverill

unread,
Aug 22, 2020, 2:25:26 AM8/22/20
to Prometheus Users
I have an application where I am trying to aggregate metrics from hosts that have spotty connectivity. I want to be logging locally when the hosts are periodically offline, then sync all the metrics to a central server once they connect again. Another complication is that these remote hosts sometimes get rebooted. In the ideal case the remote hosts would delete data locally after some amount of time, allowing plenty of time for it all to be synced to the centralized server.

It sounds like I could run Prometheus on all my remote hosts, with appropriate retention rules. Then have my central Prometheus server pull from them all periodically via remote_read? Will I be able to get ALL data to the central server that way? All the hosts are connected over a VPN so there are no firewall issues.

Does this sound like an appropriate use of Prometheus?

Thank you!

Ben Kochie

unread,
Aug 22, 2020, 2:40:52 AM8/22/20
to James Peverill, Prometheus Users
Yes, the only good option to retain data in remote nodes like this is to have Prometheus running in the remote location. If it's one-off nodes, you'll need a localhost Prometheus. The good news is, this is relatively efficient, as the minimum footprint of Prometheus is pretty small.

As for how to get the data home, this sounds like a use case for remote write or Thanos sidecars. With remote write, the systems will stream the data up to a central service, this is tracked via the write ahead log, so you won't have any data loss. With the sidecar, it will upload blocks of data every 2 hours, and require a reverse connection for reading recent data. The nice thing about remote write here is that no VPN is necessary, it can stream securely via https.

I recommend Cortex or Thanos Remote Write server to receive the data from the remote host Prometheus servers.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/29c35e2a-58f7-48b7-ad99-28e890eacc26n%40googlegroups.com.

Brian Candler

unread,
Aug 22, 2020, 1:43:37 PM8/22/20
to Prometheus Users
Another option to look at for the remote write target is Victoria Metrics.  I like this because it's a trivial single-binary install.  You can point multiple prometheus instances at it with remote_write protocol (as well as various other write protocols and formats), and you can query it directly using a superset of PromQL.

Laurent Dumont

unread,
Aug 22, 2020, 6:23:27 PM8/22/20
to Brian Candler, Prometheus Users
I wonder if you could use a Push Gateway to which all the spotty hosts would write to. They could write locally the metrics to a file and keep trying to push the metrics once they have the connectivity to the central Prometheus.

On Sat, Aug 22, 2020 at 1:43 PM Brian Candler <b.ca...@pobox.com> wrote:
Another option to look at for the remote write target is Victoria Metrics.  I like this because it's a trivial single-binary install.  You can point multiple prometheus instances at it with remote_write protocol (as well as various other write protocols and formats), and you can query it directly using a superset of PromQL.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

Brian Candler

unread,
Aug 23, 2020, 3:40:29 AM8/23/20
to Prometheus Users
No, you definitely *cannot* use pushgateway for this.

Pushgateway is only for recording the *last* metric written.  It cannot record a history of measurements; neither can prometheus import historical measurements (by scraping or any other means).

James Peverill

unread,
Aug 28, 2020, 2:18:26 AM8/28/20
to Prometheus Users

Is data that is kept in the WAL considered "historical" in this context? I was reading that the WAL drops data after ~2 hours. The way our system works, hosts may be disconnected for a while during logging. Typically it is only 30-60 minutes, but it can be longer. Is the 2 hour age limit changeable?

For our application ideally we'd like to be able to ingest and/or import past data to have it all in one place for analysis. It seems this isn't possible with Prometheus currently. This is more of a nice to have than a requirement, but we do have several years of accumulated logs in other formats. Converting/ingesting these is a separate project vs live logging, but one we are considering in order to get it all in one place.

Given the limitations around ingestion of historical data maybe remote write to Victoria Metrics would work better for us. I'll also check out Cortex and Thanos Remote Write server. Victoria Metrics seems like an simple install to play around with the storage end at least.

Appreciate the input here folks!


James



On 8/23/20 12:40 AM, Brian Candler wrote:
No, you definitely *cannot* use pushgateway for this.

Pushgateway is only for recording the *last* metric written.  It cannot record a history of measurements; neither can prometheus import historical measurements (by scraping or any other means).
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/i5GZ-sZPJnc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/93113936-6d72-46b4-b2ac-9cc24c3a6725o%40googlegroups.com.


Brian Candler

unread,
Aug 29, 2020, 5:46:47 PM8/29/20
to Prometheus Users
 

Given the limitations around ingestion of historical data maybe remote write to Victoria Metrics would work better for us.


Yes, VictoriaMetrics works well for this application.
Reply all
Reply to author
Forward
0 new messages