Allowing Prometheus to discover scrape targets over http

154 views
Skip to first unread message

avai...@gmail.com

unread,
Oct 10, 2017, 7:05:55 PM10/10/17
to Prometheus Developers
Hello All,

I am a happy user of prometheus but wanted to describe a pain point I've experienced.

Currently we have many physical machines running mysql processes inside docker containers. Whenever a new cluster is created it gets allocated to containers and deployed to some subset of physical machines.

At this point we will have our management process ssh into our prometheus nodes and drop some files (using file config) which prometheus watches. In addition whenever new hosts are added we will drop a file so prometheus will monitor node_exporter and cadvisor processes.

One thing that would make managing prometheus simpler in this setting is if we simply provided prometheus an http endpoint where it could discover scrape targets in some json format.

For example, the only line we'd provide in our prometheus config would be 'http://node-info.prod.company.com'. Prometheus would poll this endpoint and any new targets/labels would be added and any old ones would be purged. This would greatly simply managing the location of files on prometheus' filesystem.

Given how there are many exotic methods for configuring scrape targets (file_config, ec2, openstack, nerve, gce, kubernetes, ...), I was a bit surprised about why a simple http protocol might not be available.

Curious about your thoughts on this.

Thanks,

Anil

Callum Styan

unread,
Oct 10, 2017, 8:41:59 PM10/10/17
to avai...@gmail.com, Prometheus Developers
Maybe you can provide some more info as to what you want this new http discovery to solve, as I'm not sure I entirely understand what the issue is.

If it's simply that dropping the file for file_sd is manual, you can automate that via config management or a small service that would grab that file from somewhere for you and save it where prometheus expects to find it. Otherwise, why not just use real service discovery via something like consul? 

If the issue something else then hopefully you can elaborate and we can discuss from there :)


--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/0907894d-0fd3-4612-84f7-7b8c90b0c75c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Julius Volz

unread,
Oct 10, 2017, 9:22:51 PM10/10/17
to Callum Styan, avai...@gmail.com, Prometheus Developers
Yeah, basically what Callum said. If you're going to do network-based discovery, there are already numerous ones that Prometheus supports which you could use.


On Tue, Oct 10, 2017 at 5:41 PM, Callum Styan <callu...@gmail.com> wrote:
Maybe you can provide some more info as to what you want this new http discovery to solve, as I'm not sure I entirely understand what the issue is.

If it's simply that dropping the file for file_sd is manual, you can automate that via config management or a small service that would grab that file from somewhere for you and save it where prometheus expects to find it. Otherwise, why not just use real service discovery via something like consul? 

If the issue something else then hopefully you can elaborate and we can discuss from there :)
On Tue, Oct 10, 2017 at 4:05 PM, <avai...@gmail.com> wrote:
Hello All,

I am a happy user of prometheus but wanted to describe a pain point I've experienced.

Currently we have many physical machines running mysql processes inside docker containers. Whenever a new cluster is created it gets allocated to containers and deployed to some subset of physical machines.

At this point we will have our management process ssh into our prometheus nodes and drop some files (using file config) which prometheus watches. In addition whenever new hosts are added we will drop a file so prometheus will monitor node_exporter and cadvisor processes.

One thing that would make managing prometheus simpler in this setting is if we simply provided prometheus an http endpoint where it could discover scrape targets in some json format.

For example, the only line we'd provide in our prometheus config would be 'http://node-info.prod.company.com'. Prometheus would poll this endpoint and any new targets/labels would be added and any old ones would be purged. This would greatly simply managing the location of files on prometheus' filesystem.

Given how there are many exotic methods for configuring scrape targets (file_config, ec2, openstack, nerve, gce, kubernetes, ...), I was a bit surprised about why a simple http protocol might not be available.

Curious about your thoughts on this.

Thanks,

Anil

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

Anil Vaitla

unread,
Oct 10, 2017, 9:31:43 PM10/10/17
to Prometheus Developers
Currently our system is the source of truth for where processes run and what needs monitoring (it also knows what nodes are masters/replicas and has other useful information for labeling metrics). Currently with the file drop mechanism we periodically need to poll prometheus' file system to see what is currently monitoring. If there is a discrepancy between this and our source of truth, we will sync the prometheus file system to reflect what it should be. Certainly this is not too challenging, but I wonder if we could do better.

The pain point I want to address is having to manage two states of the world and ensuring they are in sync. Currently we maintain our source of truth and the model of prometheus' filesystem. The model I had in mind that would allow us to not manage prometheus' state is that prometheus just pulls what we want it to monitor every few minutes and manages this information itself.

I'll look into the service discovery mechanisms, but we also want to weigh the benefits of a new component with the costs of managing one.

Thanks for your time.

Matthias Rampke

unread,
Oct 11, 2017, 3:51:56 AM10/11/17
to Anil Vaitla, Prometheus Developers

Can you run processes alongside Prometheus? The pattern I would choose here is to run a sidecar dedicated to reconciliation on the Prometheus server node, amd let it do whatever it needs to – this could be as simple as running curl with conditional GET in a bash loop. This would ensure that the state is always in sync (after a round or two of the loop).

/MR


--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/2177d05e-b3b5-42b2-96cb-5a70a8a06f27%40googlegroups.com.

Anil Vaitla

unread,
Oct 11, 2017, 1:54:52 PM10/11/17
to Prometheus Developers
Yes that’s what we are currently doing, and it works just fine. I was thinking that not managing this process at all could be a place for improvement but it may not be too big a problem for others. Thanks for the feedback all!

Anil Vaitla

unread,
Oct 11, 2017, 2:18:31 PM10/11/17
to Prometheus Developers
Out of curiosity is there a place where you’ve collected the number of users using each sd mechanism and some pain points they might have with them?

If not, what do you think about me sending out a google survey to the Prometheus users mailing list asking what sd mechanisms they use and what things they ran into in their environment that was a pain point?

This would certainly not collect data from all Prometheus users of course, and possibly no data at all, but I think it could be valuable information in terms of roadmap development.

I think it would also be a great way to share the solutions one another have come up with.

Just a thought.

Callum Styan

unread,
Oct 11, 2017, 8:51:22 PM10/11/17
to Matthias Rampke, Anil Vaitla, Prometheus Developers
​Agreed. No need to check if the file_sd on the prometheus node is in sync or not, just periodically copy it from your source of truth anyways.​


On Wed, Oct 11, 2017 at 12:51 AM, Matthias Rampke <m...@soundcloud.com> wrote:

Can you run processes alongside Prometheus? The pattern I would choose here is to run a sidecar dedicated to reconciliation on the Prometheus server node, amd let it do whatever it needs to – this could be as simple as running curl with conditional GET in a bash loop. This would ensure that the state is always in sync (after a round or two of the loop).

/MR

On Wed, Oct 11, 2017, 03:31 Anil Vaitla <avai...@gmail.com> wrote:
Currently our system is the source of truth for where processes run and what needs monitoring (it also knows what nodes are masters/replicas and has other useful information for labeling metrics). Currently with the file drop mechanism we periodically need to poll prometheus' file system to see what is currently monitoring. If there is a discrepancy between this and our source of truth, we will sync the prometheus file system to reflect what it should be. Certainly this is not too challenging, but I wonder if we could do better.

The pain point I want to address is having to manage two states of the world and ensuring they are in sync. Currently we maintain our source of truth and the model of prometheus' filesystem. The model I had in mind that would allow us to not manage prometheus' state is that prometheus just pulls what we want it to monitor every few minutes and manages this information itself.

I'll look into the service discovery mechanisms, but we also want to weigh the benefits of a new component with the costs of managing one.

Thanks for your time.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAFU3N5USvMgRXv_1BFA5Le2BRP%3DxCB-2hR2yb3dG9i_oPoJ2dQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages