Dynamically configure job endpoints

671 views
Skip to first unread message

joa...@dynamic-design.se

unread,
Sep 10, 2014, 5:49:23 PM9/10/14
to prometheus...@googlegroups.com
Hi and thanks for a great product!

We are currently investigating using prometheus for telemetry since we (mostly) use a go stack. We utilize micro services that are (for example):
- Scaled up and down
- Tagged as canaries/qa/prod
- Jumping between hosts (CoreOS)
- Continuously building and releasing
...you get it, the actual IP and host of service endpoints changes ALOT. I guess this is the case for soundcloud as well but reading through the docs I could not find any reference about updating/adding/removing job endpoints at runtime?

All our services are registered via etcd so monitoring them is not the problem. But something doesn't feel right about rebooting the prometheus "server" every time an endpoint comes or goes.

Thoughts?

Brian Brazil

unread,
Sep 10, 2014, 7:17:23 PM9/10/14
to joa...@dynamic-design.se, prometheus-developers
The support is there in code, the only current dynamic support is via DNS SRV records. I at least would be happy to see generic support for etcd like addressing or even just local file-based added.

Brian
 

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

rebecka...@gmail.com

unread,
Sep 11, 2014, 5:32:37 AM9/11/14
to prometheus...@googlegroups.com, joa...@dynamic-design.se
Thanks for the quick reply!

I guess I could just hook up something like https://github.com/skynetservices/skydns to resolve DNS for my etcd entries then! Could you please point me to the configuration for DNS backed jobs in prometheus? Sorry if I'm being stupid but I can't figure it out from the docs, nor from reading the code...

joa...@dynamic-design.se

unread,
Sep 11, 2014, 5:47:55 AM9/11/14
to prometheus...@googlegroups.com, joa...@dynamic-design.se, rebecka...@gmail.com
Wow, incredible to see that my wife just got interested in programming and ops... No, I'm sry, accidentally misposted from her account :P

Björn Rabenstein

unread,
Sep 11, 2014, 6:40:37 AM9/11/14
to rebecka...@gmail.com, prometheus-developers, joa...@dynamic-design.se
On Thu, Sep 11, 2014 at 11:32 AM, <rebecka...@gmail.com> wrote:
>
> I guess I could just hook up something like https://github.com/skynetservices/skydns to resolve DNS for my etcd entries then! Could you please point me to the configuration for DNS backed jobs in prometheus? Sorry if I'm being stupid but I can't figure it out from the docs, nor from reading the code...
>
>> The support is there in code, the only current dynamic support is via DNS SRV records. I at least would be happy to see generic support for etcd like addressing or even just local file-based added.

Yeah, documentation is still sketchy... (Will be fixed before 1.0... ;)

The best source of canonical information about the configuration is
currently the protobuf definition:
https://github.com/prometheus/prometheus/blob/master/config/config.proto
Check out "sd_name".

Providing alternative service discovery mechanism might be one of the
thing we should discuss at the upcoming PromCon...

--
Björn Rabenstein, Engineer
http://soundcloud.com/brabenstein

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany
Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

Julius Volz

unread,
Sep 11, 2014, 9:10:16 PM9/11/14
to joa...@dynamic-design.se, prometheus-developers
Hey Joakim,

Thanks for your interest in Prometheus! Just out of curiosity (since we haven't talked much about Prometheus yet), may I ask how you stumbled over it?

With regards to your question, Björn probably already gave the most useful answer. For completeness' sake, there's also the "/api/targets&job=<foo>" API endpoint which allows PUTing a list of JSON objects of type TargetGroup (https://github.com/prometheus/prometheus/blob/master/web/api/targets.go#L26-L29) to the server to dynamically replace a job's endpoints during runtime. However, it's not clear whether we will continue supporting this as an HTTP API feature forever.

I think it's a good idea to add more support for dynamic targets in the future; we just need to do it carefully and as generally applicable as possible, so that we don't end up with a big bag of different mechanisms to support directly. It would also be cool to make the config reloadable during runtime, so that no restarts are required for these kinds of changes anymore in general.

One more thing: since you said that your targets are changing very rapidly, are you aware that this means a set of new timeseries every time a target changes (the old ones will become stale and new ones will appear, since the target's HTTP metrics endpoint is added as the "instance" label of each timeseries). This is probably totally fine, but just be aware that once you reach multiple millions of timeseries (stale ones + current ones), you'll probably run into bottlenecks somewhere in the storage.

Cheers,
Julius

joa...@dynamic-design.se

unread,
Sep 13, 2014, 6:18:50 PM9/13/14
to prometheus...@googlegroups.com, joa...@dynamic-design.se
For reference: I accidentally didn't cc the mailing list:

​Of course, I listened to the Gophercon talk from Peter Bourgon and he mentioned it. I've also stumbled across it multiple times just searching google and GH for a Go metrics solution. But it was kind of a big step actually starting trying it out since the documentation is kind of sparse. But I guess that you haven't really "gone public" with it yet :)

Yeah regarding the new timeseries I guess that's just the way it is. They are not chaning that rapidly... But are you saying that if I run 3 instances of userservice-v.1.1 and then spin up 3 instances of userservice-v.1.2 and (on success) kill the old v.1.1 jobs I get a new timeseries for my 1.2 instances (if I tag them with 1.2)?

I'm not sure what the desired outcome is, but ofc I want to be able to compare the metrics from the 1.1 to the 1.2. How do you keep track of different versions? I guess you deploy new versions of services pretty often as well?

Julius Volz

unread,
Sep 15, 2014, 6:11:38 AM9/15/14
to Joakim Gustin, prometheus-developers
Ah, Peter's brief mention at GopherCon, yes :)

So the general rule about timeseries is that a single timeseries is identified by a unique set of key=value pairs (labels). If any label changes, gets removed, or added, a new timeseries is born. So if e.g. you track the version in a label, then you will have different set of timeseries for each version (which enables comparisons and so on between versions). That's totally fine, and we're doing the same with our internal cluster deployment system. Well, actually, our applications usually don't export version information themselves, but our cluster scheduler which runs them does export metadata like CPU usage, memory usage, etc. and labels this by revision, environment, instance ID, etc. But it's also possible to put version information directly into your application's /metrics exports itself. Just try to keep the total number of timeseries tracked by a Prometheus to under a couple of million :) For example, if you have a target that exports 1000 different timeseries, and you deploy it 1000 times a day with a new version, you'll create 1000x1000 = 1 million timeseries in total after a day. Of course that is a bit of an extreme example.

Also, I just wanted to make sure that you did discover the Wiki at https://github.com/prometheus/prometheus/wiki/_pages, right? I guess I should write a new page, "Life of a Timeseries" there soon :)

Cheers,
Julius

Reply all
Reply to author
Forward
0 new messages