service discovery

180 views
Skip to first unread message

ruben....@gmail.com

unread,
Jan 21, 2019, 4:36:13 AM1/21/19
to Prometheus Developers
hi,

i wasn't really aware of the new SD moratorium. and so i went and did this:

https://github.com/prometheus/prometheus/pull/5113

my question is now - what is the general plan going forward? i mean, the landscape keeps changing, and new things will appear, while old things disappear. there is, for example, another discussion on docker-swarm. i also believe that it will disappear from new deployments mid-term, but as of now, it is still popular.

as for interfacing with eureka - well, the consul adapter doesn't really cut it, since it doesn't really label as expected, and sometimes instances that were long dead kept showing up as a target. i now deployed my version of the eureka discovery to a test environment, and it works like a charm, including deregistering things super quickly.

also, it makes for a very nice experience if you can tell people to simply configure one thing (the eureka URL) and have everything work as expected from there on, with all metadata available.

the question is one of balance, of course - how much code do you need to maintain, and what's the benefit from it.

from where i stand now (and that's me using zuul and eureka all over the place) it does make sense to include this one. but also, i would think docker-swarm (or maybe plain docker?) could make sense. and so, i would really appreciate if there was any wriggle room with the moratorium.

regards

.rm

Ben Kochie

unread,
Jan 21, 2019, 5:29:04 AM1/21/19
to ruben....@gmail.com, Prometheus Developers
I'm in favor of ending the moratorium, instead having a set of requirements for new additions.

Some requirements I can think of:
* Automated tests, end-to-end testing.
* Some description of the popularity of the system. Will more than 1% of users find this useful, will it increase adoption?
* Some commitment from the maintainers of the system to support the integration.

These don't have to be hard-fact driven decisions. We can simply take a maintainers vote[0] to include a proposed integration. IMO it just looks bad that we have an unending moratorium on improving one of our most important core features, dynamic discovery.

For example, I think Docker Swarm clearly meets the popularity threshold, but we've had no interest from Docker the company to support us.



--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/009d4bbd-4d54-4caa-a1aa-2a60d277eb55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ruben....@gmail.com

unread,
Jan 21, 2019, 5:35:06 AM1/21/19
to Prometheus Developers

simon also pointed me to several external solutions to this issue, and especially the file based discovery. this is unsatisfactory, since this causes additional (and unnecessary) pain in containerized environments, i.e. there has to be shared volumes (pain) or additional custom processes running in the container (even more pain).

although i would appreciate eureka discovery being made available, i wouldn't insist on it, if we had EITHER an HTTP poller OR a rest api to update targets (the former one being preferred).

Ben Kochie

unread,
Jan 21, 2019, 5:43:04 AM1/21/19
to ruben....@gmail.com, Prometheus Developers
One proposal I've made several times in the past is that we support HTTP URLs in `file_sd_configs`. This way the Prometheus server could simply GET from a URL that returns json/yaml compatible with file_sd_configs.

On Mon, Jan 21, 2019 at 11:35 AM <ruben....@gmail.com> wrote:

simon also pointed me to several external solutions to this issue, and especially the file based discovery. this is unsatisfactory, since this causes additional (and unnecessary) pain in containerized environments, i.e. there has to be shared volumes (pain) or additional custom processes running in the container (even more pain).

although i would appreciate eureka discovery being made available, i wouldn't insist on it, if we had EITHER an HTTP poller OR a rest api to update targets (the former one being preferred).

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

Brian Brazil

unread,
Jan 21, 2019, 5:44:29 AM1/21/19
to Ben Kochie, ruben....@gmail.com, Prometheus Developers
On Mon, 21 Jan 2019 at 10:29, Ben Kochie <sup...@gmail.com> wrote:
I'm in favor of ending the moratorium, instead having a set of requirements for new additions.

Some requirements I can think of:
* Automated tests, end-to-end testing.

We still lack this for the existing SDs, which was one of the things on the list for ending the moratorium. 
 
* Some description of the popularity of the system. Will more than 1% of users find this useful, will it increase adoption?
* Some commitment from the maintainers of the system to support the integration.

The problem is that creators of previous SDs offered that, and then disappeared. I believe the only integration that has this currently is Azure (and I don't know if that's official).

Brian
 

These don't have to be hard-fact driven decisions. We can simply take a maintainers vote[0] to include a proposed integration. IMO it just looks bad that we have an unending moratorium on improving one of our most important core features, dynamic discovery.

For example, I think Docker Swarm clearly meets the popularity threshold, but we've had no interest from Docker the company to support us.



On Mon, Jan 21, 2019 at 10:36 AM <ruben....@gmail.com> wrote:
hi,

i wasn't really aware of the new SD moratorium. and so i went and did this:

https://github.com/prometheus/prometheus/pull/5113

my question is now - what is the general plan going forward? i mean, the landscape keeps changing, and new things will appear, while old things disappear. there is, for example, another discussion on docker-swarm. i also believe that it will disappear from new deployments mid-term, but as of now, it is still popular.

as for interfacing with eureka - well, the consul adapter doesn't really cut it, since it doesn't really label as expected, and sometimes instances that were long dead kept showing up as a target. i now deployed my version of the eureka discovery to a test environment, and it works like a charm, including deregistering things super quickly.

also, it makes for a very nice experience if you can tell people to simply configure one thing (the eureka URL) and have everything work as expected from there on, with all metadata available.

the question is one of balance, of course - how much code do you need to maintain, and what's the benefit from it.

from where i stand now (and that's me using zuul and eureka all over the place) it does make sense to include this one. but also, i would think docker-swarm (or maybe plain docker?) could make sense. and so, i would really appreciate if there was any wriggle room with the moratorium.

regards

.rm

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/009d4bbd-4d54-4caa-a1aa-2a60d277eb55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ben Kochie

unread,
Jan 21, 2019, 6:50:45 AM1/21/19
to Brian Brazil, ruben....@gmail.com, Prometheus Developers
On Mon, Jan 21, 2019 at 11:44 AM Brian Brazil <brian....@robustperception.io> wrote:
On Mon, 21 Jan 2019 at 10:29, Ben Kochie <sup...@gmail.com> wrote:
I'm in favor of ending the moratorium, instead having a set of requirements for new additions.

Some requirements I can think of:
* Automated tests, end-to-end testing.

We still lack this for the existing SDs, which was one of the things on the list for ending the moratorium. 

That doesn't make sense at all. If a new feature comes in and is well tested, it only improves the existing codebase.
 
 
* Some description of the popularity of the system. Will more than 1% of users find this useful, will it increase adoption?
* Some commitment from the maintainers of the system to support the integration.

The problem is that creators of previous SDs offered that, and then disappeared. I believe the only integration that has this currently is Azure (and I don't know if that's official).

This doesn't mean it's a bad requirement to approve a new feature.

ruben....@gmail.com

unread,
Jan 21, 2019, 7:15:09 AM1/21/19
to Prometheus Developers
On Monday, January 21, 2019 at 11:43:04 AM UTC+1, Ben Kochie wrote:
> One proposal I've made several times in the past is that we support HTTP URLs in `file_sd_configs`. This way the Prometheus server could simply GET from a URL that returns json/yaml compatible with file_sd_configs.

this was actually one of my earlier thoughts. this would make it rather easy to do something on the other end, without adding _that_ much code on the prometheus side.

ruben....@gmail.com

unread,
Jan 21, 2019, 7:28:08 AM1/21/19
to Prometheus Developers

> Some requirements I can think of:
> * Automated tests, end-to-end testing.
>
>
> We still lack this for the existing SDs, which was one of the things on the list for ending the moratorium. 
>  

i agree that it is a good approach to cleanup the existing stuff before adding to it. what i find a bit daunting (personally) is to provide this for the code i added - especially given the fact that i only actually added a handful of lines.

nonetheless, i would like to help here, but go test harnesses aren't exactly my home turf.

> The problem is that creators of previous SDs offered that, and then disappeared. I believe the only integration that has this currently is Azure (and I don't know if that's official).

see above - i am not sure what i actually have to do. and this makes it difficult for me to get to the point were i feel confident enough to do this in the approcimately 2 hours that it would take someone with more experience.

>
> For example, I think Docker Swarm clearly meets the popularity threshold, but we've had no interest from Docker the company to support us.
>

the go SDK at

https://docs.docker.com/develop/sdk/examples/

looks pretty manageable. i assume anyone could whip up at least local docker support in minutes.



Krasimir Georgiev

unread,
Jan 21, 2019, 7:47:23 AM1/21/19
to ruben....@gmail.com, Prometheus Developers
I hate to say this, but I also agree that until we have proper e2e tests for all SD providers and a long term commitment from a Prometheus maintainer to fix bugs and update a given SD provider adding a new one is not a good idea.

And you can always use the custom SD which is very easy to implement.
https://github.com/prometheus/prometheus/tree/master/documentation/examples/custom-sd
If the current custom SD doesn't cover a good amount of use case we can think how to improve.

Krasi Georgiev
Senior Software Engineer
Prometheus Team
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

ruben....@gmail.com

unread,
Jan 21, 2019, 8:02:26 AM1/21/19
to Prometheus Developers
On Monday, January 21, 2019 at 1:47:23 PM UTC+1, Krasi Georgiev wrote:
> I hate to say this, but I also agree that until we have proper e2e tests for all SD providers and a long term commitment from a Prometheus maintainer to fix bugs and update a given SD provider adding a new one is not a good idea.

i am as torn as you are - i do understand the reasoning behind this, however, but at the same time, this only a valid argument if support for various service discoveries is going to be phased out and possibly moved to "external" services (i.e. separate adapters translating "$whatever" into target groups). then, the moratorium makes sense.

if the overall strategy is to keep supporting these, then the focus should be on improving test coverage of the existing SDs as well as possibly improving generic support for testing them. as i said, i am willing to help, but i'm too stupid to do anything useful right off the bat, and i guess this happens to other people too. the way i see it, this is the gap that needs to be closed, and where any effort to making this more accessible would eventually be beneficiary for everyone.

> And you can always use the custom SD which is very easy to implement.
> https://github.com/prometheus/prometheus/tree/master/documentation/examples/custom-sd
> If the current custom SD doesn't cover a good amount of use case we can think how to improve.
>

as i mentioned above - what would solve this issue would be supporting http urls in the custom SD. anything file based doesn't work as nice for any containerised environments (which is exactly the use case where dynamic discovery becomes extremely important), since you have to share a filesystem OR run more processes inside a container.

.rm


Krasimir Georgiev

unread,
Jan 21, 2019, 9:05:49 AM1/21/19
to ruben....@gmail.com, Prometheus Developers



On Jan 21 2019, at 3:02 pm, ruben....@gmail.com wrote:

On Monday, January 21, 2019 at 1:47:23 PM UTC+1, Krasi Georgiev wrote:
I hate to say this, but I also agree that until we have proper e2e tests for all SD providers and a long term commitment from a Prometheus maintainer to fix bugs and update a given SD provider adding a new one is not a good idea.

i am as torn as you are - i do understand the reasoning behind this, however, but at the same time, this only a valid argument if support for various service discoveries is going to be phased out and possibly moved to "external" services (i.e. separate adapters translating "$whatever" into target groups). then, the moratorium makes sense.

if the overall strategy is to keep supporting these, then the focus should be on improving test coverage of the existing SDs as well as possibly improving generic support for testing them. as i said, i am willing to help, but i'm too stupid to do anything useful right off the bat, and i guess this happens to other people too. the way i see it, this is the gap that needs to be closed, and where any effort to making this more accessible would eventually be beneficiary for everyone.

this calls for another GSOC mentor :) and adding it as a project in



And you can always use the custom SD which is very easy to implement.
If the current custom SD doesn't cover a good amount of use case we can think how to improve.


as i mentioned above - what would solve this issue would be supporting http urls in the custom SD. anything file based doesn't work as nice for any containerised environments (which is exactly the use case where dynamic discovery becomes extremely important), since you have to share a filesystem OR run more processes inside a container.

.rm

This has come up few times now so I would definitely vote +1 for adding this.





--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

Callum Styan

unread,
Jan 22, 2019, 1:57:51 PM1/22/19
to Krasimir Georgiev, ruben....@gmail.com, Prometheus Developers
I would still like to take on more SD work.

Brian is any of the testing setup that Connor built early last year still available?

Brian Brazil

unread,
Jan 22, 2019, 2:00:33 PM1/22/19
to Callum Styan, Krasimir Georgiev, ruben....@gmail.com, Prometheus Developers
On Tue, 22 Jan 2019 at 18:57, Callum Styan <callu...@gmail.com> wrote:
I would still like to take on more SD work.

Brian is any of the testing setup that Connor built early last year still available?

I don't believe any of that is around still I'm afraid.

Brian
 

For more options, visit https://groups.google.com/d/optout.

Julius Volz

unread,
Feb 9, 2019, 5:27:51 PM2/9/19
to Brian Brazil, Callum Styan, Krasimir Georgiev, ruben....@gmail.com, Prometheus Developers
The CNCF doesn't normally sponsor feature work for policy reasons, but I wonder if they would be able to make an exception for testing/stability work like this? I think if someone actually offered plenty of money to throw at this problem, it would all be solvable. It's just that people have trouble maintaining more in their free time that is not tested etc.

daniel.s...@gmail.com

unread,
Feb 27, 2019, 3:04:27 PM2/27/19
to Prometheus Developers
On Monday, January 21, 2019 at 11:43:04 AM UTC+1, Ben Kochie wrote:
> One proposal I've made several times in the past is that we support HTTP URLs in `file_sd_configs`. This way the Prometheus server could simply GET from a URL that returns json/yaml compatible with file_sd_configs.

I would also be in favour of something like that. Writing out files to disk feels clumsy to me, especially considering the original information probably lives in a database.

Last year I actually ended up implementing a very minimal, fake Consul API as an adapter to a proprietary database containing hosts to scrape. Consul seemed like the simplest API to implement, and it works like a charm.

I always thought it would make sense for Prometheus to support an HTTP-based "Prometheus SD", which was sufficiently well-documented that people could easily implement such services, talking to whatever proprietary stuff they have on the backend - pretty similar to what was decided for adding new notification methods to Alertmanager (via HTTP hooks).

Julien Pivotto

unread,
Mar 10, 2019, 6:55:55 PM3/10/19
to Prometheus Developers

On Wednesday, February 27, 2019 at 9:04:27 PM UTC+1, daniel.s...@gmail.com wrote:
On Monday, January 21, 2019 at 11:43:04 AM UTC+1, Ben Kochie wrote:
> One proposal I've made several times in the past is that we support HTTP URLs in `file_sd_configs`. This way the Prometheus server could simply GET from a URL that returns json/yaml compatible with file_sd_configs.

I would also be in favour of something like that. Writing out files to disk feels clumsy to me, especially considering the original information probably lives in a database.


I would like to start working on this if there is a consensus.

That would be called http_sd and would reuse the http client we have in common.

I would however like to get a "go" before I work on that.

Regards,

Julius Volz

unread,
Mar 11, 2019, 7:24:10 AM3/11/19
to Julien Pivotto, Prometheus Developers
So far the consensus (or rather strong feeling of a few, especially Brian) has been to only have one generic mechanism for doing a certain thing (whether it's SD with file_sd, or AM with webhook notifier). But we could have a more official debate about it and possibly even bring it to a vote. It's a bit on the edge for me.

In case we added HTTP support, would we poll periodically or do a long-poll watch (like Kubernetes SD does)? The latter would give us more responsive results and less SD overhead, but would be more complex to implement on both sides (and would need some framing around the target groups file format). That's the downside of doing it over HTTP vs. local file watches.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

Simon Pasquier

unread,
Mar 11, 2019, 5:36:38 PM3/11/19
to Julius Volz, Julien Pivotto, Prometheus Developers
On Mon, Mar 11, 2019 at 12:24 PM Julius Volz <juliu...@gmail.com> wrote:
>
> So far the consensus (or rather strong feeling of a few, especially Brian) has been to only have one generic mechanism for doing a certain thing (whether it's SD with file_sd, or AM with webhook notifier). But we could have a more official debate about it and possibly even bring it to a vote. It's a bit on the edge for me.
>
> In case we added HTTP support, would we poll periodically or do a long-poll watch (like Kubernetes SD does)? The latter would give us more responsive results and less SD overhead, but would be more complex to implement on both sides (and would need some framing around the target groups file format). That's the downside of doing it over HTTP vs. local file watches.

I second this. Before jumping right to the implementation, it would be
good to have a better understanding of what an HTTP SD would look
like.

>
> On Sun, Mar 10, 2019 at 11:55 PM Julien Pivotto <roidel...@inuits.eu> wrote:
>>
>>
>> On Wednesday, February 27, 2019 at 9:04:27 PM UTC+1, daniel.s...@gmail.com wrote:
>>>
>>> On Monday, January 21, 2019 at 11:43:04 AM UTC+1, Ben Kochie wrote:
>>> > One proposal I've made several times in the past is that we support HTTP URLs in `file_sd_configs`. This way the Prometheus server could simply GET from a URL that returns json/yaml compatible with file_sd_configs.
>>>
>>> I would also be in favour of something like that. Writing out files to disk feels clumsy to me, especially considering the original information probably lives in a database.
>>
>>
>> I would like to start working on this if there is a consensus.
>>
>> That would be called http_sd and would reuse the http client we have in common.
>>
>> I would however like to get a "go" before I work on that.
>>
>> Regards,
>>
>> --
>> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
>> To post to this group, send email to prometheus...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/0f9ac41e-5526-4880-9d0c-a4904c9393e5%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
> To post to this group, send email to prometheus...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CA%2BT6YozKnwQ8BY4pEzq_ovD3yC2_EE1OcRkL5rob5oCsCcwnYA%40mail.gmail.com.

Brian Brazil

unread,
Mar 11, 2019, 5:39:18 PM3/11/19
to Simon Pasquier, Julius Volz, Julien Pivotto, Prometheus Developers
On Mon, 11 Mar 2019 at 21:36, Simon Pasquier <spas...@redhat.com> wrote:
On Mon, Mar 11, 2019 at 12:24 PM Julius Volz <juliu...@gmail.com> wrote:
>
> So far the consensus (or rather strong feeling of a few, especially Brian) has been to only have one generic mechanism for doing a certain thing (whether it's SD with file_sd, or AM with webhook notifier). But we could have a more official debate about it and possibly even bring it to a vote. It's a bit on the edge for me.
>
> In case we added HTTP support, would we poll periodically or do a long-poll watch (like Kubernetes SD does)? The latter would give us more responsive results and less SD overhead, but would be more complex to implement on both sides (and would need some framing around the target groups file format). That's the downside of doing it over HTTP vs. local file watches.

I second this. Before jumping right to the implementation, it would be
good to have a better understanding of what an HTTP SD would look
like.

If it's a generic integration, it needs to be as simple as possible which means regular polling. The goal would be to make it as easy to integrate as possible, not to invent a brand new SD - which there's enough of in the world already.

Brian

 

For more options, visit https://groups.google.com/d/optout.

Julius Volz

unread,
Mar 12, 2019, 5:44:50 AM3/12/19
to Brian Brazil, Simon Pasquier, Julien Pivotto, Prometheus Developers
On Mon, Mar 11, 2019 at 10:39 PM Brian Brazil <brian....@robustperception.io> wrote:
On Mon, 11 Mar 2019 at 21:36, Simon Pasquier <spas...@redhat.com> wrote:
On Mon, Mar 11, 2019 at 12:24 PM Julius Volz <juliu...@gmail.com> wrote:
>
> So far the consensus (or rather strong feeling of a few, especially Brian) has been to only have one generic mechanism for doing a certain thing (whether it's SD with file_sd, or AM with webhook notifier). But we could have a more official debate about it and possibly even bring it to a vote. It's a bit on the edge for me.
>
> In case we added HTTP support, would we poll periodically or do a long-poll watch (like Kubernetes SD does)? The latter would give us more responsive results and less SD overhead, but would be more complex to implement on both sides (and would need some framing around the target groups file format). That's the downside of doing it over HTTP vs. local file watches.

I second this. Before jumping right to the implementation, it would be
good to have a better understanding of what an HTTP SD would look
like.

If it's a generic integration, it needs to be as simple as possible which means regular polling. The goal would be to make it as easy to integrate as possible, not to invent a brand new SD - which there's enough of in the world already.

The other watch-based generic HTTP SDs that are in the world are not tailored around Prometheus and its labeled data model though, so you have to do things like munging Consul tags into labels. I'm just saying it would be very tempting to produce something that's native to Prometheus *and* watch-based for efficiency and responsiveness. But I concede that it wouldn't be that useful in practice if it's too hard to implement on the other side. A Go library would help a bit with that, but would require using Go.

So I probably agree with you, but at least would like to see this point explored a bit further.

rm

unread,
Mar 12, 2019, 10:23:54 AM3/12/19
to Julius Volz, Brian Brazil, Prometheus Developers

> The other watch-based generic HTTP SDs that are in the world are not
> tailored around Prometheus and its labeled data model though, so you
> have to do things like munging Consul tags into labels. I'm just
> saying it would be very tempting to produce something that's native
> to Prometheus *and* watch-based for efficiency and responsiveness.
> But I concede that it wouldn't be that useful in practice if it's too
> hard to implement on the other side. A Go library would help a bit
> with that, but would require using Go.
>
> So I probably agree with you, but at least would like to see this
> point explored a bit further.

hi,

my main concern with filewatchers is that they're not particularly
cloud-friendly. a vanilla "http-watcher" that expects sd information in
prometheus format would have huge advantages here.

currently, the closest thing for our use case (where the actual SD is
eureka) is kind of clumsy, with eureka-consul, then consul-prometheus.
on the other hand, implementing a eureka->prometheus translation,
adding an endpoint to the eureka server, and the polling this seems
much easier.

in any case, this kind of approach would make sd integration a lot more
accessible - and pretty much any adaptor to $SD_OF_CHOICE can be
implemented rather quickly.

so - a big +1 from me for anything HTTP-based - and expecting something
like along the lines of what the current file watcher expects also
makes sense, so another +1 for that.

long-polling could simply be left up to the opposite side ... as in
"don't respond until interval X has passed OR a change is observed" the
prometheus poller would only have to be able to live with long
timeouts. websocket don't really seem to be an option - since it would
require a lot more complexity that a braindead long-poll HTTP endpoint,
where all you would have to configure is a URL.

.rm





Matthias Rampke

unread,
Mar 12, 2019, 10:43:10 AM3/12/19
to rm, Julius Volz, Brian Brazil, Prometheus Developers
So Prometheus would poll on some short interval, and it's up to the SD
provider to reply immediately or only after a while?

I like that – it covers both use cases without additional
configuration, we only need to document that "respond only after there
is a change" is a valid thing to do, as is "respond immediately with
the same data". I expect the way we handle SD updates would already do
the right thing in either case?

/MR
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
> To post to this group, send email to prometheus...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/ac17292c38fc2e8734710afffeb572cb3b6e66ef.camel%40posteo.net.

Julien Pivotto

unread,
Mar 12, 2019, 10:46:30 AM3/12/19
to Matthias Rampke, rm, Julius Volz, Brian Brazil, Prometheus Developers
On 12 Mar 14:42, 'Matthias Rampke' via Prometheus Developers wrote:
> So Prometheus would poll on some short interval, and it's up to the SD
> provider to reply immediately or only after a while?
>
> I like that – it covers both use cases without additional
> configuration, we only need to document that "respond only after there
> is a change" is a valid thing to do, as is "respond immediately with
> the same data". I expect the way we handle SD updates would already do
> the right thing in either case?
>

We could also look at ETag header to see if there are changes or not.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAFU3N5U0KT6L9JgTSK3nB_RFSsDamWH4HOE_rwpCDhGx35OA8A%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

--
(o- Julien Pivotto
//\ Open-Source Consultant
V_/_ Inuits - https://www.inuits.eu
signature.asc

rm

unread,
Mar 12, 2019, 10:54:59 AM3/12/19
to Julien Pivotto, Prometheus Developers

> We could also look at ETag header to see if there are changes or not.

... or just MD5 the body thing and compare?

i don't think etags are really helpful here. it's easy to figure out a
change (i would assume this is already implemented for files), and the
overhead is in repeated connections more than in repeated answers.

also: if you go through the devs in your organization: how many of them
are able to deal with etags correctly? :)

.rm

Brian Brazil

unread,
Mar 12, 2019, 11:17:12 AM3/12/19
to Matthias Rampke, rm, Julius Volz, Prometheus Developers
On Tue, 12 Mar 2019 at 14:43, Matthias Rampke <m...@soundcloud.com> wrote:
So Prometheus would poll on some short interval, and it's up to the SD
provider to reply immediately or only after a while?

I like that – it covers both use cases without additional
configuration, we only need to document that "respond only after there
is a change" is a valid thing to do, as is "respond immediately with
the same data". I expect the way we handle SD updates would already do
the right thing in either case?

That's not safe in general, what happens if Prometheus restarts?

Brian

rm

unread,
Mar 12, 2019, 11:19:08 AM3/12/19
to Brian Brazil, Matthias Rampke, Prometheus Developers
On Tue, 2019-03-12 at 15:16 +0000, Brian Brazil wrote:
> That's not safe in general, what happens if Prometheus restarts?
>
> Brian

do you ... connections left open? or the time it takes until it gets
it's initial config? i am not 100% clear what your concern is.

.rm

Matthias Rampke

unread,
Mar 12, 2019, 11:51:14 AM3/12/19
to rm, Brian Brazil, Prometheus Developers
Oh, I see – Prometheus needs to get the current state "immediately"
when it restarts, so there would need to be a way to tell the server
that just this once we don't want a long-poll. Hmm. Is there a
standard way of signaling this?

/MR

Julius Volz

unread,
Mar 12, 2019, 12:11:57 PM3/12/19
to Matthias Rampke, rm, Brian Brazil, Prometheus Developers
You could add a revision arg to the request to determine if anything changed since the last requested revision.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

Brian Brazil

unread,
Mar 12, 2019, 12:25:22 PM3/12/19
to Julius Volz, Matthias Rampke, rm, Prometheus Developers
On Tue, 12 Mar 2019 at 16:11, Julius Volz <juliu...@gmail.com> wrote:
You could add a revision arg to the request to determine if anything changed since the last requested revision.

That's getting into inventing an SD territory.

Brian

Julien Pivotto

unread,
Mar 12, 2019, 12:25:35 PM3/12/19
to Julius Volz, Matthias Rampke, rm, Brian Brazil, Prometheus Developers
On 12 Mar 17:11, Julius Volz wrote:
> You could add a revision arg to the request to determine if anything
> changed since the last requested revision.
>

That is what http Etag header is for.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CA%2BT6YoxOsyH4ZpP%2B8tVnf67eJUfaNXHiJuK0bZJfwp2DK6w-5w%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

signature.asc

rm

unread,
Mar 12, 2019, 12:38:33 PM3/12/19
to Julius Volz, Matthias Rampke, Brian Brazil, Prometheus Developers
On Tue, 2019-03-12 at 17:11 +0100, Julius Volz wrote:
> You could add a revision arg to the request to determine if anything
> changed since the last requested revision.


well - the standard way would be "if-modified-since", right?

and then you would have either

200 + body
if modified

OR

304 not modified
if not

... if you combine this with "1970-01-01" then bob's your proverbial
uncle.

also, i think it would be possible (and not too much of a stretch) to
specify that the other side should either reply immediately regardless
of whether or not a change happens (i.e. fast-poll), OR implement
additional logic so that it can deal with "if-modified-since" request
headers OR etags OR a request header like "x-long-poll: 0..N" to
specify a timeout (0 for immediate, N for long-poll), then after N
seconds ALWAYS return the current status.

which one it is, i am not really all that passionate about, i think
those would all be equally good. i think "x-long-poll" would be the
most fool-proof one, though.

.rm





Julius Volz

unread,
Mar 12, 2019, 1:06:08 PM3/12/19
to rm, Matthias Rampke, Brian Brazil, Prometheus Developers
Ah right, that's basically how Etags work.

And yes, would make sense to do fast-poll by default and optionally support some kind of long-poll time / version header.

Julius Volz

unread,
Mar 12, 2019, 1:07:27 PM3/12/19
to rm, Matthias Rampke, Brian Brazil, Prometheus Developers
And if the long-poll variant is optional, then one may as well start out with an SD implementation on both sides that only does fast (normal) polling and we could decide whether we want the long-polling features later, and how they should work.

Matthias Rampke

unread,
Mar 12, 2019, 1:26:34 PM3/12/19
to Julius Volz, rm, Brian Brazil, Prometheus Developers
Agree – and we could see whether a long-polling mechanism is needed at all. It seems like we would want regular polling anyway, and then we can see if there is still a need for long-polls?

/MR

Ruben Malchow

unread,
Mar 12, 2019, 1:26:43 PM3/12/19
to Julius Volz, Matthias Rampke, Brian Brazil, Prometheus Developers
Yes. That absolutely makes sense. And content-wise I'd guess we could simply have the exact same stuff that the file watcher is expecting. This kind of construct is probably just a handful of lines away.

Short poll + optional long poll is also nice. Can you think of a reason why we would need the SD side to signal support for long poll? To me it looks as if it would be enough to say either

"I need this now" (I.e. x-long-poll: 0)

Or

"Take your time" (I.e. x-long-poll: N)

Either way, I would expect the current situation after N seconds, so checking for modifications should be done before the config is actually reloaded. There, etags would be one way, but I still think simply calculating a hash is more robust, and also less hassle. I.e. implementing it once on the Prometheus side and not over and over again on watch adapter.

.rm
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Matthias Rampke

unread,
Mar 12, 2019, 1:49:19 PM3/12/19
to Ruben Malchow, Julius Volz, Brian Brazil, Prometheus Developers
How is idempotency implemented on file SD?

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

ruben malchow

unread,
Mar 12, 2019, 3:18:18 PM3/12/19
to Matthias Rampke, Julius Volz, Brian Brazil, Prometheus Developers


quick look at:


https://github.com/prometheus/prometheus/blob/master/discovery/file/file.go

yields the code below - which looks as if it completely relies on
change events received from fsnotiy. in refresh(), it also tracks
"disappeared" files (i.e. no longer found in last update) and sends
empty updates. so unless changes are tracked further up, we definitely
need to add some sort of status with a short-poll mechanism. md5 the
response body and compare that with the previous? or do we need a more
thorough comparison?

.rm

-------------------- SNIP --------------------

case event := <-d.watcher.Events:
// fsnotify sometimes sends a bunch of events
without name or operation.
// It's unclear what they are and why they are
sent - filter them out.
if len(event.Name) == 0 {
break
}
// Everything but a chmod requires rereading.
if event.Op^fsnotify.Chmod == 0 {
break
}
// Changes to a file can spawn various
sequences of events with
// different combinations of operations. For
all practical purposes
// this is inaccurate.
// The most reliable solution is to reload
everything if anything happens.
d.refresh(ctx, ch)

-------------------- SNIP --------------------

Daniel Swarbrick

unread,
Mar 12, 2019, 4:20:45 PM3/12/19
to prometheus...@googlegroups.com

Take a look at Consul's blocking queries: (https://www.consul.io/api/index.html#blocking-queries)

Endpoints that support blocking queries return an HTTP header named X-Consul-Index. This is a unique identifier representing the current state of the requested resource.

On subsequent requests for this resource, the client can set the index query string parameter to the value of X-Consul-Index, indicating that the client wishes to wait for any changes subsequent to that index.

When this is provided, the HTTP request will "hang" until a change in the system occurs, or the maximum timeout is reached.

In addition to index, endpoints that support blocking will also honor a wait parameter specifying a maximum duration for the blocking request.

Reply all
Reply to author
Forward
0 new messages