This is an issue that also affects other languages that are used in a prefork model. There's a multiprocess branch for the python client too but it seems like better discovery mechanisms would solve the issue while keeping clients simple. Has there been any effort in that direction?
Currently the only way I have to monitor workers behind a gunicorn/uwsgi proxy that preforks has been to add a startup code in the worker that tries to bind to each port in a range until it finds one that is open, and to scrape the worker via that port. It seems really clumsy. But monitoring each worker separately is more likely to help detect issues so I'd rather stick to that approach rather than using a registry shared accross processes.
Another option would be to have some kind of machine-wide authority (or user-wide) that is discoverable easily by the workers (e.g. it runs on a fixed port on localhost, or it provides a unix socket at a predetermined path) and allow workers to self-announce. One easy way to do this would be to have that central authority tell the worker "please export on port 12345" and tell Prometheus "a XYZ worker is now exporting on localhost:12345". I'm not sure if that's even possible to do in a portable way without possible races on port grabbing.
Does anyone have a better solution?
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This is an issue that also affects other languages that are used in a prefork model. There's a multiprocess branch for the python client too but it seems like better discovery mechanisms would solve the issue while keeping clients simple. Has there been any effort in that direction?
Currently the only way I have to monitor workers behind a gunicorn/uwsgi proxy that preforks has been to add a startup code in the worker that tries to bind to each port in a range until it finds one that is open, and to scrape the worker via that port. It seems really clumsy. But monitoring each worker separately is more likely to help detect issues so I'd rather stick to that approach rather than using a registry shared accross processes.
Another option would be to have some kind of machine-wide authority (or user-wide) that is discoverable easily by the workers (e.g. it runs on a fixed port on localhost, or it provides a unix socket at a predetermined path) and allow workers to self-announce. One easy way to do this would be to have that central authority tell the worker "please export on port 12345" and tell Prometheus "a XYZ worker is now exporting on localhost:12345". I'm not sure if that's even possible to do in a portable way without possible races on port grabbing.
Does anyone have a better solution?
On Wed, Nov 11, 2015, 12:14 AM Jeffery Utter <jeff...@sadclown.net> wrote:There is an outstanding issue on the Github issues list for the Ruby Client library about support for multi-process (forking) servers. The issue can be found here: https://github.com/prometheus/client_ruby/issues/9 .--I have a potential fix for this here: https://github.com/jeffutter/prometheus-client_ruby/tree/multi-worker . However, since it is a non-trivial change I wanted to get some feedback and the contributing guidelines said to bring that up here.Right now there are plenty of broken tests, since I re-architected a little. If it seems like a generally good solution I'll fix things up and get it ready for a pull requests.Thanks,Jeff
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49 173 6395215
Managing Director: Alexander Ljung | Incorporated in England & Wales with Company No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657BI'm not sure using PStore is the right thing to do here. If I understand it correctly, this will incur disk I/O for pretty much any metrics increment. This won't be sustainable in a high throughput environment. Additionally, you'll clutter TMPDIR with all those files.At the very least, the persistence/multi-process support should be optional, off by default, and equipped with a warning (and the means) to make sure these go to an in-memory filesystem.
/MR--On Wed, Nov 11, 2015 at 12:14 AM, Jeffery Utter <jeff...@sadclown.net> wrote:There is an outstanding issue on the Github issues list for the Ruby Client library about support for multi-process (forking) servers. The issue can be found here: https://github.com/prometheus/client_ruby/issues/9 .--I have a potential fix for this here: https://github.com/jeffutter/prometheus-client_ruby/tree/multi-worker . However, since it is a non-trivial change I wanted to get some feedback and the contributing guidelines said to bring that up here.Right now there are plenty of broken tests, since I re-architected a little. If it seems like a generally good solution I'll fix things up and get it ready for a pull requests.Thanks,Jeff
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matthias Rampke
EngineerSoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49 173 6395215
Managing Director: Alexander Ljung | Incorporated in England & Wales with Company No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
/MRI'm not sure using PStore is the right thing to do here. If I understand it correctly, this will incur disk I/O for pretty much any metrics increment. This won't be sustainable in a high throughput environment. Additionally, you'll clutter TMPDIR with all those files.At the very least, the persistence/multi-process support should be optional, off by default, and equipped with a warning (and the means) to make sure these go to an in-memory filesystem.
On Wed, Nov 11, 2015 at 12:14 AM, Jeffery Utter <jeff...@sadclown.net> wrote:
There is an outstanding issue on the Github issues list for the Ruby Client library about support for multi-process (forking) servers. The issue can be found here: https://github.com/prometheus/client_ruby/issues/9 .I have a potential fix for this here: https://github.com/jeffutter/prometheus-client_ruby/tree/multi-worker . However, since it is a non-trivial change I wanted to get some feedback and the contributing guidelines said to bring that up here.Right now there are plenty of broken tests, since I re-architected a little. If it seems like a generally good solution I'll fix things up and get it ready for a pull requests.Thanks,Jeff
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Yeah, it will incur I/O for all metrics, however most (if not all) modern Linux distros use tmpfs for /tmp which is in-memory and thus won't require disk writes.With my current implementation the PStore support is optional and disabled by default. The implementation still defaults to in-memory hashes.I agree there should be some warning (perhaps a mention in the readme is enough?)
I would love to find a purely in-memory solution that works for pre-forking workers. However to the best of my ability the only solution I can find is using the 'raindrops' gem, which would require all counters (all possible labels) to be known up-front, before the server forks.
On Wednesday, November 11, 2015 at 7:42:56 AM UTC-6, Matthias Rampke wrote:
/MRI'm not sure using PStore is the right thing to do here. If I understand it correctly, this will incur disk I/O for pretty much any metrics increment. This won't be sustainable in a high throughput environment. Additionally, you'll clutter TMPDIR with all those files.At the very least, the persistence/multi-process support should be optional, off by default, and equipped with a warning (and the means) to make sure these go to an in-memory filesystem.
On Wed, Nov 11, 2015 at 12:14 AM, Jeffery Utter <jeff...@sadclown.net> wrote:
There is an outstanding issue on the Github issues list for the Ruby Client library about support for multi-process (forking) servers. The issue can be found here: https://github.com/prometheus/client_ruby/issues/9 .I have a potential fix for this here: https://github.com/jeffutter/prometheus-client_ruby/tree/multi-worker . However, since it is a non-trivial change I wanted to get some feedback and the contributing guidelines said to bring that up here.Right now there are plenty of broken tests, since I re-architected a little. If it seems like a generally good solution I'll fix things up and get it ready for a pull requests.Thanks,Jeff
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--Matthias Rampke
EngineerSoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49 173 6395215
Managing Director: Alexander Ljung | Incorporated in England & Wales with Company No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
I can see a couple options that would make sense and be pluggable to any python/ruby framework.1. Use shared memory + a simple binary format.
2. Use a long running helper process and send metrics to it ala statsd.
Depending on your definition of "network stack", unix domain sockets might be an option on linux/unix. However windows would require a totally different transport in that case. Perhaps something around named pipes.There's complexity associated with doing any sort of IPC, but assuming you want to deal with subprocesses, then you have to accept IPC and if you are doing IPC then your choices are pretty much: shared memory, filesystems, pipes, or sockets. Pick your poison ; ).
As far as mmapped files go on ruby. There is one library that was last updated 6 years ago and doesn't work on ruby 2.1. Unfortunately I don't know if that will be a good solution. Also, as far as I know there are no viable shared memory solutions for ruby. Doing searches for shared memory between workers in ruby only turn up relatively complex IPC systems like DRb or cod.I still don't think a file-based solution, on an in-memory filesystem will be terrible.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Matthias Rampke
Engineer
SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49 173 6395215
Managing Director: Alexander Ljung | Incorporated in England & Wales with Company No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
--Brian Brazil
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
From what I gather ( and I am far from an expert on this topic ). mmap only works on *nix systems, and would not work on windows.
Moreover it basically gives you a shared block of memory that is entirely un-managed? It would require writing a C library to use this memory and provide some sort of mutex/lock like functionality? The benefit is, is that it is about the fastest way to share anything between processes?
I am not opposed to this solution, however It is beyond my experience and probably not something I have time to learn about deeply enough to implement something useful. Are there other potential solutions that might not be too daunting? Is it reasonable to provide multiple adapters now, one memory/hash backed, one backed by some sort of file (PStore or otherwise) and maybe a redis one?I realize disk or network IO may not be optimal for some very high-throughput applications but in a typical Ruby app you are already doing plenty of disk/network IO. I imagine the overhead would be negligible until there is a better solution.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Matthias Rampke
Engineer
SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49 173 6395215
Managing Director: Alexander Ljung | Incorporated in England & Wales with Company No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
--Brian Brazil
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
On Thu, Nov 12, 2015 at 3:52 AM, Jeffery Utter <jeff...@sadclown.net> wrote:From what I gather ( and I am far from an expert on this topic ). mmap only works on *nix systems, and would not work on windows.There's an equivalent on Windows, but we should rely on some library that takes care of all this for us.
We don't need to actively share data if there's a file per process. Only at scrape time do access all the files.
It's not just high throughput we need to worry about, latency is also an issue. If for example every metric change involved a disk seek that'd limit you to only a handful of metrics, whereas we want users to be free to use hundreds to thousands of metrics without a second thought.
Another solution I'm looking into is using something like 0mq or nanomsg.I have just started looking into nanomsg, but it's docs propose:"Zero-CopyWhile ZeroMQ offers a "zero-copy" API, it's not true zero-copy. Rather it's "zero-copy till the message gets to the kernel boundary". From that point on data is copied as with standard TCP. nanomsg, on the other hand, aims at supporting true zero-copy mechanisms such as RDMA (CPU bypass, direct memory-to-memory copying) and shmem (transfer of data between processes on the same box by using shared memory). The API entry points for zero-copy messaging are nn_allocmsg and nn_freemsg functions in combination with NN_MSG option passed to send/recv functions."Sounds like that could alleviate some of the concern around latency from hitting the kernel?On Thu, Nov 12, 2015 at 4:49 AM, Brian Brazil <brian....@robustperception.io> wrote:On Thu, Nov 12, 2015 at 3:52 AM, Jeffery Utter <jeff...@sadclown.net> wrote:From what I gather ( and I am far from an expert on this topic ). mmap only works on *nix systems, and would not work on windows.There's an equivalent on Windows, but we should rely on some library that takes care of all this for us.Yeah.. but someone has to write this library :)We don't need to actively share data if there's a file per process. Only at scrape time do access all the files.By 'files' you mean chunks of memory allocated by mmap? So the parent would allocate chunks of memory for each worker, and pass those down to the workers. The workers would write their stats to this memory. At scrape time, whichever worker gets the request would read from the memory allocated to each of the workers and aggregate them?
# TYPE http_requests_total counter# HELP http_requests_total A counter of the total number of HTTP requests made.http_requests_total{method="get",host="127.0.0.1:5000",path="/",code="200",pid="13336"} 489http_requests_total{method="get",host="127.0.0.1:5000",path="/",code="200",pid="13337"} 526http_requests_total{method="get",host="127.0.0.1:5000",path="/",code="200",pid="13338"} 485http_requests_total{method="get",host="127.0.0.1:5000",path="/",code="200",pid="13339"} 500Brian
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Matthias Rampke
Engineer
SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49 173 6395215
Managing Director: Alexander Ljung | Incorporated in England & Wales with Company No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
--Brian Brazil
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
I have been working on another approach to this. I have a rough draft over here: https://github.com/jeffutter/prometheus-client_ruby/tree/aggregate_statsThe idea this time is that there are no modifications to the stats stored in memory per thread. When the middlewares are called with the persist: true option - at the end of each rack request, they persist their current state to a file on disk scoped by their parent pid and their own pid.When the exporter middleware gets hit, it reads all of the files for it's children and exports them as individually tagged stats as such:# TYPE http_requests_total counter# HELP http_requests_total A counter of the total number of HTTP requests made.http_requests_total{method="get",host="127.0.0.1:5000",path="/",code="200",pid="13336"} 489http_requests_total{method="get",host="127.0.0.1:5000",path="/",code="200",pid="13337"} 526http_requests_total{method="get",host="127.0.0.1:5000",path="/",code="200",pid="13338"} 485http_requests_total{method="get",host="127.0.0.1:5000",path="/",code="200",pid="13339"} 500
I think this yields a much better solution than my previous approach for a few reasons:
Brian
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Matthias Rampke
Engineer
SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49 173 6395215
Managing Director: Alexander Ljung | Incorporated in England & Wales with Company No. 6343600 | Local Branch Office | AG Charlottenburg | HRB 110657B
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
--Brian Brazil
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
I would also prefer not to expose the pid directly – either aggregate at exposition time, or use a labelling scheme that keeps the label value consistent over time (not sure what such a scheme might be).
my concern is two-fold – for one, per-thread exposition blows up the number of metrics and data points considerably. additionally, if the labels keep changing that's also constantly producing new time series assuming the processes are not extremely long-lived.
We restart our Ruby procs a lot, and Prometheus is reasonably good at short lived time series but not *that* good.
Apart from that I think this is a good compromise between simplicity and I/O. To take out even further, what would you think about a minimum write-out interval? For an app serving many short requests it might be better to serve 100ms old metrics than to write to disk several thousand times a second?
/mr
Yes, summaries are inherently not aggregatable (but histograms are). For these cases a per-process label is unavoidable.
So it should really use histograms then … but for now, how about keeping per-thread summaries but aggregating counters?
I'd like to hear more opinions though, maybe I'm just wrong and many metrics is the way to go all the way. My concerns are definitely practical, not fundamental, and can be scaled around.
/mr
I would also prefer not to expose the pid directly – either aggregate at exposition time, or use a labelling scheme that keeps the label value consistent over time (not sure what such a scheme might be).
my concern is two-fold – for one, per-thread exposition blows up the number of metrics and data points considerably. additionally, if the labels keep changing that's also constantly producing new time series assuming the processes are not extremely long-lived.
We restart our Ruby procs a lot, and Prometheus is reasonably good at short lived time series but not *that* good.
Apart from that I think this is a good compromise between simplicity and I/O. To take out even further, what would you think about a minimum write-out interval? For an app serving many short requests it might be better to serve 100ms old metrics than to write to disk several thousand times a second?
On Dec 5, 2015 11:33 PM, "Brian Brazil"
<brian....@robustperception.io> wrote:
>
> I thought this was an approach that didn't hit disk at every write? Any mmap approach not calling fsync/fdatasync should be okay on this front.
You're right, without sync it will just work. Yay for POSIX semantics!
Thinking about it more, having some metrics per-process and some
preaggregated would be quite inconsistent and confusing. Jeffery – I
think your approach is sound.
Something I'm still not sure how to handle is the ever-changing PIDs …
on one hand, it's the only sane identifier I see at exposition time;
but I'd be interested in any approach to relabel it down to a fixed
set at ingestion time…
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/c9162c58-0e84-4e42-b6a6-fb144b51cd1d%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.