Requirements / Best Practices to use Prometheus Metrics for Serverless environments

236 views
Skip to first unread message

Bartłomiej Płotka

unread,
Jun 15, 2021, 2:59:52 PMJun 15
to Prometheus Developers
Hi All,

Prometheus has seen the fashion shifting from on-premise to clouds, monoliths to microservices, virtual machines to containers etc. Prometheus has proven to be successful for users in all those scenarios. Let's now talk about FaaS/Serverless. (Let's leave other buzzwords - blockchain/AI for later 🙈).

I would love to start a discussion around the usage of Prometheus Metrics on Serverless environments. I wonder if, from the Prometheus dev point of view, we can implement/integrate anything better, document or explain more etc. (: 

In this thread, I am specifically looking for: 

* Existing best practices for using Prometheus for gathering metrics from Serverless/FaaS platforms and functions
* Specific gaps and limitation users might have in these scenarios
* Existing success stories?
* Ideas for improvements.

Action Item: Feel free to respond if you have any input on those!

Past discussions:
* Suggestion to use event aggregation proxy
* Pushgateway improvements for serverless cases

My thoughts:

IMO the FaaS function should be like function in any other full pledge application/pod. You programmatically increment common metric for your aggregated view (e.g overall number of errors).

Trying to switch to a push model for this case, sounds like an unnecessary complication because, in the end, those functions are running in the common, longer living context (e.g FaaS runtime). This runtime should give programmatic APIs to use custom metrics like it's possible in a normal app when your function has local variables (e.g *prometheus.CounterVec) to use.

In fact, this is what AWS Lambda allows and there are exporters to get that data into Prometheus. 

We see users attempting to switch to the push model. I just wonder if for FaaS functions this really makes sense. 

If you init the TCP connection and use remote write, OM push, pushgateway API / Otel/OpenCensus to push metric, you take enormous latency hit to spin up a new TCP connection just for that. This might be already too slow for FaaS. If you do this asynchronously on Faas platform, you need to care about discovery/backoffs/persistent buffer/auth and all pains of push model + some aggregation proxy like Pushgateway/Aggregation gateway or OTel collector to get this data to Prometheus (BTW this is what knative is recommending). Equally, one could just expose those metrics on /metrics endpoint and drop all of this complexity (or run exporter if FaaS is in the cloud, like Lambda/Google Run).

I think the main problem appears if those FaaS runtimes are short-living workloads that automatically spins up only to run some functions (batch jobs). In some way, this is then a problem of short-living jobs and the design of those workloads.

For those short-living jobs, we again see users try to use the push model. I think there is room to either streamline those initiatives OR propose an alternative. A quick idea, yolo... why not killing the job after the first successful scrape (detecting usage on /metric path)?

Kind Regards,
Bartek Płotka (@bwplotka)

Bjoern Rabenstein

unread,
Jun 18, 2021, 6:16:58 PMJun 18
to Bartłomiej Płotka, Prometheus Developers
On 15.06.21 20:59, Bartłomiej Płotka wrote:
>
> Let's now talk about FaaS/Serverless.

Excellent! That's my 2nd favorite topic after histograms. (And while I
provably talked about histograms as my favorite topic since early
2015, I have only started to talk about FaaS/Serverless as an
important gap to fill in the Prometheus story since 2018.)

I think "true FaaS" means that the function calls are
lightweight. The additional overhead of sending anything over the
networks defeats that purpose. So similar to what has been said
before, and what Bartek has already nicely worked out, I think the
metrics have to be managed by the FaaS runtime, in the same path as
billing is managed.

And that's, of course, what cloud providers are doing, and it's also a
formidable way of locking their customers into their own metrics and
monitoring system.

And that's in turn precisely where I think Prometheus can use its
weight. Prometheus has already proven that cloud providers can
essentially not get away with ignoring it, and even halfhearted
integrations won't be enough. With more or less native Prometheus
support by cloud providers, it might actually just require a small
step to come to some convention how to collect and present FaaS
metrics in a "Promethean" way. If all cloud providers do it the same
way, the lock-in is gone.

I think it would be very valuable to study what OpenFaaS has already
done: https://docs.openfaas.com/architecture/metrics/

In the simplest case, we could just say: Please, dear cloud providers,
please expose exactly the same metrics for general benefit. If there
is anything to improve with the OpenFaaS approach, I'm sure they will
be delighted to get help. (Spontaneously, I'm missing a way to define
custom metrics, e.g. how many records a function call has processed.)


> * Suggestion to use event aggregation proxy
> <https://github.com/weaveworks/prom-aggregation-gateway>
> * Pushgateway improvements
> <https://groups.google.com/g/prometheus-users/c/sm5qOrsVY80/m/nSfbzHd9AgAJ> for
> serverless cases

Despite all of what I said above, I think there _are_ quite a few user
of FaaS who have fairly heavy-weight function calls. For them, pushing
counter increments etc. via the network might actually be more
convenient than funneling metrics through the FaaS runtime. This is
then just another use-case of the "distributed counter" idea, which
the Pushgateway quite prominently is not catering for. As discussed
in the thread linked above and at countless other places, I strongly
recommend to not shoehorn the Pushgateway into this use-case, but
create a separate project for it, which would be designed from the
beginning for this use-case. Perhaps
weaveworks/prom-aggregation-gateway is just that. I haven't studied it
in detail yet. In a way, we need "statsd done right". Again, I would
suggest to look what others have already done. For example, there are
tons of statsd users out there. What have they done in the last years
to overcome the known shortcomings? Perhaps statsd instrumentation and
the Prometheus statsd exporter just needs a bit of development in that
way to make it a viable solution.

> I think the main problem appears if those FaaS runtimes are short-living
> workloads that automatically spins up only to run some functions (batch
> jobs). In some way, this is then a problem of short-living jobs and the
> design of those workloads.
>
> For those short-living jobs, we again see users try to use the push model.
> I think there is room to either streamline those initiatives OR propose
> an alternative. A quick idea, yolo... why not killing the job after the
> first successful scrape (detecting usage on /metric path)?

Ugh, that doesn't sound right. I think this problem should be solved
within the FaaS runtime in the way they prefer. Cloud providers need
billing in any case (they want to make money after all), so they have
already solved reliably metrics collection for that. They just need to
hook in a simple exporter to present Prometheus metrics. See how
OpenFaaS has done it. Knative seems to have gone down the OTel path,
but that could be seen as an implementation detail. If they in the end
expose a /metrics endpoint with the desired metrics for Prometheus to
scrape, all is good. It's just a terribly overengineered exporter,
effectively. (o;

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

Tobias Schmidt

unread,
Jun 22, 2021, 5:26:14 AMJun 22
to Bjoern Rabenstein, Bartłomiej Płotka, Prometheus Developers
Thanks for bringing up this topic Bartek and your great insights Björn!

I think it's a great idea to open the discussion with the big cloud providers about an open runtime integration for metrics. Maybe they're more open about this than I expect. My fear is that this won't really lead to any substantial improvement, as the vendor lock-in seems to be quite desired judging from my personal experience with cloud providers.


> * Suggestion to use event aggregation proxy
> <https://github.com/weaveworks/prom-aggregation-gateway>
> * Pushgateway improvements
> <https://groups.google.com/g/prometheus-users/c/sm5qOrsVY80/m/nSfbzHd9AgAJ> for
> serverless cases

Despite all of what I said above, I think there _are_ quite a few user
of FaaS who have fairly heavy-weight function calls. For them, pushing
counter increments etc. via the network might actually be more
convenient than funneling metrics through the FaaS runtime. This is
then just another use-case of the "distributed counter" idea, which
the Pushgateway quite prominently is not catering for. As discussed
in the thread linked above and at countless other places, I strongly
recommend to not shoehorn the Pushgateway into this use-case, but
create a separate project for it, which would be designed from the
beginning for this use-case. Perhaps
weaveworks/prom-aggregation-gateway is just that. I haven't studied it
in detail yet. In a way, we need "statsd done right". Again, I would
suggest to look what others have already done. For example, there are
tons of statsd users out there. What have they done in the last years
to overcome the known shortcomings? Perhaps statsd instrumentation and
the Prometheus statsd exporter just needs a bit of development in that
way to make it a viable solution.

First of all, I wonder if there is really any difference in terms of heavy-weight/light-weight classification of serverless / FaaS in contrast to traditional deployment styles. Personally the reason I chose a serverless runtime (GCP Cloud Run) for my application layer is just in order to focus on business feature development. The runtime manages container lifecycles and we're only paying for the time containers serve traffic. I could deploy the exact same Docker container outside a serverless environment as well.
My needs are still the same though: I want to instrument the various aspects of the service and its many endpoints, both with common request related metrics as well as custom metrics. The problem I face is the fundamental mismatch of Prometheus' pull architecture and the serverless runtime which doesn't even allow me to see individual container instances.

The StatsD / push-over-network approach has some serious latency impact as you both highlighted already. Additionally, it requires the deployment of a service with an external TCP API which would need to be protected from public access as well (might be easy depending on the serverless runtime provider).

Last night I was wondering if there are any other common interfaces available in serverless environments and noticed that all products by AWS (Lambda) and GCP (Functions, Run) at least provide the option to handle log streams, sometimes even log files on disk. I'm currently thinking about experimenting with an approach where containers log metrics to stdout / some file, get picked up by the serverless runtime and written to some log stream. Another service "loggateway" (or otherwise named) would then stream the logs, aggregate them and either expose them on the common /metrics endpoint or push them with remote write right away to a Prometheus instance hosted somewhere (like Grafana Cloud).
My hopes are that the latency impact of logging a dozen metrics per request should be neglectable especially compared to TCP pushing. There are a lot of open questions about the log format, how to handle metric metadata (without logging it all the time), and HA deployment of the log aggregation service. Furthermore this approach requires some support by the client libraries (I think only the Ruby client supports custom data stores).

Besides the implementation details, one major downside would be the pollution of the common log stream if the runtime provider doesn't support separate log streams (AWS Lambda only supports stdout/stderr I think). Anything else I'm missing which would make this idea infeasible?
 
> I think the main problem appears if those FaaS runtimes are short-living
> workloads that automatically spins up only to run some functions (batch
> jobs). In some way, this is then a problem of short-living jobs and the
> design of those workloads.
>
> For those short-living jobs, we again see users try to use the push model.
> I think there is room to either streamline those initiatives OR propose
> an alternative. A quick idea, yolo... why not killing the job after the
> first successful scrape (detecting usage on /metric path)?

Ugh, that doesn't sound right. I think this problem should be solved
within the FaaS runtime in the way they prefer. Cloud providers need
billing in any case (they want to make money after all), so they have
already solved reliably metrics collection for that. They just need to
hook in a simple exporter to present Prometheus metrics. See how
OpenFaaS has done it. Knative seems to have gone down the OTel path,
but that could be seen as an implementation detail. If they in the end
expose a /metrics endpoint with the desired metrics for Prometheus to
scrape, all is good. It's just a terribly overengineered exporter,
effectively. (o;

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/20210618221656.GS3670%40jahnn.

Richard Hartmann

unread,
Jun 22, 2021, 6:09:56 AMJun 22
to Bartłomiej Płotka, Prometheus Developers
This has come up in the context of OM, OTel, and TAG Observability. My
own thinking largely mirrors beorn's & grobie's: In a perfect world
the orchestration layer has all the information and interfaces
required and billing knows about the required datapaths, NB:
Monitoring usually has higher speed and lower reliability requirements
than billing. Still, for doability, lock-in, convenience, and velocity
reasons, it's enticing to bypass the ideal solution and do something
that works-ish now. If someone incurs ~100% overhead for monitoring
lightweight functions but gets their job done, they are are still
getting their job done and can optimize later if they so choose.

Pushing might appear hamfisted here, and arguably is, but it's largely
under the control of the dev; as such, they can do it with less
coordination. This might get us near to using the Prometheus Agent as
a Collector to reduce latency and blast radius. Far from ideal, but...

An in-between would be what grobie said: To speak in Prometheus terms,
the orchestrator is node_exporter, the serverless functions write out
something which the textfile collector can ingest.

OpenMetrics deliberately supports push, but this approach creates
issues with `up` and staleness handling. OTel is currently facing
similar issues, maybe there's room for cooperation. Also see
https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#supporting-target-metadata-in-both-push-based-and-pull-based-systems
and https://docs.google.com/document/d/1hn-u6WKLHxIsqYT1_u6eh94lyQeXrFaAouMshJcQFXs/edit#heading=h.e4p9f543e7i2


I strongly believe that we should be particular about the wire format;
in a future in which orchestrators have a collector component, it
would be nice to be able to simply expose the metrics for pulling or
use PRW code and wire format.


Best,
Richard

Tobias Schmidt

unread,
Jun 22, 2021, 10:32:28 AMJun 22
to Richard Hartmann, Bartłomiej Płotka, Prometheus Developers
On Tue, Jun 22, 2021 at 12:09 PM Richard Hartmann <richih.ma...@gmail.com> wrote:
This has come up in the context of OM, OTel, and TAG Observability. My
own thinking largely mirrors beorn's & grobie's: In a perfect world
the orchestration layer has all the information and interfaces
required and billing knows about the required datapaths, NB:
Monitoring usually has higher speed and lower reliability requirements
than billing. Still, for doability, lock-in, convenience, and velocity
reasons, it's enticing to bypass the ideal solution and do something
that works-ish now. If someone incurs ~100% overhead for monitoring
lightweight functions but gets their job done, they are are still
getting their job done and can optimize later if they so choose.

Pushing might appear hamfisted here, and arguably is, but it's largely
under the control of the dev; as such, they can do it with less
coordination. This might get us near to using the Prometheus Agent as
a Collector to reduce latency and blast radius. Far from ideal, but...

An in-between would be what grobie said: To speak in Prometheus terms,
the orchestrator is node_exporter, the serverless functions write out
something which the textfile collector can ingest.

There is not much overlap between the node_exporter and the functionality needed here. It would need something which can read common log streams from major cloud providers / serverless runtimes, aggregate the logs, and then expose them. Only the last part is somewhat available in the node_exporter and the rest doesn't really make sense there. Google's mtail would be a bit closer conceptually, but as we have full control over the clients and wire format there is no need for a full-fledged log parsing engine, and the cloud provider log reading part is still missing.

OpenMetrics deliberately supports push, but this approach creates
issues with `up` and staleness handling. OTel is currently facing
similar issues, maybe there's room for cooperation. Also see
https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#supporting-target-metadata-in-both-push-based-and-pull-based-systems
and https://docs.google.com/document/d/1hn-u6WKLHxIsqYT1_u6eh94lyQeXrFaAouMshJcQFXs/edit#heading=h.e4p9f543e7i2


I strongly believe that we should be particular about the wire format;
in a future in which orchestrators have a collector component, it
would be nice to be able to simply expose the metrics for pulling or
use PRW code and wire format.


Best,
Richard

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

Bjoern Rabenstein

unread,
Jun 24, 2021, 5:09:11 PMJun 24
to Tobias Schmidt, Bartłomiej Płotka, Prometheus Developers
On 22.06.21 11:26, Tobias Schmidt wrote:
>
> Last night I was wondering if there are any other common interfaces
> available in serverless environments and noticed that all products by AWS
> (Lambda) and GCP (Functions, Run) at least provide the option to handle log
> streams, sometimes even log files on disk. I'm currently thinking about
> experimenting with an approach where containers log metrics to stdout /
> some file, get picked up by the serverless runtime and written to some log
> stream. Another service "loggateway" (or otherwise named) would then stream
> the logs, aggregate them and either expose them on the common /metrics
> endpoint or push them with remote write right away to a Prometheus instance
> hosted somewhere (like Grafana Cloud).

Perhaps I'm missing something, but isn't that
https://github.com/google/mtail ?

Rob Skillington

unread,
Jun 25, 2021, 12:11:28 AMJun 25
to Bjoern Rabenstein, Bartłomiej Płotka, Prometheus Developers, Tobias Schmidt
With respect to OpenMetrics push, we had something very similar at $prevco that pushed something that looked very similar to the protobuf payload of OpenMetrics (but was Thrift snapshot of an aggregated set of metrics from in process) that was used by short running tasks (for Jenkins, Flink jobs, etc).

I definitely agree it’s not ideal and ideally the platform provider can supply a collection point (there is something for Jenkins, a plug-in that can do this, but custom metrics is very hard / nigh impossible to make work with it, and this is a non-cloud provider environment that’s actually possible to make work, just no one has made it seamless).

I agree with Richi that something that could push to a Prometheus Agent like target that supports OpenMetrics push could be a good middle ground with the right support / guidelines:
- A way to specify multiple Prometheus Agent targets and quickly failover from one to another if within $X ms one is not responding (you could imagine a 5ms budget for each and max 3 are tried, introducing at worst 15ms overhead when all are down in 3 local availability zones, but in general this is a disaster case)
- Deduplication ability so that a retried push is not double counted, this might mean timestamping the metrics… (so if written twice only first record kept, etc)

I think it should similar to the Push Gateway be generally a last resort kind of option and have clear limitations so that pull still remains the clear choice for anything but these environments.

Is there any interest discussing this on a call some time?

Rob

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

Bartłomiej Płotka

unread,
Nov 16, 2021, 2:54:49 AM (11 days ago) Nov 16
to Rob Skillington, Bjoern Rabenstein, Prometheus Developers, Tobias Schmidt
Hi All,

I would love to resurrect this thread. I think we are missing a good push-gateway like a product that would ideally live in Prometheus (repo/binary or can be recommended by us) and convert events to metrics in a cheap way. Because this is what it is when we talk about short-living containers and serverless functions. What's the latest Rob? I would be interested in some call for this if that is still on the table. (: 

I think we have some new options on the table like supporting Otel metrics as such potential high-cardinal event push, given there are more and more clients for that API. Potentially Otel collector can work as such "push gateway" proxy, but at this point, it's extremely generic, so we might want to consider something more focused/efficient/easier to maintain. Let's see (: The other problem is that Otel metrics is yet another protocol. Users might want to use push gateway API, remote write or logs/traces as per @Tobias Schmidt idea 

Another service "loggateway" (or otherwise named) would then stream the logs, aggregate them and either expose them on the common /metrics endpoint or push them with remote write right away to a Prometheus instance hosted somewhere (like Grafana Cloud)."
Kind Regards,
Bartek Płotka (@bwplotka)

Rob Skillington

unread,
6:41 AM (11 hours ago) 6:41 AM
to Bartłomiej Płotka, Bjoern Rabenstein, Prometheus Developers, Tobias Schmidt
FWIW we have been experimenting with users pushing OpenMetrics protobuf payloads quite successfully, but only sophisticated exporters that can guarantee no collisions of time series and generate their own monotonic counters, etc are using this at this time.

If you're looking for a solution that also involves aggregation support, M3 Coordinator (either standalone or combined with M3 Aggregator) supports Remote Write as a backend (and is thus compatible with Thanos, Cortex and of course Prometheus itself too due to the PRW receiver).

M3 Coordinator however does not have any nice support to publish to it from a serverless environment (since the primary protocol it supports is Prometheus Remote Write which has no metrics clients, etc I would assume).

Rob

Rob Skillington

unread,
6:50 AM (10 hours ago) 6:50 AM
to Rob Skillington, Bartłomiej Płotka, Bjoern Rabenstein, Prometheus Developers, Tobias Schmidt
Here’s the documentation for using M3 coordinator (with it without M3 aggregator) with a backend that has a Prometheus Remote Write receiver:
Would be more than happy to do a call some time on this topic, the more we’ve looked at this it’s a client library issue primarily way before you consider the backend/receiver aspect (which there are options out there and are fairly mechanical to overcome, vs the client library concerns which have a lot of ergonomic and practical issues especially in a serverless environment where you may need to wait for publishing before finishing your request - perhaps an async process like publishing a message to local serverless message queue like SQS is an option and having a reader read that and use another client library to push that data out is ideal - it would be more type safe and probably less lossy than logs and reading the logs then publishing but would need good client library support for both the serverless producers and the readers/pushers).

Rob

Reply all
Reply to author
Forward
0 new messages