Timestamp of Duration Metric of a Periodic Task

163 views
Skip to first unread message

Dennis Kanygin

unread,
Mar 16, 2021, 1:48:40 PM3/16/21
to Prometheus Users
Struggling with Prometheus use case.

Periodic job runs every 30 seconds or so. Duration varies. Need a bar graph with x-axis as time of run and y-axis as duration. Pushing duration metrics to pushgateway. With this setup time stamp that is being attached to duration metrics is the time of scrape of pushgateway. Timestamp of duration is therefore "off" when graphing. As I understand, this is a feature, not a bug. Any recommendation on how to accomplish what I need?  Seems like this would be a fairly common scenario so curious how others are solving it.

cheers,

Dennis Kanygin

Matthias Rampke

unread,
Mar 16, 2021, 7:06:45 PM3/16/21
to Dennis Kanygin, Prometheus Users
There is a mismatch of models here. You are asking about plotting a set of (x,y) points; Prometheus fundamentally thinks in terms of continuous time series that happen to be sampled at the scrape interval.

One way to resolve this is to consider the continuous time series of "how long did the last run take". This can be sampled at any time, whether 2 or 22 seconds after the job has finished. You need to sample (scrape) at least twice per job run to reliably get all counts, and you will not be directly able to determine whether a run happened to have exactly the same duration as the one before.

However, you can use the knowledge of how often the job runs to make the graph look nice: play with the interval and overall time window alignment in Grafana to do that.

/MR


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/e69c1cb3-683a-4652-9bd4-4cd93d3b996fn%40googlegroups.com.

Matthias Rampke

unread,
Mar 16, 2021, 7:17:07 PM3/16/21
to Dennis Kanygin, Prometheus Users
Also consider this: at 30s run intervals, if you were to look at a longer time (say a week), would you still be interested in each individual run's timing? How would you aggregate them?

Prometheus' answer is to construct time series of the number of runs, and cumulative run time, starting at some arbitrary point in time (together these are a summary). By looking at the change in these numbers over time, we can calculate the duty cycle (what fraction of time is spent running vs. idle) or average run time (cumulative run time divided by the number of runs in the same timeframe). Note that this is all phrased in terms of numbers that exist continuously (time spent since …) rather than individual events (time spent in the fifteenth run).

Unfortunately there is no trivial way to keep these accumulated counters over multiple process invocations z since the client libraries only hold them in memory. Ideally, you could get them from the long-running process that starts these individual runs. If that is not possible, the third party aggregating pushgateway may be useful to you.

I hope this helps clarify how Prometheus sees the world!

/MR

m...@timescale.com

unread,
Mar 17, 2021, 11:27:06 AM3/17/21
to Prometheus Users
You can also use a Prometheus long-term store that supports direct push. Promscale is one such LTS (https://github.com/timescale/promscale/blob/master/docs/writing_to_promscale.md), and the one I work at, but I believe there are others. But as MR points out the analysis/query side gets complicated in PromQL because there is an implicit assumption of regular intervals. Promscale's answer is to use SQL but this does complicate the analysis somewhat.

--
Mat Arye, Promscale Team Lead
See what we're working on (feedback welcome!): tsdb.co/prom-design-doc

Chris Siebenmann

unread,
Mar 18, 2021, 5:08:45 PM3/18/21
to Matthias Rampke, Dennis Kanygin, Prometheus Users, cks.prom...@cs.toronto.edu
> Prometheus' answer is to construct time series of the number of runs,
> and cumulative run time, starting at some arbitrary point in time
> (together these are a summary). By looking at the change in these
> numbers over time, we can calculate the duty cycle (what fraction of
> time is spent running vs. idle) or average run time (cumulative run
> time divided by the number of runs in the same timeframe). Note that
> this is all phrased in terms of numbers that exist continuously (time
> spent since …) rather than individual events (time spent in the
> fifteenth run).
>
> Unfortunately there is no trivial way to keep these accumulated
> counters over multiple process invocations since the client libraries
> only hold them in memory. Ideally, you could get them from the
> long-running process that starts these individual runs. If that is not
> possible, the third party aggregating pushgateway may be useful to
> you.

Probably the easiest way to generate cumulative counters from separate
one-time jobs is to use the statsd gateway, the statsd_exporter. Statsd
supports incrementing persistent counters and it has a very easy wire
protocol to talk to:

echo 'our.counter:+3|c|#label1:aname,area:prod' | nc statsd-host 9125

(There are several ways to add labels; see the statsd_exporter
readme at https://github.com/prometheus/statsd_exporter )

This would let you keep track of the total time jobs have taken to
run and the count of jobs run, among other things. I think you can
even do histograms through statsd_exporter if you want to (with the
statsd exporter doing the hard work for your script).

Pushgateway is easier to deal with in a number of ways, but it only
supports setting metrics; you can't update them the way you can with
the statsd exporter.

- cks

Matthias Rampke

unread,
Mar 18, 2021, 5:11:52 PM3/18/21
to Chris Siebenmann, Dennis Kanygin, Prometheus Users
The statsd exporter works for this but has the downside that you are mapping through a different metric model. It's okay but can be annoying.

There is also an aggregating pushgateway that may be useful if this is the route you want to go: https://github.com/weaveworks/prom-aggregation-gateway

/MR
Reply all
Reply to author
Forward
0 new messages