On 08/08/2023 20:31, Matt Doughty wrote:
> So you are trying to get discreet metrics for every run of the batch
> job? That sounds like an unbounded cardinality problem as you would
> end up with a timeseries for every run of the batch job.
> Am I misunderstanding or is this accurate?
>
>> You're right I don't need the exact time when the metric is fetched. I only need it to differentiate between iterations within the batch job. Then is creating a separate metric the best way to go?
>>
If that is the case then Prometheus isn't the right tool. Having
distinctly detectable groups of data for a particular job run indicates
you are talking about events which are quite different to metrics. For
events you'd want to be looking at tools such as Elasticsearch, Loki or
a standard SQL database.
Events and metrics can (and often are) used in parallel. For example
Prometheus would tell you that the average job runtime is 5 minutes over
the past 3 hours, but you'd then use the events system to find the exact
durations for each run (or the number of events processed, or the error
message returned, etc.).
--
Stuart Clark