Hi,
I'm new here and I saw that someone suggested to write a tutorial about bacth jobs. I'm studying Promotheus and I've some questions about that topic.
I have some batch jobs written with Powershell (the language doesn't matter here as pushgateway has a built-in HTTP API that I can use with Invoke-WebRequest). Of course, I don't want to feed Promotheus with the log events but with some metrics, such as:
1. date/time of last execution and outcome.
2. duration of execution
3. In some cases, number of records (e.g. one of my batch jobs updates the list of users and it is interresting to get the opportunity to know the actual numbers of users and to perform some correlations with that metric).
Let's take an example of a batch job which role is to "update_users". This job is used withe two ALM systems that we will call "QC" and "PC" (the same code is used, but with two targets).
I'm a bit confused with the definitions of the documentation, mainly because English is not my language.
The job is "update_users", but can I use any string for the instance name? I mean something else than an IP address and here use "QC" and "PC" instead of the server name?
Now comes the problem of the metric types:
1. Is The outcome a counter or a gauge? If it is a counter I will need two counters. One for the number of successful executions, and another one for the number of failed executions. If it is a gauge it can have two values 0 or 1 for OK and KO ?
2. The duration of execution should be a gauge with a timestamp.
3. The number of records should be a gauge with a timestamp.
If I'm correct, the message that I should push would have this content:
# HELP job_update_users_outcome Outcome of the bacth job
# TYPE job_update_users_outcome counter
job_update_users_outcome{label="OK"} 1 1398355504000
# Another example with an unfortunate outcome
job_update_users_outcome{label="KO"} 1 1398355504000
# HELP job_update_users_duration duration of the script execution in seconds.
# TYPE job_update_users_duration gauge
job_update_users_duration 2398.28 1398355504000
# HELP job_update_users_records number of records.
# TYPE job_update_users_records gauge
job_update_users_duration 3000 1398355504000
EOF
Will I get meaningful graphs In grafana with this approach that will give me the 3 metrics I've explained in the beginning of the question ?
Benjamin