I'm currently working on defining Service Level Objectives (SLOs) for our use of Rundeck and would appreciate some insights or tips.
Background on SLOs: A Service Level Objective (SLO) represents a target level of reliability or performance for a particular service. In the context of Rundeck, this could relate to job execution success rates, response times, or other performance indicators.
Question on Current Metric Use: While considering potential metrics, I've been looking at rundeck_project_execution_status{status="succeeded"}, which measures the count of jobs that have successfully executed. At first glance, this seems like a solid metric for indicating the reliability of Rundeck jobs. However, I've noticed that this metric can be misleading because it doesn't account for user-induced errors or failures. For example, a job may succeed from a system perspective but fail to achieve its intended outcome due to incorrect inputs or misconfigurations by the user.
Request for Input: Given this observation, I'm interested in learning what others consider good metrics for SLOs in the context of Rundeck. Specifically:
Our org doesn’t use have any formal SLOs for Rundeck aside from a common platform uptime and fault restoration time. But as far as our users go their concerns are:
C2 General
Hi, I'm currently working on defining Service Level Objectives (SLOs) for our use of Rundeck and would appreciate some insights or tips. Background on SLOs: A Service Level Objective (SLO) represents a target level of reliability or performance
ZjQcmQRYFpfptBannerStart
|
ZjQcmQRYFpfptBannerEnd
--
You received this message because you are subscribed to the Google Groups "rundeck-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
rundeck-discu...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/rundeck-discuss/8ae9a0cf-08a5-4a3f-bd5e-a899832f2db2n%40googlegroups.com.