# Caculates HTTP error Responses total - record: windows:windows_iis_worker_request_errors_total:irate5m
expr: irate(windows_iis_worker_request_errors_total[5m])
- alert: IIS error requests rate
expr: sum without () (rate(windows:windows_iis_worker_request_errors_total:irate5m{status_code!="401"}[5m])) > 3
for: 5m
labels:
severity: critical
component: WindowsOS
annotations:
summary: "High IIS worker error rate"
description: "IIS http responses on {{ if $labels.fqdn }}{{ $labels.fqdn }}{{ else }}{{ $labels.instance }}{{ end }}for {{ $labels.app }} has high rate of errors." dashboard:
runbook:
I'm trying to do something like this to alert on when people are getting errors whilst trying to connect to a webapp, the issue is the query itself 'windows_iis_worker_request_errors_total:irate5m' is returning non integer values
The idea was to evaluate over a rolling 5 minute window the number of errors.
of course in an ideal world I'd alert on the rate of errors using the total requests metrics and dividing, however the two metrics have a label mismatch and I am unsure how to perform that query.
Would really appreciate any assistance!
edit:
Someone in the Prometheus developer group provided me with the followering query which does work:
sum by (fqdn, instance, app) (increase(windows_iis_worker_request_errors_total{status_code!="401"}[5m]))
However I was wondering if someone would still know how to get a query working on the rate of errors rather than the increase in count despite the label mismatch between the IIS total requests and IIS error request metrics.