I created this GitHub issue as a bug report, but brian-brazil directed me here so I guess it is intended behaviour. The original issue can be found below.
I don't understand how this is intended behaviour. If rate(some_metrics[1m]) work, why shouldn't rate(some_metrics/other_metrics_with_constant_value[1m]) ?
--- GitHub Issue ---
What did you do?
I wanted to calculate container CPU saturation. Without a recording rule I would do something like this:
rate(container_cpu_usage_seconds_total{pod!="", container!=""}[1m]) / ON (namespace, pod, container) GROUP_LEFT kube_pod_container_resource_limits_cpu_cores
Then I wanted to set up a recording rule for this query while retaining the ability to change the range selector, so I wrote the recording rule like this:
- record: container_cpu_saturation_total
expr: container_cpu_usage_seconds_total{pod!="", container!=""} / ON(namespace, pod, container) GROUP_LEFT kube_pod_container_resource_limits_cpu_cores
So I could do something like this:
rate(container_cpu_saturation_total[1m])
rate(container_cpu_saturation_total[5m])
rate(container_cpu_saturation_total[15m])
What did you expect to see?
I expect the original PromQL and the one with the recording rule to yield mostly identical results. Perhaps a margin of error no more than a few percents?
What did you see instead? Under which circumstances?
This is the actual result I got:
The difference is rather large. With recording rule, the returned values are always higher, sometimes going over 100% which should not be possible.
I tested this on both v2.15.2 and v2.17.1, both versions have this issue (the screenshot is from v2.15.2)
Environment
Linux 3.10.0-1062.el7.x86_64 x86_64
I tested this on both v2.15.2 and v2.17.1, both versions have this issue (the screenshot is from v2.15.2)
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 30s
alerting:
alertmanagers:
- static_configs:
- targets:
- 10.200.13.200:6007
scheme: http
timeout: 10s
api_version: v1
rule_files:
- /config/*.recording.yml
- /config/*.alerting.yml
scrape_configs:
(other irrelevant config)
- job_name: cadvisor
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
kubernetes_sd_configs:
- role: node
tls_config:
ca_file: /secrets/kubelet/ca
cert_file: /secrets/kubelet/cert
key_file: /secrets/kubelet/key
insecure_skip_verify: false
- job_name: kube-state-metrics
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: service
namespaces:
names:
- kube-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: kube-state-metrics
replacement: $1
action: keep
here is another screenshot with graph instead of instant query