Help with Exporting Kueue Metrics to CloudWatch Using CloudWatch Agent + Prometheus

12 views

Skip to first unread message

Nishanth Reddy

unread,

Jun 26, 2025, 3:32:40 AMJun 26

to wg-batch

Hi Kueue team,I’m currently working on exporting Kueue metrics to Amazon CloudWatch using the CloudWatch Agent with Prometheus integration, and I’ve run into an issue that I was hoping to get your input on.What I’m trying to do:I’ve set up a Prometheus scrape_config targeting the kueue-controller-manager pod, and metrics are being scraped successfully. I confirmed this by curling the /metrics endpoint from within the CloudWatch Agent container.The relevant prometheus.yaml includes:# ConfigMap: Prometheus scrape config
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: amazon-cloudwatch
data:
prometheus.yaml: |
global:
scrape_interval: 1m
scrape_timeout: 10s
scrape_configs:
- job_name: 'kueue-controller-manager'
metrics_path: /metrics
sample_limit: 10000
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: .*
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: Namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container_name
- source_labels: [__meta_kubernetes_pod_controller_name]
action: replace
target_label: pod_controller_name
- source_labels: [__meta_kubernetes_pod_controller_kind]
action: replace
target_label: pod_controller_kind
- source_labels: [__meta_kubernetes_pod_phase]
action: replace
target_label: pod_phase
- target_label: job
replacement: kueue-controller-manager
# Optional: pull in all Kubernetes pod labels as Prometheus labels
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)

And my cwagentconfig.json includes:# ConfigMap: CloudWatch Agent config for Prometheus metrics
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-cwagentconfig
namespace: amazon-cloudwatch
data:
cwagentconfig.json: |
{
"agent": {
"region": "eu-north-1",
"debug": true
},
"logs": {
"metrics_collected": {
"prometheus": {
"prometheus_config_path": "/etc/prometheusconfig/prometheus.yaml",
"log_group_name": "/aws/containerinsights/Cluster9EE0221C-test12345/prometheus",
"cluster_name": "Cluster9EE0221C-7e214e03e5374e27bd58561567510b4c",
"emf_processor": {
"metric_declaration": [
{
"source_labels": ["job"],
"label_matcher": "kueue-controller-manager",
"dimensions": [["ClusterName"]],
"metric_selectors": [
"^kueue_admitted_workloads_total$",
"^kueue_cluster_queue_status$",
"^kueue_evicted_workloads_total$",
"^kueue_pending_workloads$",
"^kueue_quota_reserved_workloads_total$",
"^kueue_reserving_active_workloads$"
]
}
]
}
}
},
"force_flush_interval": 5
}
}
What’s going wrong:
The CloudWatch Agent log shows metrics are scraped, but many metrics are being dropped with messages like:Dropped metric: no metric declaration matched metric nameMetrics like controller_runtime_webhook_requests_total, workqueue_depth, etc., are expected to be dropped —

but I don’t see any of the kueue_* metrics being emitted to CloudWatch, even though they are visible in the /metrics output.
I’ve also verified that the job label on those metrics is correctly set to kueue-controller-manager.I suspect this may be a label mismatch or an unexpected metric structure that doesn’t match metric_declaration logic in the CloudWatch Agent.
Question:
Can you confirm if the Kueue metrics (like kueue_admitted_workloads_total, etc.) are expected to be exposed under the job=kueue-controller-manager label? And if not, what is the correct label or configuration to capture them via Prometheus scraping?Appreciate any guidance or examples you could provide!

Reply all

Reply to author

Forward

0 new messages