Hello,
We are monitoring some 30+ Kafka clusters.
scrape_configs:
- job_name: Confluent Cloud
scrape_interval: 1m
scrape_timeout: 1m
honor_timestamps: true
static_configs:
- targets:
- api.telemetry.confluent.cloud
scheme: https
basic_auth:
username: <Cloud API Key>
password: <Cloud API Secret>
metrics_path: /v2/metrics/cloud/export
params:
- lkc-1
- lkc-2
- lkc-lots-more
This works well and all metrics are visual in both Promtheus and Grafana.
But once they are sent of to an Elastic agent running the prometheus remote_write module, something is happening which i cant figure out.
Metrics are consistently arriving in Elasticsearch with 5 mins delay.
prometheus remote write queue config is pretty much the defaults from prometheus documenation.
queue_config:
capacity: 2500
max_samples_per_send: 1000
max_shards: 200
min_shards: 1
batch_send_deadline: 5s
min_backoff: 30ms
max_backoff: 5s
retry_on_http_429: true
metadata_config:
send: true
send_interval: 1m
max_samples_per_send: 1000
Other metrics such as metrics from Cofluent kafka rest proxy and black box exporter is ingested into elastic in a timely manner. Only the Confluent Cloud metrics are delayed.
Prometheus is deployed using Helm.
Any of you fine minds have any idea how i can eliminate or at least reduce this latency?
Best regards
Christian Oelsner