Hello Everyone,
We have integrated Ceph with Prometheus.
In Ceph, Ceph MGR Service is exporting metrics at Port 9283( refer below Prometheus config)
*********************
rule_files:
- /etc/prometheus/alerting/*
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9092
- honor_labels: true
job_name: ceph
static_configs:
- labels:
instance: ceph_cluster
targets:
- storagenode1:9283
- labels:
instance: ceph_cluster
targets:
- storagenode2:9283
- labels:
instance: ceph_cluster
targets:
- storagenode3:9283
*****************************
We have three nodes of Ceph-mgr of which one is active at a time and two are at stnadby:
we can verify this from ceph health:
[ansible@storagenode1 ~]$ sudo ceph -s
cluster:
id: 78dbd380-03e0-48e9-a8c6-d560be215788
health: HEALTH_OK
services:
mgr: storagenode2(active, since 3h)
*************************
The above output shows that ceph-mgr is active on storage node2, from which Prometheus should effectively scrape.
But When I go and see the Prometheus dashboard:
it shows down for all nodes, including the ones it should show as up.
Issue:
On the Prometheus dashboard, we should see the ceph-mgr service status as in sync with ceph health.
Please suggest any reason/possible cause.
Prometheus Version: v2.7.2
Best Regards,
Lokendra