Python client not detecting hosted prometheus url

142 views
Skip to first unread message

apoorve kalot

unread,
Jun 9, 2023, 8:51:32 AM6/9/23
to Prometheus Users
Hello,

I currently intend to use prometheus python client library to push custom metrics which are being generated through ML model inference. for now, in the development environment, using following python code: (for now code just push random values generated).
```
import random
import time
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway

class MetricsCollector:
def __init__(self, endpoint):
self.registry = CollectorRegistry()
self.gauges = {}
self.endpoint = endpoint
def add_metric(self, name):
gauge = Gauge(name, 'Description of the metric', registry=self.registry)
self.gauges[name] = gauge
def set_metric_value(self, name, value):
gauge = self.gauges.get(name)
if gauge is not None:
gauge.set(value)
def collect_metrics(self):
while True:
for name, gauge in self.gauges.items():
# Generate random value using random library
value = random.random()
self.set_metric_value(name, value)
# Push metrics to the Prometheus endpoint
push_to_gateway(self.endpoint, job='my_job', registry=self.registry)
time.sleep(5) # Collect metrics every 5 seconds

# Example usage
if __name__ == '__main__':
collector = MetricsCollector('http://0.0.0.0:9999') # Replace with your Prometheus endpoint
# Add metrics
collector.add_metric('metric1')
collector.add_metric('metric2')
# Start collecting and pushing metrics
collector.collect_metrics()
```
And following default template for YML file:
```
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

```
With initialising command as as :
```
./prometheus --config.file="prometheus.yml" --web.listen-address="0.0.0.0:9999"
```

Then too it is throwing error:
```
File "/root/miniconda3/envs/mlops/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
```

I tried sending api request to check whether the url is accessible with following python code:
```
import requests
url = "http://0.0.0.0:9999"
payload = {}
headers = {}
r = requests.request("GET", url, headers=headers, data=payload)
print(r)
```

Which gives output: 
```
<Response [200]>
```
which means that URL atleast exist.

What can be possible problem associated with this though.

Brian Candler

unread,
Jun 9, 2023, 10:33:46 AM6/9/23
to Prometheus Users
I think you are confused between prometheus and pushgateway.

If you want to "push_to_gateway", then the thing you want to be pushing to is a running instance of pushgateway (which would be running on port 9999 in your example).  Then separately, you run an instance of prometheus with a scrape job which scrapes the metrics from the pushgateway periodically.

It will always scrape the most-recently-pushed value; this value is "sticky" if you don't push to it again. If the metric you want to store represents the status of the current or most-recently run ML job, like "started at time X" or "Y% through" or "successfully completed", then pushgateway is a reasonable approach.  However if you're trying to push a *series* of timestamped values then pushgateway is not what you want.

Therefore to make this work:
- you need to run pushgateway on port 9999
- you need to run prometheus on a different port (9090 is normal)
- you need to add a scrape job to your prometheus.yml to fetch metrics from the pushgateway

The other way to "push" data directly into Prometheus is to use the remote_write protocol, but that's a whole different ball game.  In this case, your source needs to keep pushing data periodically (at least every 2 minutes), to stop the timeseries from becoming stale.
Reply all
Reply to author
Forward
0 new messages