How to return selected metrics using Flask and not all metrics (python prometheus_client)

996 views
Skip to first unread message

ahmer...@gmail.com

unread,
Jun 5, 2018, 3:51:31 AM6/5/18
to Prometheus Users
Hi,
We've decided to incorporate Prometheus in our organization and there are multiple groups involved that are highly interested. As such I've taken the task of setting up a convenient way of adding metric monitors using a single port  in a Flask application but using multiple url paths and endpoints so that we can retrieve metrics for each component that is mapped to these url path/endpoints.
Here is my challenge. The following is simplistic flask app that uses some code to pull jobs from a database that are stuck in the 'retrying' status. If I find jobs such as these, after a period of time the ultimate goal is to provide alerts to necessary parties alerting them about the stuck jobs.

RETRYING_JOBS = Gauge("jobs_retrying_total", "Jobs that were re-submitted and retried that are stuck")

@app.route('/retrying', endpoint='retrying')
def jobs_retries():
    registry = CollectorRegistry()
    sqlalchemy_bootstrapping = SqlalchemySetup()
    retrying_count = sqlalchemy_bootstrapping.session.query(func.count()).filter(jobs.Job.status == 'retrying')
    RETRYING_JOBS.set(retrying_count[0][0])
    return generate_latest(REGISTRY)

if __name__ == '__main__':
    app.run(debug=True, port=9090)


The main problem is that the code above shows metrics for process_virtual_memory_bytes, process_resident_memory_bytes, process_cpu_seconds_total and other things that I do not want. I simply want the metric I queried above for jobs_retrying_total and nothing else. Is there a way for me to achieve this? Any pointers will be helpful.
Thanks!

Brian Brazil

unread,
Jun 5, 2018, 4:08:36 AM6/5/18
to ahmer...@gmail.com, Prometheus Users
On 5 June 2018 at 08:51, <ahmer...@gmail.com> wrote:
Hi,
We've decided to incorporate Prometheus in our organization and there are multiple groups involved that are highly interested. As such I've taken the task of setting up a convenient way of adding metric monitors using a single port  in a Flask application but using multiple url paths and endpoints so that we can retrieve metrics for each component that is mapped to these url path/endpoints.

Usually you would have all this on the same /metrics, spreading it across endpoints will make it harder to monitor and understand - as well as being harder to implement on the Python end.
 
Here is my challenge. The following is simplistic flask app that uses some code to pull jobs from a database that are stuck in the 'retrying' status. If I find jobs such as these, after a period of time the ultimate goal is to provide alerts to necessary parties alerting them about the stuck jobs.

RETRYING_JOBS = Gauge("jobs_retrying_total", "Jobs that were re-submitted and retried that are stuck")

@app.route('/retrying', endpoint='retrying')
def jobs_retries():
    registry = CollectorRegistry()
    sqlalchemy_bootstrapping = SqlalchemySetup()
    retrying_count = sqlalchemy_bootstrapping.session.query(func.count()).filter(jobs.Job.status == 'retrying')
    RETRYING_JOBS.set(retrying_count[0][0])
    return generate_latest(REGISTRY)

if __name__ == '__main__':
    app.run(debug=True, port=9090)


The main problem is that the code above shows metrics for process_virtual_memory_bytes, process_resident_memory_bytes, process_cpu_seconds_total and other things that I do not want. I simply want the metric I queried above for jobs_retrying_total and nothing else. Is there a way for me to achieve this? Any pointers will be helpful.

You've mixed REGISTRY and registry. If you're proxying metrics like that you should use a custom collector (https://github.com/prometheus/client_python#custom-collectors) rather than normal direct instrumentation such as a gauge, the above code has a small race for example.
 
--

ahmer...@gmail.com

unread,
Jun 5, 2018, 4:40:53 AM6/5/18
to Prometheus Users
Thank you for the update. I was tinkering around with the registry and mistakenly left the CollectorRegistry line in the snippet when I didn't mean to.  
On the topic of exporters, I wasn't very sure exactly in which circumstance I'd need to create my own exporter and it's clear now why I'd have to do it here. Thanks much!

Brian Brazil

unread,
Jun 5, 2018, 4:43:15 AM6/5/18
to ahmer...@gmail.com, Prometheus Users
On 5 June 2018 at 09:40, <ahmer...@gmail.com> wrote:
Thank you for the update. I was tinkering around with the registry and mistakenly left the CollectorRegistry line in the snippet when I didn't mean to.  
On the topic of exporters, I wasn't very sure exactly in which circumstance I'd need to create my own exporter and it's clear now why I'd have to do it here. Thanks much!

Exporters are composed (almost) entirely of custom collectors, however it's not unusual for a normal application to have a few custom collectors too.

Brian 

On Tuesday, June 5, 2018 at 1:08:36 AM UTC-7, Brian Brazil wrote:
On 5 June 2018 at 08:51, <ahmer...@gmail.com> wrote:
Hi,
We've decided to incorporate Prometheus in our organization and there are multiple groups involved that are highly interested. As such I've taken the task of setting up a convenient way of adding metric monitors using a single port  in a Flask application but using multiple url paths and endpoints so that we can retrieve metrics for each component that is mapped to these url path/endpoints.

Usually you would have all this on the same /metrics, spreading it across endpoints will make it harder to monitor and understand - as well as being harder to implement on the Python end.
 
Here is my challenge. The following is simplistic flask app that uses some code to pull jobs from a database that are stuck in the 'retrying' status. If I find jobs such as these, after a period of time the ultimate goal is to provide alerts to necessary parties alerting them about the stuck jobs.

RETRYING_JOBS = Gauge("jobs_retrying_total", "Jobs that were re-submitted and retried that are stuck")

@app.route('/retrying', endpoint='retrying')
def jobs_retries():
    registry = CollectorRegistry()
    sqlalchemy_bootstrapping = SqlalchemySetup()
    retrying_count = sqlalchemy_bootstrapping.session.query(func.count()).filter(jobs.Job.status == 'retrying')
    RETRYING_JOBS.set(retrying_count[0][0])
    return generate_latest(REGISTRY)

if __name__ == '__main__':
    app.run(debug=True, port=9090)


The main problem is that the code above shows metrics for process_virtual_memory_bytes, process_resident_memory_bytes, process_cpu_seconds_total and other things that I do not want. I simply want the metric I queried above for jobs_retrying_total and nothing else. Is there a way for me to achieve this? Any pointers will be helpful.

You've mixed REGISTRY and registry. If you're proxying metrics like that you should use a custom collector (https://github.com/prometheus/client_python#custom-collectors) rather than normal direct instrumentation such as a gauge, the above code has a small race for example.
 
--

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/755edb97-86c9-4309-9e8e-e6103ae9acc8%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
0 new messages