Prometheus Talk with other web UI

183 views
Skip to first unread message

nina guo

unread,
Dec 7, 2021, 12:28:38 AM12/7/21
to Prometheus Users
Hi,

Is there a way for Prometheus to talk with other web UI to get the real-time status of targets?

nina guo

unread,
Dec 7, 2021, 5:10:53 AM12/7/21
to Prometheus Users
Any one can give us some help on this?

Brian Candler

unread,
Dec 7, 2021, 8:18:46 AM12/7/21
to Prometheus Users
If you are *testing* a remote web UI:
https://github.com/prometheus/blackbox_exporter/

If you want to *collect data* from a remote web UI, then you'll need to scrape the HTML to extract the metrics you want, and convert them into OpenMetrics format.  Usually it's better to look for an API that you can query, or add prometheus metrics exposition into the app itself.

nina guo

unread,
Dec 9, 2021, 4:12:16 AM12/9/21
to Prometheus Users

The web application is the application which manages all the targets. We want to check with the web application to get the real status of the target. Then if the target is with a "monitored" status, we expect there will be an alert triggered.

So can Prometheus send this kind of check/command to the web application to get the target real status?

Brian Candler

unread,
Dec 9, 2021, 5:08:19 AM12/9/21
to Prometheus Users
You could do that - for example, in your inventory system, you expose the status of the targets as a prometheus exporter endpoint:

monitoring_active{instance="foo"} 1
monitoring_active{instance="bar"} 0

Then you scrape this to create a new set of timeseries, and you write your alerting expressions to include this values when deciding whether to alert or not.  These rules become a bit awkward, using one-to-one or many-to-one vector matches.

But personally, I would do it the other way around.  What I suggest is that you generate the list of targets to scrape *from* the inventory system in the first place.  Then you can either:
1. Not include machines with monitoring=no in the list of targets (so they don't get scraped at all); or
2. Scrape all targets, but add monitoring="no" or monitoring="yes" as a target label.  You can then use that label in your alerting rules:

    # old
    expr: up == 0

    # new
    expr: up{monitoring="yes"} == 0

Or even simpler, you can use this target label in your alertmanager rules to route the alert to a null endpoint.  That is, you still generate alerts for all targets, you just don't deliver them if they are labeled with monitoring="no".

To do this, you will use the "service discovery" features of prometheus, so that your monitoring system *pushes* information to prometheus telling it what to scrape and how.  The simplest mechanism is the "file_sd" mechanism, where you just write the targets into a file.

This is what I do:
- my inventory is in Netbox
- I wrote some code to read the inventory via Netbox API every 5 minutes and write out the prometheus target files (for file_sd_targets)

An example of the generated targets file:

# Auto-generated from Netbox, do not edit as your changes will be overwritten!
- labels:
    netbox_type: device
  targets:
  - nuc1 10.12.255.11
  - nuc2 10.12.255.12
  - storage1 10.12.255.5


You could generate two sets of targets, one with label "monitoring: yes" and one with label "monitoring: no"

This has the very nice side benefit that adding a device to your inventory will *automatically* add it to prometheus monitoring.

HTH,

Brian. 

nina guo

unread,
Dec 9, 2021, 5:29:14 AM12/9/21
to Prometheus Users
Thank you for your detailed reply.

Yes, currently we use file discovery. But one important thing maybe I misssed is that the status of the target might be changed. That is to say, now the status of the target is the status which we want to monitor, but after several minis, the target is down, we don't want to monitor the system anymore.

Brian Candler

unread,
Dec 9, 2021, 6:39:15 AM12/9/21
to Prometheus Users
When using file discovery: as soon as you change the contents, prometheus will pick it up and apply the new settings immediately - there's no need to signal prometheus that it has changed.  Or there are other service discovery mechanisms you can use, that your inventory could update.

Now, the fact that a target has gone down doesn't necessarily mean you want to stop monitoring it entirely.  If you do, you will lose a lot of information (did it come back? when did it come back? has it been down and up intermittently?).  So I think you're right to be thinking about suppressing alarms, not suppressing monitoring.

You've already had some notifications when the thing went down first. (Personally I disable sending of automatic repeat notifications, and automatic resolved notifications).

Maybe your management system is really saying "look, we know there's a problem with this system; stop sending any more alerts about it until the problem has been investigated and solved".  If that's what you're trying to do, then there's another option you can look at, which is to dynamically create "silences" in alertmanager.  That is: when the inventory system has marked a machine as being in trouble, it creates a silence matching the instance name, with a defined end time (say 30 minutes in the future), and periodically keeps updating the silence for as long as the machine is out of service.  When monitoring="yes" is set back, then it can delete the silence, or just let it expire.

These silences can be created via the alertmanager API.  There are other applications like karma which talk to this API, which might give you some clues how it works.
 
Regards,

Brian.

Brian Candler

unread,
Dec 9, 2021, 6:45:36 AM12/9/21
to Prometheus Users
I forgot to say: if you don't want to code directly to the alertmanager API, there is amtool which gives you a simple command-line frontend:

amtool silence add instance=device1
amtool silence expire instance=device1

You should be able to prototype something very easily using this.

nina guo

unread,
Dec 9, 2021, 8:56:03 PM12/9/21
to Prometheus Users
Thank you very much Brian.

Our file discovery is happened every 5mins to get the latest target list. We are worry that during this 5 mins if there is one of the target state changes from monitored state to the state which we don't want to monitor, actually we don't want to receive the alert from this target, but at this time point, Prometheus doesn't know the target state change, still trigger an alert.

nina guo

unread,
Dec 10, 2021, 12:55:24 AM12/10/21
to Prometheus Users
Sorry missed several words:

So we want to have this implement: before triggerring an alert, Prometheus sends a request to the inventory management system to get the current state of the target, if the target is with the state we don't want to monitor, then no alert generated. Can Prometheus have this kind of checking with inventory management system?

Brian Candler

unread,
Dec 10, 2021, 3:22:39 AM12/10/21
to Prometheus Users
I've given you three different solutions to this.  I'm afraid if you don't like any of them, then maybe a different tool will suit your needs better.

nina guo

unread,
Dec 13, 2021, 3:31:09 AM12/13/21
to Prometheus Users
Thank you Brian.

Your suggestion is really good. May I have one more question?

For example, if you read the inventory via Netbox every 5mins, let's assume, if during the 5mins, there is a state of the target is removed  from the inventory, at this time point Prometheus still triggers an fake alert for the unexisting target. Prometheus only refreshes the inventory list in next 5 mins. So is there a way to fix this issue from your expert perspective?

nina guo

unread,
Dec 13, 2021, 3:33:18 AM12/13/21
to Prometheus Users
Correct the words * there is a target is removed  from the inventory

Brian Candler

unread,
Dec 13, 2021, 3:50:55 AM12/13/21
to Prometheus Users
You can always read the inventory as often as you like.  Read it every 5 seconds if you like.  Write it to a new file like "targets.new", compare it with the old "targets" file, and if they are different rename targets.new to targets.  Then prometheus will pick it up the new file immediately.

The third solution I gave you (creating silences) might work better for you.  You can create silences whenever you like, and they take effect immediately.

nina guo

unread,
Dec 13, 2021, 4:10:51 AM12/13/21
to Prometheus Users
Thank you Brian.

Let's assume the refresh interval is only 5 secs, but during 5 secs the state might be changed. For exmaple, the target is changed  to maintence(which is not the state we want to monitor) and start a reboot, but at this time point, prometheus doesn't know the state change, a fake alert will also be triggered for the reboot action. But the maintenance state is not the state we are going to monitor, we don't want to receive such kind of alerts. So we would like to check if there is a way for Prometheus to know the target state instantly before triggering an alert.

Message has been deleted

nina guo

unread,
Dec 13, 2021, 4:13:03 AM12/13/21
to Prometheus Users
if some one changes the status to in maintenance and performs a reboot ... prometheus will notice this status update after 5 secs and during this time it could generates fake alerts

Brian Candler

unread,
Dec 13, 2021, 6:04:33 AM12/13/21
to Prometheus Users
I think there's not much point continuing this discussion.  Prometheus does not have exactly what you keep asking for, which is "to check" something externally before sending an alert to alertmanager.  All it can check is what's in its timeseries database at that time.

However, you can achieve what you want by applying a silence to alertmanager.  Note that you can do this *before* an alert fires: a silence is just a matching rule, saying "ignore alerts which match these labels".  Push out a silence matching {instance="X"} as soon as you know you're not interested in alerts from device X, and you won't receive notifications for that device.

If that doesn't meet your needs, then I'm sorry, but perhaps you should look for different software which does, or write something yourself.  You could for example write a webhook receiver for alerts, which performs any check you like before forwarding the alert to its final destination.

nina guo

unread,
Dec 13, 2021, 9:17:25 PM12/13/21
to Prometheus Users
Thank you very much Brian.

ee1

unread,
Jan 1, 2022, 10:26:51 PM1/1/22
to Prometheus Users
Reading this I think there is a simple solution.  Lets say you go with the aggressive 5 sec file_sd interval, but you're still worried about a false alert, why not just add a "for" clause in your alerting rules?


groups:
- name: example
  rules:
  - alert: something
    expr: <metric is bad and device SHOULD be monitored>
    for: 30s  # <---   Wait for expr to be true for 30 seconds before dispatching the alert

Would that work?

Reply all
Reply to author
Forward
0 new messages