William Montgomery
unread,Oct 22, 2020, 11:24:56 AM10/22/20Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Prometheus Users
I have a query to find out if space is running out:
(100 - (100 * node_filesystem_avail_bytes{job="special_host",mountpoint=~"/my_data/[a-zA-Z]*/.*"} / node_filesystem_size_bytes{job="special_host",mountpoint=~"/my_data/[a-zA-Z]*/.*"}))
For simplicity lets substitute this with SIZE_QUERY
This VM is very special because there are multiple metrics that are equivalent.
I have two categories of mounts on the host:
These group of mounts share the underlying storage and have duplicated values (Note for brevity only 2 out of many are included)
{device="$DEVICE1",fstype="$FS1",instance="$INSTANCE1",job="special_host",mountpoint="/my_data/first"} 86.6186759625663
{device="$DEVICE2",fstype="$FS1",instance="$INSTANCE1",job="special_host",mountpoint="/my_data/second"} 86.6186759625663
These group of mounts do not share underlying storage
{device="$DEVICE3",fstype="$FS2",instance="$INSTANCE1",job="special_host",mountpoint="/var/log"} 85.1214545444532
I want to alert when any single host is above the threshold. When the instance is not in the "shared" group, this is trivial. But when the query returns many results This causes alertmanager problems.
My promql knowledge is lacking on how to get around this limitation, but these are the things I've tried. Each has a problemdoesn't
topk- flaps between each of the alerting instances as the labels change.
topk(1, sum by (instance, mountpoint, device) (SIZE_QUERY) > 80)
sum by returns too many and puts alertmanager to its knees which breaks our alerting in general
sum by (device, instance) (SIZE_QUERY) > 80
sum by (device, instance, mountpount) (SIZE_QUERY) > 80
max doesn't show the labels which makes notifications hard to debug the problem- what instance, what device?
max(SIZE_QUERY > 80)
Is there a possible solution to this I haven't considered