Hi,
On 6/24/20 8:09 PM,
yagyans...@gmail.com wrote:
> Hi. Currently I am using a custom threshold in case of my Memory alerts.
> I have 2 main labels for my every node exporter target - cluster and
> component.
> My custom threshold till now has been based on the component as I had to
> define that particular custom threshold for all the servers of the
> component. But now, I have 5 instances, all from different components
> and I have to set the threshold as 97. How do approach this?
>
> My typical node exporter job.
> - job_name: 'node_exporter_JOB-A'
> static_configs:
> - targets: [ 'x.x.x.x:9100' , 'x.x.x.x:9100']
> labels:
> cluster: 'Cluster-A'
> env: 'PROD'
> component: 'Comp-A'
> scrape_interval: 10s
>
> Recording rule for custom thresholds.
> - record: abcd_critical
> expr: 99.9
> labels:
> component: 'Comp-A'
>
> - record: xyz_critical
> expr: 95
> labels:
> node: 'Comp-B'
>
> The expression for Memory Alert.
> ((node_memory_MemTotal_bytes - node_memory_MemFree_bytes -
> node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) *
> on(instance) group_left(nodename) node_uname_info > on(component)
> group_left() (*abcd_critical* or *xyz_critical* or on(node) count by
> (component)((node_memory_MemTotal_bytes - node_memory_MemFree_bytes -
> node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100) * 0 + 90)
>
> Now, I have 5 servers with different components. How to include that in
> the most optimized manner?
This looks almost like the pattern described here:
https://www.robustperception.io/using-time-series-as-alert-thresholds
It looks like you already tried to integrate the two different ways to
specific thresholds, right? Is there any specific problem with it?
Sadly, this pattern quickly becomes complex, especially if nested (like
you would need to do) and if combined with an already longer query (like
in your case).
I can only suggest to try to move some of the complexity out of the
query (e.g. by moving the memory calculation to a recording rule instead).
You can also split the rule into multiple rules (with the same name).
You will just have to ensure that they only ever fire for a subset of
your instances (e.g. the first variant would only fire for
compartment-based thresholds, the second only for instance-based
thresholds).
Hope this helps.
Kind regards,
Christian