Current rule configuration
<group name="performance_metric,">
<!-- CPU, Memory and Disk usage -->
<rule id="100054" level="3">
<decoded_as>general_health_check</decoded_as>
<description>CPU | MEMORY | DISK usage metrics</description>
</rule>
<!-- High memory usage -->
<rule id="100055" level="12" frequency="2" timeframe="300" ignore="7200">
<if_sid>100054</if_sid>
<field name="memory_usage_%" type="pcre2">^[8-9]\d|100</field>
<description>Memory usage is high: $(memory_usage_%)%</description>
<options>no_full_log</options>
</rule>
<!-- High CPU usage -->
<rule id="100056" level="12" frequency="2" timeframe="300" ignore="7200">
<if_sid>100054</if_sid>
<field name="cpu_usage_%" type="pcre2">^[8-9]\d|100</field>
<description>CPU usage is high: $(cpu_usage_%)%</description>
<options>no_full_log</options>
</rule>
<!-- High disk usage -->
<rule id="100057" level="12" frequency="2" timeframe="300" ignore="7200">
<if_sid>100054</if_sid>
<field name="disk_usage_%" type="pcre2">[8-9]\d|100</field>
<description>Disk space is running low: $(disk_usage_%)%</description>
<options>no_full_log</options>
</rule>
<!-- Load average check -->
<rule id="100058" level="3">
<decoded_as>load_average_check</decoded_as>
<description>load average metrics</description>
</rule>
<!-- memory check -->
<rule id="100059" level="3">
<decoded_as>memory_check</decoded_as>
<description>Memory metrics</description>
</rule>
<!-- Disk check -->
<rule id="100060" level="3">
<decoded_as>disk_check</decoded_as>
<description>Disk metrics</description>
</rule>
</group>
Issue
When one agent triggers the alert, email is sent successfully
If another agent exceeds the threshold within the ignore window, no email is sent within 2 hours (example: agent 1 disk storage up to 80% alert got within 2 hours agent 2 disk storage up to 80% not alert got)
If not used ignore continues mail receive that i not want
I understand ignore works globally per rule ID, but I need alerts to work independently per agent
Questions
Is there a supported way to suppress repeated alerts per agent instead of globally?
Is the recommended approach to avoid ignore for performance metrics and rely on frequency + timeframe?
Are there best practices for disk / CPU / memory alerting in multi-agent environments?
Any guidance or recommended design patterns would be appreciated.