GCE alert policy that check if all node respect condition and send only one alert ?

119 views
Skip to first unread message

David Poulin

unread,
Oct 8, 2021, 10:35:12 AM10/8/21
to gce-discussion
Hi guys,

We have a gcp project that contains 3 instances that we can qualify as worker. All 3 nodes are 99.9% the same and act like a cluster so workload is balancing between all 3 nodes.

If one node is busy, no impact will be on client...if 2 nodes, that should be a warning of something wrong curently hapen...and if all 3 nodes are busy....applications may be impacted and performance degration may occur...

For now on GCP monitoring, we can send alert for each nodes indivudally, we can have a multiple-condition trigger, but will also send multiple alert....

SO,

We need to create some general alert for our application support team that will be triggered only when CPU utilization > 90 on all 3 nodes and send only 1 alert.


Is there a way to that in GCP ? and How to do that?


Ahmad P - Cloud Platform Support

unread,
Oct 12, 2021, 1:51:15 PM10/12/21
to gce-discussion

Hello David,


I was unable to find a way for this. You can create a feature request on the Google public Issue Tracker[1] for this feature on our monitoring product team. At this time, I am not able to provide any ETA or guarantee its implementation.


[1] https://cloud.google.com/support/docs/issue-trackers

Ahmad P - Cloud Platform Support

unread,
Oct 13, 2021, 9:16:45 AM10/13/21
to gce-discussion

I found a way for this:


When you define the alert use this setting:


Resource type: VM instance

Metric: CPU utilization


Filter: instance_ID=~firstVMid| secondVMid|thirdVMid

Period: 1 minute

Configuration: 

Condition triggers if: All time series violate


Condition: is above

Threshold: 0.9

For: most recent


Don’t forget to use pipe| between instance IDs and use the “All time series violate”



MQL:

fetch gce_instance

| metric 'compute.googleapis.com/instance/cpu/utilization'

| filter

    (resource.instance_id

     =~ 'ID1|ID2|ID3')

| group_by 1m, [value_utilization_mean: mean(value.utilization)]

| every 1m

| condition val() > 0.9 '10^2.%'

Ahmad P - Cloud Platform Support

unread,
Oct 13, 2021, 9:45:06 AM10/13/21
to gce-discussion
The condition provided will send only 1 alert if all 3 nodes respect the condition.

Please note that pipe | act more like a OR that we use it to add all 3 VMs, but Condition triggers if: "All time series violate" works as "AND" it means when all 3 respect the condition send 1 alert.
If you use Condition triggers: "Any time series violates" then if one of (not all of) the VMs have the condition then you will have alerts.

David Poulin

unread,
Oct 13, 2021, 10:49:02 AM10/13/21
to Ahmad P - Cloud Platform Support, gce-discussion
Is there any other logic that could be implemented with google that could take multiple alert and combined them in one.

By example, if we use pub/sub and all 3 alerts (1/node) come, just combine it and send the combined one..??

or some kind of sink that will filter...

thank

--
© 2018 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to a topic in the Google Groups "gce-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gce-discussion/f84qVWyq_sE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gce-discussio...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/328da053-ea32-4d4f-b2f6-1f7f4d030a94n%40googlegroups.com.


--
David Poulin
Administrateur de systèmes / Systems administrator
Groupe Canam / Canam Group
T438 940-9339

David Poulin

unread,
Oct 13, 2021, 10:49:30 AM10/13/21
to Ahmad P - Cloud Platform Support, gce-discussion
Hi,

thank for answer.

Just to be sure, the condition provided will send only 1 alert if all 3 nodes respect the condition ????

Normally the pipe | act more like a OR, so from my understanding a notification will be send if 1 node of group of 3...and not if all 3 respect the condition...

Does the "&" can be used to have a "AND" operation ?

thank

--
© 2018 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to a topic in the Google Groups "gce-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gce-discussion/f84qVWyq_sE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gce-discussio...@googlegroups.com.

David Poulin

unread,
Oct 13, 2021, 10:49:31 AM10/13/21
to Ahmad P - Cloud Platform Support, gce-discussion
OK, thank for the clarification.

i will give it a try

--
© 2018 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to a topic in the Google Groups "gce-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gce-discussion/f84qVWyq_sE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gce-discussio...@googlegroups.com.

Ahmad P - Cloud Platform Support

unread,
Oct 13, 2021, 11:52:30 AM10/13/21
to gce-discussion
Example:
Alert.png

David Poulin

unread,
Oct 13, 2021, 6:00:31 PM10/13/21
to Ahmad P - Cloud Platform Support, gce-discussion
Hi,

first i switch to legacy gui to be able to compare with your screenshot.

The only thing different, was the 'most recent value".

The other option seem to be the same on my side too 

David Poulin

unread,
Oct 13, 2021, 6:01:43 PM10/13/21
to Ahmad P - Cloud Platform Support, gce-discussion
Hi,

i did some test with your indication, but seem there still something wrong on my side..maybe i miss something.

Here a few screenshot of my alert policy

image.png

image.png
image.png
value : XXXXX|XXXXX|YYYY  (i follow the same pattern as you)

image.png


The trigger condition seem to be good as no notification is sent until all 3 nodes cpu are over 3 % (for test purppose here), but the result is 3 notifications. 

It's create 3 incidents, 3 message to pub/sub via channel pub/sub, 3 cloud function execution to send to google chat....

Do you see anything that not correct on my test alert ???

Thank

Digil (Google Cloud Platform Support)

unread,
Oct 14, 2021, 4:33:15 PM10/14/21
to gce-discussion
If you create the policy by using the Google Cloud Console, then the default behavior is to send a notification when the condition is met. 

As mentioned by the other community support member in one of his suggestions, I believe the only way to achieve this is by defining the alerting policy with the help of MQL. If it is not giving you the expected result, it might be a bug with the product. I would then recommend you report it using an issue-tracker.  

David Poulin

unread,
Oct 15, 2021, 1:10:40 PM10/15/21
to Digil (Google Cloud Platform Support), gce-discussion
Hi guys,

here my MQL : 

fetch gce_instance
filter
    (resource.instance_id
     =~ '1443841796196913531|5538148825718182650|865168989707161797')
group_by 1m, [value_utilization_meanmean(value.utilization)]
every 1m
condition val() > 0.03 '10^2.%'

That 99% the same as provided by other, except for ID and trigger value.

In my case, no notification will be sent until all 3 nodes are over 3% of CPU (for test purpose), but when ti triggered...3 notifications will be sent (1 by node)

The only thing missing is the 1 alert only...

Is there something else a could check ??

thank

Reply all
Reply to author
Forward
0 new messages