Google Cloud Monitoring: Add an alert if Publish succeeds and subscribe fails

637 views
Skip to first unread message

Yash Narvaneni

unread,
Oct 4, 2016, 9:56:26 PM10/4/16
to Google Cloud Pub/Sub Discussions

I want to add an alert on Google Cloud Monitoring such that, for a given topic and a subscription, I want to know if a topic is being published then subscriptions are not being acknowledged at the same or similar rate for a given time frame.

How do we achieve that using Alerts in Google Cloud Monitoring or StackDriver?


I have tried an approach where I have 2 conditions to satisfy:

  1. If publish operations > 0.016/sec for 2 minutes (meaning atleast one publish per minute)
  2. If subscribe acknowledgments < 0.001/sec for 2 minutes (If no subscribe acknowledgements happening in 2 minutes)

Then, alert.

Whats happening here is, during low load, if there are no publishes happening say for a span of 3 minutes and a publish happens, both conditions 1 and 2 are set to be true and devs are alerted about this as failure.

So, what is the right way of designing such alerts?

If my approach is close to what I want, the next questions that come to my mind are,

  1. Is there a way to say count your two minutes from the instance where the publish happens to see if acknowledgement condition is satisfying or not.
  2. Or, is there a way to make the alert to wait for 2-3 minutes to see if the incident resolves, and then send an alert to devs.
  3. Or, is there a way we can count the occurances of these conditions satisfying and then alert only if the occurances are more than 5 or 10 in a span of 15 minutes or something like that.

Sorry for the long post. But, any kind of help is appreciated.

Kir Titievsky

unread,
Oct 5, 2016, 10:15:42 AM10/5/16
to Yash Narvaneni, Google Cloud Pub/Sub Discussions
Hi, Yash,

Thanks for the question! Do you think alerting on the subscription "Unacknowledged Messages" metric?  If this number is large or growing really fast, you are publishing faster than you are ack'ing. 

Kir Titievsky | Product Manager | Google Cloud Pub/Sub

--
You received this message because you are subscribed to the Google Groups "Google Cloud Pub/Sub Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-pubsub-discuss+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-pubsub-discuss/0d33041e-508e-49ba-ab00-f5afa98e0126%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Kir Titievsky | Product Manager | Google Cloud Pub/Sub 

Yash Narvaneni

unread,
Oct 7, 2016, 1:54:27 PM10/7/16
to Kir Titievsky, Google Cloud Pub/Sub Discussions
Hi Kir,
I created an alert for a specific subscription with a condition that, if the unacknowledged messages increase at a rate of 20% over 15mins, then alert me. How does this work? I am assuming, that the alert calculates rate of increase of unacknowledged messages every 15mins to see if there is any increase. if yes, it alerts. Am I correct? If not, let me know how it works. Most alerts I am getting are false-alarms. Like the one below..

How do I represent the rate of change from 0-1? Does Google Cloud monitoring consider it 100%? or something else? Because, I am getting alerts like the following,

Google Stackdriver has detected that one of your resources has entered an alert state.

Summary: Unacknowledged Messages for user-create~history is increasing rate of 20 with a value of 0 

Violation Began: 2016-10-07 17:40:11 UTC

View violation details

Document:


Events are not getting acknowledged. The Unacknowledged messages are increasing at a rate of 20% for 15 minutes

and the graph on the metric looks something like, 

Inline image 1

I dont want to track these spikes. I want to track if messages are not being acknowledged for a long duration even if there are messages getting published to the respective topic.


Regards
Yash

Yashodhar Narvaneni
Contact:+1-610-883-3837

Fight On!!

On Wed, Oct 5, 2016 at 7:15 AM, Kir Titievsky <k...@google.com> wrote:
Hi, Yash,

Thanks for the question! Do you think alerting on the subscription "Unacknowledged Messages" metric?  If this number is large or growing really fast, you are publishing faster than you are ack'ing. 

Kir Titievsky | Product Manager | Google Cloud Pub/Sub
On Tue, Oct 4, 2016 at 8:20 PM, Yash Narvaneni <n.yas...@gmail.com> wrote:

I want to add an alert on Google Cloud Monitoring such that, for a given topic and a subscription, I want to know if a topic is being published then subscriptions are not being acknowledged at the same or similar rate for a given time frame.

How do we achieve that using Alerts in Google Cloud Monitoring or StackDriver?


I have tried an approach where I have 2 conditions to satisfy:

  1. If publish operations > 0.016/sec for 2 minutes (meaning atleast one publish per minute)
  2. If subscribe acknowledgments < 0.001/sec for 2 minutes (If no subscribe acknowledgements happening in 2 minutes)

Then, alert.

Whats happening here is, during low load, if there are no publishes happening say for a span of 3 minutes and a publish happens, both conditions 1 and 2 are set to be true and devs are alerted about this as failure.

So, what is the right way of designing such alerts?

If my approach is close to what I want, the next questions that come to my mind are,

  1. Is there a way to say count your two minutes from the instance where the publish happens to see if acknowledgement condition is satisfying or not.
  2. Or, is there a way to make the alert to wait for 2-3 minutes to see if the incident resolves, and then send an alert to devs.
  3. Or, is there a way we can count the occurances of these conditions satisfying and then alert only if the occurances are more than 5 or 10 in a span of 15 minutes or something like that.

Sorry for the long post. But, any kind of help is appreciated.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Pub/Sub Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-pubsub-discuss+unsubscrib...@googlegroups.com.

Yash Narvaneni

unread,
Oct 7, 2016, 2:25:34 PM10/7/16
to Google Cloud Pub/Sub Discussions, k...@google.com
Hi Kir, Do you think if I add a condition that "Unacknowledged Messages" are greater than 0 for the last 15mins, these false alarms can be avoided?

Regards
Yash

Kir Titievsky

unread,
Oct 9, 2016, 10:54:53 PM10/9/16
to Yash Narvaneni, Google Cloud Pub/Sub Discussions
Yash, 

I think rate of change metrics are not very meaningful when you are dealing with such low volumes.  I could help you track down the specific definitions used in the corner cases, but perhaps a question to the Stackdriver experts on the StackOverflow will get you there faster.

Why not start with the absolute number of messages (metric threshold condition on "Unacknowledged messages" for a subscription)?

Let me know if this still does not get you unstuck and I could try to loop in some Stackdriver pros.

Thanks for using Pub/Sub!

k



--
You received this message because you are subscribed to the Google Groups "Google Cloud Pub/Sub Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-pubsub-dis...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages