Need helping triggering alert off sum errors over longer alignment intervals for a periodic cronjob

261 views
Skip to first unread message

Shah Akram

unread,
Sep 17, 2018, 2:25:35 PM9/17/18
to Google Stackdriver Discussion Forum
Scenario:

I have a cronjob that runs every 11 minutes and have a user-defined logs-based metric that's counting the number of errors being logged into that job. 

I was hoping to create an alert condition where the number of errors if exceeding a certain count threshold over, say 15 minutes, would satisfy the condition. 

This is currently what the 1-minute interval graph is looking like:

Screenshot 2018-09-17 11.10.45.png


















If I were to change the window to 1 week, the change interval to shifts to 1 hour, which looks like the interval window I'd want:

Screenshot 2018-09-17 11.06.24.png


















I'm not sure how to increase the change interval or if I'm going about this the right way. Any ideas would be much appreciated! Thank you.

Summit Tuladhar

unread,
Sep 17, 2018, 2:41:00 PM9/17/18
to sh...@slytrunk.com, Google Stackdriver Discussion Forum
Hi Akram,
To create an alert condition that alerts on the SUM of counts over a custom alignment period, you will need to use the Alerting API. This is currently not possible to configure in the UI, and this feature will be added soon.

To create it using gcloud CLI, you can follow the steps in https://cloud.google.com/monitoring/alerts/using-alerting-api#api-create-policy

Using gcloud, you can perform an ALIGN_SUM over a custom alignment_period: https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.alertPolicies#Aggregation

Here's an example alert policy:
```
{
  "alertPolicies": [
    {
      "name": "projects/my-project/alertPolicies/9096429217245850895",
      "displayName": "test",
      "combiner": "OR",
      "conditions": [
        {
          "conditionThreshold": {
            "filter": "metric.type=\"logging.googleapis.com/user/go_bananas\" AND resource.type=\"global\"",
            "comparison": "COMPARISON_GT",
            "thresholdValue": 10,
            "duration": "0s",
            "trigger": {
              "count": 1
            },
            "aggregations": [
              {
                "alignmentPeriod": "600s",
                "perSeriesAligner": "ALIGN_SUM",
                "crossSeriesReducer": "REDUCE_SUM"
              }
            ]
          },
          "displayName": "More than 10 bananas in 10 minutes",
          "name": "projects/my-project/alertPolicies/9096429217245850895/conditions/9096429217245849694"
        }
      ],
      "notificationChannels": [
        "projects/my-project/notificationChannels/4539285974347733078"
      ],
      "enabled": true
    }
  ]
}
```

--
© 2016 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Stackdriver Discussion Google Group (google-stackdr...@googlegroups.com) to participate in discussions with other members of the GoogleStackdriver community.
---
You received this message because you are subscribed to the Google Groups "Google Stackdriver Discussion Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-stackdriver-d...@googlegroups.com.
To post to this group, send email to google-stackdr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-stackdriver-discussion/5fc12856-6028-4fdb-9752-c72dcc3c0b32%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shah Akram

unread,
Sep 18, 2018, 12:52:17 PM9/18/18
to Google Stackdriver Discussion Forum
Hi Summit,

Thanks for replying to this post! I created the policy via api. Please see the attached file of the condition info graph.

Alignment period is 15 minutes. Per the graph, the last time we exceeded the threshold, it sustained for 23 minutes. The condition duration is 12 minutes, so it should've met the condition. Any thoughts on the condition setup? Thanks.
Screenshot 2018-09-18 09.42.13.png

Summit Tuladhar

unread,
Sep 18, 2018, 1:11:33 PM9/18/18
to Shah Akram, Google Stackdriver Discussion Forum
The alert condition looks ok to me. Did it not trigger a notification?

--
© 2016 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Stackdriver Discussion Google Group (google-stackdr...@googlegroups.com) to participate in discussions with other members of the GoogleStackdriver community.
---
You received this message because you are subscribed to the Google Groups "Google Stackdriver Discussion Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-stackdriver-d...@googlegroups.com.
To post to this group, send email to google-stackdr...@googlegroups.com.

Shah Akram

unread,
Sep 18, 2018, 1:15:47 PM9/18/18
to Google Stackdriver Discussion Forum
Yeah nope. Would it help if I were to privately provide you the alert policy ID?


On Tuesday, September 18, 2018 at 10:11:33 AM UTC-7, Summit Tuladhar wrote:
The alert condition looks ok to me. Did it not trigger a notification?

On Tue, Sep 18, 2018 at 12:52 PM Shah Akram <sh...@slytrunk.com> wrote:
Hi Summit,

Thanks for replying to this post! I created the policy via api. Please see the attached file of the condition info graph.

Alignment period is 15 minutes. Per the graph, the last time we exceeded the threshold, it sustained for 23 minutes. The condition duration is 12 minutes, so it should've met the condition. Any thoughts on the condition setup? Thanks.



On Monday, September 17, 2018 at 11:25:35 AM UTC-7, Shah Akram wrote:
Scenario:

I have a cronjob that runs every 11 minutes and have a user-defined logs-based metric that's counting the number of errors being logged into that job. 

I was hoping to create an alert condition where the number of errors if exceeding a certain count threshold over, say 15 minutes, would satisfy the condition. 

This is currently what the 1-minute interval graph is looking like:

Screenshot 2018-09-17 11.10.45.png


















If I were to change the window to 1 week, the change interval to shifts to 1 hour, which looks like the interval window I'd want:

Screenshot 2018-09-17 11.06.24.png


















I'm not sure how to increase the change interval or if I'm going about this the right way. Any ideas would be much appreciated! Thank you.

--
© 2016 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Stackdriver Discussion Google Group (google-stackdriver-discu...@googlegroups.com) to participate in discussions with other members of the GoogleStackdriver community.

---
You received this message because you are subscribed to the Google Groups "Google Stackdriver Discussion Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-stackdriver-discussion+unsub...@googlegroups.com.

ddu...@carix.org

unread,
Oct 1, 2018, 4:01:57 AM10/1/18
to Google Stackdriver Discussion Forum
Have you managed to setup right alert condition that works? I've stumbled on the same case and StackDriver docs aren't helping much
Reply all
Reply to author
Forward
0 new messages