Stackdriver pod uptime monitoring false positive

337 views
Skip to first unread message

Roman Rusakov

unread,
Mar 25, 2019, 11:15:46 AM3/25/19
to Google Stackdriver Discussion Forum
Hello,

We have a problem with GKE pod uptime monitoring - I see it as running in `kubectl get pods` also I see that service is running well inside it - it produces logs etc. 
However pod uptime alert is still active.

Also uptime metric in the alert description has strange view (rate) but it is not possible to see its value - however maybe this is expected due to the fact that alert description is "Violates when: Any container.googleapis.com/container/uptime stream is absent for greater than 5 minutes"  but still looks strange as one may expect to have a value there like 0/1 
anyway - according to this metric all is good - it shows about 1 r/s all the way.

Right before the alert appeared I was trying to restart pod with scale in to 0 and then to 1 with success - could be related. 

Is this a bug or I should check some configuration settings? 

Thanks, 
Roman

Rory Petty

unread,
Mar 25, 2019, 11:16:47 AM3/25/19
to Roman Rusakov, Kevin Miller, Nikki Oyekunbi, Javier Kohen, Google Stackdriver Discussion Forum

--
© 2016 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Stackdriver Discussion Google Group (google-stackdr...@googlegroups.com) to participate in discussions with other members of the GoogleStackdriver community.
---
You received this message because you are subscribed to the Google Groups "Google Stackdriver Discussion Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-stackdriver-d...@googlegroups.com.
To post to this group, send email to google-stackdr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-stackdriver-discussion/e396feab-07f9-4d68-bdb4-e897502d4759%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nikki Oyekunbi

unread,
Mar 25, 2019, 11:25:36 AM3/25/19
to Rory Petty, Roman Rusakov, Kevin Miller, Javier Kohen, Google Stackdriver Discussion Forum
Hello,
    This is often a source of confusion, but the "uptime" metrics are not from Uptime Checks / Uptime Monitoring. Uptime Check metrics have the suffix "uptime_check/check_passed", etc - please see https://cloud.google.com/monitoring/api/metrics_gcp#gcp-monitoring.

     How was the alert setup? Can you please file feedback from the Stackdriver website and include details on the alert policy and project_id?

Thanks,
        Nikki (Uptime Checks team)

Roman Rusakov

unread,
Mar 25, 2019, 11:49:26 AM3/25/19
to Google Stackdriver Discussion Forum
Alert was set up manually with console, I've sent feedback as you've advised and included the link to this thread 

thank you!

понедельник, 25 марта 2019 г., 17:25:36 UTC+2 пользователь Nikki Oyekunbi написал:
Hello,
    This is often a source of confusion, but the "uptime" metrics are not from Uptime Checks / Uptime Monitoring. Uptime Check metrics have the suffix "uptime_check/check_passed", etc - please see https://cloud.google.com/monitoring/api/metrics_gcp#gcp-monitoring.

     How was the alert setup? Can you please file feedback from the Stackdriver website and include details on the alert policy and project_id?

Thanks,
        Nikki (Uptime Checks team)

On Mon, Mar 25, 2019 at 11:16 AM Rory Petty <rpe...@google.com> wrote:
On Mon, Mar 25, 2019 at 11:15 AM Roman Rusakov <rusak...@gmail.com> wrote:
Hello,

We have a problem with GKE pod uptime monitoring - I see it as running in `kubectl get pods` also I see that service is running well inside it - it produces logs etc. 
However pod uptime alert is still active.

Also uptime metric in the alert description has strange view (rate) but it is not possible to see its value - however maybe this is expected due to the fact that alert description is "Violates when: Any container.googleapis.com/container/uptime stream is absent for greater than 5 minutes"  but still looks strange as one may expect to have a value there like 0/1 
anyway - according to this metric all is good - it shows about 1 r/s all the way.

Right before the alert appeared I was trying to restart pod with scale in to 0 and then to 1 with success - could be related. 

Is this a bug or I should check some configuration settings? 

Thanks, 
Roman

--
© 2016 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Stackdriver Discussion Google Group (google-stackdriver-discu...@googlegroups.com) to participate in discussions with other members of the GoogleStackdriver community.

---
You received this message because you are subscribed to the Google Groups "Google Stackdriver Discussion Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-stackdriver-discussion+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages