Kubernetes dashboard - Incorrect memory usage stats vs GCP console

422 views
Skip to first unread message

Tiaan Swemmer

unread,
May 2, 2019, 3:23:45 AM5/2/19
to Google Stackdriver Discussion Forum
Hello!

I have a Stackdriver policy that fires an incident for high memory usage on Kubernetes workloads: "Violates when: Any kubernetes.io/container/memory/request_utilization stream is above a threshold of 1 for greater than 5 minutes". There's active incidents for some workloads, but when I investigate via the GCP console it seems the memory usage is within limits. Please find screenshots below for both the Stackdriver and GCP console.
Note sure where the memory stats discrepancy comes in?

Stackdriver: (205% memory usage)
Stackdriver-Console-resize.png


GCP: (used memory at 45% of limit)
K8S-Console-resize.png


Thank you
Tiaan

Tiaan Swemmer

unread,
May 9, 2019, 5:23:55 AM5/9/19
to Google Stackdriver Discussion Forum
Hello! Sorry for bumping my own post. Anyone with some insight on memory stats discrepancy between Stackdriver and GCP console?

T

Ruxanda Danetiu

unread,
May 9, 2019, 6:30:16 AM5/9/19
to Tiaan Swemmer, Shoucong Chen, Abhay Mujumdar, Google Stackdriver Discussion Forum
+Shoucong Chen +Abhay Mujumdar can you please look into this discrepancy?

--
© 2016 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Stackdriver Discussion Google Group (google-stackdr...@googlegroups.com) to participate in discussions with other members of the GoogleStackdriver community.
---
You received this message because you are subscribed to the Google Groups "Google Stackdriver Discussion Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-stackdriver-d...@googlegroups.com.
To post to this group, send email to google-stackdr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-stackdriver-discussion/e9a5d637-8bf9-46eb-8d09-a14f2fd1544c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

linux.il

unread,
May 9, 2019, 6:53:58 AM5/9/19
to Ruxanda Danetiu, Tiaan Swemmer, Shoucong Chen, Abhay Mujumdar, Google Stackdriver Discussion Forum
Isn't here different metrics -  usage against *request* and usage against *limit*? 

Tiaan Swemmer

unread,
May 9, 2019, 7:45:27 AM5/9/19
to linux.il, Ruxanda Danetiu, Shoucong Chen, Abhay Mujumdar, Google Stackdriver Discussion Forum
Hello!

Thank you for the reply. Following your response I reviewed the metrics again; you are correct in the request against usage limit, the metric is described as "The fraction of the requested memory that is currently in use on the instance". Should I rather use "Metric: kubernetes.io/container/memory/limit_utilization" to track pod/workloads?

I am perhaps then missing how the percentages are calculated. Below is an image of the (1) current workload/pod and the (2) Policy view/chart - not sure how the (3) "204% total memory usage" is calculated.

GCP
image.png

Stackdriver Policy (used vs requested at approx. 400% and correlates to the above image for used/requested)
image.png

Stackdriver Kubernetes (beta) - shows 204% and incident for policy was fired.
image.png

Thanks!
Tiaan

linux.il

unread,
May 9, 2019, 8:04:51 AM5/9/19
to Tiaan Swemmer, Ruxanda Danetiu, Shoucong Chen, Abhay Mujumdar, Google Stackdriver Discussion Forum
Stackdriver doesn't provide an absolute metric for container's CPU and RAM usage. You can get it either relative to the requested value or limit.
So for instance with 500MB usage, 250MB requested and 1GB limit you will get  200% request_utilization metric and 50% limit_utilization one.
As far as you'll choose the same metric in both systems, you should see consistent numbers. 

Shoucong Chen

unread,
May 9, 2019, 8:53:28 AM5/9/19
to linux.il, Tiaan Swemmer, Ruxanda Danetiu, Abhay Mujumdar, Google Stackdriver Discussion Forum
Hi Tiaan and linux.il,

Thanks a lot for contacting us! And especially thanks to linux.il for providing helpful answers!
Yes, it is the difference between "request utilization" and "limit utilization".
Please let me know if you have additional questions!

PS. Stackdriver has absolute CPU and memory metric ("kubernetes.io/container/cpu/core_usage_time", and "kubernetes.io/container/memory/used_bytes"), but we don't display them by default in our UI, because it is less intuitive and less common than request utilization and limit utilization.

Best Regards,
Shoucong Chen

Tiaan Swemmer

unread,
May 9, 2019, 9:27:33 AM5/9/19
to Shoucong Chen, linux.il, Ruxanda Danetiu, Abhay Mujumdar, Google Stackdriver Discussion Forum
Thanks linux.il and Shoucong!

Regards,
Tiaan
Reply all
Reply to author
Forward
0 new messages