Feedback around kubelet's /stats/summary

866 views
Skip to first unread message

s...@box.com

unread,
Mar 9, 2016, 4:17:08 PM3/9/16
to kubernetes-sig-node
Hello!

First of all, the new /stats/summary endpoint is AWESOME. It's a huge step up from the kubelet prometheus metrics we've been using and saved me probably a dozen hours of trying to interpret and parse the return values of the original /stats endpoint. It's generally really well-designed and has been enjoyable to work with.

I'm now writing a scraper that takes /stats/summary and converts it into OpenTSDB metrics for nodes and pods, I came up with some questions and feedback. I'm happy to make PR's for all of the below but I'd appreciate guidance on what makes sense to do versus what might just be my misunderstandings.

1. I like to use LoadAverage a lot when determining the health of a system, so I was hoping to add LoadAverage to CPUStats - basically stats.CPUStats.LoadAverage would be set from cadvisorapiv2.ContainerStats.Cpu.LoadAverage. Any reason not to?

2. MemoryStats currently only contains UsageBytes and WorkingSetBytes but not RSS. I guess the questions I most care about is:
* Which one is the generous number that includes linux buffer cache/swap/etc.?
* Which one is the number where if it goes over the memory limit the container gets oomkilled? Or actually, if a container is given a hard memory limit and then goes over, does it get oomkilled or does it start swapping?
Basically I'd love more context on how UsageBytes and WorkingSetBytes are practically used.

3. I found the naming of UsageNanoCores and UsageCoreNanoSeconds very difficult to understand. Reading the descriptions of each, would we consider a rename to AverageCPUNanoSeconds and TotalCPUNanoSeconds respectively? 

4. UsageNanoCores has the comment "Total CPU usage (sum of all cores) averaged over the sample window." What is the sample window?

5. Network currently has Transfer and Errors - why not include the other cadvisor parameters Dropped and Packets? People can ignore data if they don't need it, but by cutting of access altogether you restrict their ability to get access to metrics that may be helpful for their debugging case.

Thanks again! I'd love to get these into 1.2, but if nothing else 3 would be ideal to do ASAP if the naming change makes sense, since this API hasn't been released yet.

-- 
Sam

Vishnu Kannan

unread,
Mar 9, 2016, 4:39:14 PM3/9/16
to s...@box.com, kubernetes-sig-node, Tim St. Clair, Dawn Chen
+Tim St. Clair, +Dawn Chen


On Wed, Mar 9, 2016 at 1:17 PM, sam via kubernetes-sig-node <kubernete...@googlegroups.com> wrote:
Hello!

First of all, the new /stats/summary endpoint is AWESOME. It's a huge step up from the kubelet prometheus metrics we've been using and saved me probably a dozen hours of trying to interpret and parse the return values of the original /stats endpoint. It's generally really well-designed and has been enjoyable to work with.
Great.. Thanks for sharing your feedback..

I'm now writing a scraper that takes /stats/summary and converts it into OpenTSDB metrics for nodes and pods, I came up with some questions and feedback. I'm happy to make PR's for all of the below but I'd appreciate guidance on what makes sense to do versus what might just be my misunderstandings.

1. I like to use LoadAverage a lot when determining the health of a system, so I was hoping to add LoadAverage to CPUStats - basically stats.CPUStats.LoadAverage would be set from cadvisorapiv2.ContainerStats.Cpu.LoadAverage. Any reason not to?
Cost of calculating the load averages might be a concern. I can't think of anything else. 

2. MemoryStats currently only contains UsageBytes and WorkingSetBytes but not RSS. I guess the questions I most care about is:
RSS should be added. cAdvisor support for RSS was merged while the summary API was being finalized.
 
* Which one is the generous number that includes linux buffer cache/swap/etc.?
UsageBytes includes cold pages. So that should be the biggest of the lot. 
 
* Which one is the number where if it goes over the memory limit the container gets oomkilled? Or actually, if a container is given a hard memory limit and then goes over, does it get oomkilled or does it start swapping?
WorkingSet is what the Kernel uses while dealing with OOMs. The kernel will attempt to swap out pages if possible, or get rid of some text pages, before giving up and OOMing a container. So the working set is usually lower with memory limits in place. 

Basically I'd love more context on how UsageBytes and WorkingSetBytes are practically used.
From an end user perspective, RSS will be the primary concern. Apart from that WorkingSetBytes getting close to Limit will be a secondary concern. 
If the API documentation doesn't convey this already, we should fix that.

3. I found the naming of UsageNanoCores and UsageCoreNanoSeconds very difficult to understand. Reading the descriptions of each, would we consider a rename to AverageCPUNanoSeconds and TotalCPUNanoSeconds respectively? 
We found that users have trouble parsing nanoseconds as CPU usage. A rename might be possible in the future. We were hoping to not have to rename/delete fields in the current API unless its absolutely necessary, to provide some level of compatibility to clients across kubelet versions. That said, this API is currently alpha grade.

4. UsageNanoCores has the comment "Total CPU usage (sum of all cores) averaged over the sample window." What is the sample window?
The window is tied to cAdvisor's housekeeping interval, which is 10s as of now. It is a sequential window.
 
 

5. Network currently has Transfer and Errors - why not include the other cadvisor parameters Dropped and Packets? People can ignore data if they don't need it, but by cutting of access altogether you restrict their ability to get access to metrics that may be helpful for their debugging case.
Those additional network metrics are cheap. So I don't see why we cannot add them. 
From a project goals perspective, cAdvisor exposes a lot of data. Our hope was to only surface what is absolutely necessary for running a kubernetes cluster. In the future, we want to only include a few features of cAdvisor in the kubelet. That would mean restricted set of metrics. Users will still be able to run cAdvisor as a standalone container. As part of that transition, Tim is planning on reusing the summary API in cAdvisor as well.

Thanks again! I'd love to get these into 1.2, but if nothing else 3 would be ideal to do ASAP if the naming change makes sense, since this API hasn't been released yet.
Can we solve `3` with better documentation? `Average` keyword doesn't convey the fact that its a sequential window. 

As for solving any of them in v1.2, I'd let Dawn comment on that.
 

-- 
Sam

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To post to this group, send email to kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/b8ba797d-e685-46d6-b5de-c85cc6b865d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim St. Clair

unread,
Mar 9, 2016, 6:15:05 PM3/9/16
to Vishnu Kannan, s...@box.com, kubernetes-sig-node, Dawn Chen
Thanks for the feedback, I'm glad the new API is proving useful. I think it is very unlikely that we can change anything other than bug fixes & documentation for 1.2 at this point, but as Vish pointed out we labeled this an alpah API, so we may consider changing it in a future release.

WRT adding additional stats to the API (Load, RSS, etc.), I think we need to figure out how we want to define it. Currently the the stats included in the summary are defined by Heapster's needs, so if we decide to include the stats in heapster we would also need to include them in the summary. We are currently seeing performance benefits from sending less data to Heapster, so I'm hesitant to add much more. That said, I think RSS should probably be added. One option we could consider for the future is providing a summary request API to select which fields are (not) included.

Dawn Chen

unread,
Mar 9, 2016, 7:09:35 PM3/9/16
to Tim St. Clair, Vishnu Kannan, Sam Ghods, kubernetes-sig-node
Sam, thanks for using the new API and providing the feedback. I just
advertised this new API at yesterday's sig-node meeting. :-)

Like what Tim and Vishnu mentioned above, we tried to separate stats /
metrics into two categories, and only absolutely required by upstream
control systems will be exported by Kubelet. Rest of stats / metrics
for monitoring purpose will be served through a standalone cAdvisor
pod or some other plugin solution. /stats/summary is the intermediate
step toward to that goal.

This new API is at alpha stage, but I assume after 1.2 release, we
won't do much changes except adding more fields / stats to the
existing API objects, and changing the endpoint. Renaming the existing
fields / stats would be much harder. In this case, I am open to rename
the confusing UsageNanoCores and UsageCoreNanoSeconds, but on another
hand, I don't think AverageCPUNanoSeconds and TotalCPUNanoSeconds make
this less confusion. I guess better description and comments for those
fields might be the solution if we cannot reach an agreement.
Thoughts?

Sam Ghods

unread,
Mar 9, 2016, 8:47:14 PM3/9/16
to Dawn Chen, Tim St. Clair, Vishnu Kannan, kubernetes-sig-node
Thank you all for the prompt responses!

Re: adding too many fields - just adding them to /stats/summary does not mean heapster automatically processes them, right? And I imagine we expect /stats/summary to have more consumers than just heapster. So is there an issue with adding more fields? To me, the difference between /stats/summary and the other metrics endpoints is in the structure and usability of the information, not the quantity. I see how prometheus metrics like requests per second, latencies, etc. shouldn't get brought in, but I don't understand why it's better to make people run their own cAdvisor when you're already doing all the calculating and rendering and just have to output it.

Re: my suggestions above:

1. Load: LoadAverage is already smoothed over the last 10 seconds by cAdvisor in cadvisorapiv2.ContainerStats.Cpu.LoadAverage. I would recommend just pulling that field, which should not add any overhead.

2. I can make a PR to add RSS.

3. Thanks Dawn for renaming these fields pre-1.2. My understanding is that this API has not been released yet so breaking changes would be best to do right now. Re: the specific names, what is the issue with TotalCPUNanoSeconds? It seems to describe the field pretty well. Alternative would be AggregateCPUNanoSeconds. Re: AverageCPUNanoSeconds, what about SampledCPUNanoSeconds? Also I think Nanoseconds is one word, so S of seconds should probably be lowercase, like TotalCPUNanoseconds and SampledCPUNanoseconds (or whatever we pick).

5. I can make a PR to add Dropped and Packets, but it seems like approval for this one is fuzzier depending on the outcome of the conversation above re: how much /stats/summary should have. I'll await further clarification on that point before proceeding on adding these.
--
Sam

pszcz...@google.com

unread,
Mar 10, 2016, 7:57:17 AM3/10/16
to kubernetes-sig-node, dawn...@google.com, stc...@google.com, vis...@google.com, Marcin Wielgus, Filip Grzadkowski
+mwielgus@, filipg@

Sam,

If you are interested in exporting metrics to OpenTSDB it should work out of the box in Heapster. The implementation of the code which exports data into OpenTSDB you can find here. Since we are not very familiarized with OpenTSDB it would be gorgeous if you could test it. Also if there is lack of any features you need feel free contribute. Don't hesitate to reach me or mwielgus@ in case of troubles.

Thanks,
Piotr


>>> To post to this group, send email to
>>> kubernete...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/kubernetes-sig-node/b8ba797d-e685-46d6-b5de-c85cc6b865d6%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>



--
Sam

Tim St. Clair

unread,
Mar 10, 2016, 1:57:51 PM3/10/16
to Piotr Szczesniak, kubernetes-sig-node, Dawn Chen, Vishnu Kannan, Marcin Wielgus, Filip Grzadkowski
Re: adding too many fields - just adding them to /stats/summary does not mean heapster automatically processes them, right? 
 
There is still a (small) overhead for serializing & deserializing the extra data. It's small enough that for important stats we don't need to worry about it, but I just wanted to point out that it's not "free".

I don't understand why it's better to make people run their own cAdvisor when you're already doing all the calculating and rendering and just have to output it.

To clarify here, we are planning on moving most of cAdvisor's responsibilities (probably including the summary API) out of the kubelet, and into a standalone pod (run by default as a cluster addon). This won't change the API, just the endpoints.
 
3. Thanks Dawn for renaming these fields pre-1.2. My understanding is that this API has not been released yet so breaking changes would be best to do right now. Re: the specific names, what is the issue with TotalCPUNanoSeconds? It seems to describe the field pretty well. Alternative would be AggregateCPUNanoSeconds. Re: AverageCPUNanoSeconds, what about SampledCPUNanoSeconds? Also I think Nanoseconds is one word, so S of seconds should probably be lowercase, like TotalCPUNanoseconds and SampledCPUNanoseconds (or whatever we pick).
 
I think Dawn was suggesting that we don't change the name, but rather fix the documentation.
WRT your name suggestions:
- I think CPU in the name is redundant (it lives in the CPUStats struct)
- Nanoseconds is not the right unit for the average usage (core-seconds / second = cores). I suppose this is implicit with the word "average", but I prefer units to be explicit to minimize mismatched unit issues. FWIW heapster uses "CpuUsageRate".
- I prefer Cumulative to Total, as it makes it clear that it's total over time, not total over something else (e.g. processors, processes)

I don't mean to bikeshed here, I just want to make sure that if we do go through the trouble of changing the name, we get it right :)

 

>>> To post to this group, send email to



--
Sam

Sam Ghods

unread,
Mar 10, 2016, 3:04:18 PM3/10/16
to Tim St. Clair, Piotr Szczesniak, kubernetes-sig-node, Dawn Chen, Vishnu Kannan, Marcin Wielgus, Filip Grzadkowski
Re: naming, that rationale makes sense - I picked UsageRate and CumulativeNanoseconds as a first draft and created a PR https://github.com/kubernetes/kubernetes/pull/22817. I don't like racing the train as much as the next person but given (a) the API isn't used yet and (b) my subjective opinion that these names are considerably better, I'd love to get this in. Also happy to use different names if we come up with better ones.

--
You received this message because you are subscribed to a topic in the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-sig-node/txBjT8-WvM0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-sig-...@googlegroups.com.

To post to this group, send email to kubernete...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Sam
Reply all
Reply to author
Forward
0 new messages