Hello!First of all, the new /stats/summary endpoint is AWESOME. It's a huge step up from the kubelet prometheus metrics we've been using and saved me probably a dozen hours of trying to interpret and parse the return values of the original /stats endpoint. It's generally really well-designed and has been enjoyable to work with.
I'm now writing a scraper that takes /stats/summary and converts it into OpenTSDB metrics for nodes and pods, I came up with some questions and feedback. I'm happy to make PR's for all of the below but I'd appreciate guidance on what makes sense to do versus what might just be my misunderstandings.1. I like to use LoadAverage a lot when determining the health of a system, so I was hoping to add LoadAverage to CPUStats - basically stats.CPUStats.LoadAverage would be set from cadvisorapiv2.ContainerStats.Cpu.LoadAverage. Any reason not to?
2. MemoryStats currently only contains UsageBytes and WorkingSetBytes but not RSS. I guess the questions I most care about is:
* Which one is the generous number that includes linux buffer cache/swap/etc.?
* Which one is the number where if it goes over the memory limit the container gets oomkilled? Or actually, if a container is given a hard memory limit and then goes over, does it get oomkilled or does it start swapping?
Basically I'd love more context on how UsageBytes and WorkingSetBytes are practically used.
3. I found the naming of UsageNanoCores and UsageCoreNanoSeconds very difficult to understand. Reading the descriptions of each, would we consider a rename to AverageCPUNanoSeconds and TotalCPUNanoSeconds respectively?
4. UsageNanoCores has the comment "Total CPU usage (sum of all cores) averaged over the sample window." What is the sample window?
5. Network currently has Transfer and Errors - why not include the other cadvisor parameters Dropped and Packets? People can ignore data if they don't need it, but by cutting of access altogether you restrict their ability to get access to metrics that may be helpful for their debugging case.
Thanks again! I'd love to get these into 1.2, but if nothing else 3 would be ideal to do ASAP if the naming change makes sense, since this API hasn't been released yet.
----Sam
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To post to this group, send email to kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/b8ba797d-e685-46d6-b5de-c85cc6b865d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
>>> email to kubernetes-sig-node+unsub...@googlegroups.com.
>>> To post to this group, send email to
>>> kubernete...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/kubernetes-sig-node/b8ba797d-e685-46d6-b5de-c85cc6b865d6%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>
--Sam
Re: adding too many fields - just adding them to /stats/summary does not mean heapster automatically processes them, right?
I don't understand why it's better to make people run their own cAdvisor when you're already doing all the calculating and rendering and just have to output it.
3. Thanks Dawn for renaming these fields pre-1.2. My understanding is that this API has not been released yet so breaking changes would be best to do right now. Re: the specific names, what is the issue with TotalCPUNanoSeconds? It seems to describe the field pretty well. Alternative would be AggregateCPUNanoSeconds. Re: AverageCPUNanoSeconds, what about SampledCPUNanoSeconds? Also I think Nanoseconds is one word, so S of seconds should probably be lowercase, like TotalCPUNanoseconds and SampledCPUNanoseconds (or whatever we pick).
>>> email to kubernetes-sig-...@googlegroups.com.
>>> To post to this group, send email to
>>> kubernete...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/kubernetes-sig-node/b8ba797d-e685-46d6-b5de-c85cc6b865d6%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>
--Sam
--
You received this message because you are subscribed to a topic in the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-sig-node/txBjT8-WvM0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CADtktAWNdKGWAMMEay4C_c7bios8shxyBqeSFGki8vE2S%2BbvSg%40mail.gmail.com.