Based on kubernetes/community#306
This issue is meant to track work items and help collaboration between the community on adding support for capacity isolation of shared partitions.
Note: Priority of these features can change based on the number of collaborators.
v1.7
Note: @jingxu97 has already been making progress on some of the items for v1.7.
Size
(Owner: )v1.8
v1.9
cc @jingxu97 @kubernetes/sig-node-feature-requests @kubernetes/sig-storage-feature-requests
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
@derekwaynecarr kindly vet this plan and add any missing items.
I'd like to contribute "scheduler predicate" if not owner now :).
This is pretty impactful, we should enqueue this for the next SIG meeting.
@k82cn some of the work items have already been been prototyped including the predicates you mention. Give us a couple of days to update owners on the list of work items.
@timothysc I wonder if in-person discussions about this should be channeled to the "resource management working group" ... I'm a little worried about spreading discussion between node and scheduling.
@davidopp +1 on having discussions in the resource management working group.
A naive question what's "resource management working group" ?
A naive question what's "resource management working group" ?
We've mentioned it on the scheduling SIG mailing list. Please join the scheduling SIG mailing list
https://groups.google.com/forum/#!forum/kubernetes-sig-scheduling
The resource management working group was mentioned here
https://groups.google.com/d/msg/kubernetes-sig-scheduling/a-nvvjjP70M/ZeiC1BfHFAAJ
@davidopp Thanks, I am happy to take some task if needed.
@davidopp I'm ok with that, but we should PSA broadcast in the SIGs about topics like this to ensure interested parties are "in the loop".
/cc @kensimon
Kubelet specifies container logs and overlay limits via CRI to let runtimes apply limits (Owner: @jingxu97)
What do you mean by that? I don't see anything about CRI and runtime applying limits in kubernetes/community#306 How does it interact with log rotation?
@crassirostris It will be an extension to the existing CRI API. kubernetes/community#306 is a high level user facing proposal. A detailed design doc will be published by @jingxu97 at-least for the v1.7 work items.
A short summary around logging is as follows:
To prevent I/O abuse, kubelet through the CRI needs to have runtimes rate-limit logging from container's stdout and stderr. This is not possible with all runtimes, and so it will be an optional features which some runtimes will support, for example containerd.
Kubelet needs to keep long running services available for weeks or months and not evict them because they used up all their log space. For this reason, kubelet will have to rotate logs on demand to keep the overall log usage within user specified limits.
Now the kubelet needs to interact with logging agents to prevent premature log rotation. This is an API I'm hoping you'd drive. Generate requirements around rotation and we can discuss having the kubelet meet those requirements.
Does this makes sense?
May be this topic requires a dedicated meeting?
To prevent I/O abuse, kubelet through the CRI needs to have runtimes rate-limit logging from container's stdout and stderr. This is not possible with all runtimes, and so it will be an optional features which some runtimes will support, for example containerd.
@vishh is there an issue for this? We need to define what "rate limit" actually means. For example, journald just dropped log messages, and it could be frustrating to debug without relevant information.
@yujuhong None exists as of now. We do not plan on tackling logging until v1.8.
If it's urgent we can at-least spec out an API in v1.7 time frame.
I'm thinking of not dropping but instead just blocking. For example in the case of containerd, if a container is limited to 100 write IOPS, then the containerd logger can stall writes if that limit has been exceeded. This will result in application stalling which will hopefully be caught by application level metrics (using a latency metric around glog for example).
It seems all work items for 1.7 are done, should we move this to 1.8 milestone?
[MILESTONENOTIFIER] Milestone Labels Incomplete
Action required: This issue requires label changes. If the required changes are not made within 6 days, the issue will be moved out of the v1.8 milestone.
kind: Must specify at most one of ['kind/bug', 'kind/feature', 'kind/cleanup'].
priority: Must specify at most one of ['priority/critical-urgent', 'priority/important-soon', 'priority/important-longterm'].
'kind/feature'
'priority/important-soon'
[MILESTONENOTIFIER] Milestone Labels Incomplete
Action required: This issue requires label changes. If the required changes are not made within 6 days, the issue will be moved out of the v1.8 milestone.
kind: Must specify at most one of ['kind/bug', 'kind/feature', 'kind/cleanup'].
priority: Must specify at most one of ['priority/critical-urgent', 'priority/important-soon', 'priority/important-longterm'].
—
Action required: This issue requires label changes. If the required changes are not made within 5 days, the issue will be moved out of the v1.8 milestone.
kind: Must specify at most one of ['kind/bug', 'kind/feature', 'kind/cleanup'].
priority: Must specify at most one of ['priority/critical-urgent', 'priority/important-soon', 'priority/important-longterm'].
—
/kind feature
/priority important-soon
[MILESTONENOTIFIER] Milestone Labels Complete
Issue label settings:
[MILESTONENOTIFIER] Milestone Labels Complete
Issue label settings:
I retarget this to v1.9 since all work items for 1.8 are done.
[MILESTONENOTIFIER] Milestone Issue Needs Approval
@jingxu97 @vishh @kubernetes/sig-node-bugs @kubernetes/sig-scheduling-bugs @kubernetes/sig-storage-bugs
Action required: This issue must have the status/approved-for-milestone
label applied by a SIG maintainer. If the label is not applied within 6 days, the issue will be moved out of the v1.9 milestone.
sig/node
sig/scheduling
sig/storage
: Issue will be escalated to these SIGs if needed.priority/important-soon
: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.kind/feature
: New functionality.[MILESTONENOTIFIER] Milestone Issue Needs Approval
@jingxu97 @vishh @kubernetes/sig-node-bugs @kubernetes/sig-scheduling-bugs @kubernetes/sig-storage-bugs
Action required: This issue must have the status/approved-for-milestone
label applied by a SIG maintainer. If the label is not applied within 5 days, the issue will be moved out of the v1.9 milestone.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen
comment.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta
.
/lifecycle stale
/lifecycle frozen
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
/lifecycle frozen
/remove-kind bug
@jingxu97 where are we with this?
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
/cc
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.