Is this a BUG REPORT or FEATURE REQUEST?:
@kubernetes/sig-storage-feature-requests
What happened:
This is a followup to the 1.11 design discussions regarding volume attach limits and how it will work with delayed volume binding. Here's a rough sketch of how we can integrate the two, based off of previous discussions. I'm also trying to consider how this could be extended to work for arbitrary resources.
kind: StorageClass
...
nodeResources:
- name: "attachable-volumes-gce-pd"
quantity: 1
- name: "memory"
quantity: 100M
name
is the name of the resource that is consumedquantity
is how much it consumesCurrently, volume max counts predicate handles resources prefixed with "attachable-volumes", and PodFitsResources predicate handles the rest. Both would have to be extended to also account for resources consumed by unbound and bound PVs via StorageClass.nodeResources.
I'm also trying to see if there's a way we can handle reporting dynamic provisioning capacity for local volumes through a similar or same mechanism. It would need a special resource prefix again for special handling by the volume binding predicate. And if we want to support more than just local volumes, then we would need a way to specify a topology object kind (ie a Rack or Zone CRD, or maybe some first class TopologyResource object) where the allocatable resources are specified. If so, then this may no longer be just nodeResources. I need to see how much overlap there may be with device manager here.
Would like to hear your thoughts on this idea. cc @gnufied @liggitt
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
/cc
It's a little different from it in my mind.
quantity
is the resource a volume consumed, but not total capacity? then how should the predicate make use of it, can you give more details?memory
resource and PodFitsResources predicate? Or it's just a case of "arbitrary resources"?Looks good. One question regarding:
- name: "attachable-volumes-gce-pd"
quantity: 1
Will the quantity
setting for attachable-volumes-*
ever be something other than 1?
What is the use case of supporting arbitrary resources within PV or storageClasses? Before we go there - I think we should gather some real world data and how those resources will be applied and how scheduler has to be modified to take that into account. What happens when individual PVs can share the memory? (like a shared fuse mount). Are memory limits applicable to PVs very much storage implementation specific? what happens to those resource requirements when Storage provider version or implementation changes? Such as AWS EFS vs off-the-shelf NFS? Glusterfs vs ceph-glusterfs or glusterfs v1.x vs glusterfs v2.x
Can the volume attach limit and memory limit of PV be in conflict with each other? Who wins in that case?
You mean the quantity is the resource a volume consumed, but not total capacity?
Sorry I wasn't clear. The total capacity is reported in the Node resources, and the consumption of a to-be-provisioned-PV is reported in the StorageClass. Something like capacity would need to be special cased with a prefix since the capacity is inferred by PVC request.
is there a scenario where a driver provisions a single PV that maps to (or spans over) multiple EBS/PD/AzDisk?
@liggitt had a hypothetical example of a driver raiding two ebs volumes together.
What is the use case of supporting arbitrary resources within PV or storageClasses?
Volume drivers like NFS or Ceph have kernel components that can consume system memory and network bandwidth that is unaccounted for. That being said, I agree, we probably have many other volume scaling issues that are more pressing than this. The most important resources we're trying to deal with currently are attachable limits and capacity. Limits for other system resources could potentially be accounted for through attach limits in the near term, but I wanted to see if we could come up with a more general API that could be extended in the future, if needed.
One more use case to further complicate all of this:
In addition to attach limits per node, GCE PD also has maximum capacity per node of 64TB on most instances. To handle this, I think we need to count the capacity of PVCs on the node (and inline volumes won't work).
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
I updated the design doc on how to integrate the max pd predicate with volume topology: kubernetes/community#2711
I left out supporting a volume taking up multiple attach points and a volume consuming other node resources like memory/cpu. We don't have any concrete use cases or needs for those yet, and we can reconsider if something comes up. The design for integration is much simpler in this case.
/assign
I will pick this up as we discussed. Thanks for updating the design doc!
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
—
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
—
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen
.
Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Closed #65201.