generic and CSI ephemeral volumes - next steps

304 views
Skip to first unread message

Patrick Ohly

unread,
Sep 15, 2020, 10:35:03 AM9/15/20
to kubernetes-...@googlegroups.com
Hello!

As you may have seen, Kubernetes 1.19 got a new alpha feature, generic
ephemeral volumes:
- https://kubernetes.io/blog/2020/09/01/ephemeral-volumes-with-storage-capacity-tracking/
- https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/

Planning for 1.20 just started, so now is a good time to discuss how
this feature should move forward. During the initial review of the
feature, several questions were raised that still need to be answered:
- Do we really need both "generic ephemeral volumes" and "CSI ephemeral
volumes" (the older feature, currently in beta)?
- How can we avoid confusion given that both (at first glance) are so
similar?
- What should the final API look like? Can we somehow unify different
volume sources under a common API?

Let me start by pointing out that the two volume types are indeed very
different: CSI ephemeral volumes are for simple, local driver
deployments that don't need and don't support provisioning and
attaching. Drivers are specifically written for Kubernetes. Generic
ephemeral volumes on the other hand are based on provisioning, which
opens up the whole range of features that are supported for that. No
changes are needed in the driver, the "ephemeral" part is handled
entirely by Kubernetes.

Because they are designed for different use-cases, the API is also
different: CSI ephemeral volumes are parameterized via arbitrary,
custom, per-volume key/value pairs instead of cluster-wide storage
classes. Generic ephemeral volumes have the same parameters as
persistent volumes (custom parameters in the storage class, size and data
source per-volume).

Could "generic ephemeral volumes" replace "CSI ephemeral volumes"?
Probably not. The driver would become more complex and provisioning is
likely to be slower (but that would need to be measured). Developers of
CSI drivers that were specifically written for "CSI ephemeral volumes"
are in a better position than myself to comment on this - personally I
don't mind deprecating and removing that feature... so speak up now! :-)

What I can say is that the "generic ephemeral volumes" approach was
necessary to enable features like storage capacity tracking that simply
wouldn't have fit into the concept of "CSI ephemeral volumes".

Perhaps if naming was more obviously different, there would be less
confusion. If someone has any bright ideas, then please share them. For
my part, I already tried to explain both concepts in the documentation
and blog posts linked above.

When introducing "generic ephemeral volumes", the following API was
chosen with the notion that the new "ephemeral" volume source might get
extended to also include some of the other, already existing ephemeral
volume sources:

ephemeral:
volumeClaimTemplate:
metadata:
labels:
type: my-frontend-volume
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "scratch-storage-class"
resources:
requests:
storage: 1Gi

For example, "CSI ephemeral volumes" could become:

ephemeral:
csi:
driver: inline.storage.kubernetes.io
volumeAttributes:
foo: bar

This basically would replace a flat list of alternatives in the
VolumeSource struct
(https://pkg.go.dev/k8s.io/api/core/v1?tab=doc#VolumeSource) with a
hierarchical structure where all (?) volume sources that are considered
"ephemeral" are grouped in EphemeralVolumeSource
(https://pkg.go.dev/k8s.io/api/core/v1?tab=doc#EphemeralVolumeSource).

IMHO this does not improve the API and the code just becomes more
complex, too. I would prefer to go the other way: remove
EphemeralVolumeSource and put VolumeClaimTemplate directly into the
VolumeSource struct, together with all other volume sources. Tim Hockin
suggested that during the API review.

As a side effect, EphemeralVolumeSource.ReadOnly would get dropped,
something that Saad was interested in because it is only present for
feature parity with other volume sources that have it. If the volume is
supposed to be read-only inside the pod, then this can still be
specified via the VolumeMount.ReadOnly flag.

Because the fields in the two volume source APIs are so different, I
don't see how they can be merged into one struct. Suppose an
"EphemeralVolumeSource" allows specifying all the fields from both
types. We would need an additional enum to distinguish which of the
fields are supposed to be used - this doesn't make sense to me.

It's also not the case that one is a true superset of the other. It
might technically be possible to shoe-horn the fields from CSI ephemeral
volume into the PVC template (parameters -> labels, driver name ->
storage class name, no separate enum to determine the type) and then
treat that like the current CSI ephemeral volume source when there is a
corresponding CSIDriver object, but that API then is not at all obvious
and error prone because the API server cannot validate it well.

Looking at the beta criteria for generic ephemeral volumes
(https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1698-generic-ephemeral-volumes#alpha---beta-graduation),
the API design was covered above. I can handle some of the open
technical work (errors as pod events, tests). That leaves gathering of
feedback for the feature.

Let me start by throwing out some questions. Perhaps a more formal
questionaire would be better, but I'm not sure which tool to use for
that.

Do you think it is useful? Do you plan to use it yourself? For which
use-cases? How many of your pods in the cluster (as percentage and/or
absolute number) will use generic ephemeral volumes? Are those short- or
long-lived? This influences how scaleable and fast this solution needs
to be.

We don't have graduation criteria defined for CSI ephemeral volumes. If
we decide to keep the current API for those, are they perhaps already
ready for GA? If not, what's missing?

--
Best Regards

Patrick Ohly

Kevin Fox

unread,
Sep 15, 2020, 11:50:22 AM9/15/20
to Patrick Ohly, kubernetes-...@googlegroups.com
I helped create the "CSI ephemeral volumes" feature and do use it. I would be impacted if it were to be removed. I agree it is a very different thing then generic ephemeral volumes" and there can be confusion between the two features.

I also think working on the api some more would be useful. For generic, labeling it as ephemeral in the podspec somehow makes some sense to me. The user is stating they want a volume that goes away when done as opposed to a volume that sticks around after the pod goes away. This clarity of purpose I think is important to the user when combined with a feature normally associated with "persistence". They need to ask for data to be deleted. Then k8s can't be blamed when it does what the user told it to do.

The "CSI ephemeral volume" feature was originally intended for "emptyDir" style drivers. Drivers that never were "persistent". Behaving like emptyDir but with some additional features. So having a user flag them as ephemeral in the api spec may not make sense. EX, vault, cert-manager drivers, etc. The name ephemeral was the best name we could come up with that we all could agree upon. But may not be the best name for it when "generic ephemeral volumes" feature is also in play. It just adds confusion IMO. Not sure what to call that style of driver though. Naming things is hard. :)

Asking the user to know what csi is to put it into the volume spec always felt a bit weird api wise also. Not sure if that should be addressed as well.

Maybe moving the csi to the root, making driver a top level property, and then base the fields of volume on the specified driver. make emtpyDir,Secret,hostPath,configmap/pvc show up as drivers but not actually be implemented with csi would be potentially a much cleaner api. Not sure though. All sorts of discussions could probably be had about cleaning up the volumes api, as it was grown over time rather then being carefully designed as a whole.

> Do you think it is useful? Do you plan to use it yourself? For which
use-cases? How many of your pods in the cluster (as percentage and/or
absolute number) will use generic ephemeral volumes? Are those short- or
long-lived? This influences how scaleable and fast this solution needs
to be.

Yes, its useful. For the purposes it was designed for. Things like the image driver or the cert-manager plugin. Percentage of cluster varies over time. They are both long and short lived. As things like volcano become more common I think the shorter term use case will become more common as well.


> We don't have graduation criteria defined for CSI ephemeral volumes. If
we decide to keep the current API for those, are they perhaps already
ready for GA? If not, what's missing?

I think its ready at least in my opinion. But if we wanted to make the api/naming (csi ephemeral) better to make it less confusing in a world where "generic ephemeral" also exists, now would be a good time to do so before GA.

Thanks,
Kevin

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-storage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-st...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-storage/yrjhy2lbcdm6.fsf%40pohly-mobl1.fritz.box.

Patrick Ohly

unread,
Sep 15, 2020, 12:40:44 PM9/15/20
to Kevin Fox, kubernetes-...@googlegroups.com
Kevin Fox <kfox...@gmail.com> writes:
> I helped create the "CSI ephemeral volumes" feature and do use it. I would
> be impacted if it were to be removed.

Don't worry, there are no such plans. On the other hand, we do need to
figure out how to proceed with it because keeping features in beta
in perpetuity is no longer acceptable.

> I also think working on the api some more would be useful. For generic,
> labeling it as ephemeral in the podspec somehow makes some sense to
> me.

I'm fine with:

ephemeral:
metadata:
labels:
type: my-frontend-volume
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "scratch-storage-class"
resources:
requests:
storage: 1Gi

> The "CSI ephemeral volume" feature was originally intended for "emptyDir"
> style drivers. Drivers that never were "persistent". Behaving like emptyDir
> but with some additional features.

I actually see "CSI ephemeral volume" as closer to volume sources like
secrets and configmap, i.e. something that makes relatively small
amounts of data available. For large, ephemeral scratch space as in
emptyDir the storage capacity tracking part is missing.

> So having a user flag them as ephemeral
> in the api spec may not make sense.

Agreed. "Ephemeral" is just one aspect of such volumes and emphasizing
that aspect may distract from what users really expect from the CSI
driver. "Inline" is another aspect that may be equally important (in
particular the per-pod parameters), but again this is just a means to an
end and not the real purpose. No-one is going to say "I want an
ephemeral, inline volume". What they'll ask for instead is "give me
fast, large scratch space" or "give me access to these secrets".

> Maybe moving the csi to the root, making driver a top level property, and
> then base the fields of volume on the specified driver. make
> emtpyDir,Secret,hostPath,configmap/pvc show up as drivers but not actually
> be implemented with csi would be potentially a much cleaner api. Not sure
> though. All sorts of discussions could probably be had about cleaning up
> the volumes api, as it was grown over time rather then being carefully
> designed as a whole.

I don't mind having different volume sources listed as alternatives in
VolumeSource. They are different things which need different APIs.

>> Do you think it is useful? Do you plan to use it yourself? For which
> use-cases? How many of your pods in the cluster (as percentage and/or
> absolute number) will use generic ephemeral volumes? Are those short- or
> long-lived? This influences how scaleable and fast this solution needs
> to be.
>
> Yes, its useful. For the purposes it was designed for. Things like the
> image driver or the cert-manager plugin. Percentage of cluster varies over
> time. They are both long and short lived. As things like volcano become
> more common I think the shorter term use case will become more common as
> well.

So you answered for "CSI ephemeral volumes". That's okay, but beware
that the questions here were meant for "generic ephemeral volumes".

Patrick Ohly

unread,
Oct 30, 2020, 1:24:17 PM10/30/20
to kubernetes-...@googlegroups.com, Saad Ali, Michelle Au
"Patrick Ohly" <patric...@intel.com> writes:
> As you may have seen, Kubernetes 1.19 got a new alpha feature, generic
> ephemeral volumes:
> - https://kubernetes.io/blog/2020/09/01/ephemeral-volumes-with-storage-capacity-tracking/
> - https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/
[...]

So while the discussion with Kevin confirmed that both flavors of
ephemeral volumes are useful, we have not come close to an agreement on
how to proceed. Perhaps a Zoom meeting will allow us to hash out some of
the open questions from my original email.

If you are interested, please indicate your preferred time slot in this
Doodle: https://www.doodle.com/poll/bxn4gg9i9f54r5iv?utm_source=poll&utm_medium=link

If you want to attend, but none of those slots work, please let me know
and perhaps we can arrange something else.

--
Best Regards

Patrick Ohly
Senior Software Engineer

Aldo Culquicondor

unread,
Nov 2, 2020, 2:06:09 PM11/2/20
to kubernetes-sig-storage
Just filled it. It will be useful to have an agenda. Could you send a doc with a draft one?

Patrick Ohly

unread,
Nov 12, 2020, 11:22:35 AM11/12/20
to kubernetes-...@googlegroups.com, Saad Ali, Michelle Au
https://docs.google.com/document/d/1yAe3SPPosgC_QgmnY7oJTmZYWrqLrii1oA4de67DEcw/edit
has meeting notes, but let me also recap the conclusion here.

Generic ephemeral volumes will continue as planned, with no API change
compared to the current alpha. I'll try to get it to beta in 1.21.

There was consensus that having "CSI" in "CSI ephemeral volumes" is
confusing, primarily because not all CSI drivers support it. The Google
doc has various alternatives that were considered. Eventually we agreed
to rename the API:
- VolumeSource.CSI -> VolumeSource.Ephemeral.Custom
CSI.VolumeAttributes -> Custom.Parameters

The existing "CSI" variant will become deprecated. But because it is
part of the v1 VolumeSource API, cannot be removed in practice from the
current API, i.e. the in-tree code will have to support both ways of
specifying it. Each VolumeSource instance may have either CSI or
Ephemeral.Custom, but not both.

"CSI ephemeral volumes" needs an owner who has the time to focus on that
change and can move it forward. Because of the API change, introducing
Ephemeral.Custom as beta (?) is probably the next step instead of going
straight to GA with a new API.
Reply all
Reply to author
Forward
0 new messages