Probe failures

5 views
Skip to first unread message

Jan Safranek

unread,
Feb 14, 2019, 8:43:02 AM2/14/19
to kubernetes-sig-storage-wg-csi
I went through all sidecar containers usage of Probe() CSI call and I
found out:

* external-attacher and external-snapshotter call Probe(), but do not
wait until it reports that the driver is ready. It can issue CSI calls
to unready driver. It may be OK, as the driver is likely to return an
error and attacher/snapshotter retries later, but I think it's a bug.

* external-provisioner does not call Probe() at all. IMO, it should.

* cluster-driver-registrar don't call Probe() at all. IMO it's OK.

* Neither node-driver-registrar or kubelet does not call Probe() at all.
IMO one of them should. I'd vote for the registrar, as it's easier to
implement.

I can fix everything expect cluster-registrar to call Probe():
- os.Exit when the driver is not healthy (i.e. grpc returns
FAILED_PRECONDITION)
- wait if the driver is healthy and not ready OR Probe() returns any
other gRPC error code
- continue only when the driver is healthy and ready

CSI spec is not very specific about handling of FAILED_PRECONDITION - it
says the driver is not healthy, but does not say if the driver may
become healthy in a moment (= we should retry Probe()) or it's worthless
to retry.

Serguei Bezverkhi (sbezverk)

unread,
Feb 14, 2019, 1:43:32 PM2/14/19
to Jan Safranek, kubernetes-sig-storage-wg-csi
Hi Jan,

Why not to use livenessprobe sidecar along with kubernetes liveness probe as it is done with hostpath driver example?

Serguei
--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-storage-wg-csi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-stora...@googlegroups.com.
To post to this group, send email to kubernetes-sig...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-storage-wg-csi/9abf6554-41fb-e91e-12b4-046e8f668cdd%40redhat.com.
For more options, visit https://groups.google.com/d/optout.


Michelle Au

unread,
Feb 14, 2019, 1:59:39 PM2/14/19
to Serguei Bezverkhi (sbezverk), Jan Safranek, kubernetes-sig-storage-wg-csi
I think the liveness probe sidecar has a different use case, for restarting the driver container if it becomes unready.  Jan is looking at preventing sidecars from coming up until the driver is ready.

Serguei Bezverkhi (sbezverk)

unread,
Feb 14, 2019, 2:45:47 PM2/14/19
to Michelle Au, Jan Safranek, kubernetes-sig-storage-wg-csi

In this case kubernetes readiness probe should be used again with liveness probe side car. Only in this case it will be used as an indicator of the driver readiness.

Jan Safranek

unread,
Feb 15, 2019, 5:09:59 AM2/15/19
to Serguei Bezverkhi (sbezverk), Michelle Au, kubernetes-sig-storage-wg-csi
The problem I am trying to solve is to when a sidecar container should
start calling a CSI driver and start provisioning / attaching /
snapshotting / registering to kubelet. So far we call it when Probe()
returns with no error code. We should probably call it when Probe()
returns that the driver is ready.

It's unrelated to readiness probe, that only reports that the driver is
ready do Kubernetes, not to our sidecar containers.

OK, what about this:

- Sidecar will wait endlessly for Probe to become ready
- Cluster admin may employ readiness / liveness probe and restart the
driver if it takes too long
- Restart of a driver during Probe() will probably result in Unavailable
error code. Sidecar container will call os.Exit() on this error (and any
other error except for timeout).

PR will follow shortly.

Jan

On 14/02/2019 20:45, Serguei Bezverkhi (sbezverk) wrote:
> In this case kubernetes readiness probe should be used again with
> liveness probe side car. Only in this case it will be used as an
> indicator of the driver readiness.
>
>  
>
> *From: *Michelle Au <ms...@google.com>
> *Date: *Thursday, February 14, 2019 at 1:59 PM
> *To: *sbezverk <sbez...@cisco.com>
> *Cc: *Jan Safranek <jsaf...@redhat.com>, kubernetes-sig-storage-wg-csi
> <kubernetes-sig...@googlegroups.com>
> *Subject: *Re: Probe failures
>
>  
>
> I think the liveness probe sidecar has a different use case, for
> restarting the driver container if it becomes unready.  Jan is looking
> at preventing sidecars from coming up until the driver is ready.
>
>  
>
> On Thu, Feb 14, 2019 at 10:43 AM 'Serguei Bezverkhi (sbezverk)' via
> kubernetes-sig-storage-wg-csi
> <kubernetes-sig...@googlegroups.com
> <mailto:kubernetes-sig...@googlegroups.com>> wrote:
>
> Hi Jan,
>
> Why not to use livenessprobe sidecar along with kubernetes liveness
> probe as it is done with hostpath driver example?
>
> Serguei
>
> On 2019-02-14, 8:43 AM,
> "kubernetes-sig...@googlegroups.com
> <mailto:kubernetes-sig...@googlegroups.com> on behalf of
> Jan Safranek" <kubernetes-sig...@googlegroups.com
> <mailto:kubernetes-sig...@googlegroups.com> on behalf of
> <mailto:kubernetes-sig-storage-wg-csi%2Bunsu...@googlegroups.com>.
>     To post to this group, send email to
> kubernetes-sig...@googlegroups.com
> <mailto:kubernetes-sig...@googlegroups.com>.
>     To view this discussion on the web visit
> https://groups.google.com/d/msgid/kubernetes-sig-storage-wg-csi/9abf6554-41fb-e91e-12b4-046e8f668cdd%40redhat.com.
>     For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "kubernetes-sig-storage-wg-csi" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to
> kubernetes-sig-stora...@googlegroups.com
> <mailto:kubernetes-sig-storage-wg-csi%2Bunsu...@googlegroups.com>.
> To post to this group, send email to
> kubernetes-sig...@googlegroups.com
> <mailto:kubernetes-sig...@googlegroups.com>.

Saad Ali

unread,
Feb 15, 2019, 6:29:59 PM2/15/19
to Jan Safranek, Serguei Bezverkhi (sbezverk), Michelle Au, kubernetes-sig-storage-wg-csi
> - Sidecar will wait endlessly for Probe to become ready

SGTM

> - Cluster admin may employ readiness / liveness probe and restart the driver if it takes too long

SGTM. Serguei, can you verify how long liveness probe waits before it restarts the driver?

> Restart of a driver during Probe() will probably result in Unavailable error code. Sidecar container will call os.Exit() on this error (and any other error except for timeout).

Why os.Exit()? Why not just wait forever and print warnings in logs?

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-stora...@googlegroups.com.
To post to this group, send email to kubernetes-sig...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-storage-wg-csi/876af59c-c7d3-0d36-0193-1de7a361252d%40redhat.com.
Reply all
Reply to author
Forward
0 new messages