Re: [kubernetes/kubernetes] Allow attach of volumes to multiple nodes for vSphere (#51066)

Huamin Chen

unread,

Aug 22, 2017, 9:39:26 AM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

I am not convinced this fix is safe for the cases when the kubelet is down yet the node is still up, and so are the pods that are using the volume.

Per VMWare manuals, multi-writers option is disabled by default.

@kubernetes/sig-storage-pr-reviews

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

divyenpatel

unread,

Aug 22, 2017, 10:26:31 AM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@BaluDontu
As we discussed in the team, another area of concern is the vSphere HA with VM Monitoring enabled. Upon VM crash or guest tool is not responding, VM will be reset. Lets test out that scenario.

k8s-ci-robot

unread,

Aug 22, 2017, 10:27:03 AM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@BaluDontu: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-kubernetes-e2e-kops-aws	`cfdff1a`	link	`/test pull-kubernetes-e2e-kops-aws`
pull-kubernetes-e2e-gce-etcd3	`cfdff1a`	link	`/test pull-kubernetes-e2e-gce-etcd3`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot

unread,

Aug 22, 2017, 10:38:27 AM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@BaluDontu: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command

pull-kubernetes-e2e-gce-etcd3	`cfdff1a`	link	`/test pull-kubernetes-e2e-gce-etcd3`
pull-kubernetes-e2e-kops-aws	`cfdff1a`	link	`/test pull-kubernetes-e2e-kops-aws`

BaluDontu

unread,

Aug 22, 2017, 10:46:52 AM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

/retest

divyenpatel

unread,

Aug 22, 2017, 10:51:17 AM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@BaluDontu

Alternative solution is to return false for vsphere in isMultiAttachForbidden() function.
And make a change in DisksAreAttached in the vsphere.go - https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/vsphere/vsphere.go#L567

In DisksAreAttached function we can check node VM's power state, if node is powered off, detach disks and then set false for the detached disks in the returned map.

This way we will not require to wait for 6 minutes for pods to recreate on another node.

BaluDontu

unread,

Aug 22, 2017, 10:53:08 AM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@divyenpatel , we cannot do a Detach call in DisksAreAttached() function. DisksAreAttached by itself is to only check whether disks are attached or not. We cannot do any detach here.

Huamin Chen

unread,

Aug 22, 2017, 11:09:09 AM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

refer to #50239

@jingxu97

BaluDontu

unread,

Aug 22, 2017, 11:21:43 AM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@rootfs : #50239 is a different issue altogether. It just addresses the scenario when node is not found in vCenter. But in the present case, node is still in the vCenter which is causing this problem.
When node is deleted from vCenter, volumes are detached automatically but when node is powered off in vCenter, volumes still remain attached to the node.

k8s-ci-robot

unread,

Aug 22, 2017, 11:48:52 AM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@BaluDontu: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command

pull-kubernetes-e2e-kops-aws	`cfdff1a`	link	`/test pull-kubernetes-e2e-kops-aws`
pull-kubernetes-e2e-gce-etcd3	`cfdff1a`	link	`/test pull-kubernetes-e2e-gce-etcd3`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

—

k8s-ci-robot

unread,

Aug 22, 2017, 11:52:37 AM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@BaluDontu: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command

pull-kubernetes-e2e-gce-etcd3	`cfdff1a`	link	`/test pull-kubernetes-e2e-gce-etcd3`
pull-kubernetes-e2e-kops-aws	`cfdff1a`	link	`/test pull-kubernetes-e2e-kops-aws`

Huamin Chen

unread,

Aug 22, 2017, 12:06:53 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@BaluDontu you can check the VM power state there

Huamin Chen

unread,

Aug 22, 2017, 12:07:26 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

/lgtm cancel

let's assess alternatives and impacts

Kubernetes Submit Queue

unread,

Aug 22, 2017, 12:08:23 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BaluDontu, saad-ali

Associated issue: 50944

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/controller/volume/attachdetach/OWNERS~~ [saad-ali]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

BaluDontu

unread,

Aug 22, 2017, 12:15:09 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@rootfs : Yes if we check the power state, do you want to detach the volume at that point when we are checking for DisksAreAttached()? The problem, DisksAreAttached() should report whether the volume is attached or not irrespective of power state. It should not be dependent on the power state. If suppose, if you return that a volume is not attached to the powered off node, then kubernetes will never attempt to remove the volume from the node.

If you think there is any other way that we can handle this, please let me know?

Huamin Chen

unread,

Aug 22, 2017, 12:23:39 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@BaluDontu I like the way @divyenpatel proposed. In addition, n DisksAreAttached, check power state of the VM. If the VM is powered off, detach the volume there, so in case the VM is restarted, it won't grab the volume.

BaluDontu

unread,

Aug 22, 2017, 12:47:47 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@rootfs : The question is would you want to perform an action different from what the function is intended to do? In this calling Detach() in DisksAreAttached() function? I think its quite misleading.

Saad Ali

unread,

Aug 22, 2017, 2:31:57 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

I am not convinced this fix is safe for the cases when the kubelet is down yet the node is still up, and so are the pods that are using the volume.

Per VMWare manuals, multi-writers option is disabled by default.

@BaluDontu

As we discussed in the team, another area of concern is the vSphere HA with VM Monitoring enabled. Upon VM crash or guest tool is not responding, VM will be reset. Lets test out that scenario.

It sounds like multi-attach is not safe for vSphere volumes. Removing approval.

@BaluDontu I like the way @divyenpatel proposed. In addition, n DisksAreAttached, check power state of the VM. If the VM is powered off, detach the volume there, so in case the VM is restarted, it won't grab the volume.

Having a volume implicitly detach during a DisksAreAttached(...) operation, whose sole purpose is to verify a volume is attached, not actually mutate anything, is really ugly. But if it works, and doesn't cause any regressions, I'm fine with it. Make sure to consider the case that the node the volume is attached to is powered down, and 2 or more pods referencing the same volume are scheduled to different nodes: I think that case should be safe, since they will all fight to detach the volume from the power down node, and then the attach/detach controller should allow only one to succeed in attaching (but you should verify).

Jing Xu

unread,

Aug 22, 2017, 5:07:19 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

I do not agree to add Detach operation in DisksAreAttached() operation. Because Detach operation will also involve internal cache (actual state) update operations which are not in VerifyVolumesAreAttached function. If you allow detach in VerifyVolumesAreAttached, how to define the behavior of this function, it will cause quite confusion. This function is already complicated due to bulkvolume change.
Currently, power off VM instance will cause detach delay for all volume types because controller does not know whether it is safe to detach now or not (not sure whether volume is still mounted) based on current node status.

I am working on a proposal of possible using VerifyVolumesAreAttached before issuing detach to solve some other problem. In this case, if we can check since node is powered off, mounts are gone, we can safely to detach the volume.

BaluDontu

unread,

Aug 22, 2017, 6:18:35 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@jingxu97 : I have one question. Once we detach the volume in DisksAreAttached() and return that this set of volumes are no longer attached to this node. Wouldn't it update the state in this case?

Jing Xu

unread,

Aug 22, 2017, 6:29:28 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@BaluDontu actually I need to take my comments back. Yes, it will update the actual state if volumes are detached. But this VerifyVolumesAreAttached is called periodically by default 1 min, but user could change the frequency and even disable this function. I don't this is a reliable way to solve this problem.

Saad Ali

unread,

Aug 22, 2017, 6:54:19 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

Spoke with @jingxu97 offline. And I agree. Detaching in DisksAreAttached(...) may result in undesired behavior if the node is powered on very quickly, and possibly other scenarios.

You can still go down that route if you've tested it and are comfortable with it. Otherwise, this PR, as is (whitelisting vSphere for multi-attach), might be a less risky work around considering this was the behavior for vSphere in 1.6 and earlier.

BaluDontu

unread,

Aug 22, 2017, 9:12:24 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

We analyzed the following issues with the current PR and the current approach looks more safe than other approach where we plan to detach the disk in DisksAreAttached().

User power off VM.
- Pod is instantly available on other node with volume attached successfully.
- If user tries to power on the powered off node, vsphere doesn't allow it to power on because volume is in use on other node.
- Kubernetes after 6 minutes successfully detaches the volume from powered off node and we should be able to power on that node.
- With approach where we detach the volume in DisksAreAttached() , any quick power on before of the powered off node would detach the volumes but kubernetes thinks that this node is still active since it is powered on.
If kubelet is not responsive on node.
- Pod will eventually become available on other node in 6 minutes.
Node VM is hung with vSphere HA with VM monitoring enabled.
- In this case, vsphere resets the node after 30 sec (default) with volumes still attached and pod will be in its original state.

Also the user might also disable VerifyVolumesAreAttached to be called periodically by default 1 min. In this case, DisksAreAttached() is not called at all and would not detach the volume on powered off node immediately.

I have discussed with @divyenpatel offline as he seems to be agreeing with the current approach.

We would want to go ahead with this approach. @saad-ali @jingxu97 can you lgtm it if you feel it is OK.

BaluDontu

unread,

Aug 22, 2017, 9:15:44 PM8/22/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

/retest

BaluDontu

unread,

Aug 23, 2017, 12:47:16 PM8/23/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@saad-ali @jingxu97 : Can you provide an lgtm on this PR.

Jing Xu

unread,

Aug 23, 2017, 2:19:32 PM8/23/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

/lgtm.
One question after VM powered off, how long did the pods get recreated to other nodes?

BaluDontu

unread,

Aug 23, 2017, 2:36:21 PM8/23/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@jingxu97 : PR still says lgtm label is not present. Can you try it again?

BaluDontu

unread,

Aug 23, 2017, 2:36:32 PM8/23/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

@jingxu97 : It took a minute or so for the kubernetes to recognize that the VM powered off.

After that, it deletes all the pods on the powered off node (node1 in this case) and immediately starts the volume attach on new node (node3 in this case) after which the pod became available. It almost took around 40-50 secs to have pod available.

Now node1 is powered off. Node controller recognizes this and eventually deletes the orphaned pod.

{"log":"W0821 19:27:12.632675       1 vsphere.go:1446] VM node1, is not in poweredOn state\n","stream":"stderr","time":"2017-08-21T19:27:12.632845421Z"}
{"log":"I0821 19:27:33.459577       1 controller_utils.go:285] Recording status change NodeNotReady event message for node node1\n","stream":"stderr","time":"2017-08-21T19:27:33.459956784Z"}{"log":"I0821 19:27:33.459653       1 controller_utils.go:203] Update ready status of pods on node [node1]\n","stream":"stderr","time":"2017-08-21T19:27:33.459975458Z"}
{"log":"I0821 19:27:33.459791       1 event.go:217] Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"node1\", UID:\"c32a5114-86a4-11e7-bff5-005056bd2502\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'NodeNotReady' Node node1 status is now: NodeNotReady\n","stream":"stderr","time":"2017-08-21T19:27:33.459981155Z"}
{"log":"W0821 19:27:33.553165       1 vsphere.go:611] VM node1, is not in poweredOn state\n","stream":"stderr","time":"2017-08-21T19:27:33.553316921Z"}
{"log":"I0821 19:27:33.553213       1 nodecontroller.go:752] Deleting node (no longer present in cloud provider): node1\n","stream":"stderr","time":"2017-08-21T19:27:33.553369294Z"}
{"log":"I0821 19:27:33.553220       1 controller_utils.go:274] Recording Deleting Node node1 because it's not present according to cloud provider event message for node node1\n","stream":"stderr","time":"2017-08-21T19:27:33.55337406Z"}
{"log":"I0821 19:27:33.553384       1 event.go:217] Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"node1\", UID:\"c32a5114-86a4-11e7-bff5-005056bd2502\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'DeletingNode' Node node1 event: Deleting Node node1 because it's not present according to cloud provider\n","stream":"stderr","time":"2017-08-21T19:27:33.553443403Z"}
{"log":"I0821 19:27:38.553459       1 nodecontroller.go:623] NodeController observed a Node deletion: node1\n","stream":"stderr","time":"2017-08-21T19:27:38.553720811Z"}
{"log":"I0821 19:27:38.553518       1 controller_utils.go:274] Recording Removing Node node1 from NodeController event message for node node1\n","stream":"stderr","time":"2017-08-21T19:27:38.553760747Z"}
{"log":"I0821 19:27:38.553811       1 event.go:217] Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"node1\", UID:\"c32a5114-86a4-11e7-bff5-005056bd2502\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RemovingNode' Node node1 event: Removing Node node1 from NodeController\n","stream":"stderr","time":"2017-08-21T19:27:38.554956045Z"}
{"log":"I0821 19:27:53.531262       1 gc_controller.go:157] Found orphaned Pod redis-server-2448813087-m62zw assigned to the Node node1. Deleting.\n","stream":"stderr","time":"2017-08-21T19:27:53.531376237Z"}
{"log":"I0821 19:27:53.531267       1 gc_controller.go:62] PodGC is force deleting Pod: default:redis-server-2448813087-m62zw\n","stream":"stderr","time":"2017-08-21T19:27:53.53138159Z"}
{"log":"I0821 19:27:53.557972       1 gc_controller.go:161] Forced deletion of orphaned Pod redis-server-2448813087-m62zw succeeded\n","stream":"stderr","time":"2017-08-21T19:27:53.558039494Z"}

Immediately started attaching volume on the new node - node3 in this case and it succeeded.

{"log":"I0821 19:27:56.771960       1 reconciler.go:208] Started AttachVolume for volume \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\" to node \"node3\"\n","stream":"stderr","time":"2017-08-21T19:27:56.772160877Z"}
{"log":"I0821 19:27:57.804622       1 operation_generator.go:290] AttachVolume.Attach succeeded for volume \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\" (spec.Name: \"pvc-208563c0-86a5-11e7-bff5-005056bd2502\") from node \"node3\".\n","stream":"stderr","time":"2017-08-21T19:27:57.804762946Z"}
{"log":"I0821 19:27:57.880295       1 node_status_updater.go:143] Updating status for node \"node3\" succeeded. patchBytes: \"{\\\"status\\\":{\\\"volumesAttached\\\":[{\\\"devicePath\\\":\\\"/dev/disk/by-id/wwn-0x6000c291fb5e79d81ea1958e888c9d37\\\",\\\"name\\\":\\\"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\\\"}]}}\" VolumesAttached: [{kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk /dev/disk/by-id/wwn-0x6000c291fb5e79d81ea1958e888c9d37}]\n","stream":"stderr","time":"2017-08-21T19:27:57.880397598Z"}

Kubernetes Submit Queue

unread,

Aug 23, 2017, 4:19:10 PM8/23/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

/test all [submit-queue is verifying that this PR is safe to merge]

Kubernetes Submit Queue

unread,

Aug 23, 2017, 5:33:20 PM8/23/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

Merged #51066.

Kubernetes Submit Queue

unread,

Aug 23, 2017, 5:33:22 PM8/23/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

Automatic merge from submit-queue

k8s-cherrypick-bot

unread,

Aug 25, 2017, 9:06:32 AM8/25/17

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Maxim Ivanov

unread,

Oct 9, 2018, 6:08:27 AM10/9/18

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

When this change was applied/discussed, what vsphere integration expected to do in the "normal" case? when pod is restarted on another node, this changes makes volume on a new node to be attached first, before detach on a previous node is executed.

We see kernel panics where volume is attached before detaching it on another node, so I am asking to find out if there is a bug lurking elsewhere even though this particular change was reverted. Looking at the discussion, it seems that normal case would be handled by vsphere cloud provider, but what exactly is expected behaviour here? Should it report error when asked to attach volume without attempting to attach if volume is attached to a healthy node? Should it detach as part of attach?

Alvaro Aleman

unread,

Oct 9, 2018, 7:46:53 AM10/9/18

to kubernetes/kubernetes, k8s-mirror-storage-pr-reviews, Team mention

To clarify this for future users who may stumble on this: The change made here got reverted in #67825. The Kubernetes versions that do not include the change introduced here anymore are:

 - `v1.10.8` (and later)
 - `v1.11.4` (and later)
 - `v1.12+

Reply all

Reply to author

Forward