Re: [kubernetes/kubernetes] Pod volume mounting failing even after PV is bound and attached to pod (#49926)

1,482 views
Skip to first unread message

Michelle Au

unread,
May 17, 2018, 6:51:52 PM5/17/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@kubernetes/sig-storage-bugs


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Justin Clark

unread,
May 29, 2018, 8:18:17 PM5/29/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Of note, I'm no longer able to reproduce this issue in 1.9.8, but there is a 45sec back off where the pod claims PersistentVolumeClaim is not bound: "test-StatefulSet" (repeated 10 times) when it first is created, event thought you can clearly see the PVC was instantly successful.

Tested: us-west-2, m5.large, K8 v1.9.8

Maxim Ivanov

unread,
Jun 14, 2018, 12:44:36 PM6/14/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@u2mejc it looks suspiciously similar to what we see on bare-metal, it looks like attach-detach controller can take minutes to update node status. See details in #64549 looks like it see new value only after some internal timeout, it then relists resources and sees update.

Justin Clark

unread,
Jun 15, 2018, 2:43:51 PM6/15/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

thank you for touching this issue @redbaron, and I have to admit that since I have posted the above, I've found the behavior to be extremely unreliable and hard to reproduce a successful binding. 😨 My success rate of getting this to work is all over.

Adam Bovill

unread,
Jun 21, 2018, 12:52:00 AM6/21/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

We're having the same problems. It was certainly a non-obvious problem to track down. I found it by seeing the description of the problem in a stackoverflow issue:
https://stackoverflow.com/questions/49661306/timeout-mounting-pvc-volume-to-pod

Is there anything else that we can do to help tack down the issue?

Jiashen Cao

unread,
Jun 26, 2018, 3:08:38 PM6/26/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I am facing the same issue for instance type m4.xlarge. Anyone has the problem for m4.xlarge machine?

Mikael Sundberg

unread,
Jun 29, 2018, 3:46:07 AM6/29/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

had same problem, using kops 1.9.1 and kubernetes 1.9.6, changing from m5.medium to t2.large instances worked for me.

Jean-Francois Larouche

unread,
Jun 29, 2018, 9:34:46 PM6/29/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Sorry for my n00b question in advance. I am just starting using kops and got this issue too on us-east-2a.

How do you specify t2.large? This is my create command:
kops create cluster kitfox-k8s-test.k8s.local --zones us-east-2a --yes

Thanks.

Igor Korsun

unread,
Jul 26, 2018, 2:25:19 AM7/26/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@bobo same here m4.xlarge oops 1.9.1 k8s2 1.9.6

facing this after upgrade

Brandon Harper

unread,
Aug 13, 2018, 1:08:48 PM8/13/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@Rouche --master-size and --node-size

coryodaniel

unread,
Aug 15, 2018, 5:42:22 PM8/15/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Having the same issue. Change from m5 to t2s fixed it. Would love a fix for m5s. <3

Justin Clark

unread,
Aug 15, 2018, 9:17:50 PM8/15/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Had to downgrade from M5 to M4, because shared T2 instances are not acceptable in our production.

Turgay

unread,
Sep 21, 2018, 2:15:14 AM9/21/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Here is how I fixed.

Kubernetes version: 1.9.10
Machine type: c5.xlarge

kops 1.8 release notes

New AWS instance types: P3, C5, M5, H1. Please note that NVME volumes are not supported on the default jessie image, so masters will not boot on M5 and C5 instance types unless a stretch image is chosen (change stretch to jessie in the image name). Also note that kubernetes will not support mounting persistent volumes on NVME instances until Kubernetes v1.9.

kops edit ig nodes

Just change the image from jessie to stretch:

from:
image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
to:
image: kope.io/k8s-1.9-debian-stretch-amd64-hvm-ebs-2018-03-11

Note: Make sure you have installed kubernetes version 1.9.x or upper.

Shantanu Deshpande

unread,
Sep 23, 2018, 4:20:43 AM9/23/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Here is how I fixed.

Kubernetes version: 1.9.10
Machine type: c5.xlarge

kops 1.8 release notes

New AWS instance types: P3, C5, M5, H1. Please note that NVME volumes are not supported on the default jessie image, so masters will not boot on M5 and C5 instance types unless a stretch image is chosen (change stretch to jessie in the image name). Also note that kubernetes will not support mounting persistent volumes on NVME instances until Kubernetes v1.9.

kops edit ig nodes

Just change the image from jessie to stretch:

from:
image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
to:
image: kope.io/k8s-1.9-debian-stretch-amd64-hvm-ebs-2018-03-11

Note: Make sure you have installed kubernetes version 1.9.x or upper.

@turgayozgur This indeed fixed the issue. Thanks. Saved me a lot of time.

Omer Haim

unread,
Dec 26, 2018, 3:35:57 PM12/26/18
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I am also having issues with nvme instances, in this case, I tried using m5.large with pvc.
This is the StatefulSet that reproduce this

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: k8s-rmq
spec:
  serviceName: "k8s-rmq"
  replicas: 1
  selector:
    matchLabels:
      app: k8s-rmq
  template:
    metadata:
      labels:
        app: k8s-rmq
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
    spec:
      nodeSelector:
        kops.k8s.io/instancegroup: nodes
      terminationGracePeriodSeconds: 30
      containers:
      - name: k8s-rmq
        imagePullPolicy: IfNotPresent
        image: rabbitmq:3.7.8-management-alpine
        ports:
        - containerPort: 5672
          name: amqp
        - containerPort: 15672
          name: management
        envFrom:
            - configMapRef:
                name: k8s-dev-aws         
        env:
          - name: RABBITMQ_DEFAULT_USER
            value: example
          - name: RABBITMQ_DEFAULT_PASS
            value: example
        resources:
          limits:
            cpu: "800m"
            memory: "1Gi"
          requests:
            cpu: "100m"
            memory: "128Mi"
        livenessProbe:
          tcpSocket:
            port: 5672
          initialDelaySeconds: 20
          timeoutSeconds: 5
          periodSeconds: 30
          failureThreshold: 2
          successThreshold: 1
        readinessProbe:
          tcpSocket:
            port: 5672
          initialDelaySeconds: 20
          timeoutSeconds: 5
          periodSeconds: 30
          failureThreshold: 2
          successThreshold: 1
        volumeMounts:
        - name: rmqvol
          mountPath: /var/lib/rabbitmq
  volumeClaimTemplates:
  - metadata:
      name: rmqvol
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 20Gi

this is the storage classes:

NAME            PROVISIONER             AGE
default         kubernetes.io/aws-ebs   337d
gp2 (default)   kubernetes.io/aws-ebs   337d

The EBS is created and attached to the instance.

but Kubelet fails to mount the disk into pod

1m          1m           1       k8s-rmq-0.1573f1a3938f660f                                     Pod                                                             Warning   FailedMount                    kubelet, ip-172-20-57-150.eu-west-1.compute.internal      Unable to mount volumes for pod "k8s-rmq-0_default(1c45f9a0-0932-11e9-b1e7-0ac8a16a5f0c)": timeout expired waiting for volumes to attach or mount for pod "default"/"k8s-rmq-0". list of unmounted volumes=[rmqvol default-token-wp48g]. list of unattached volumes=[rmqvol default-token-wp48g]

Cluster provisioned using Kops 1.10

kubectl version

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.11", GitCommit:"637c7e288581ee40ab4ca210618a89a555b6e7e9", GitTreeState:"clean", BuildDate:"2018-11-26T14:25:46Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

When checking on m5.large the mounts on the node there is not disk mount on the nvme drive.

when replacing to m4.large the mounts has:
/dev/xvdcu 20G 49M 19G 1% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-1a/vol-0b73b3a1bf15aac39

The node image in Kops is: kope.io/k8s-1.10-debian-jessie-amd64-hvm-ebs-2018-08-17

And on the same note, different use case, when launching a new cluster using Kops and masters are nvme instances like m5.large, the host startup fails to mount the etcd volume and hangs with protokube:1.10.0 in a loop for

I1226 20:23:59.194888    1721 aws_volume.go:320] nvme path not found "/rootfs/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0fdeaa59d34bd2ab1"

After installing name-cli I can see that the volume exists

root@ip-10-101-35-149:~# nvme list
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     vol0913974dacc67c490 Amazon Elastic Block Store               1           0.00   B /  68.72  GB    512   B +  0 B   1.0
/dev/nvme1n1     vol0fdeaa59d34bd2ab1 Amazon Elastic Block Store               1           0.00   B /  21.47  GB    512   B +  0 B   1.0
/dev/nvme2n1     vol0a587f2950331bf7b Amazon Elastic Block Store               1           0.00   B /  21.47  GB    512   B +  0 B   1.0

But this /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0fdeaa59d34bd2ab1 not exists.

the disk mapping /dev/disk/by-id doesn't exists only /dev/disk/by-uuid/

So basically I can use any nvme base instance for masters or nodes that has a EBS pvc

Chey

unread,
Jan 30, 2019, 10:42:46 PM1/30/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Cloud: AWS
OS: RedHat 7.6
kube version: v1.12.5

Having a similar problem with storage myself. When draining a node that has an EBS volume which is attached to a Pod and/or Deployment, the storage doesn't move. It releases from original node but never makes it to the new/next node.

Like others, switching to t2.xlarge instances fixed this for me.

Gabor Debreczeni-Kis

unread,
Feb 12, 2019, 12:04:40 PM2/12/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Same problem here, AWS, k8s version 1.13.0

Bryn Mathias

unread,
Apr 11, 2019, 7:51:23 AM4/11/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Same problem as well, AWS, eks. k8s version 1.11.0
See the issue on c5.9xlarge machines and on p2.xlarge instances.

I have a feeling that it might be due to the maximum number of ebs attachments to an ec2 node.
I do not know if the maximum attachment limit is total, or the attachments are properly removed.

Josh Trotter

unread,
Apr 24, 2019, 12:32:12 PM4/24/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I've been seeing a similar issue on EKS with k8s version 1.11. Our support agent suggested the following:

It's very likely that the Kubernetes scheduler was choosing worker nodes in Availability Zones (AZ) with no volumes available. This can happen when the node selected for placement of the pod by the scheduler is not in the availability zone in which the Persistent volume(s) claim are available i.e. EBS volume exists. For example, when there isn't sufficient CPU and/or memory resources available on Nodes in which the persistent volume exists. That would lead to the scheduler choosing a Node in another zone and failing to schedule pod with this error. 

This was a known issue in Kubernetes[1][2] and has been fixed by having "VolumeScheduling"[2] feature enabled in scheduler.

Another workaround could include creating the volumes manually and updating the PVCs but both options would turn into a less available/dynamic cluster. 

References:
[1] https://github.com/kubernetes/enhancements/issues/490 
[2] https://github.com/kubernetes/kubernetes/issues/34583 
[3] https://kubernetes.io/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/ 

Michelle Au

unread,
Apr 24, 2019, 5:31:13 PM4/24/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

/assign @leakingtapan

Gabor Debreczeni-Kis

unread,
Apr 29, 2019, 2:21:29 PM4/29/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Unfortunately this has nothing to do with AWS Availability Zones or VolumeScheduling. AZ related problems are kinda hot nowadays, so people like to mix up that problem with this one, but a quick look at the Availability Zones makes clear that there is no connection.

Today's testing results:
Kubernetes: 1.13.0
Instance: m5.4xlarge
EBS: gp2

  • Both the worker node and the EBS volume is in the same AZ (us-east-1c) (VolumeScheduling is enabled in our StorageClass, previously it wasn't, nothing changed).
  • EBS volume is successfully created and mounted on the worker node
  • kubectl describe pv reports the volume as "Bound", events are empty
  • kubectl describe pvc reports as "Bound", events are empty
  • pod events:
    - MountVolume.WaitForAttach failed for volume "pvc-xxxxx" : could not find attached AWS Volume "aws://us-east-1c/vol-xxxxx". Timeout waiting for mount paths to be created
    - Unable to mount volumes for pod "xyz": timeout expired waiting for volumes to attach or mount for pod

Gabor Debreczeni-Kis

unread,
Apr 29, 2019, 3:10:08 PM4/29/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

After some debugging I found my problem is this, all the symptoms are matching:
coreos/bugs#2371

Shawn Zhang

unread,
May 15, 2019, 6:51:36 AM5/15/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@gabordk I got the same issue. Any solution or progress on this?

fang duan

unread,
Aug 9, 2019, 11:14:48 PM8/9/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

same issue here.

on microk8s, enabled default storage addon,

then install helm-consul got this issue, all pv and pvc are bound. but consul-server pods still got "pod has unbound immediate PersistentVolumeClaims", recreate these pods didn't help

Joshua Hoblitt

unread,
Aug 29, 2019, 2:37:34 PM8/29/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I believe I'm seeing the same issue on aws w/ eks 1.3.8 (1.3 "eks.2")

  Warning  FailedMount             56s (x11 over 23m)  kubelet, ip-192-168-124-20.ec2.internal  Unable to mount volumes for pod "jenkins-6758665c4c-gg5tl_jenkins(f6440463-ca87-11e9-a31c-0a4da4f89c32)": timeout expired waiting for volumes to attach or mount for pod "jenkins"/"jenkins-6758665c4c-gg5tl". list of unmounted volumes=[jenkins-home]. list of unattached volumes=[plugins tmp jenkins-config plugin-dir secrets-dir jenkins-home sc-config-volume jenkins-token-pn7mq]

Shantanu Deshpande

unread,
Sep 2, 2019, 7:06:22 AM9/2/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@jhoblitt Did you tried with EKS 1.2.x? I faced some issues with statefulsets with PVC on EKS 1.3.x but everything ran just fine on EKS 1.2.x.

Alejandro Garrido Mota

unread,
Sep 4, 2019, 9:17:01 AM9/4/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@jhoblitt I was facing the same issue until 10 min ago, I realised there is a problem/bug with the kubernetes-plugin I was using. I solved upgrading to 1.18.3 the kubernetes-plugin for Jenkins.

Joshua Hoblitt

unread,
Sep 4, 2019, 3:56:50 PM9/4/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@ishantanu I don't believe I was seeing this problem with 1.2 but it's been awhile since I've tested with that version.

@mogaal This problem is present outside of pods managed by jenkins.

Sergey Vasilyev

unread,
Sep 12, 2019, 10:14:30 AM9/12/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@mogaal
Thank you!! I've upgraded version of kubernetes-plugin for Jenkins and it works!!!

fejta-bot

unread,
Dec 11, 2019, 9:15:50 AM12/11/19
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale


You are receiving this because you are on a team that was mentioned.

Reply to this email directly, view it on GitHub, or unsubscribe.

fejta-bot

unread,
Jan 10, 2020, 10:02:35 AM1/10/20
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.

/lifecycle rotten

fejta-bot

unread,
Feb 9, 2020, 10:43:49 AM2/9/20
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.


Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Kubernetes Prow Robot

unread,
Feb 9, 2020, 10:43:58 AM2/9/20
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Closed #49926.

Kubernetes Prow Robot

unread,
Feb 9, 2020, 10:43:58 AM2/9/20
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sudip-moengage

unread,
Jan 9, 2021, 2:52:55 PM1/9/21
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

/reopen

Kubernetes Prow Robot

unread,
Jan 9, 2021, 2:53:08 PM1/9/21
to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@sudip-moengage: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Reply all
Reply to author
Forward
0 new messages