—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
Of note, I'm no longer able to reproduce this issue in 1.9.8, but there is a 45sec back off where the pod claims PersistentVolumeClaim is not bound: "test-StatefulSet" (repeated 10 times)
when it first is created, event thought you can clearly see the PVC was instantly successful.
Tested: us-west-2, m5.large, K8 v1.9.8
thank you for touching this issue @redbaron, and I have to admit that since I have posted the above, I've found the behavior to be extremely unreliable and hard to reproduce a successful binding. 😨 My success rate of getting this to work is all over.
We're having the same problems. It was certainly a non-obvious problem to track down. I found it by seeing the description of the problem in a stackoverflow issue:
https://stackoverflow.com/questions/49661306/timeout-mounting-pvc-volume-to-pod
Is there anything else that we can do to help tack down the issue?
I am facing the same issue for instance type m4.xlarge
. Anyone has the problem for m4.xlarge
machine?
had same problem, using kops 1.9.1 and kubernetes 1.9.6, changing from m5.medium to t2.large instances worked for me.
Sorry for my n00b question in advance. I am just starting using kops and got this issue too on us-east-2a.
How do you specify t2.large? This is my create command:
kops create cluster kitfox-k8s-test.k8s.local --zones us-east-2a --yes
Thanks.
@Rouche --master-size
and --node-size
Having the same issue. Change from m5
to t2
s fixed it. Would love a fix for m5s. <3
Had to downgrade from M5 to M4, because shared T2 instances are not acceptable in our production.
Here is how I fixed.
Kubernetes version: 1.9.10
Machine type: c5.xlarge
New AWS instance types: P3, C5, M5, H1. Please note that NVME volumes are not supported on the default jessie image, so masters will not boot on M5 and C5 instance types unless a stretch image is chosen (change stretch to jessie in the image name). Also note that kubernetes will not support mounting persistent volumes on NVME instances until Kubernetes v1.9.
kops edit ig nodes
Just change the image from jessie to stretch:
from:
image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
to:
image: kope.io/k8s-1.9-debian-stretch-amd64-hvm-ebs-2018-03-11
Note: Make sure you have installed kubernetes version 1.9.x or upper.
Here is how I fixed.
Kubernetes version: 1.9.10
Machine type: c5.xlargeNew AWS instance types: P3, C5, M5, H1. Please note that NVME volumes are not supported on the default jessie image, so masters will not boot on M5 and C5 instance types unless a stretch image is chosen (change stretch to jessie in the image name). Also note that kubernetes will not support mounting persistent volumes on NVME instances until Kubernetes v1.9.
kops edit ig nodes
Just change the image from jessie to stretch:
from:
image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
to:
image: kope.io/k8s-1.9-debian-stretch-amd64-hvm-ebs-2018-03-11
Note: Make sure you have installed kubernetes version 1.9.x or upper.
@turgayozgur This indeed fixed the issue. Thanks. Saved me a lot of time.
I am also having issues with nvme instances, in this case, I tried using m5.large with pvc.
This is the StatefulSet that reproduce this
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: k8s-rmq
spec:
serviceName: "k8s-rmq"
replicas: 1
selector:
matchLabels:
app: k8s-rmq
template:
metadata:
labels:
app: k8s-rmq
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
spec:
nodeSelector:
kops.k8s.io/instancegroup: nodes
terminationGracePeriodSeconds: 30
containers:
- name: k8s-rmq
imagePullPolicy: IfNotPresent
image: rabbitmq:3.7.8-management-alpine
ports:
- containerPort: 5672
name: amqp
- containerPort: 15672
name: management
envFrom:
- configMapRef:
name: k8s-dev-aws
env:
- name: RABBITMQ_DEFAULT_USER
value: example
- name: RABBITMQ_DEFAULT_PASS
value: example
resources:
limits:
cpu: "800m"
memory: "1Gi"
requests:
cpu: "100m"
memory: "128Mi"
livenessProbe:
tcpSocket:
port: 5672
initialDelaySeconds: 20
timeoutSeconds: 5
periodSeconds: 30
failureThreshold: 2
successThreshold: 1
readinessProbe:
tcpSocket:
port: 5672
initialDelaySeconds: 20
timeoutSeconds: 5
periodSeconds: 30
failureThreshold: 2
successThreshold: 1
volumeMounts:
- name: rmqvol
mountPath: /var/lib/rabbitmq
volumeClaimTemplates:
- metadata:
name: rmqvol
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 20Gi
this is the storage classes:
NAME PROVISIONER AGE
default kubernetes.io/aws-ebs 337d
gp2 (default) kubernetes.io/aws-ebs 337d
The EBS is created and attached to the instance.
but Kubelet fails to mount the disk into pod
1m 1m 1 k8s-rmq-0.1573f1a3938f660f Pod Warning FailedMount kubelet, ip-172-20-57-150.eu-west-1.compute.internal Unable to mount volumes for pod "k8s-rmq-0_default(1c45f9a0-0932-11e9-b1e7-0ac8a16a5f0c)": timeout expired waiting for volumes to attach or mount for pod "default"/"k8s-rmq-0". list of unmounted volumes=[rmqvol default-token-wp48g]. list of unattached volumes=[rmqvol default-token-wp48g]
Cluster provisioned using Kops 1.10
kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.11", GitCommit:"637c7e288581ee40ab4ca210618a89a555b6e7e9", GitTreeState:"clean", BuildDate:"2018-11-26T14:25:46Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
When checking on m5.large the mounts on the node there is not disk mount on the nvme drive.
when replacing to m4.large the mounts has:
/dev/xvdcu 20G 49M 19G 1% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-1a/vol-0b73b3a1bf15aac39
The node image in Kops is: kope.io/k8s-1.10-debian-jessie-amd64-hvm-ebs-2018-08-17
And on the same note, different use case, when launching a new cluster using Kops and masters are nvme instances like m5.large, the host startup fails to mount the etcd volume and hangs with protokube:1.10.0 in a loop for
I1226 20:23:59.194888 1721 aws_volume.go:320] nvme path not found "/rootfs/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0fdeaa59d34bd2ab1"
After installing name-cli I can see that the volume exists
root@ip-10-101-35-149:~# nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 vol0913974dacc67c490 Amazon Elastic Block Store 1 0.00 B / 68.72 GB 512 B + 0 B 1.0
/dev/nvme1n1 vol0fdeaa59d34bd2ab1 Amazon Elastic Block Store 1 0.00 B / 21.47 GB 512 B + 0 B 1.0
/dev/nvme2n1 vol0a587f2950331bf7b Amazon Elastic Block Store 1 0.00 B / 21.47 GB 512 B + 0 B 1.0
But this /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0fdeaa59d34bd2ab1 not exists.
the disk mapping /dev/disk/by-id doesn't exists only /dev/disk/by-uuid/
So basically I can use any nvme base instance for masters or nodes that has a EBS pvc
Cloud: AWS
OS: RedHat 7.6
kube version: v1.12.5
Having a similar problem with storage myself. When draining a node that has an EBS volume which is attached to a Pod and/or Deployment, the storage doesn't move. It releases from original node but never makes it to the new/next node.
Like others, switching to t2.xlarge
instances fixed this for me.
Same problem here, AWS, k8s version 1.13.0
Same problem as well, AWS, eks. k8s version 1.11.0
See the issue on c5.9xlarge machines and on p2.xlarge instances.
I have a feeling that it might be due to the maximum number of ebs attachments to an ec2 node.
I do not know if the maximum attachment limit is total, or the attachments are properly removed.
I've been seeing a similar issue on EKS with k8s version 1.11. Our support agent suggested the following:
It's very likely that the Kubernetes scheduler was choosing worker nodes in Availability Zones (AZ) with no volumes available. This can happen when the node selected for placement of the pod by the scheduler is not in the availability zone in which the Persistent volume(s) claim are available i.e. EBS volume exists. For example, when there isn't sufficient CPU and/or memory resources available on Nodes in which the persistent volume exists. That would lead to the scheduler choosing a Node in another zone and failing to schedule pod with this error.
This was a known issue in Kubernetes[1][2] and has been fixed by having "VolumeScheduling"[2] feature enabled in scheduler.
Another workaround could include creating the volumes manually and updating the PVCs but both options would turn into a less available/dynamic cluster.
References:
[1] https://github.com/kubernetes/enhancements/issues/490
[2] https://github.com/kubernetes/kubernetes/issues/34583
[3] https://kubernetes.io/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/
/assign @leakingtapan
Unfortunately this has nothing to do with AWS Availability Zones or VolumeScheduling. AZ related problems are kinda hot nowadays, so people like to mix up that problem with this one, but a quick look at the Availability Zones makes clear that there is no connection.
Today's testing results:
Kubernetes: 1.13.0
Instance: m5.4xlarge
EBS: gp2
After some debugging I found my problem is this, all the symptoms are matching:
coreos/bugs#2371
@gabordk I got the same issue. Any solution or progress on this?
same issue here.
on microk8s, enabled default storage addon,
then install helm-consul got this issue, all pv and pvc are bound. but consul-server pods still got "pod has unbound immediate PersistentVolumeClaims", recreate these pods didn't help
I believe I'm seeing the same issue on aws w/ eks 1.3.8 (1.3 "eks.2")
Warning FailedMount 56s (x11 over 23m) kubelet, ip-192-168-124-20.ec2.internal Unable to mount volumes for pod "jenkins-6758665c4c-gg5tl_jenkins(f6440463-ca87-11e9-a31c-0a4da4f89c32)": timeout expired waiting for volumes to attach or mount for pod "jenkins"/"jenkins-6758665c4c-gg5tl". list of unmounted volumes=[jenkins-home]. list of unattached volumes=[plugins tmp jenkins-config plugin-dir secrets-dir jenkins-home sc-config-volume jenkins-token-pn7mq]
@jhoblitt Did you tried with EKS 1.2.x? I faced some issues with statefulsets with PVC on EKS 1.3.x but everything ran just fine on EKS 1.2.x.
@jhoblitt I was facing the same issue until 10 min ago, I realised there is a problem/bug with the kubernetes-plugin I was using. I solved upgrading to 1.18.3 the kubernetes-plugin for Jenkins.
@ishantanu I don't believe I was seeing this problem with 1.2 but it's been awhile since I've tested with that version.
@mogaal This problem is present outside of pods managed by jenkins.
@mogaal
Thank you!! I've upgraded version of kubernetes-plugin for Jenkins and it works!!!
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Closed #49926.
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen
.
Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
@sudip-moengage: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
—