Re: [kubernetes/kubernetes] Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod (#43515)

719 views
Skip to first unread message

Yu-Ju Hong

unread,
Mar 22, 2017, 12:51:45 PM3/22/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

/cc @kubernetes/sig-storage-misc


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Jing Xu

unread,
Mar 22, 2017, 4:04:45 PM3/22/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

If you delete pod and create pods soon, it is possible that you could see some errors like "timeout expired waiting for volumes to attach/mount" temporally. Could you wait for a few more minutes to see whether the pods could be started correctly?

Jonathan Kosgei

unread,
Mar 22, 2017, 4:10:29 PM3/22/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention
I actually tried that, deleted the pods, then waited 5 minutes and
recreated them with the same error. Maybe it wasn't long enough


On Wed, Mar 22, 2017 at 11:04 PM, Jing Xu <notifi...@github.com> wrote:

> If you delete pod and create pods soon, it is possible that you could see
> some errors like "timeout expired waiting for volumes to attach/mount"
> temporally. Could you wait for a few more minutes to see whether the pods
> could be started correctly?
>
> —
> You are receiving this because you authored the thread.

> Reply to this email directly, view it on GitHub
> <https://github.com/kubernetes/kubernetes/issues/43515#issuecomment-288521883>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AGZuUrxOuWU3xkvfFzis_woibaSqFSiXks5roX7SgaJpZM4MlFrS>
> .

Hemant Kumar

unread,
Mar 22, 2017, 4:16:22 PM3/22/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

Can you post output of kubectl describe pod <pod_name> ? It may have some other events for failure..

Jonathan Kosgei

unread,
Mar 22, 2017, 4:25:12 PM3/22/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention
I deleted the cluster this afternoon, but I checked constantly and those
were the only errors, Timeout on mount and a failed to sync error


On Wed, Mar 22, 2017 at 11:16 PM, Hemant Kumar <notifi...@github.com>
wrote:


> Can you post output of kubectl describe pod <pod_name> ? It may have some
> other events for failure..
>
> —
> You are receiving this because you authored the thread.

> Reply to this email directly, view it on GitHub
> <https://github.com/kubernetes/kubernetes/issues/43515#issuecomment-288525171>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AGZuUgry8KvMOHqelgopadyB83PyaMBSks5roYGUgaJpZM4MlFrS>

Jing Xu

unread,
Mar 22, 2017, 4:44:55 PM3/22/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

If you use level 4 log, it should also give the detailed messages of failed mount so that we can figure out why it failed.

Michelle Au

unread,
Mar 22, 2017, 4:57:47 PM3/22/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@jonathan-kosgei Can you email me your project name, cluster name, and zone information? I can take a look at the master logs to see if there's anything useful there.

Jonathan Kosgei

unread,
Mar 23, 2017, 3:41:24 PM3/23/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

This is starting to happen more frequently..

Jing Xu

unread,
Mar 23, 2017, 3:58:04 PM3/23/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@jonathan-kosgei could you provide more details about in what situation it happened frequently? Kubelet log could give us more information. If you use GKE cluster, run "journctlctl -u kubelet > /tmp/log" to get the log. You could also share with us the log if it is ok.

Jonathan Kosgei

unread,
Mar 23, 2017, 4:10:47 PM3/23/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

This seems to be the relevant part of the log

0:00:02.411401    1330 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
0:00:02.411783    1330 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
0:00:02.550786    1330 glusterfs.go:122] glusterfs: failed to get endpoints glusterfs-cluster[endpoints "glusterfs-cluster" not found]
0:00:02.551237    1330 reconciler.go:296] operationExecutor.MountVolume failed for volume "kubernetes.io/glusterfs/ee23b663-1002-11e7-9e8a-42010a8001ee-gluster-etcd" (spec.Nam
0:00:02.751142    1330 glusterfs.go:122] glusterfs: failed to get endpoints glusterfs-cluster[endpoints "glusterfs-cluster" not found]
0:00:02.751683    1330 reconciler.go:296] operationExecutor.MountVolume failed for volume "kubernetes.io/glusterfs/1174a83a-1003-11e7-9e8a-42010a8001ee-gluster-mysql" (spec.Na
0:00:02.951575    1330 glusterfs.go:122] glusterfs: failed to get endpoints glusterfs-cluster[endpoints "glusterfs-cluster" not found]
0:00:02.952012    1330 reconciler.go:296] operationExecutor.MountVolume failed for volume "kubernetes.io/glusterfs/ee23b663-1002-11e7-9e8a-42010a8001ee-gluster-etcd" (spec.Nam
0:00:03.150966    1330 glusterfs.go:122] glusterfs: failed to get endpoints glusterfs-cluster[endpoints "glusterfs-cluster" not found]
0:00:03.151410    1330 reconciler.go:296] operationExecutor.MountVolume failed for volume "kubernetes.io/glusterfs/ee23b663-1002-11e7-9e8a-42010a8001ee-gluster-etcd" (spec.Nam
0:00:03.352325    1330 glusterfs.go:122] glusterfs: failed to get endpoints glusterfs-cluster[endpoints "glusterfs-cluster" not found]
0:00:03.352861    1330 reconciler.go:296] operationExecutor.MountVolume failed for volume "kubernetes.io/glusterfs/1174a83a-1003-11e7-9e8a-42010a8001ee-gluster-mysql" (spec.Na
0:00:03.751054    1330 glusterfs.go:122] glusterfs: failed to get endpoints glusterfs-cluster[endpoints "glusterfs-cluster" not found]
0:00:03.751508    1330 reconciler.go:296] operationExecutor.MountVolume failed for volume "kubernetes.io/glusterfs/ee23b663-1002-11e7-9e8a-42010a8001ee-gluster-etcd" (spec.Nam
0:00:04.151432    1330 glusterfs.go:122] glusterfs: failed to get endpoints glusterfs-cluster[endpoints "glusterfs-cluster" not found]
0:00:04.151905    1330 reconciler.go:296] operationExecutor.MountVolume failed for volume "kubernetes.io/glusterfs/1174a83a-1003-11e7-9e8a-42010a8001ee-gluster-mysql" (spec.Na
0:00:04.350979    1330 glusterfs.go:122] glusterfs: failed to get endpoints glusterfs-cluster[endpoints "glusterfs-cluster" not found]
0:00:04.351422    1330 reconciler.go:296] operationExecutor.MountVolume failed for volume "kubernetes.io/glusterfs/ee23b663-1002-11e7-9e8a-42010a8001ee-gluster-etcd" (spec.Nam
0:00:04.551345    1330 glusterfs.go:122] glusterfs: failed to get endpoints glusterfs-cluster[endpoints "glusterfs-cluster" not found]
0:00:04.551902    1330 reconciler.go:296] operationExecutor.MountVolume failed for volume "kubernetes.io/glusterfs/1174a83a-1003-11e7-9e8a-42010a8001ee-gluster-mysql" (spec.Na
0:00:04.750850    1330 glusterfs.go:122] glusterfs: failed to get endpoints glusterfs-cluster[endpoints "glusterfs-cluster" not found]
0:00:04.751326    1330 reconciler.go:296] operationExecutor.MountVolume failed for volume "kubernetes.io/glusterfs/ee23b663-1002-11e7-9e8a-42010a8001ee-gluster-etcd" (spec.Nam
0:00:04.951155    1330 glusterfs.go:122] glusterfs: failed to get endpoints glusterfs-cluster[endpoints "glusterfs-cluster" not found]
0:00:04.951663    1330 reconciler.go:296] operationExecutor.MountVolume failed for volume "kubernetes.io/glusterfs/1174a83a-1003-11e7-9e8a-42010a8001ee-gluster-mysql" (spec.Na
0:00:05.011296    1330 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats

I re-created that and my pods sprang back up. Weird thing is I had created that, along with the pvcs and pvs but seems it's somehow getting deleted. There's almost certainly certain I'm doing wrong but I'm not sure what.
]

Michelle Au

unread,
Mar 23, 2017, 6:17:33 PM3/23/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

HI Jonathan, can you clarify what you had to recreate?

Jonathan Kosgei

unread,
Mar 23, 2017, 6:25:37 PM3/23/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention
The glusterfs endpoints

On Fri, Mar 24, 2017 at 1:17 AM, Michelle Au <notifi...@github.com>
wrote:


> HI Jonathan, can you clarify what you had to recreate?
>
> —
> You are receiving this because you were mentioned.

> Reply to this email directly, view it on GitHub
> <https://github.com/kubernetes/kubernetes/issues/43515#issuecomment-288877364>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AGZuUhKKZT_9MhXXZiMYFCMURwDxwtThks5rou9rgaJpZM4MlFrS>
> .

shepp67

unread,
Apr 5, 2017, 5:44:40 PM4/5/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

I have a very similar problem running OpenShift 3.4 in AWS EC2 doing the same exact thing.
Brand new OSE install
[root@ip-172-31-40-126 master]# oc version
oc v3.4.1.12
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://172.31.40.126:8443
openshift v3.4.1.12
kubernetes v1.4.0+776c994

POD in question is the docker-registry (kind of important for OSE to work)
I've followed this to see if it would fix the issue:
http://alexkerney.com/2016/10/kubernetes-troubleshooting.html

But no luck or change.

Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "docker-registry-3-afbbt"/"default". list of unattached/unmounted volumes=[registry-storage]

then:
Unable to mount volumes for pod "docker-registry-3-afbbt_default(3c6182b3-1a45-11e7-abf8-0a155858833b)": timeout expired waiting for volumes to attach/mount for pod "docker-registry-3-afbbt"/"default". list of unattached/unmounted volumes=[registry-storage]

I have no idea what log(s) to look at, this was of no help when I deployed a new docker-registry:
oc logs -f dc/docker-registry

Any help or guidance would be greatly appreciated as this is a blocker issue for me.
thanks and best wishes,

Jing Xu

unread,
Apr 5, 2017, 6:05:11 PM4/5/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@shepp67 are you using glusterfs for registry-storage volume or other type of volume? Could you share us your kubelet log if possible over slack @jinxu? Thanks!

shepp67

unread,
Apr 5, 2017, 6:14:49 PM4/5/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention
Hi Jing Xu,

Would you know where this log would be?
Or is the name kublet<something> so I may find it for you?

I just requested to join the slack channel, as soon as I'm signed in I will share the log. Probably tomorrow as I'm on the east coast USA

Thank you!

Joel


> On Apr 5, 2017, at 6:05 PM, Jing Xu <notifi...@github.com> wrote:
>
> @shepp67 are you using glusterfs for registry-storage volume or other type of volume? Could you share us your kubelet log if possible over slack @jinxu? Thanks!
>
> —
> You are receiving this because you were mentioned.

Jing Xu

unread,
Apr 5, 2017, 6:42:31 PM4/5/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention
For you are using GKE with GCI image, you can use "journalctl -u kubelet >
/tmp/kubelet.log" to get the log. Otherwise, it should be in
/var/log/kubelet.log"
> <https://github.com/kubernetes/kubernetes/issues/43515#issuecomment-292012950>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ASSNxVXATEDgjuBXkgQc-HeHGCRxOwJZks5rtBJYgaJpZM4MlFrS>
> .
>



--
- Jing

Ewa Czechowska

unread,
Apr 7, 2017, 9:18:42 AM4/7/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

I also experienced this.

Environment

  • Kubernetes version:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-15T06:40:50Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-15T06:34:56Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: OpenStack
  • OS (e.g. from /etc/os-release): Ubuntu 16.04.1 LTS
  • Kernel (e.g. uname -a): Linux k8s-testing-gogs-workstation-worker-1 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: custom

kubelet logs

That kubernetes worker vm actually had volumes attached. I saw it on OpenStack dashboard. But:

$ journalctl -ru kubelet
-- Logs begin at Fri 2017-04-07 14:49:58 CEST, end at Fri 2017-04-07 14:59:27 CEST. --
Apr 07 14:59:27 k8s-testing-gogs-workstation-worker-1 kubelet[2922]: E0407 14:59:27.321403    2922 cinder_util.go:182] error running udevadm trigger fork/exec /usr/bin/udevadm: no such file or directory
Apr 07 14:59:27 k8s-testing-gogs-workstation-worker-1 kubelet[2922]: E0407 14:59:27.316085    2922 attacher.go:180] Error: could not find attached Cinder disk "43c97a33-0a6b-49a9-af55-13ee512a8c01": <nil>
Apr 07 14:59:27 k8s-testing-gogs-workstation-worker-1 kubelet[2922]: W0407 14:59:27.315662    2922 openstack_volumes.go:191] Failed to find device for the diskid: "43c97a33-0a6b-49a9-af55-13ee512a8c01"
Apr 07 14:59:27 k8s-testing-gogs-workstation-worker-1 kubelet[2922]: E0407 14:59:27.314320    2922 cinder_util.go:182] error running udevadm trigger fork/exec /usr/bin/udevadm: no such file or directory

and actually:

$ whereis udevadm
udevadm: /bin/udevadm /sbin/udevadm /usr/share/man/man8/udevadm.8.gz

Besides, this is a new worker vm. I have other worker vms already running, using cinder volumes and there were no such errors.

Other

If you delete pod and create pods soon, it is possible that you could see some errors like "timeout expired waiting for volumes to attach/mount" temporally.

I use Helm to deploy this Pod. I have deployed it earlier and removed it. But it was >2 days ago.

Solution

I found the solution here: #29832 (comment) . On that worker vm I run:

ln -s /sbin/udevadm /usr/bin/udevadm

vdavidoff

unread,
Apr 7, 2017, 9:50:20 AM4/7/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

I'm seeing this same error on all my manually launched private Ubuntu based clusters (6), running 1.5.3 and 1.6.0. In my case the messages reference only stopped pods launched by CronJob Jobs.

The messages indicate issues mounting every volume in the pod including PVs and node-local volumes. These pods, and future ones launched by these CronJobs, are fully successful when they run. It is only after they run to successful completion and exit that these messages show up.

Glancing at the kubelet code, this appears to be coming out of some housekeeping/synch loop.

E0407 12:02:18.299394   22419 kubelet.go:1549] Unable to mount volumes for pod "kube-master-etcd-backup-cronjob-1491566400-6lkgw_default(be1dc7a8-1b89-11e7-a1df-00163e216233)": timeout expired waiting for volumes to attach/mount for pod "default"/"kube-master-etcd-backup-cronjob-1491566400-6lkgw". list of unattached/unmounted volumes=[varetcd etcdsnaps etckube etcssl usrsharessl varssl usrssl usrlibssl usrlocalopenssl etcopenssl etcpkitls default-token-6pw37]; skipping pod
E0407 12:02:18.299458   22419 pod_workers.go:182] Error syncing pod be1dc7a8-1b89-11e7-a1df-00163e216233 ("kube-master-etcd-backup-cronjob-1491566400-6lkgw_default(be1dc7a8-1b89-11e7-a1df-00163e216233)"), skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"kube-master-etcd-backup-cronjob-1491566400-6lkgw". list of unattached/unmounted volumes=[varetcd etcdsnaps etckube etcssl usrsharessl varssl usrssl usrlibssl usrlocalopenssl etcopenssl etcpkitls default-token-6pw37]
I0407 12:02:18.299644   22419 server.go:345] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"kube-master-etcd-backup-cronjob-1491566400-6lkgw", UID:"be1dc7a8-1b89-11e7-a1df-00163e216233", APIVersion:"v1", ResourceVersion:"113670", FieldPath:""}): type: 'Warning' reason: 'FailedMount' Unable to mount volumes for pod "kube-master-etcd-backup-cronjob-1491566400-6lkgw_default(be1dc7a8-1b89-11e7-a1df-00163e216233)": timeout expired waiting for volumes to attach/mount for pod "default"/"kube-master-etcd-backup-cronjob-1491566400-6lkgw". list of unattached/unmounted volumes=[varetcd etcdsnaps etckube etcssl usrsharessl varssl usrssl usrlibssl usrlocalopenssl etcopenssl etcpkitls default-token-6pw37]
I0407 12:02:18.299689   22419 server.go:345] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"kube-master-etcd-backup-cronjob-1491566400-6lkgw", UID:"be1dc7a8-1b89-11e7-a1df-00163e216233", APIVersion:"v1", ResourceVersion:"113670", FieldPath:""}): type: 'Warning' reason: 'FailedSync' Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"kube-master-etcd-backup-cronjob-1491566400-6lkgw". list of unattached/unmounted volumes=[varetcd etcdsnaps etckube etcssl usrsharessl varssl usrssl usrlibssl usrlocalopenssl etcopenssl etcpkitls default-token-6pw37]
I0407 12:02:18.857848   22419 kubelet.go:1884] SyncLoop (housekeeping)```

Matthew Wong

unread,
Apr 21, 2017, 2:05:26 PM4/21/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

I've identified a bug and am working on a fix for when "Unable to mount volumes for pod..." + "Error syncing pod" happen indefinitely BUT the pod is actually running just fine, as observed by @vdavidoff . For others, if you see this message but your pod is Running, you can safely ignore it. It's caused by a small limitation of the reconstruction of state (what volumes are mounted) so there is a chance you will see this only if you restart kubelet while your pods had volumes mounted.

Jing Xu

unread,
Apr 21, 2017, 6:51:21 PM4/21/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

Because the error message is very generic, I think there might be different problems mentioned in this thread.
@vdavidoff 's issue could be fixed by @wongma7 's PR
@shepp67 are you still have any issues with mount?
@jonathan-kosgei could you also let us know whether you still have the issue? Thanks!

Jing

Jonathan Kosgei

unread,
Apr 22, 2017, 4:16:25 AM4/22/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

Hi @jingxu97 my issue was related to #12964 (comment) which @thockin helped me sort out. I was creating the gluster endpoints without a service and turns out the kube-controller deletes all such endpoints that are not attached to a service at regular intervals. So after every so often when a pod went down and tried to come up again and mount its respective pv it failed because it couldn't connect to gluster. The fix is to create a service for the gluster endpoints.

discordianfish

unread,
May 2, 2017, 12:36:04 PM5/2/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

I also just ran into such timeout issue:

After running a load test against my postgres backed web application, the database didn't respond anymore for still unknown reasons. According to kubernetes it was up and running

database-postgresql-1781528732-bvv40   2/2       Running            1          13d

The log showed a bunch of these messages. Not sure yet why but possibly it couldn't access the volume anymore:

LOG:  incomplete startup packet

Short after that, the pod entered "error" state: https://gist.github.com/discordianfish/9188cc11370804b5d23ea273c5624695

The event log showed:

Error syncing pod, skipping: network is not ready: [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR

This error disappeared after a few minutes (not sure how long exactly, but it took a while), but then I got the same error as described here:

Events:
  FirstSeen	LastSeen	Count	From							SubObjectPath				Type		Reason		Message
  ---------	--------	-----	----							-------------				--------	------		-------
  11m		10m		6	{kubelet gke-xx-xx-pool-small-3338bda2-l2dp}	spec.containers{database-postgresql}	Warning		Unhealthy	Readiness probe failed: 10.116.1.7:5432 - no response

  9m	9m	1	{kubelet gke-xx-xx-pool-small-3338bda2-l2dp}	spec.containers{database-postgresql}	Warning	Unhealthy	Liveness probe failed: 10.116.1.7:5432 - no response

  8m	8m	4	{kubelet gke-xx-xx-pool-small-3338bda2-l2dp}		Warning	FailedSync	Error syncing pod, skipping: network is not ready: [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
  4m	3m	4	{kubelet gke-xx-xx-pool-small-3338bda2-l2dp}		Warning	FailedSync	Error syncing pod, skipping: network is not ready: [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
  1m	1m	1	{kubelet gke-xx-xx-pool-small-3338bda2-l2dp}		Warning	FailedMount	Unable to mount volumes for pod "database-postgresql-1781528732-bvv40_default(c9b5822c-24f1-11e7-bcac-42010af0012b)": timeout expired waiting for volumes to attach/mount for pod "default"/"database-postgresql-1781528732-bvv40". list of unattached/unmounted volumes=[data]
  1m	1m	1	{kubelet gke-xx-xx-pool-small-3338bda2-l2dp}		Warning	FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"database-postgresql-1781528732-bvv40". list of unattached/unmounted volumes=[data]

I couldn't figure out anything more on my own. Nothing in the syslog, nothing more than this in the kubelet log. Finally I rebooted the machine which fixed the issue.

Jing Xu

unread,
May 2, 2017, 3:45:16 PM5/2/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

cc @MrHohn do you know what error of the network not ready?
@discordianfish It might be helpful if you could share the kubelet log with us to debug. Thanks!

Zihong Zheng

unread,
May 2, 2017, 4:18:06 PM5/2/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@jingxu97 AFAIK, the "network not ready error" is benign, kubelet will retry after this error (ref)

cc @freehan

discordianfish

unread,
May 4, 2017, 12:15:58 PM5/4/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@jingxu97 I hope I didn't miss some secrets in there, quite a talky log. But here it is: https://gist.github.com/discordianfish/ea71d76a87a302997819e1ffc9f340ff#file-gistfile1-txt-L1314

The database didn't respond anymore because it ran OOM and was killed, as also shown in the log. Although it appears like it was OOM killed after it became unresponsive in the log.. Either way, a bunch of other things got killed too, so granted - "not ideal" situation :). Yet it shouldn't drop the volume and should re-attach it.

Jing Xu

unread,
May 9, 2017, 6:05:30 PM5/9/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@discordianfish I think you are hitting the issue @wongma7 mentioned above. He has a PR #44781 to fix it. I will make sure it can merge soon. Thanks!

discordianfish

unread,
May 10, 2017, 6:33:27 AM5/10/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@jingxu97 Great, thanks! Do you know roughly when this will be available in GCE?

Jing Xu

unread,
May 10, 2017, 1:13:59 PM5/10/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@discordianfish you mean GKE or GCE? For GKE it will take a little bit longer. On GCE, when the PR gets merged, the fix should be available when the next release is out (I think should be soon)

discordianfish

unread,
May 11, 2017, 5:29:51 AM5/11/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

Running on GKE. I'd rather not launch my product with this bug around. Is there a workaround for it or something I can monitor for? Or maybe it could get hot fixed on GKE? If there is no workaround, this effectively renders GKE useless for stateful stuff.

Matthew Wong

unread,
May 11, 2017, 3:32:29 PM5/11/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

It's hard to say if volumes are actually being dropped/unmounted or if the FailedMount errors are wrong & misleading, so I wouldn't go so far as to say it's broken because of some volume issues. It could be something else entirely.

If nothing else, the PR jing's referring to will clean up the event noise and help us debug the root cause. In the meantime, I wouldn't trust the FailedMount events to be accurate, so if your applications are unresponsive, there may be other causes worth considering

discordianfish

unread,
May 12, 2017, 1:48:31 PM5/12/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@wongma7 Well, I only can say that my volumes were dropped and it didn't recover on it's own (only after reboot).

Dropped a seen by kubernetes, dunno if the volumes were still attached but they weren't usable by my scheduled database. To be clear: I'm not "complaining" and this was just test/R&D, but it would this is the scariest kind of issue for me to happen in prod, given that it's a managed service where I can't find a workaround on my own. Rebooting fixed it in my case, but that's a big hammer..

Jing Xu

unread,
May 12, 2017, 1:58:11 PM5/12/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention
You don't need to reboot the node. Delete the pod and restart it should
work for this issue. Is this workaround ok for your case?

On Fri, May 12, 2017 at 10:48 AM, discordianfish <notifi...@github.com>
wrote:

> @wongma7 <https://github.com/wongma7> Well, I only can say that my

> volumes were dropped and it didn't recover on it's own (only after reboot).
>
> Dropped a seen by kubernetes, dunno if the volumes were still attached but
> they weren't usable by my scheduled database. To be clear: I'm not
> "complaining" and this was just test/R&D, but it would this is the scariest
> kind of issue for me to happen in prod, given that it's a managed service
> where I can't find a workaround on my own. Rebooting fixed it in my case,
> but that's a big hammer..
>
> —
> You are receiving this because you were mentioned.

> Reply to this email directly, view it on GitHub
> <https://github.com/kubernetes/kubernetes/issues/43515#issuecomment-301142082>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ASSNxfpLzgMumu75uiDtZcPxahpxqPKgks5r5JtkgaJpZM4MlFrS>
> .
>



--
- Jing

discordianfish

unread,
May 12, 2017, 2:07:58 PM5/12/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

Good to know, thanks! Will try this next time.

fejta-bot

unread,
Dec 24, 2017, 2:22:21 AM12/24/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

racyber

unread,
Dec 26, 2017, 10:13:48 AM12/26/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

Hi,

I'm facing the same issue with k8s 1.9, installed by kops on AWS.

Dec 26 15:08:43 ip-172-20-119-141 kubelet[1386]: E1226 15:08:43.081255    1386 kubelet.go:1625] Unable to mount volumes for pod "jenkins-leader-6f68684779-mdjhm_default(602df05f-ea4e-11e7-9cf0-0adc55d1ca26)": timeout expired waiting for volumes to attach/mount for pod "default"/"jenkins-leader-6f68684779-mdjhm". list of unattached/unmounted volumes=[jenkins-home]; skipping pod
Dec 26 15:08:43 ip-172-20-119-141 kubelet[1386]: E1226 15:08:43.081276    1386 pod_workers.go:186] Error syncing pod 602df05f-ea4e-11e7-9cf0-0adc55d1ca26 ("jenkins-leader-6f68684779-mdjhm_default(602df05f-ea4e-11e7-9cf0-0adc55d1ca26)"), skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"jenkins-leader-6f68684779-mdjhm". list of unattached/unmounted volumes=[jenkins-home]

volume created successfully on AWS and attached to the right node, but the pod is failed to start.

/remove-lifecycle stale

racyber

unread,
Dec 26, 2017, 10:13:55 AM12/26/17
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

Jing Xu

unread,
Jan 2, 2018, 7:36:00 PM1/2/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@racyber, could you please provide more details about your issue? What are the steps to reproduce it? Thanks!

Hemant Kumar

unread,
Jan 2, 2018, 9:11:33 PM1/2/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@racyber yeah can you open a new bug for EBS and cc me with details? I was hoping we fixed most of EBS problems in 1.9 - I will be more than happy to walk through and find what is causing volume to not mount. As I understand this is different from typical EBS problems where volume fails to attach. In your case, volume is being attached to right node but it is not being seen by kubelet.

racyber

unread,
Jan 11, 2018, 2:44:47 PM1/11/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@gnufied @jingxmlu97

I found the issue, and it was the fact that i used c5 instances with nvme, this is why the volume never been attached as k8s looking for /dev/xvdXX and with nvme that path is different. I did notice that in 1.9 k8s should support nvme natively but it seems that it's still not stable.

thanks

Hemant Kumar

unread,
Jan 11, 2018, 2:47:16 PM1/11/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@racyber oh weird. You should open a bug about NVMe support then. I was thinking it would just work.

Jing Xu

unread,
Jan 11, 2018, 4:17:53 PM1/11/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

cc @davidz627 Is there an issue related to nvme support?
@racyber please let us know any issues you still have with 1.9. Thanks!

Michelle Au

unread,
Jan 11, 2018, 4:53:27 PM1/11/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

This is nvme on AWS, not related to the work @davidz627 is doing on gce.

Chris Campo

unread,
Jan 22, 2018, 5:04:55 PM1/22/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

Thanks @racyber. Definitely related to NVMe support. Ran into this trying to mount a gp2 EBS volume on a node which was an m5 EC2 instance.

John

unread,
Jan 31, 2018, 10:22:43 PM1/31/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

I think I am seeing like this with c4.xlarge, but unsure of "why" at the moment. It would seem r3.large works perfectly fine.

Hemant Kumar

unread,
Mar 19, 2018, 11:47:22 AM3/19/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@racyber @ccampo133 did you guys had a chance to open a BUG for NVMe support on EBS? If you do - please paste output of ls /dev/disk/by-id/ and PersistentVolume spec. We can't fix it unless we know bit more. :(

Chris Campo

unread,
Mar 19, 2018, 11:59:45 AM3/19/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

@gnufied sorry, we worked around this by reverting back to m4 class instances. It should be simple enough to reproduce however... create a k8s cluster w/ m5 EBS backed nodes and create a simple pod w/ a persistent volume. You'll notice that the pod will never launch and you should be able to debug further. If I do run into this again I'll be sure to get some more data for you.

Kubernetes Submit Queue

unread,
Mar 28, 2018, 4:53:32 PM3/28/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

Closed #43515 via #61357.

Arunkumar Anandhan

unread,
Apr 11, 2018, 12:52:55 PM4/11/18
to kubernetes/kubernetes, k8s-mirror-storage-misc, Team mention

I too have the error, error syncing pod.

kubectl describe pod kafka1-kafka-0
....
....
....
Warning FailedSync 5m (x141 over 35m) kubelet, 10.186..* Error syncing pod
Warning BackOff 57s (x164 over 35m) kubelet, 10.186..* Back-off restarting failed container

kubectl logs kafka1-kafka-0 -p

export KAFKA_BROKER_ID=kafka1-kafka-0
export KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://172.30.*.:9092
/etc/confluent/docker/run
sh: 3: /etc/confluent/docker/run: not found

Reply all
Reply to author
Forward
0 new messages