Re: [kubernetes/kubernetes] Occasional ImagePullBackOff Errors when pulling large docker images (#59376)

Bernard Van De Walle

unread,

Feb 5, 2018, 5:56:40 PM2/5/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I have seen the same thing, also with smaller images sometimes.

It randomly solves itself after a couple retries.

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Kevin Lam

unread,

Feb 5, 2018, 11:51:54 PM2/5/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@bvandewalle thanks for +1'ing, out of curiousity are you also hosting your images on docker hub? I'm wondering if moving to Google Container Registry will help with these issues.

Kevin Lam

unread,

Feb 7, 2018, 1:58:49 PM2/7/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Just want to follow-up to mention this issue is sometimes being observed for images hosted on GCR as well.

Eric di Domenico

unread,

Feb 18, 2018, 3:57:24 PM2/18/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I am seeing this with images hosted on ECS

Alex Barnes

unread,

Feb 19, 2018, 11:04:15 AM2/19/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I am seeing this with images hosted in a private registry.

discordianfish

unread,

Feb 20, 2018, 6:18:56 AM2/20/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I have the same issue on a vanilla cluster running on AWS, using docker hub registry:

When scheduling a pod which uses a large image (5GB), I see ErrImagePull and rpc error: code = Canceled desc = context canceled in the pod events. I can still pull the image with docker pull after which the scheduling succeedes. It doesn't seem to recover on it's own though. What also worked is recreating the pod, so it seems like a intermediate issue and I haven't found a way to reliably reproduce it.

Michelle Au

unread,

Feb 20, 2018, 1:02:23 PM2/20/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@kubernetes/sig-node-bugs

Andreas Krüger

unread,

Feb 20, 2018, 3:19:58 PM2/20/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Did you set the --image-pull-progress-deadline on the kubelet?

discordianfish

unread,

Mar 30, 2018, 1:12:39 PM3/30/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@woopstar No, should that be necessary? The flag says:

If no pulling progress is made before this deadline, the image pulling will be cancelled

But I would assume that pulling should make progress.

Looking at https://github.com/kubernetes/kubernetes/blob/915798d229b7be076d8e53d6aa1573adabd470d2/pkg/kubelet/dockershim/libdocker/kube_docker_client.go#L374 it seems that it expects to get some sort of pull status update from docker and only if there is no new message from docker for 1 minute it aborts.

Still unclear if I need to tune this flag for large images or whether docker in that case should still report some progress unless something is broken where the deadline would just hide this.

Andreas Krüger

unread,

Mar 30, 2018, 3:48:43 PM3/30/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@discordianfish According the the blog post here they had set to increase the --image-pull-progress-deadline on the kubelet, as they got rpc error: code = 2 desc = net/http: request canceled errors when pulling large images.

Mikalai Radchuk

unread,

Apr 17, 2018, 2:49:46 PM4/17/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Had the same issue on GKE with a public image from the docker hub.

Node and master version is v1.8.8-gke.0

Grant Curell

unread,

Apr 21, 2018, 1:17:44 PM4/21/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I got the same thing, but for me it seemed to be Kubernetes failing to pull the image internally. I got on the box in question, ran docker pull <IMAGE> - it was kibana so we're only talking like 250MB on a fast internet connection so it completed near immediately - then reran the Kubernetes command and it completed just fine. My problem didn't seem to have anything to do with the timeout for pulling the image, it appeared that Kubernetes failed to do it altogether. Plus, I already have it set up so that it retries in the case of timeouts so it had around ~7.5 minutes to perform this task. Not sure if it is helpful, but here's is the really angry block of text Ansible spat out:

fatal: [rockserver1.lan]: FAILED! => {"attempts": 45, "changed": true, "cmd": "kubectl get deployments kibana -n default -o go-template --template='{{if ne (.sta tus.replicas) (.status.readyReplicas)}}false{{end}}'", "delta": "0:00:02.985879", "end": "2018-04-21 10:37:20.887500", "failed_when_result": true, "msg": "non-ze ro return code", "rc": 1, "start": "2018-04-21 10:37:17.901621", "stderr": "error: error executing template \"{{if ne (.status.replicas) (.status.readyReplicas)} }false{{end}}\": template: output:1:5: executing \"output\" at <ne (.status.replicas...>: error calling ne: invalid type for comparison", "stderr_lines": ["error : error executing template \"{{if ne (.status.replicas) (.status.readyReplicas)}}false{{end}}\": template: output:1:5: executing \"output\" at <ne (.status.repli cas...>: error calling ne: invalid type for comparison"], "stdout": "Error executing template: template: output:1:5: executing \"output\" at <ne (.status.replica s...>: error calling ne: invalid type for comparison. Printing more information for debugging the template:\n\ttemplate was:\n\t\t{{if ne (.status.replicas) (.st atus.readyReplicas)}}false{{end}}\n\traw data was:\n\t\t{\"apiVersion\":\"extensions/v1beta1\",\"kind\":\"Deployment\",\"metadata\":{\"annotations\":{\"deploymen t.kubernetes.io/revision\":\"1\"},\"creationTimestamp\":\"2018-04-21T15:27:57Z\",\"generation\":1,\"labels\":{\"component\":\"kibana\"},\"name\":\"kibana\",\"nam espace\":\"default\",\"resourceVersion\":\"48446\",\"selfLink\":\"/apis/extensions/v1beta1/namespaces/default/deployments/kibana\",\"uid\":\"91844b37-4578-11e8-9 345-0800278f7853\"},\"spec\":{\"progressDeadlineSeconds\":600,\"replicas\":1,\"revisionHistoryLimit\":2,\"selector\":{\"matchLabels\":{\"component\":\"kibana\"}} ,\"strategy\":{\"rollingUpdate\":{\"maxSurge\":\"25%\",\"maxUnavailable\":\"25%\"},\"type\":\"RollingUpdate\"},\"template\":{\"metadata\":{\"creationTimestamp\": null,\"labels\":{\"component\":\"kibana\"}},\"spec\":{\"containers\":[{\"env\":[{\"name\":\"CLUSTER_NAME\",\"value\":\"rock\"},{\"name\":\"XPACK_SECURITY_ENABLED \",\"value\":\"true\"},{\"name\":\"XPACK_MONITORING_UI_CONTAINER_ELASTICSEARCH_ENABLED\",\"value\":\"true\"},{\"name\":\"XPACK_GRAPH_ENABLED\",\"value\":\"true\" },{\"name\":\"XPACK_ML_ENABLED\",\"value\":\"true\"},{\"name\":\"XPACK_REPORTING_ENABLED\",\"value\":\"true\"}],\"image\":\"docker.elastic.co/kibana/kibana:6.2.4 \",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"kibana\",\"ports\":[{\"containerPort\":5601,\"name\":\"http\",\"protocol\":\"TCP\"}],\"resources\":{\"limits\" :{\"cpu\":\"1\"},\"requests\":{\"cpu\":\"100m\"}},\"terminationMessagePath\":\"/dev/termination-log\",\"terminationMessagePolicy\":\"File\"}],\"dnsPolicy\":\"ClusterFirst\",\"nodeSelector\":{\"role\":\"server\"},\"restartPolicy\":\"Always\",\"schedulerName\":\"default-scheduler\",\"securityContext\":{},\"terminationGracePeriodSeconds\":30}}},\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2018-04-21T15:27:59Z\",\"lastUpdateTime\":\"2018-04-21T15:27:59Z\",\"message\":\"Deployment does not have minimum availability.\",\"reason\":\"MinimumReplicasUnavailable\",\"status\":\"False\",\"type\":\"Available\"},{\"lastTransitionTime\":\"2018-04-21T15:27:58Z\",\"lastUpdateTime\":\"2018-04-21T15:28:00Z\",\"message\":\"ReplicaSet \\\"kibana-5bcb7799d\\\" is progressing.\",\"reason\":\"ReplicaSetUpdated\",\"status\":\"True\",\"type\":\"Progressing\"}],\"observedGeneration\":1,\"replicas\":1,\"unavailableReplicas\":1,\"updatedReplicas\":1}}\n\tobject given to template engine was:\n\t\tmap[kind:Deployment metadata:map[creationTimestamp:2018-04-21T15:27:57Z generation:1 labels:map[component:kibana] resourceVersion:48446 uid:91844b37-4578-11e8-9345-0800278f7853 annotations:map[deployment.kubernetes.io/revision:1] namespace:default selfLink:/apis/extensions/v1beta1/namespaces/default/deployments/kibana name:kibana] spec:map[revisionHistoryLimit:2 selector:map[matchLabels:map[component:kibana]] strategy:map[rollingUpdate:map[maxSurge:25% maxUnavailable:25%] type:RollingUpdate] template:map[metadata:map[creationTimestamp:<nil> labels:map[component:kibana]] spec:map[securityContext:map[] terminationGracePeriodSeconds:30 containers:[map[imagePullPolicy:IfNotPresent name:kibana ports:[map[containerPort:5601 name:http protocol:TCP]] resources:map[limits:map[cpu:1] requests:map[cpu:100m]] terminationMessagePath:/dev/termination-log terminationMessagePolicy:File env:[map[name:CLUSTER_NAME value:rock] map[value:true name:XPACK_SECURITY_ENABLED] map[name:XPACK_MONITORING_UI_CONTAINER_ELASTICSEARCH_ENABLED value:true] map[name:XPACK_GRAPH_ENABLED value:true] map[name:XPACK_ML_ENABLED value:true] map[name:XPACK_REPORTING_ENABLED value:true]] image:docker.elastic.co/kibana/kibana:6.2.4]] dnsPolicy:ClusterFirst nodeSelector:map[role:server] restartPolicy:Always schedulerName:default-scheduler]] progressDeadlineSeconds:600 replicas:1] status:map[observedGeneration:1 replicas:1 unavailableReplicas:1 updatedReplicas:1 conditions:[map[type:Available lastTransitionTime:2018-04-21T15:27:59Z lastUpdateTime:2018-04-21T15:27:59Z message:Deployment does not have minimum availability. reason:MinimumReplicasUnavailable status:False] map[type:Progressing lastTransitionTime:2018-04-21T15:27:58Z lastUpdateTime:2018-04-21T15:28:00Z message:ReplicaSet \"kibana-5bcb7799d\" is progressing. reason:ReplicaSetUpdated status:True]]] apiVersion:extensions/v1beta1]", "stdout_lines": ["Error executing template: template: output:1:5: executing \"output\" at <ne (.status.replicas...>: error calling ne: invalid type for comparison. Printing more information for debugging the template:", "\ttemplate was:", "\t\t{{if ne (.status.replicas) (.status.readyReplicas)}}false{{end}}", "\traw data was:", "\t\t{\"apiVersion\":\"extensions/v1beta1\",\"kind\":\"Deployment\",\"metadata\":{\"annotations\":{\"deployment.kubernetes.io/revision\":\"1\"},\"creationTimestamp\":\"2018-04-21T15:27:57Z\",\"generation\":1,\"labels\":{\"component\":\"kibana\"},\"name\":\"kibana\",\"namespace\":\"default\",\"resourceVersion\":\"48446\",\"selfLink\":\"/apis/extensions/v1beta1/namespaces/default/deployments/kibana\",\"uid\":\"91844b37-4578-11e8-9345-0800278f7853\"},\"spec\":{\"progressDeadlineSeconds\":600,\"replicas\":1,\"revisionHistoryLimit\":2,\"selector\":{\"matchLabels\":{\"component\":\"kibana\"}},\"strategy\":{\"rollingUpdate\":{\"maxSurge\":\"25%\",\"maxUnavailable\":\"25%\"},\"type\":\"RollingUpdate\"},\"template\":{\"metadata\":{\"creationTimestamp\":null,\"labels\":{\"component\":\"kibana\"}},\"spec\":{\"containers\":[{\"env\":[{\"name\":\"CLUSTER_NAME\",\"value\":\"rock\"},{\"name\":\"XPACK_SECURITY_ENABLED\",\"value\":\"true\"},{\"name\":\"XPACK_MONITORING_UI_CONTAINER_ELASTICSEARCH_ENABLED\",\"value\":\"true\"},{\"name\":\"XPACK_GRAPH_ENABLED\",\"value\":\"true\"},{\"name\":\"XPACK_ML_ENABLED\",\"value\":\"true\"},{\"name\":\"XPACK_REPORTING_ENABLED\",\"value\":\"true\"}],\"image\":\"docker.elastic.co/kibana/kibana:6.2.4\",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"kibana\",\"ports\":[{\"containerPort\":5601,\"name\":\"http\",\"protocol\":\"TCP\"}],\"resources\":{\"limits\":{\"cpu\":\"1\"},\"requests\":{\"cpu\":\"100m\"}},\"terminationMessagePath\":\"/dev/termination-log\",\"terminationMessagePolicy\":\"File\"}],\"dnsPolicy\":\"ClusterFirst\",\"nodeSelector\":{\"role\":\"server\"},\"restartPolicy\":\"Always\",\"schedulerName\":\"default-scheduler\",\"securityContext\":{},\"terminationGracePeriodSeconds\":30}}},\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2018-04-21T15:27:59Z\",\"lastUpdateTime\":\"2018-04-21T15:27:59Z\",\"message\":\"Deployment does not have minimum availability.\",\"reason\":\"MinimumReplicasUnavailable\",\"status\":\"False\",\"type\":\"Available\"},{\"lastTransitionTime\":\"2018-04-21T15:27:58Z\",\"lastUpdateTime\":\"2018-04-21T15:28:00Z\",\"message\":\"ReplicaSet \\\"kibana-5bcb7799d\\\" is progressing.\",\"reason\":\"ReplicaSetUpdated\",\"status\":\"True\",\"type\":\"Progressing\"}],\"observedGeneration\":1,\"replicas\":1,\"unavailableReplicas\":1,\"updatedReplicas\":1}}", "\tobject given to template engine was:", "\t\tmap[kind:Deployment metadata:map[creationTimestamp:2018-04-21T15:27:57Z generation:1 labels:map[component:kibana] resourceVersion:48446 uid:91844b37-4578-11e8-9345-0800278f7853 annotations:map[deployment.kubernetes.io/revision:1] namespace:default selfLink:/apis/extensions/v1beta1/namespaces/default/deployments/kibana name:kibana] spec:map[revisionHistoryLimit:2 selector:map[matchLabels:map[component:kibana]] strategy:map[rollingUpdate:map[maxSurge:25% maxUnavailable:25%] type:RollingUpdate] template:map[metadata:map[creationTimestamp:<nil> labels:map[component:kibana]] spec:map[securityContext:map[] terminationGracePeriodSeconds:30 containers:[map[imagePullPolicy:IfNotPresent name:kibana ports:[map[containerPort:5601 name:http protocol:TCP]] resources:map[limits:map[cpu:1] requests:map[cpu:100m]] terminationMessagePath:/dev/termination-log terminationMessagePolicy:File env:[map[name:CLUSTER_NAME value:rock] map[value:true name:XPACK_SECURITY_ENABLED] map[name:XPACK_MONITORING_UI_CONTAINER_ELASTICSEARCH_ENABLED value:true] map[name:XPACK_GRAPH_ENABLED value:true] map[name:XPACK_ML_ENABLED value:true] map[name:XPACK_REPORTING_ENABLED value:true]] image:docker.elastic.co/kibana/kibana:6.2.4]] dnsPolicy:ClusterFirst nodeSelector:map[role:server] restartPolicy:Always schedulerName:default-scheduler]] progressDeadlineSeconds:600 replicas:1] status:map[observedGeneration:1 replicas:1 unavailableReplicas:1 updatedReplicas:1 conditions:[map[type:Available lastTransitionTime:2018-04-21T15:27:59Z lastUpdateTime:2018-04-21T15:27:59Z message:Deployment does not have minimum availability. reason:MinimumReplicasUnavailable status:False] map[type:Progressing lastTransitionTime:2018-04-21T15:27:58Z lastUpdateTime:2018-04-21T15:28:00Z message:ReplicaSet \"kibana-5bcb7799d\" is progressing. reason:ReplicaSetUpdated status:True]]] apiVersion:extensions/v1beta1]"]}

Mikalai Radchuk

unread,

Apr 21, 2018, 2:34:03 PM4/21/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Had the same issue on GKE with a public image from the docker hub.

Node and master versions are v1.8.8-gke.0

Vinodh Kumar Basavani

unread,

Apr 26, 2018, 6:03:06 AM4/26/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I am also facing the same issue with kubernetes 1.8x version where i am using ECR as container registry and my cluster is running using kops on aws

Amazing Turtle

unread,

Apr 26, 2018, 7:19:34 AM4/26/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Experiencing the same issue with gitlab omnibus registry on 1.8.8-gke.0. But imagesize is 74.04 MiB. It helps to delete the whole namespace and re-deploy with gitlab again. Ocassionally after resizing my cluster nodes I'm getting a backoff when it's balanced to different nodes for example.

Steven Church

unread,

Apr 28, 2018, 9:42:02 AM4/28/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Also seeing this on GKE 1.8.8-gke.0 with a 300MiB image. All containers seeing the same issue.

DamonStamper

unread,

Apr 30, 2018, 2:51:27 PM4/30/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I'm seeing the same 'Back-off pulling image "FOO": rpc error: code = Canceled desc = context canceled' Using Kube cluster 1.10 and pulling a large image. Using --image-pull-progress-deadline=60m on kublet bypassed the issue, per @woopstar.

discordianfish

unread,

May 1, 2018, 9:22:51 AM5/1/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I'm trying to get to the root of this.

This will abort the pull if there hasn't been progress for the deadline: https://github.com/kubernetes/kubernetes/blob/915798d229b7be076d8e53d6aa1573adabd470d2/pkg/kubelet/dockershim/libdocker/kube_docker_client.go#L374

On the Docker side, the progress gets posted by the ProgressReader: https://github.com/moby/moby/blob/53683bd8326b988977650337ee43b281d2830076/distribution/pull_v2.go#L234

Which is suppose to send a progress message at least every 512kb: https://github.com/moby/moby/blob/3a633a712c8bbb863fe7e57ec132dd87a9c4eff7/pkg/progress/progressreader.go#L34

So unless there is a bug I missed, the pulls here fail to download 512kb within the default 60s deadline, so there is something wrong with the registry, docker or the network.

Davanum Srinivas

unread,

May 1, 2018, 9:37:54 AM5/1/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

LOL. +1 to there is something wrong with the registry, docker or the network. :)

DamonStamper

unread,

May 1, 2018, 9:42:26 AM5/1/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Sounds like a reason to setup wireshark and monitor traffic. If I get some (ha) time I'll do this and report back.

I'll be looking for a lack of 512kb (KiB?) data transfer inside of a minute. Let me know if I should look for something different.

discordianfish

unread,

May 1, 2018, 10:09:21 AM5/1/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@dims Sorry for not isolating this further but it confirms that there is an issue causing the pulls to stall. Network is rather unlikely to be the problem though.

Davanum Srinivas

unread,

May 1, 2018, 10:14:27 AM5/1/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@discordianfish not at all, it was just funny. i totally understand it's a very tricky problem to debug or make sense of

Davanum Srinivas

unread,

May 1, 2018, 10:17:34 AM5/1/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@AlexB138 i totally deserved that thumbs down!

Davanum Srinivas

unread,

May 1, 2018, 11:04:12 AM5/1/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@discordianfish what's the docker daemon max-concurrent-downloads set to? (default seems to be 3) and Looks like the serial puller is a max # set to 10 ( https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/images/puller.go#L57)

Steven Chen

unread,

May 4, 2018, 6:21:20 AM5/4/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Encountered the same here. It was working before.

discordianfish

unread,

May 4, 2018, 8:35:42 AM5/4/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@dims I think I'm using the default..
So possible that if kubelet pulls 10 images (layers?) in parallel and docker only processes 3, it will stale of the remaining ones.

Maybe the kubelet puller should use a lower max-concurrent-downloads?

Davanum Srinivas

unread,

May 4, 2018, 9:09:35 AM5/4/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

or bump up what you have in docker daemon (see some detail here - https://blog.openai.com/scaling-kubernetes-to-2500-nodes/#dockerimagepulls )

Sami Alakus

unread,

Jun 4, 2018, 3:34:18 AM6/4/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I'm having this issue too when pulling images down from ec2. After some time the pod stands up without issue though.

Ryan Richards

unread,

Jun 4, 2018, 2:38:37 PM6/4/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I am seeing this same issue when using gcr.io (google container registry). What is interesting however is that doing a 'docker pull' works without issue every time.

Abhishek Chanda

unread,

Jun 6, 2018, 6:47:59 AM6/6/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Saw this issue on AKS, pulling from GCR.

discordianfish

unread,

Jun 6, 2018, 8:00:24 AM6/6/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I think the root problem here ti fix is either requiring users in the docs to increase max-concurrent-downloads or the kubelet puller shouldn't use more concurrent downloads than docker uses by default.

That being said, I see have indeed networking issues in my cluster and many here might too, so would discourage from just increasing the staleness limits.

Emanuele Casadio

unread,

Jun 29, 2018, 9:26:21 AM6/29/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I am seeing this issue on a single pod in a replica set made by 3 pods pulling the image from an Azure Registry to an Azure AKS Cluster. 2 pods out of 3 can correctly pull the image, while the other one keeps stalling (12 retries in 10 minutes).

Sharon Rolel

unread,

Jul 31, 2018, 8:12:11 AM7/31/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Seeing this on ECS.

Bryan Larsen

unread,

Oct 2, 2018, 2:33:39 PM10/2/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

I've got a machine in my cluster that reliably spews a machine like "kubelet[2589]: E1002 12:13:46.064879 2589 kube_docker_client.go:341] Cancel pulling image "us.gcr.io/...:4099cd1e356386df36c122fbfff51243674d6433" because of no progress for 1m0s, latest progress: "8bc388a983a5: Download complete "

Davanum Srinivas

unread,

Oct 2, 2018, 2:37:41 PM10/2/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@bryanlarsen did u see the tips i referred to earlier? ( in https://blog.openai.com/scaling-kubernetes-to-2500-nodes/#dockerimagepulls )

Bryan Larsen

unread,

Oct 2, 2018, 2:51:47 PM10/2/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Sorry, I didn't leave enough context. The tips did fix things for me -- the last message saying "download complete" just means that at least one layer has been fully downloaded; there may still be many more layers to download and/or extract. The frustration was that the message sent me on a wild goose chase looking for the problem elsewhere.

discordianfish

unread,

Oct 4, 2018, 7:34:21 AM10/4/18

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Agreeing with @bryanlarsen, this error is really misleading.

fejta-bot

unread,

Jan 2, 2019, 6:34:21 AM1/2/19

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot

unread,

Feb 1, 2019, 7:18:42 AM2/1/19

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.

/lifecycle rotten

fejta-bot

unread,

Mar 3, 2019, 7:36:02 AM3/3/19

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Kubernetes Prow Robot

unread,

Mar 3, 2019, 7:36:11 AM3/3/19

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Closed #59376.

Kubernetes Prow Robot

unread,

Mar 3, 2019, 7:36:19 AM3/3/19

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Travis Clarke

unread,

Nov 5, 2019, 2:28:57 AM11/5/19

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

/reopen

How would one configure this in a cloud provider (e.g. GKE) such that the setting preserves across upgrades and applies to the nodes automatically as they are scaled?

—
You are receiving this because you are on a team that was mentioned.

Reply to this email directly, view it on GitHub, or unsubscribe.

Kubernetes Prow Robot

unread,

Nov 5, 2019, 2:29:01 AM11/5/19

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@clarketm: Reopened this issue.

In response to this:

/reopen

How would one configure this in a cloud provider (e.g. GKE) such that the setting preserves across upgrades and applies to the nodes automatically as they are scaled?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

—

Kubernetes Prow Robot

unread,

Nov 5, 2019, 2:29:04 AM11/5/19

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Reopened #59376.

Travis Clarke

unread,

Nov 5, 2019, 2:29:43 AM11/5/19

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

/remove-lifecycle rotten

fejta-bot

unread,

Mar 4, 2020, 6:13:03 PM3/4/20

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

discordianfish

unread,

Mar 11, 2020, 2:01:25 PM3/11/20

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

/reopen
/remove-lifecycle stale
/remove-lifecycle rotten
/lifecycle freeze
/cc @fejta Feedback: I don't like it.

fejta-bot

unread,

Jun 9, 2020, 2:02:07 PM6/9/20

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

—

discordianfish

unread,

Jun 15, 2020, 6:34:15 AM6/15/20

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Giving up and unsubscribing from issue

/cc @fejta Feedback: I don't like it.

—

Travis Clarke

unread,

Jun 18, 2020, 5:32:58 AM6/18/20

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

/lifecycle frozen

Ahmed Hanafy

unread,

May 25, 2021, 10:52:22 AM5/25/21

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Have the same issue with a private registry, minikube as a dev environment, I pulled the image before but I pushed some code changes and the CI pipeline build the new image and when I try to upgrade using helm upgrade --install i get the same issue :/

Paco Xu

unread,

Jun 24, 2021, 5:59:28 AM6/24/21

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

After reading the whole thread, there would be several reasons.

--image-pull-progress-deadline is by default 1m for docker(which is deprecated), and configure it to be 60m would solve the problem(if it is caused by not complete pulling in 1m for slow registry or large image).
Another common reason is the registry/network may have something wrong. and there are several ways to speed up the image pulling

the pulls here fail to download 512kb within the default 60s deadline, so there is something wrong with the registry, docker or the network.

docker side: max-concurrent-downloads for docker, default 3 and can set to 10
kubelet side: configure --serialize-image-pulls to make one image pull not stuck others.

BTW, Using https://github.com/dragonflyoss/Dragonfly is another solution in my opinion.

To improve it in Kubernetes, I think what we can do immediately

defaulting change: --serialize-image-pulls=false (current it is false for early docker version and aufs.)

/remove-sig storage
Honestly, I don't think this is a bug, but a performance issue that we should tune.

Aditi Sharma

unread,

Jun 24, 2021, 6:59:45 AM6/24/21

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@pacoxu it seems that the issue is fixed by increasing --image-pull-progress-deadline and we have deprecated to dockershim so cannot fine tune any parameter now, I think we can close it?

Paco Xu

unread,

Jun 24, 2021, 9:43:37 PM6/24/21

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Agree.

/close
/triage accepted

I suggest opening a new issue if a user still suffers ImagePullBackOff error with container runtime. Close it as we need more specifics on repro steps and reopen if there're more details(kubelet log and container runtime log are appreciated).

Kubernetes Prow Robot

unread,

Jun 24, 2021, 9:43:41 PM6/24/21

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

@pacoxu: Closing this issue.

In response to this:

Agree.

/close
/triage accepted

I suggest opening a new issue if a user still suffers ImagePullBackOff error with container runtime. Close it as we need more specifics on repro steps and reopen if there're more details(kubelet log and container runtime log are appreciated).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

—

Kubernetes Prow Robot

unread,

Jun 24, 2021, 9:43:42 PM6/24/21

to kubernetes/kubernetes, k8s-mirror-storage-bugs, Team mention

Closed #59376.

Reply all

Reply to author

Forward