@kubernetes/sig-apps-bugs
because this may be behavior of the deployment itself, not the CLI interaction
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.![]()
/assign
I was testing this today w/o being able to reproduce :( kubectl rollout status uses the status of the Deployment to determine if a Deployment has succeeded rolling out so this must be an issue in the Deployment controller where the ProgressDeadlineExceeded Condition is set prematurely. Do you have access in controller manager logs?
When in fact the deployment continued to work just fine. We never hit the limit of our progressDeadlineSeconds at all.
Can you post the output of the Deployment next time you hit this?
kubectl get deploy/testapp -o yaml
Do you have access in controller manager logs?
Ideally with --v=4 (loglevel).
@kargakis Thanks for taking a gander. As stated I do run into this tremendously randomly.
I'll see if I can get the logs. I'm running this in GKE, so I'll need to figure out where those may live.
In the meantime, I've modified my deploy script to see if it can help with troubleshooting. The relevant bits:
echo "Deploying to ${environment}" kubectl rollout history --namespace ${environment} deployments/${deployment_name} | tail kubectl set --namespace ${environment} image deployments/${deployment_name} ${container_name}=${image_name}:${CIRCLE_SHA1} --record kubectl rollout history --namespace ${environment} deployments/${deployment_name} | tail kubectl get pods -l app=app --namespace ${environment} for sleep_time in {1..10} do echo "Take ${sleep_time}" kubectl rollout status deployments/${deployment_name} --namespace ${environment} if [[ $? == 0 ]] then break fi kubectl get pods -l app=app --namespace ${environment} sleep ${sleep_time} done kubectl rollout status deployments/${deployment_name} --namespace ${environment} if [[ $? != 0 ]] then echo "The deploy has failed, I'm going to attempt to roll back..." kubectl rollout undo deployments/${deployment_name} --namespace ${environment} kubectl rollout status deployments/${deployment_name} --namespace ${environment} kubectl rollout history --namespace ${environment} deployments/${deployment_name} | tail if [[ $? != 0 ]] then echo "Rollback was unsuccessful." else echo "Rollback was successful." fi exit 1 else echo "Deployment has succeeded."
And another example failure. In this case, we see it take 9 seconds before the next try outputs what I expect the first time around. (things were redacted for readability)
Deploying to staging3
19 kubectl set image deployments/app app=app:19 --namespace=staging3 --record=true
20 kubectl set image deployments/app app=app:20 --namespace=staging3 --record=true
21 kubectl set image deployments/app app=app:21 --namespace=staging3 --record=true
23 kubectl set image deployments/app app=app:23 --namespace=staging3 --record=true
25 kubectl set image deployments/app app=app:25 --namespace=staging3 --record=true
26 kubectl set image deployments/app app=app:26 --namespace=staging3 --record=true
27 kubectl set image deployments/app app=app:27 --namespace=staging3 --record=true
28 kubectl set image deployments/app app=app:28 --namespace=staging3 --record=true
29 kubectl scale deploy/app --replicas=1 --namespace=staging3
deployment "app" image updated
20 kubectl set image deployments/app app=app:20 --namespace=staging3 --record=true
21 kubectl set image deployments/app app=app:21 --namespace=staging3 --record=true
23 kubectl set image deployments/app app=app:23 --namespace=staging3 --record=true
25 kubectl set image deployments/app app=app:25 --namespace=staging3 --record=true
26 kubectl set image deployments/app app=app:26 --namespace=staging3 --record=true
27 kubectl set image deployments/app app=app:27 --namespace=staging3 --record=true
28 kubectl set image deployments/app app=app:28 --namespace=staging3 --record=true
29 kubectl scale deploy/app --replicas=1 --namespace=staging3
30 kubectl set image deployments/app app=app:30 --namespace=staging3 --record=true
NAME READY STATUS RESTARTS AGE
app-1070618494-m0glv 1/1 Running 0 33m
app-1761012463-bclcv 0/1 ContainerCreating 0 0s
Take 1
error: deployment "app" exceeded its progress deadline
NAME READY STATUS RESTARTS AGE
app-1070618494-m0glv 1/1 Running 0 33m
app-1761012463-bclcv 0/1 ContainerCreating 0 0s
Take 2
error: deployment "app" exceeded its progress deadline
NAME READY STATUS RESTARTS AGE
app-1070618494-m0glv 1/1 Running 0 33m
app-1761012463-bclcv 0/1 Running 0 2s
Take 3
error: deployment "app" exceeded its progress deadline
NAME READY STATUS RESTARTS AGE
app-1070618494-m0glv 1/1 Running 0 33m
app-1761012463-bclcv 0/1 Running 0 4s
Take 4
deployment "app" successfully rolled out
deployment "app" successfully rolled out
Deployment has succeeded.
The above output is just one example. I haven't been able to determine if the status of the pod it's spinning up affects the outcome of the status command yet.
Unfortunately it looks I won't have access to the controller manager logs. GKE only exposes the api server logs.
It may or may not help but please post the output of kubectl get deploy/testapp -o yaml next time you reproduce this. Reason: ProgressDeadlineExceeded should be in the status.
this happens to me too
status:
availableReplicas: 1
conditions:
- lastTransitionTime: 2017-08-03T20:41:52Z
lastUpdateTime: 2017-08-03T20:41:52Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: 2017-08-04T06:49:55Z
lastUpdateTime: 2017-08-04T06:49:55Z
message: ReplicaSet "app-stg-2565272565" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 195
readyReplicas: 1
replicas: 2
unavailableReplicas: 1
updatedReplicas: 1
Yeah, flipping from False to True is ok since the actual state of the Deployment is True. Are you also running on GKE?
Im on AWS
When you say controller manager logs what exactly do you mean? kubelet logs?
can you direct me exactly how to fecth this info?
KOPS
@justinsb how can you get the logs from the controller manager in a cluster installed by kops?
I think we're seeing similar issues. We're also using set image to deploy and randomly see rollout status return with a progress deadline error. Is there anything I can provide to help track this down?
controller manager logs on log level 4 when you reproduce, ideally
Also running on GKE and it looks like controller manager logs aren't exposed
I think I have figured out why this is happening - opened #49637 and once it merges I will backport it in 1.7.
/priority critical-urgent
[MILESTONENOTIFIER] Milestone Labels Incomplete
Action required: This issue requires label changes. If the required changes are not made within 6 days, the issue will be moved out of the v1.8 milestone.
priority: Must specify exactly one of [priority/critical-urgent, priority/important-longterm, priority/important-soon].
[MILESTONENOTIFIER] Milestone Labels Complete
Issue label settings:
/remove-sig cli
Hey so automated rollbacks through kubernetes is done currently with only "undo" command with kubectl.
Is there is way kubernetes identifies the failed deployment and rollbacks automatically to previous running deployment by itself rather than passing the command kubectl rollout undo deploy/
Also with the command "kubectl patch deployment/nginx-deployment -p '{"spec":{"progressDeadlineSeconds":100}}" kubernetes does not by default rolls back to previous success deployment after the new deployment fails and completes the progressDeadlineSeconds to 100s
can we make make kubernetes know to rollback by itself depending upon progressDeadlineSeconds ?
I have tried " kubectl patch deployment/nginx-deployment -p '{"spec":{"spec.autoRollback":true}}'" which dint seem to work
Please provide me with solution if there is a way kubernetes can automatically rollback to previous successful deployment by itself.
Thank you
if you have a problem like this on deployment action github
error: deployment "gke-apiadministrasi" exceeded its progress deadline
you can check your flow and your code, here I assume you using program language golang :
build and push your image docker. here i'm use command ( gcloud builds submit --tag gcr.io/[PROJECT_ID]/[IMAGE_NAME] . or gcloud builds submit --config cloudbuild.yaml . if you use cloud build)google container registry and make sure build image successimage: gcr.io/<prject-id>/web-app-administrasi:<latest or result build image on google container registry> #inject docker image tag latest on deployment.yaml—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.![]()
I think what is likely to cause this issue is if your cluster runs a health check or a liveness check on your containers before deploying.
Your cluster identifies that your application isnt running and enters a waiting state due to CrashLoopBackOff
This can be fixed by just creating a default get route on your API application on "/" that returns a 200.
