@frittentheke: Reiterating the mentions to trigger a notification:
@kubernetes/sig-api-machinery-bugs
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
I enabled log level 5 to see more details about the garbage collection and apparently there seems to be an / the issue:
I0628 10:03:39.308749 1 garbagecollector.go:364] processing item [v1/Pod, namespace: namespaceabc, name: nameabc-757b9c656-77lmr, uid: 9e81c654-7386-11e8-87a1-06b7587fe442]
I0628 10:03:39.310672 1 garbagecollector.go:483] remove DeleteDependents finalizer for item [v1/Pod, namespace: namespaceabc, name: nameabc-757b9c656-77lmr, uid: 9e81c654-7386-11e8-87a1-06b7587fe442]
E0628 10:03:39.315051 1 garbagecollector.go:265] error syncing item &garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:v1.OwnerReference{APIVersion:"v1", Kind:"Pod", Name:"nameabc-757b9c656-77lmr", UID:"9e81c654-7386-11e8-87a1-06b7587fe442", Controller:(*bool)(nil), BlockOwnerDeletion:(*bool)(nil)}, Namespace:"namespaceabc"}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:1, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{}, deletingDependents:true, deletingDependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, beingDeleted:true, beingDeletedLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, virtual:false, virtualLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, owners:[]v1.OwnerReference{v1.OwnerReference{APIVersion:"extensions/v1beta1", Kind:"ReplicaSet", Name:"nameabc-757b9c656", UID:"95da59b1-286d-11e8-b3a8-02e226510ede", Controller:(*bool)(0xc42b654a1a), BlockOwnerDeletion:(*bool)(0xc42b654a1b)}}}: Pod "nameabc-757b9c656-77lmr" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)
The replicaset (or the deployment) does not exist anymore BTW.
/cc @yliaog @caesarxuchao
when you manage to remove the deployment and replica-set, what DeleteOption did you specify? Did you cascading = true? (https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/)
you mentioned "the kubelet on that node where the Pod / container used to run was actively sending DELETE requests to the API with no luck", do you know what DeleteOption does kubelet send?
@yliaog I did not specify the cascading, so "true".
Could there be a problem if the parent is gone before the dependent? Like a pod which references a replicaset do being deleted because the replicaset that started this pod is already gone? Same would be true for deployment -> replicaset.
Or to say it differently: Would the GC still delete a pod which owner (ReplicaSet) is already gone?
that depends on the DeleteOption setting, if you explicitly set it to "Orphan", then GC would not delete it.
Is the pod nameabc-757b9c656-77lmr the one you cannot delete or is it another pod?
Do you have dynamic admission webhooks in your system, especially ones that inject sidecar container to pods? Last time I saw "error syncing item" it's caused by incompatible admission webhook and built-in admission controller.
Cannot open the pod_forever_Terminating.json.txt, could you paste it here if it's not too long?
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "name-566ccd445b-bcm8f",
"generateName": "nameabc-566ccd445b-",
"namespace": "namespaceabc",
"selfLink": "/api/v1/namespaces/namespaceabc/pods/nameabc-566ccd445b-bcm8f",
"uid": "80d7bd47-6e45-11e8-a560-028e0a1f08a4",
"resourceVersion": "47640604",
"creationTimestamp": "2018-06-12T13:35:42Z",
"deletionTimestamp": "2018-06-28T05:31:10Z",
"deletionGracePeriodSeconds": 0,
"labels": {
"pod-template-hash": "1227780016",
"run": "nameabc"
},
"annotations": {
"kubernetes.io/limit-ranger": "LimitRanger plugin set: memory request for container nameabc; memory limit for container nameabc"
},
"ownerReferences": [
{
"apiVersion": "extensions/v1beta1",
"kind": "ReplicaSet",
"name": "nameabc-566ccd445b",
"uid": "4a5344e7-5dcb-11e8-b980-0a7a97b25f0a",
"controller": true,
"blockOwnerDeletion": true
}
],
"finalizers": [
"foregroundDeletion"
]
},
"spec": {
"volumes": [
{
"name": "default-token-bk8pn",
"secret": {
"secretName": "default-token-bk8pn",
"defaultMode": 420
}
}
],
"containers": [
{
"name": "nameabc",
"image": "postgres",
"args": [
"bash"
],
"resources": {
"limits": {
"memory": "2Gi"
},
"requests": {
"memory": "512Mi"
}
},
"volumeMounts": [
{
"name": "default-token-bk8pn",
"readOnly": true,
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
}
],
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"imagePullPolicy": "Always",
"stdin": true,
"tty": true
}
],
"restartPolicy": "Always",
"terminationGracePeriodSeconds": 30,
"dnsPolicy": "ClusterFirst",
"serviceAccountName": "default",
"serviceAccount": "default",
"nodeName": "ip-1-2-3-4.eu-central-1.compute.internal",
"securityContext": {
},
"imagePullSecrets": [
{
"name": "ci-odm-readonly"
}
],
"schedulerName": "default-scheduler",
"tolerations": [
{
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"effect": "NoExecute",
"tolerationSeconds": 300
},
{
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"effect": "NoExecute",
"tolerationSeconds": 300
}
]
},
"status": {
"phase": "Running",
"conditions": [
{
"type": "Initialized",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2018-06-12T13:35:42Z"
},
{
"type": "Ready",
"status": "False",
"lastProbeTime": null,
"lastTransitionTime": "2018-06-20T10:24:47Z",
"reason": "ContainersNotReady",
"message": "containers with unready status: [nameabc]"
},
{
"type": "PodScheduled",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2018-06-12T13:35:42Z"
}
],
"hostIP": "1.2.3.4",
"podIP": "100.116.0.26",
"startTime": "2018-06-12T13:35:42Z",
"containerStatuses": [
{
"name": "nameabc",
"state": {
"terminated": {
"exitCode": 0,
"startedAt": null,
"finishedAt": null
}
},
"lastState": {
"terminated": {
"exitCode": 0,
"reason": "Completed",
"startedAt": "2018-06-17T03:33:07Z",
"finishedAt": "2018-06-18T18:34:31Z",
"containerID": "docker://9a9363d2692742172044ffcb941b12fa1bea1f478db2d181b7a1fb57b65e6aa5"
}
},
"ready": false,
"restartCount": 2,
"image": "postgres:latest",
"imageID": "docker-pullable://postgres@sha256:d9c44f9fc460dd8962c388eacf88a0e252b858ccdf33bc223f68112617e81fc9",
"containerID": "docker://5e9844ad66b047c2aad649956485740cd4accdc36371321566b8321cc6e379c8"
}
],
"qosClass": "Burstable"
}
}
Same here: #65936
@frittentheke the error message you pasted in #65569 (comment) is the cause of the problem. Unfortunately the message didn't say what field of the Pod was mutated.
Could you check the apiserver log to see if it contained more detail?
This is a long shot, try to disable the limit range admission plugin, see if that fixes the problem?
@caesarxuchao bullseye!
I just removed the LimitRanger and all pods stuck in Terminating were gone immediately after.
Could you describe this procedure a little bit more, please?
@jomeier I simply removed LimitRanger from the list of admission controllers to be loaded and restarted the API server ... see https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers for how that's done.
@caesarxuchao is the PR, #65987, which @klausenbusk referenced the bugfix for this, or just something similar?
@frittentheke, I'm curious, did you configure kube-reserved (and system-reserved) on the node where the pod lived to avoid oom-killer from being invoked? See: https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/
We have this occurring occasionally and are able to delete the stuck pods with:
kubectl delete pod iamastuckpod-655c7947c9-pgzj2 -n namespace --force --grace-period=0
We experience the same problem. kubectl delete --force --grace-period=0
does not work.
Deleting through REST API and {"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"}
indeed helps.
Closed #65569.
@sadovnikov are you certain you are running a Kubernetes version that includes
#62673 already? That would mean the bug is either not fixed, or there is another issue.
Reopened #65569.
@frittentheke we are on v1.9.81
, which does include #62673. What we found out this morning is that this bug can be seen only when pod includes a sidecar from Istio 1.0.2 or 1.0.3. Without it, the pods are being removed without any problems
Closed #65569.
/reopen
i also encounter this issue in kubernetes v1.18.x
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
@runzexia: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
i also encounter this issue in kubernetes v1.18.x
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
—
@runzexia We need the kind of detail in the OP or this comment before we can help.
@lavalamp
Not all pods in this cluster have this situation.
But the pod corresponding to this deployment cannot be deleted by the foreground
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/podIP: 100.96.8.196/32
cni.projectcalico.org/podIPs: 100.96.8.196/32
kubernetes.io/psp: gardener.privileged
mutated: "true"
mutator: role-permission-sidecar-mutator
creationTimestamp: "2020-08-14T04:31:20Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2020-08-17T03:01:04Z"
finalizers:
- foregroundDeletion
generateName: webhook-test-79fbccfb66-
labels:
app: webhook-test
pod-template-hash: 79fbccfb66
release: eureka-e2e-dev1-eureka-webhook-test
role: web
role-permission-sidecar-injected: "true"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:generateName: {}
f:labels:
.: {}
f:app: {}
f:pod-template-hash: {}
f:release: {}
f:role: {}
f:role-permission-sidecar-injected: {}
f:ownerReferences:
.: {}
k:{"uid":"78f279a6-32cf-4531-8a7c-d84d5748d62e"}:
.: {}
f:apiVersion: {}
f:blockOwnerDeletion: {}
f:controller: {}
f:kind: {}
f:name: {}
f:uid: {}
f:spec:
f:containers:
k:{"name":"webhook-test"}:
.: {}
f:env:
.: {}
k:{"name":"MY_NAMESPACE"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:fieldRef:
.: {}
f:apiVersion: {}
f:fieldPath: {}
k:{"name":"ds_maximum_pool_size"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"jwt_enable"}:
.: {}
f:name: {}
f:value: {}
f:image: {}
f:imagePullPolicy: {}
f:livenessProbe:
.: {}
f:failureThreshold: {}
f:httpGet:
.: {}
f:path: {}
f:port: {}
f:scheme: {}
f:initialDelaySeconds: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:name: {}
f:ports:
.: {}
k:{"containerPort":8080,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:protocol: {}
f:readinessProbe:
.: {}
f:failureThreshold: {}
f:httpGet:
.: {}
f:path: {}
f:port: {}
f:scheme: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:resources:
.: {}
f:limits:
.: {}
f:cpu: {}
f:memory: {}
f:requests:
.: {}
f:cpu: {}
f:memory: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/etc/secrets/kafka"}:
.: {}
f:mountPath: {}
f:name: {}
k:{"mountPath":"/etc/secrets/postgres"}:
.: {}
f:mountPath: {}
f:name: {}
f:dnsPolicy: {}
f:enableServiceLinks: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:terminationGracePeriodSeconds: {}
f:volumes:
.: {}
k:{"name":"kafka"}:
.: {}
f:name: {}
f:secret:
.: {}
f:defaultMode: {}
f:secretName: {}
k:{"name":"postgres"}:
.: {}
f:name: {}
f:secret:
.: {}
f:defaultMode: {}
f:secretName: {}
manager: kube-controller-manager
operation: Update
time: "2020-08-14T04:31:19Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:cni.projectcalico.org/podIP: {}
f:cni.projectcalico.org/podIPs: {}
manager: calico
operation: Update
time: "2020-08-14T04:31:22Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
k:{"type":"ContainersReady"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"Initialized"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"Ready"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
f:containerStatuses: {}
f:hostIP: {}
f:phase: {}
f:startTime: {}
manager: kubelet
operation: Update
time: "2020-08-17T03:01:07Z"
name: webhook-test-79fbccfb66-6kb7f
namespace: dev1
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: webhook-test-79fbccfb66
uid: 78f279a6-32cf-4531-8a7c-d84d5748d62e
resourceVersion: "22998876"
selfLink: /api/v1/namespaces/dev1/pods/webhook-test-79fbccfb66-6kb7f
uid: 41f8c3d1-ce94-4754-9041-251ac7d30f16
spec:
containers:
- env:
- name: MY_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: ds_maximum_pool_size
value: "5"
- name: jwt_enable
value: "true"
image: harbor.eurekacloud.io/eureka/webhook-test:cfc2dfac5bdb5832a2278a2be6d5d44a56017626
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
initialDelaySeconds: 180
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: webhook-test
ports:
- containerPort: 8080
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 500m
memory: 2Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/secrets/postgres
name: postgres
- mountPath: /etc/secrets/kafka
name: kafka
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-kphfg
readOnly: true
- image: harbor.eurekacloud.io/eureka/role-permission-sidecar:v1
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 80
scheme: HTTP
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 10
name: role-permission-sidecar
ports:
- containerPort: 80
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 80
scheme: HTTP
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 10
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: role-permission-sidecar-token-8wzn7
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
- name: image-pull-secret
nodeName: shoot--eureka--e2e-worker-exhnf-z1-6d66c49b9f-8k9cb
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: role-permission-sidecar
serviceAccountName: role-permission-sidecar
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: postgres
secret:
defaultMode: 420
secretName: webhook-test-postgres
- name: kafka
secret:
defaultMode: 420
secretName: webhook-test-kafka
- name: default-token-kphfg
secret:
defaultMode: 420
secretName: default-token-kphfg
- name: role-permission-sidecar-token-8wzn7
secret:
defaultMode: 420
secretName: role-permission-sidecar-token-8wzn7
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-08-14T04:31:20Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-08-17T03:01:07Z"
message: 'containers with unready status: [webhook-test role-permission-sidecar]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-08-17T03:01:07Z"
message: 'containers with unready status: [webhook-test role-permission-sidecar]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-08-14T04:31:20Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: harbor.eurekacloud.io/eureka/role-permission-sidecar:v1
imageID: ""
lastState: {}
name: role-permission-sidecar
ready: false
restartCount: 0
started: false
state:
waiting:
reason: ContainerCreating
- image: harbor.eurekacloud.io/eureka/webhook-test:cfc2dfac5bdb5832a2278a2be6d5d44a56017626
imageID: ""
lastState: {}
name: webhook-test
ready: false
restartCount: 0
started: false
state:
waiting:
reason: ContainerCreating
hostIP: 10.250.0.27
phase: Pending
qosClass: Burstable
startTime: "2020-08-14T04:31:20Z"
➜ ~ kubectl describe po -n dev1 webhook-test-79fbccfb66-6kb7f
Name: webhook-test-79fbccfb66-6kb7f
Namespace: dev1
Priority: 0
Node: shoot--eureka--e2e-worker-exhnf-z1-6d66c49b9f-8k9cb/10.250.0.27
Start Time: Fri, 14 Aug 2020 12:31:20 +0800
Labels: app=webhook-test
pod-template-hash=79fbccfb66
release=eureka-e2e-dev1-eureka-webhook-test
role=web
role-permission-sidecar-injected=true
Annotations: cni.projectcalico.org/podIP: 100.96.8.196/32
cni.projectcalico.org/podIPs: 100.96.8.196/32
kubernetes.io/psp: gardener.privileged
mutated: true
mutator: role-permission-sidecar-mutator
Status: Terminating (lasts 3m5s)
Termination Grace Period: 0s
IP:
IPs: <none>
Controlled By: ReplicaSet/webhook-test-79fbccfb66
Containers:
webhook-test:
Container ID:
Image: harbor.eurekacloud.io/eureka/webhook-test:cfc2dfac5bdb5832a2278a2be6d5d44a56017626
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 500m
memory: 2Gi
Liveness: http-get http://:8080/actuator/health delay=180s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8080/actuator/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
MY_NAMESPACE: dev1 (v1:metadata.namespace)
ds_maximum_pool_size: 5
jwt_enable: true
Mounts:
/etc/secrets/kafka from kafka (rw)
/etc/secrets/postgres from postgres (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kphfg (ro)
role-permission-sidecar:
Container ID:
Image: harbor.eurekacloud.io/eureka/role-permission-sidecar:v1
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:80/health delay=0s timeout=10s period=30s #success=1 #failure=3
Readiness: http-get http://:80/health delay=0s timeout=10s period=30s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from role-permission-sidecar-token-8wzn7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
postgres:
Type: Secret (a volume populated by a Secret)
SecretName: webhook-test-postgres
Optional: false
kafka:
Type: Secret (a volume populated by a Secret)
SecretName: webhook-test-kafka
Optional: false
default-token-kphfg:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-kphfg
Optional: false
role-permission-sidecar-token-8wzn7:
Type: Secret (a volume populated by a Secret)
SecretName: role-permission-sidecar-token-8wzn7
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 3m5s kubelet, shoot--eureka--e2e-worker-exhnf-z1-6d66c49b9f-8k9cb Stopping container webhook-test
Normal Killing 3m5s kubelet, shoot--eureka--e2e-worker-exhnf-z1-6d66c49b9f-8k9cb Stopping container role-permission-sidecar
Warning Unhealthy 3m3s kubelet, shoot--eureka--e2e-worker-exhnf-z1-6d66c49b9f-8k9cb Readiness probe failed: Get http://100.96.8.196:8080/actuator/health: dial tcp 100.96.8.196:8080: connect: invalid argument
@runzexia Do you have a broken aggregated apiserver? e.g. does kubectl api-resources
report any errors?
@caesarxuchao how does one query the GC for children of a given resource?
@lavalamp kubectl api-resources
does not show errors.
Only the pod belonging to a deployement will have such a problem.
The binary program of this pod is built using graalvm.
we're encountering this issue on 1.18 on our test cluster, our review and prod clusters running 1.17 (but same configs, workloads) have no such issues at all.
I think we will need more detail on the child objects. Foreground deletion means the GC will try to delete all child objects first. Do you know why the pod already has a foreground deletion finalizer on it?
We experience this issue to on K8s 1.16 on AWS EKS. For us it looks like mostly only pods with a PVC are affected. Other pods are delete correctly. If we force kill the pods kubectl delete <podname> --grace-period 0 --force
this leads to ghost docker containers still running and serving requests on the nodes. Only docker force delete helps to get rid of them (or recreating the node).
same issue here on 1.18. Need to check are only pods with pvc affected or not.
it looks like mostly only pods with a PVC are affected
If you can post complete metadata (at least we need to see the ownerrefs and finalizers) for pod and PVC, that might give a clue. But I fear we'll need the complete subgraph of owners, and I don't think there's an easy way to produce this.
Still having the same issue in:
kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:12:29Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Pod stuck and doesn't terminate by itself. Will try to kill it forcibly. It was a redis-slave pod.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.