Re: [kubernetes/kubernetes] Some pods stuck in Terminating state and can only be removed if deleting with propagationPolicy=Background (#65569)

414 views
Skip to first unread message

k8s-ci-robot

unread,
Jun 28, 2018, 5:21:50 AM6/28/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@frittentheke: Reiterating the mentions to trigger a notification:
@kubernetes/sig-api-machinery-bugs

In response to this:

@kubernetes/sig-api-machinery-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Christian Rohmann

unread,
Jun 28, 2018, 6:11:33 AM6/28/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

I enabled log level 5 to see more details about the garbage collection and apparently there seems to be an / the issue:

I0628 10:03:39.308749       1 garbagecollector.go:364] processing item [v1/Pod, namespace: namespaceabc, name: nameabc-757b9c656-77lmr, uid: 9e81c654-7386-11e8-87a1-06b7587fe442]
I0628 10:03:39.310672       1 garbagecollector.go:483] remove DeleteDependents finalizer for item [v1/Pod, namespace: namespaceabc, name: nameabc-757b9c656-77lmr, uid: 9e81c654-7386-11e8-87a1-06b7587fe442]
E0628 10:03:39.315051       1 garbagecollector.go:265] error syncing item &garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:v1.OwnerReference{APIVersion:"v1", Kind:"Pod", Name:"nameabc-757b9c656-77lmr", UID:"9e81c654-7386-11e8-87a1-06b7587fe442", Controller:(*bool)(nil), BlockOwnerDeletion:(*bool)(nil)}, Namespace:"namespaceabc"}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:1, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{}, deletingDependents:true, deletingDependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, beingDeleted:true, beingDeletedLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, virtual:false, virtualLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, owners:[]v1.OwnerReference{v1.OwnerReference{APIVersion:"extensions/v1beta1", Kind:"ReplicaSet", Name:"nameabc-757b9c656", UID:"95da59b1-286d-11e8-b3a8-02e226510ede", Controller:(*bool)(0xc42b654a1a), BlockOwnerDeletion:(*bool)(0xc42b654a1b)}}}: Pod "nameabc-757b9c656-77lmr" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)

The replicaset (or the deployment) does not exist anymore BTW.

Federico Bongiovanni

unread,
Jun 28, 2018, 4:20:07 PM6/28/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Yu Liao

unread,
Jun 28, 2018, 6:36:51 PM6/28/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

when you manage to remove the deployment and replica-set, what DeleteOption did you specify? Did you cascading = true? (https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/)

you mentioned "the kubelet on that node where the Pod / container used to run was actively sending DELETE requests to the API with no luck", do you know what DeleteOption does kubelet send?

Christian Rohmann

unread,
Jun 29, 2018, 6:52:23 AM6/29/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@yliaog I did not specify the cascading, so "true".

Could there be a problem if the parent is gone before the dependent? Like a pod which references a replicaset do being deleted because the replicaset that started this pod is already gone? Same would be true for deployment -> replicaset.

Or to say it differently: Would the GC still delete a pod which owner (ReplicaSet) is already gone?

Yu Liao

unread,
Jun 29, 2018, 2:06:30 PM6/29/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

that depends on the DeleteOption setting, if you explicitly set it to "Orphan", then GC would not delete it.

Chao Xu

unread,
Jun 29, 2018, 2:10:42 PM6/29/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Is the pod nameabc-757b9c656-77lmr the one you cannot delete or is it another pod?

Do you have dynamic admission webhooks in your system, especially ones that inject sidecar container to pods? Last time I saw "error syncing item" it's caused by incompatible admission webhook and built-in admission controller.

Chao Xu

unread,
Jun 29, 2018, 2:23:15 PM6/29/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Cannot open the pod_forever_Terminating.json.txt, could you paste it here if it's not too long?

Christian Rohmann

unread,
Jul 4, 2018, 3:43:55 AM7/4/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@caesarxuchao

  1. Yes, nameabc-757b9c656-77lmr is the pod still stuck in "Terminating"
  2. There are NO dynamic admission webhooks.
  3. Here you go with the JSON:
{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "name-566ccd445b-bcm8f",
    "generateName": "nameabc-566ccd445b-",
    "namespace": "namespaceabc",
    "selfLink": "/api/v1/namespaces/namespaceabc/pods/nameabc-566ccd445b-bcm8f",
    "uid": "80d7bd47-6e45-11e8-a560-028e0a1f08a4",
    "resourceVersion": "47640604",
    "creationTimestamp": "2018-06-12T13:35:42Z",
    "deletionTimestamp": "2018-06-28T05:31:10Z",
    "deletionGracePeriodSeconds": 0,
    "labels": {
      "pod-template-hash": "1227780016",
      "run": "nameabc"
    },
    "annotations": {
      "kubernetes.io/limit-ranger": "LimitRanger plugin set: memory request for container nameabc; memory limit for container nameabc"
    },
    "ownerReferences": [
      {
        "apiVersion": "extensions/v1beta1",
        "kind": "ReplicaSet",
        "name": "nameabc-566ccd445b",
        "uid": "4a5344e7-5dcb-11e8-b980-0a7a97b25f0a",
        "controller": true,
        "blockOwnerDeletion": true
      }
    ],
    "finalizers": [
      "foregroundDeletion"
    ]
  },
  "spec": {
    "volumes": [
      {
        "name": "default-token-bk8pn",
        "secret": {
          "secretName": "default-token-bk8pn",
          "defaultMode": 420
        }
      }
    ],
    "containers": [
      {
        "name": "nameabc",
        "image": "postgres",
        "args": [
          "bash"
        ],
        "resources": {
          "limits": {
            "memory": "2Gi"
          },
          "requests": {
            "memory": "512Mi"
          }
        },
        "volumeMounts": [
          {
            "name": "default-token-bk8pn",
            "readOnly": true,
            "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
          }
        ],
        "terminationMessagePath": "/dev/termination-log",
        "terminationMessagePolicy": "File",
        "imagePullPolicy": "Always",
        "stdin": true,
        "tty": true
      }
    ],
    "restartPolicy": "Always",
    "terminationGracePeriodSeconds": 30,
    "dnsPolicy": "ClusterFirst",
    "serviceAccountName": "default",
    "serviceAccount": "default",
    "nodeName": "ip-1-2-3-4.eu-central-1.compute.internal",
    "securityContext": {
      
    },
    "imagePullSecrets": [
      {
        "name": "ci-odm-readonly"
      }
    ],
    "schedulerName": "default-scheduler",
    "tolerations": [
      {
        "key": "node.kubernetes.io/not-ready",
        "operator": "Exists",
        "effect": "NoExecute",
        "tolerationSeconds": 300
      },
      {
        "key": "node.kubernetes.io/unreachable",
        "operator": "Exists",
        "effect": "NoExecute",
        "tolerationSeconds": 300
      }
    ]
  },
  "status": {
    "phase": "Running",
    "conditions": [
      {
        "type": "Initialized",
        "status": "True",
        "lastProbeTime": null,
        "lastTransitionTime": "2018-06-12T13:35:42Z"
      },
      {
        "type": "Ready",
        "status": "False",
        "lastProbeTime": null,
        "lastTransitionTime": "2018-06-20T10:24:47Z",
        "reason": "ContainersNotReady",
        "message": "containers with unready status: [nameabc]"
      },
      {
        "type": "PodScheduled",
        "status": "True",
        "lastProbeTime": null,
        "lastTransitionTime": "2018-06-12T13:35:42Z"
      }
    ],
    "hostIP": "1.2.3.4",
    "podIP": "100.116.0.26",
    "startTime": "2018-06-12T13:35:42Z",
    "containerStatuses": [
      {
        "name": "nameabc",
        "state": {
          "terminated": {
            "exitCode": 0,
            "startedAt": null,
            "finishedAt": null
          }
        },
        "lastState": {
          "terminated": {
            "exitCode": 0,
            "reason": "Completed",
            "startedAt": "2018-06-17T03:33:07Z",
            "finishedAt": "2018-06-18T18:34:31Z",
            "containerID": "docker://9a9363d2692742172044ffcb941b12fa1bea1f478db2d181b7a1fb57b65e6aa5"
          }
        },
        "ready": false,
        "restartCount": 2,
        "image": "postgres:latest",
        "imageID": "docker-pullable://postgres@sha256:d9c44f9fc460dd8962c388eacf88a0e252b858ccdf33bc223f68112617e81fc9",
        "containerID": "docker://5e9844ad66b047c2aad649956485740cd4accdc36371321566b8321cc6e379c8"
      }
    ],
    "qosClass": "Burstable"
  }
}

Josef Meier

unread,
Jul 7, 2018, 3:22:14 AM7/7/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Same here: #65936

Chao Xu

unread,
Jul 9, 2018, 4:49:05 PM7/9/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@frittentheke the error message you pasted in #65569 (comment) is the cause of the problem. Unfortunately the message didn't say what field of the Pod was mutated.

Could you check the apiserver log to see if it contained more detail?

This is a long shot, try to disable the limit range admission plugin, see if that fixes the problem?

Christian Rohmann

unread,
Jul 10, 2018, 8:57:50 AM7/10/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@caesarxuchao bullseye!

I just removed the LimitRanger and all pods stuck in Terminating were gone immediately after.

Josef Meier

unread,
Jul 10, 2018, 1:52:27 PM7/10/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Could you describe this procedure a little bit more, please?

Christian Rohmann

unread,
Jul 11, 2018, 6:16:47 PM7/11/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@jomeier I simply removed LimitRanger from the list of admission controllers to be loaded and restarted the API server ... see https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers for how that's done.

Christian Rohmann

unread,
Jul 12, 2018, 3:53:44 AM7/12/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@caesarxuchao is the PR, #65987, which @klausenbusk referenced the bugfix for this, or just something similar?

Dan Yocum

unread,
Jul 20, 2018, 6:08:25 PM7/20/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@frittentheke, I'm curious, did you configure kube-reserved (and system-reserved) on the node where the pod lived to avoid oom-killer from being invoked? See: https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/

Tim Meade

unread,
Sep 28, 2018, 7:55:59 AM9/28/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

We have this occurring occasionally and are able to delete the stuck pods with:

kubectl delete pod iamastuckpod-655c7947c9-pgzj2 -n namespace --force --grace-period=0

Chao Xu

unread,
Sep 28, 2018, 1:08:44 PM9/28/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

The bug fixed by #65987 didn't cause this problem. #62673 probably is fixing the root cause.

The problem @TimMeade met is a different one. In the original bug, kubectl delete --force --grace-period=0 won't work.

Viktor Sadovnikov

unread,
Dec 12, 2018, 10:05:37 AM12/12/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

We experience the same problem. kubectl delete --force --grace-period=0 does not work.
Deleting through REST API and {"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"} indeed helps.

Christian Rohmann

unread,
Dec 13, 2018, 2:14:23 AM12/13/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Closed #65569.

Christian Rohmann

unread,
Dec 13, 2018, 2:14:27 AM12/13/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@sadovnikov are you certain you are running a Kubernetes version that includes
#62673 already? That would mean the bug is either not fixed, or there is another issue.

Christian Rohmann

unread,
Dec 13, 2018, 2:14:32 AM12/13/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Reopened #65569.

Viktor Sadovnikov

unread,
Dec 13, 2018, 7:22:26 AM12/13/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@frittentheke we are on v1.9.81, which does include #62673. What we found out this morning is that this bug can be seen only when pod includes a sidecar from Istio 1.0.2 or 1.0.3. Without it, the pods are being removed without any problems

Christian Rohmann

unread,
Dec 13, 2018, 5:05:09 PM12/13/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Closed #65569.

runzexia

unread,
Aug 14, 2020, 12:57:51 AM8/14/20
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/reopen
i also encounter this issue in kubernetes v1.18.x


You are receiving this because you are on a team that was mentioned.

Reply to this email directly, view it on GitHub, or unsubscribe.

Kubernetes Prow Robot

unread,
Aug 14, 2020, 12:58:05 AM8/14/20
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@runzexia: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen
i also encounter this issue in kubernetes v1.18.x

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Daniel Smith

unread,
Aug 14, 2020, 12:39:38 PM8/14/20
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@runzexia We need the kind of detail in the OP or this comment before we can help.

runzexia

unread,
Aug 16, 2020, 11:06:59 PM8/16/20
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@lavalamp
Not all pods in this cluster have this situation.
But the pod corresponding to this deployment cannot be deleted by the foreground

apiVersion: v1

kind: Pod

metadata:

  annotations:

    cni.projectcalico.org/podIP: 100.96.8.196/32

    cni.projectcalico.org/podIPs: 100.96.8.196/32

    kubernetes.io/psp: gardener.privileged

    mutated: "true"

    mutator: role-permission-sidecar-mutator

  creationTimestamp: "2020-08-14T04:31:20Z"

  deletionGracePeriodSeconds: 0

  deletionTimestamp: "2020-08-17T03:01:04Z"

  finalizers:

  - foregroundDeletion

  generateName: webhook-test-79fbccfb66-

  labels:

    app: webhook-test

    pod-template-hash: 79fbccfb66

    release: eureka-e2e-dev1-eureka-webhook-test

    role: web

    role-permission-sidecar-injected: "true"

  managedFields:

  - apiVersion: v1

    fieldsType: FieldsV1

    fieldsV1:

      f:metadata:

        f:generateName: {}

        f:labels:

          .: {}

          f:app: {}

          f:pod-template-hash: {}

          f:release: {}

          f:role: {}

          f:role-permission-sidecar-injected: {}

        f:ownerReferences:

          .: {}

          k:{"uid":"78f279a6-32cf-4531-8a7c-d84d5748d62e"}:

            .: {}

            f:apiVersion: {}

            f:blockOwnerDeletion: {}

            f:controller: {}

            f:kind: {}

            f:name: {}

            f:uid: {}

      f:spec:

        f:containers:

          k:{"name":"webhook-test"}:

            .: {}

            f:env:

              .: {}

              k:{"name":"MY_NAMESPACE"}:

                .: {}

                f:name: {}

                f:valueFrom:

                  .: {}

                  f:fieldRef:

                    .: {}

                    f:apiVersion: {}

                    f:fieldPath: {}

              k:{"name":"ds_maximum_pool_size"}:

                .: {}

                f:name: {}

                f:value: {}

              k:{"name":"jwt_enable"}:

                .: {}

                f:name: {}

                f:value: {}

            f:image: {}

            f:imagePullPolicy: {}

            f:livenessProbe:

              .: {}

              f:failureThreshold: {}

              f:httpGet:

                .: {}

                f:path: {}

                f:port: {}

                f:scheme: {}

              f:initialDelaySeconds: {}

              f:periodSeconds: {}

              f:successThreshold: {}

              f:timeoutSeconds: {}

            f:name: {}

            f:ports:

              .: {}

              k:{"containerPort":8080,"protocol":"TCP"}:

                .: {}

                f:containerPort: {}

                f:protocol: {}

            f:readinessProbe:

              .: {}

              f:failureThreshold: {}

              f:httpGet:

                .: {}

                f:path: {}

                f:port: {}

                f:scheme: {}

              f:periodSeconds: {}

              f:successThreshold: {}

              f:timeoutSeconds: {}

            f:resources:

              .: {}

              f:limits:

                .: {}

                f:cpu: {}

                f:memory: {}

              f:requests:

                .: {}

                f:cpu: {}

                f:memory: {}

            f:terminationMessagePath: {}

            f:terminationMessagePolicy: {}

            f:volumeMounts:

              .: {}

              k:{"mountPath":"/etc/secrets/kafka"}:

                .: {}

                f:mountPath: {}

                f:name: {}

              k:{"mountPath":"/etc/secrets/postgres"}:

                .: {}

                f:mountPath: {}

                f:name: {}

        f:dnsPolicy: {}

        f:enableServiceLinks: {}

        f:restartPolicy: {}

        f:schedulerName: {}

        f:securityContext: {}

        f:terminationGracePeriodSeconds: {}

        f:volumes:

          .: {}

          k:{"name":"kafka"}:

            .: {}

            f:name: {}

            f:secret:

              .: {}

              f:defaultMode: {}

              f:secretName: {}

          k:{"name":"postgres"}:

            .: {}

            f:name: {}

            f:secret:

              .: {}

              f:defaultMode: {}

              f:secretName: {}

    manager: kube-controller-manager

    operation: Update

    time: "2020-08-14T04:31:19Z"

  - apiVersion: v1

    fieldsType: FieldsV1

    fieldsV1:

      f:metadata:

        f:annotations:

          f:cni.projectcalico.org/podIP: {}

          f:cni.projectcalico.org/podIPs: {}

    manager: calico

    operation: Update

    time: "2020-08-14T04:31:22Z"

  - apiVersion: v1

    fieldsType: FieldsV1

    fieldsV1:

      f:status:

        f:conditions:

          k:{"type":"ContainersReady"}:

            .: {}

            f:lastProbeTime: {}

            f:lastTransitionTime: {}

            f:message: {}

            f:reason: {}

            f:status: {}

            f:type: {}

          k:{"type":"Initialized"}:

            .: {}

            f:lastProbeTime: {}

            f:lastTransitionTime: {}

            f:status: {}

            f:type: {}

          k:{"type":"Ready"}:

            .: {}

            f:lastProbeTime: {}

            f:lastTransitionTime: {}

            f:message: {}

            f:reason: {}

            f:status: {}

            f:type: {}

        f:containerStatuses: {}

        f:hostIP: {}

        f:phase: {}

        f:startTime: {}

    manager: kubelet

    operation: Update

    time: "2020-08-17T03:01:07Z"

  name: webhook-test-79fbccfb66-6kb7f

  namespace: dev1

  ownerReferences:

  - apiVersion: apps/v1

    blockOwnerDeletion: true

    controller: true

    kind: ReplicaSet

    name: webhook-test-79fbccfb66

    uid: 78f279a6-32cf-4531-8a7c-d84d5748d62e

  resourceVersion: "22998876"

  selfLink: /api/v1/namespaces/dev1/pods/webhook-test-79fbccfb66-6kb7f

  uid: 41f8c3d1-ce94-4754-9041-251ac7d30f16

spec:

  containers:

  - env:

    - name: MY_NAMESPACE

      valueFrom:

        fieldRef:

          apiVersion: v1

          fieldPath: metadata.namespace

    - name: ds_maximum_pool_size

      value: "5"

    - name: jwt_enable

      value: "true"

    image: harbor.eurekacloud.io/eureka/webhook-test:cfc2dfac5bdb5832a2278a2be6d5d44a56017626

    imagePullPolicy: IfNotPresent

    livenessProbe:

      failureThreshold: 3

      httpGet:

        path: /actuator/health

        port: 8080

        scheme: HTTP

      initialDelaySeconds: 180

      periodSeconds: 10

      successThreshold: 1

      timeoutSeconds: 1

    name: webhook-test

    ports:

    - containerPort: 8080

      protocol: TCP

    readinessProbe:

      failureThreshold: 3

      httpGet:

        path: /actuator/health

        port: 8080

        scheme: HTTP

      periodSeconds: 10

      successThreshold: 1

      timeoutSeconds: 1

    resources:

      limits:

        cpu: "1"

        memory: 2Gi

      requests:

        cpu: 500m

        memory: 2Gi

    terminationMessagePath: /dev/termination-log

    terminationMessagePolicy: File

    volumeMounts:

    - mountPath: /etc/secrets/postgres

      name: postgres

    - mountPath: /etc/secrets/kafka

      name: kafka

    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount

      name: default-token-kphfg

      readOnly: true

  - image: harbor.eurekacloud.io/eureka/role-permission-sidecar:v1

    imagePullPolicy: Always

    livenessProbe:

      failureThreshold: 3

      httpGet:

        path: /health

        port: 80

        scheme: HTTP

      periodSeconds: 30

      successThreshold: 1

      timeoutSeconds: 10

    name: role-permission-sidecar

    ports:

    - containerPort: 80

      name: http

      protocol: TCP

    readinessProbe:

      failureThreshold: 3

      httpGet:

        path: /health

        port: 80

        scheme: HTTP

      periodSeconds: 30

      successThreshold: 1

      timeoutSeconds: 10

    resources: {}

    terminationMessagePath: /dev/termination-log

    terminationMessagePolicy: File

    volumeMounts:

    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount

      name: role-permission-sidecar-token-8wzn7

      readOnly: true

  dnsPolicy: ClusterFirst

  enableServiceLinks: true

  imagePullSecrets:

  - name: image-pull-secret

  nodeName: shoot--eureka--e2e-worker-exhnf-z1-6d66c49b9f-8k9cb

  priority: 0

  restartPolicy: Always

  schedulerName: default-scheduler

  securityContext: {}

  serviceAccount: role-permission-sidecar

  serviceAccountName: role-permission-sidecar

  terminationGracePeriodSeconds: 30

  tolerations:

  - effect: NoExecute

    key: node.kubernetes.io/not-ready

    operator: Exists

    tolerationSeconds: 300

  - effect: NoExecute

    key: node.kubernetes.io/unreachable

    operator: Exists

    tolerationSeconds: 300

  volumes:

  - name: postgres

    secret:

      defaultMode: 420

      secretName: webhook-test-postgres

  - name: kafka

    secret:

      defaultMode: 420

      secretName: webhook-test-kafka

  - name: default-token-kphfg

    secret:

      defaultMode: 420

      secretName: default-token-kphfg

  - name: role-permission-sidecar-token-8wzn7

    secret:

      defaultMode: 420

      secretName: role-permission-sidecar-token-8wzn7

status:

  conditions:

  - lastProbeTime: null

    lastTransitionTime: "2020-08-14T04:31:20Z"

    status: "True"

    type: Initialized

  - lastProbeTime: null

    lastTransitionTime: "2020-08-17T03:01:07Z"

    message: 'containers with unready status: [webhook-test role-permission-sidecar]'

    reason: ContainersNotReady

    status: "False"

    type: Ready

  - lastProbeTime: null

    lastTransitionTime: "2020-08-17T03:01:07Z"

    message: 'containers with unready status: [webhook-test role-permission-sidecar]'

    reason: ContainersNotReady

    status: "False"

    type: ContainersReady

  - lastProbeTime: null

    lastTransitionTime: "2020-08-14T04:31:20Z"

    status: "True"

    type: PodScheduled

  containerStatuses:

  - image: harbor.eurekacloud.io/eureka/role-permission-sidecar:v1

    imageID: ""

    lastState: {}

    name: role-permission-sidecar

    ready: false

    restartCount: 0

    started: false

    state:

      waiting:

        reason: ContainerCreating

  - image: harbor.eurekacloud.io/eureka/webhook-test:cfc2dfac5bdb5832a2278a2be6d5d44a56017626

    imageID: ""

    lastState: {}

    name: webhook-test

    ready: false

    restartCount: 0

    started: false

    state:

      waiting:

        reason: ContainerCreating

  hostIP: 10.250.0.27

  phase: Pending

  qosClass: Burstable

  startTime: "2020-08-14T04:31:20Z"

➜  ~ kubectl describe po -n dev1 webhook-test-79fbccfb66-6kb7f

Name:                      webhook-test-79fbccfb66-6kb7f

Namespace:                 dev1

Priority:                  0

Node:                      shoot--eureka--e2e-worker-exhnf-z1-6d66c49b9f-8k9cb/10.250.0.27

Start Time:                Fri, 14 Aug 2020 12:31:20 +0800

Labels:                    app=webhook-test

                           pod-template-hash=79fbccfb66

                           release=eureka-e2e-dev1-eureka-webhook-test

                           role=web

                           role-permission-sidecar-injected=true

Annotations:               cni.projectcalico.org/podIP: 100.96.8.196/32

                           cni.projectcalico.org/podIPs: 100.96.8.196/32

                           kubernetes.io/psp: gardener.privileged

                           mutated: true

                           mutator: role-permission-sidecar-mutator

Status:                    Terminating (lasts 3m5s)

Termination Grace Period:  0s

IP:

IPs:                       <none>

Controlled By:             ReplicaSet/webhook-test-79fbccfb66

Containers:

  webhook-test:

    Container ID:

    Image:          harbor.eurekacloud.io/eureka/webhook-test:cfc2dfac5bdb5832a2278a2be6d5d44a56017626

    Image ID:

    Port:           8080/TCP

    Host Port:      0/TCP

    State:          Waiting

      Reason:       ContainerCreating

    Ready:          False

    Restart Count:  0

    Limits:

      cpu:     1

      memory:  2Gi

    Requests:

      cpu:      500m

      memory:   2Gi

    Liveness:   http-get http://:8080/actuator/health delay=180s timeout=1s period=10s #success=1 #failure=3

    Readiness:  http-get http://:8080/actuator/health delay=0s timeout=1s period=10s #success=1 #failure=3

    Environment:

      MY_NAMESPACE:          dev1 (v1:metadata.namespace)

      ds_maximum_pool_size:  5

      jwt_enable:            true

    Mounts:

      /etc/secrets/kafka from kafka (rw)

      /etc/secrets/postgres from postgres (rw)

      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kphfg (ro)

  role-permission-sidecar:

    Container ID:

    Image:          harbor.eurekacloud.io/eureka/role-permission-sidecar:v1

    Image ID:

    Port:           80/TCP

    Host Port:      0/TCP

    State:          Waiting

      Reason:       ContainerCreating

    Ready:          False

    Restart Count:  0

    Liveness:       http-get http://:80/health delay=0s timeout=10s period=30s #success=1 #failure=3

    Readiness:      http-get http://:80/health delay=0s timeout=10s period=30s #success=1 #failure=3

    Environment:    <none>

    Mounts:

      /var/run/secrets/kubernetes.io/serviceaccount from role-permission-sidecar-token-8wzn7 (ro)

Conditions:

  Type              Status

  Initialized       True

  Ready             False

  ContainersReady   False

  PodScheduled      True

Volumes:

  postgres:

    Type:        Secret (a volume populated by a Secret)

    SecretName:  webhook-test-postgres

    Optional:    false

  kafka:

    Type:        Secret (a volume populated by a Secret)

    SecretName:  webhook-test-kafka

    Optional:    false

  default-token-kphfg:

    Type:        Secret (a volume populated by a Secret)

    SecretName:  default-token-kphfg

    Optional:    false

  role-permission-sidecar-token-8wzn7:

    Type:        Secret (a volume populated by a Secret)

    SecretName:  role-permission-sidecar-token-8wzn7

    Optional:    false

QoS Class:       Burstable

Node-Selectors:  <none>

Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s

                 node.kubernetes.io/unreachable:NoExecute for 300s

Events:

  Type     Reason     Age   From                                                          Message

  ----     ------     ----  ----                                                          -------

  Normal   Killing    3m5s  kubelet, shoot--eureka--e2e-worker-exhnf-z1-6d66c49b9f-8k9cb  Stopping container webhook-test

  Normal   Killing    3m5s  kubelet, shoot--eureka--e2e-worker-exhnf-z1-6d66c49b9f-8k9cb  Stopping container role-permission-sidecar

  Warning  Unhealthy  3m3s  kubelet, shoot--eureka--e2e-worker-exhnf-z1-6d66c49b9f-8k9cb  Readiness probe failed: Get http://100.96.8.196:8080/actuator/health: dial tcp 100.96.8.196:8080: connect: invalid argument

Daniel Smith

unread,
Aug 17, 2020, 12:50:59 PM8/17/20
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@runzexia Do you have a broken aggregated apiserver? e.g. does kubectl api-resources report any errors?

@caesarxuchao how does one query the GC for children of a given resource?

runzexia

unread,
Aug 17, 2020, 10:18:44 PM8/17/20
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@lavalamp kubectl api-resources does not show errors.
Only the pod belonging to a deployement will have such a problem.
The binary program of this pod is built using graalvm.

Ruben de Vries

unread,
Oct 19, 2020, 4:36:42 AM10/19/20
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

we're encountering this issue on 1.18 on our test cluster, our review and prod clusters running 1.17 (but same configs, workloads) have no such issues at all.

Daniel Smith

unread,
Oct 19, 2020, 7:42:37 PM10/19/20
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

I think we will need more detail on the child objects. Foreground deletion means the GC will try to delete all child objects first. Do you know why the pod already has a foreground deletion finalizer on it?

Steffen Rumpf

unread,
Dec 4, 2020, 5:44:45 AM12/4/20
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

We experience this issue to on K8s 1.16 on AWS EKS. For us it looks like mostly only pods with a PVC are affected. Other pods are delete correctly. If we force kill the pods kubectl delete <podname> --grace-period 0 --force this leads to ghost docker containers still running and serving requests on the nodes. Only docker force delete helps to get rid of them (or recreating the node).

Balazs Varga

unread,
Jul 23, 2021, 1:20:11 PM7/23/21
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

same issue here on 1.18. Need to check are only pods with pvc affected or not.

Daniel Smith

unread,
Jul 23, 2021, 1:26:18 PM7/23/21
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

it looks like mostly only pods with a PVC are affected

If you can post complete metadata (at least we need to see the ownerrefs and finalizers) for pod and PVC, that might give a clue. But I fear we'll need the complete subgraph of owners, and I don't think there's an easy way to produce this.

Beck

unread,
Oct 5, 2021, 3:47:05 AM10/5/21
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Still having the same issue in:

kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:12:29Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}

Pod stuck and doesn't terminate by itself. Will try to kill it forcibly. It was a redis-slave pod.


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.

Triage notifications on the go with GitHub Mobile for iOS or Android.

Reply all
Reply to author
Forward
0 new messages