Daemon set pods gets deleted every 5 minutes

424 views
Skip to first unread message
Assigned to ori...@gmail.com by me

Ori Popowski

unread,
Aug 22, 2016, 4:26:01 AM8/22/16
to Kubernetes user discussion and Q&A
Hi,

We have a Fluentd daemon set and the pods get deleted and re-scheduled every five minutes with no apparent reason.

    fluentd-elasticsearch-2hye3    1/1    Running    0    3m    
    fluentd
-elasticsearch-2qh9t    1/1    Running    0    3m    
    fluentd
-elasticsearch-8o9n6    1/1    Running    0    3m    
    fluentd
-elasticsearch-90yk7    1/1    Running    0    3m    
    fluentd
-elasticsearch-d2xw8    1/1    Running    0    3m    
    fluentd
-elasticsearch-kvi78    1/1    Running    0    3m    
    fluentd
-elasticsearch-loato    1/1    Running    0    3m    
    fluentd
-elasticsearch-os5lu    1/1    Running    0    3m    
    fluentd
-elasticsearch-q31d9    1/1    Running    0    3m    
    fluentd
-elasticsearch-siviz    1/1    Running    0    3m


These are the events of one of the pods right before it gets killed:



   
FirstSeen    LastSeen    Count    From                            SubobjectPath                Type        Reason    Message
   
---------    --------    -----    ----                            -------------                --------    ------    -------
   
5m        5m        1    {kubelet ip-172-20-0-161.eu-west-1.compute.internal}    spec.containers{fluentd-elasticsearch}    Normal        Pulled    Container image "gcr.io/google_containers/fluentd-elasticsearch:1.17" already present on machine
   
5m        5m        1    {kubelet ip-172-20-0-161.eu-west-1.compute.internal}    spec.containers{fluentd-elasticsearch}    Normal        Created    Created container with docker id cc587a6ca2be
   
5m        5m        1    {kubelet ip-172-20-0-161.eu-west-1.compute.internal}    spec.containers{fluentd-elasticsearch}    Normal        Started    Started container with docker id cc587a6ca2be
   
0s        0s        1    {kubelet ip-172-20-0-161.eu-west-1.compute.internal}    spec.containers{fluentd-elasticsearch}    Normal        Killing    Killing container with docker id cc587a6ca2be: Need to kill pod.

Please not here, that the final event is Killing and the message type is Normal


Here's Fluentd logs:

2016-08-22 08:02:24 +0000 [info]: Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-logging", :port=>9200, :scheme=>"http"}
2016-08-22 08:02:35 +0000 [info]: detected rotation of /var/log/containers/fluentd-elasticsearch-55f88_kube-system_fluentd-elasticsearch-02e8c8d176e5f25ab9c4ee9739806d2b4a2279caa89a4e69624bbb011ff83556.log; waiting 5 seconds
2016-08-22 08:02:35 +0000 [info]: detected rotation of /var/log/containers/fluentd-elasticsearch-55f88_kube-system_POD-33029bd161b8d47ca8fea2c2139e8e722bf1a5387803663fbbaee1c0761b3228.log; waiting 5 seconds

2016-08-22 08:07:44 +0000 [info]: shutting down fluentd
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:134307c"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:fb0f7c"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:fa25d0"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f96d84"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:eff86c"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f1abe4"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f2d2d0"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f45d30"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f607ac"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f6f414"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f85ee4"
2016-08-22 08:07:45 +0000 [info]: shutting down filter type="kubernetes_metadata" plugin_id="object:dc4434"
2016-08-22 08:07:45 +0000 [info]: shutting down output type="null" plugin_id="object:e1c094"
2016-08-22 08:07:45 +0000 [info]: shutting down output type="null" plugin_id="object:1516c00"
2016-08-22 08:07:45 +0000 [info]: shutting down output type="elasticsearch" plugin_id="object:134e4b8"
2016-08-22 08:07:45 +0000 [info]: process finished code=0


Please note here that the exit code is 0, which means that Fluentd functioned correctly


Here's the yaml:


     apiVersion
: extensions/v1beta1
    kind
: DaemonSet
    metadata
:
      name
: fluentd-elasticsearch
     
namespace: kube-system
    spec
:
     
template:
        metadata
:
          labels
:
            name
: fluentd-elasticsearch
        spec
:
          containers
:
         
- name: fluentd-elasticsearch
            image
: gcr.io/google_containers/fluentd-elasticsearch:1.17
            command
: [ "td-agent", "-c", "/etc/config/td-agent.conf" ]
            resources
:
              limits
:
                memory
: 200Mi
              requests
:
                cpu
: 100m
                memory
: 200Mi
            volumeMounts
:
           
- name: fluentd-config
              mountPath
: /etc/config
           
- name: varlog
              mountPath
: /var/log
           
- name: varlibdockercontainers
              mountPath
: /var/lib/docker/containers
              readOnly
: true
           
- name: mntephemeraldockercontainers
              mountPath
: /mnt/ephemeral/docker/containers
              readOnly
: true
          terminationGracePeriodSeconds
: 30
          volumes
:
         
- name: fluentd-config
            configMap
:
              name
: fluentd
         
- name: varlog
            hostPath
:
              path
: /var/log
         
- name: varlibdockercontainers
            hostPath
:
              path
: /var/lib/docker/containers
         
- name: mntephemeraldockercontainers
            hostPath
:
              path
: /mnt/ephemeral/docker/containers


 I'd like to know what's causing the problem and how to troubleshoot it. What logs to I need to look at, etc.? By the way, we're using version 1.3.0

Thanks






David Aronchick

unread,
Aug 22, 2016, 11:39:54 AM8/22/16
to kubernet...@googlegroups.com
Can you look to see if you have conflicting replication controllers or daemon sets?

kubectl get rc
kubectl get deployments
kubectl get daemonsets

My first guess would be that these are conflicting and one rc starts a pod, and then is killed by the other.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

Daniel Smith

unread,
Aug 22, 2016, 2:24:37 PM8/22/16
to kubernet...@googlegroups.com
5 minutes is close to the amount of time node controller waits before evicting pods on nodes when it thinks a node is unhealthy. Does this happen on every node or just one? Are the nodes healthy? Are the pods passing liveness and readiness checks?

Conflicting daemonsets is a possibility. Also at one point I think we were running fluentd via static manifests on the nodes-- if that's in place on your cluster, it could cause something like this. Although I'd expect it to not live 5 minutes consistently in the case of fighting system components-- more like five seconds.

You can look at logs of kube-controller-manager and/or the kubelet if you want to go sleuthing.

Ori Popowski

unread,
Aug 22, 2016, 4:29:09 PM8/22/16
to Kubernetes user discussion and Q&A


Yes, it indeed was a conflicting daemon sets. Thanks
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.

Ori Popowski

unread,
Aug 22, 2016, 4:31:58 PM8/22/16
to Kubernetes user discussion and Q&A

Yeah, the pods were passing liveness and readiness checks. It was evident in the pod's events.

We are not running Fluentd via static manifests. We installed our cluster with ENABLE_LOGGING=false to allow us full control of logs collection.

In the end the problem was indeed conflicting daemon sets. Thanks
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages