Hi,
We have a Fluentd daemon set and the pods get deleted and re-scheduled every five minutes with no apparent reason.
fluentd-elasticsearch-2hye3 1/1 Running 0 3m
fluentd-elasticsearch-2qh9t 1/1 Running 0 3m
fluentd-elasticsearch-8o9n6 1/1 Running 0 3m
fluentd-elasticsearch-90yk7 1/1 Running 0 3m
fluentd-elasticsearch-d2xw8 1/1 Running 0 3m
fluentd-elasticsearch-kvi78 1/1 Running 0 3m
fluentd-elasticsearch-loato 1/1 Running 0 3m
fluentd-elasticsearch-os5lu 1/1 Running 0 3m
fluentd-elasticsearch-q31d9 1/1 Running 0 3m
fluentd-elasticsearch-siviz 1/1 Running 0 3m
These are the events of one of the pods right before it gets killed:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
5m 5m 1 {kubelet ip-172-20-0-161.eu-west-1.compute.internal} spec.containers{fluentd-elasticsearch} Normal Pulled Container image "gcr.io/google_containers/fluentd-elasticsearch:1.17" already present on machine
5m 5m 1 {kubelet ip-172-20-0-161.eu-west-1.compute.internal} spec.containers{fluentd-elasticsearch} Normal Created Created container with docker id cc587a6ca2be
5m 5m 1 {kubelet ip-172-20-0-161.eu-west-1.compute.internal} spec.containers{fluentd-elasticsearch} Normal Started Started container with docker id cc587a6ca2be
0s 0s 1 {kubelet ip-172-20-0-161.eu-west-1.compute.internal} spec.containers{fluentd-elasticsearch} Normal Killing Killing container with docker id cc587a6ca2be: Need to kill pod.
Please not here, that the final event is
Killing and the message type is
NormalHere's Fluentd logs:
2016-08-22 08:02:24 +0000 [info]: Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-logging", :port=>9200, :scheme=>"http"}
2016-08-22 08:02:35 +0000 [info]: detected rotation of /var/log/containers/fluentd-elasticsearch-55f88_kube-system_fluentd-elasticsearch-02e8c8d176e5f25ab9c4ee9739806d2b4a2279caa89a4e69624bbb011ff83556.log; waiting 5 seconds
2016-08-22 08:02:35 +0000 [info]: detected rotation of /var/log/containers/fluentd-elasticsearch-55f88_kube-system_POD-33029bd161b8d47ca8fea2c2139e8e722bf1a5387803663fbbaee1c0761b3228.log; waiting 5 seconds
2016-08-22 08:07:44 +0000 [info]: shutting down fluentd
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:134307c"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:fb0f7c"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:fa25d0"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f96d84"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:eff86c"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f1abe4"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f2d2d0"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f45d30"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f607ac"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f6f414"
2016-08-22 08:07:45 +0000 [info]: shutting down input type="tail" plugin_id="object:f85ee4"
2016-08-22 08:07:45 +0000 [info]: shutting down filter type="kubernetes_metadata" plugin_id="object:dc4434"
2016-08-22 08:07:45 +0000 [info]: shutting down output type="null" plugin_id="object:e1c094"
2016-08-22 08:07:45 +0000 [info]: shutting down output type="null" plugin_id="object:1516c00"
2016-08-22 08:07:45 +0000 [info]: shutting down output type="elasticsearch" plugin_id="object:134e4b8"
2016-08-22 08:07:45 +0000 [info]: process finished code=0
Please note here that the exit code is
0, which means that Fluentd functioned correctly
Here's the yaml:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
spec:
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
containers:
- name: fluentd-elasticsearch
image: gcr.io/google_containers/fluentd-elasticsearch:1.17
command: [ "td-agent", "-c", "/etc/config/td-agent.conf" ]
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: fluentd-config
mountPath: /etc/config
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: mntephemeraldockercontainers
mountPath: /mnt/ephemeral/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: fluentd-config
configMap:
name: fluentd
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: mntephemeraldockercontainers
hostPath:
path: /mnt/ephemeral/docker/containers
I'd like to know what's causing the problem and how to troubleshoot it. What logs to I need to look at, etc.? By the way, we're using version 1.3.0
Thanks