FluentD pods state is CrashLoopBackOff when setting workers in conf file

275 views
Skip to first unread message

mohit jain

unread,
Oct 8, 2020, 1:54:45 AM10/8/20
to Fluentd Google Group
Hi,

I am using fluentd "fluentd-1.11.1" and ruby="2.4.10", when I am configuring <system> worker 4 </system>, pods are not coming up, it is just restarting the pods and then moving to crashloopbackoff state. I have checked the logs but at the log level I didnt find any error message. Can anyone guide me why I am facing this issue. 

mohit jain

unread,
Oct 9, 2020, 12:40:14 AM10/9/20
to Fluentd Google Group
worker configuration ( <system> worker 4 </system> ) is working fine with fluentd 1.9.2 version.

Mr. Fiber

unread,
Oct 12, 2020, 6:02:23 PM10/12/20
to Fluentd Google Group
> <system> worker 4 </system>
Is this typo? 'workers' is correct, not 'worker'.
One popular case is you use in_tail like plugin which doesn't support multi worker mode.

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fluentd/f04719bd-29fa-46ed-a7de-8c42c9692148n%40googlegroups.com.

mohit jain

unread,
Oct 13, 2020, 4:27:37 AM10/13/20
to Fluentd Google Group
Its my mistake (its workers only, not worker), with workers it is not working in "fluentd-1.11.1" but with "1.9.2" version it is working fine.

Mr. Fiber

unread,
Oct 15, 2020, 11:20:06 AM10/15/20
to Fluentd Google Group
Could you paste reproducible conf to check the problem?

mohit jain

unread,
Oct 19, 2020, 9:55:20 AM10/19/20
to Fluentd Google Group
Hi, PFA conf file which I am using for my case. with this configuration you can reproduce the issue.

<system>
  workers 4
</system>
 
<source>
  @type systemd
  path /var/log/journal
  tag journal
  <entry>
    fields_strip_underscores true
    fields_lowercase true
  </entry>
</source>
# system-fluentd config
# drop fluentd logs
<match fluent.**>
  @type null
</match>
 

# forward all tenants logs to public elasticsearch
# do not split per type now as standard k8s logs do not have type
<match xxx.logging.**>
  @type elasticsearch_dynamic
  @log_level info
  include_tag_key true
  host elasticsearch
  port 9200
  logstash_format true
  logstash_prefix log-${tag_parts[2]}
  reload_connections false
  reconnect_on_error true
  reload_on_failure true
  request_timeout 10s
  <buffer>
    @type file
    chunk_limit_size 8MB
    path /var/log/td-agent/es-fluentd-buffer/xxx.logging.all
  </buffer>
</match>


mohit jain

unread,
Oct 21, 2020, 10:01:52 PM10/21/20
to Fluentd Google Group
Hello, Can you please help me on this issue.

Mr. Fiber

unread,
Oct 23, 2020, 8:58:47 PM10/23/20
to Fluentd Google Group
Hmm... your configuration doesn't work with v1.9.2 on my local environment.

2020-10-24 00:56:24 +0000 [error]: #0 config error file="f.conf" error_class=Fluent::ConfigError error="Plugin 'systemd' does not support multi workers configuration (Fluent::Plugin::SystemdInput)"

Could you recheck multi worker mode and systemd input plugin combo doesn't work with v1.9.2?

mohit jain

unread,
Oct 24, 2020, 12:33:19 AM10/24/20
to Fluentd Google Group
Can you please check this configuration, it is working with fluentd 1.9.2 version.

<system>
  workers 4
</system>

<source>
  @type prometheus
</source>

<source>
 @type prometheus_monitor
  <labels>
    host ${hostname}
  </labels>
</source>

<match xxx.logging.**>
  @type elasticsearch_dynamic
  @log_level info
  include_tag_key true
  host elasticsearch
  port 9200
  logstash_format true
  logstash_prefix log-${tag_parts[2]}
  reload_connections false
  reconnect_on_error true
  reload_on_failure true
  request_timeout 10s
  <buffer>
    @type file
    chunk_limit_size 8MB
    path /var/log/td-agent/es-fluentd-buffer/xxx.logging.all
  </buffer>
</match>


Mr. Fiber

unread,
Oct 26, 2020, 7:19:58 AM10/26/20
to Fluentd Google Group
Yes. It works because prometheus plugin works with multi-worker mode.

mohit jain

unread,
Oct 26, 2020, 7:26:13 AM10/26/20
to Fluentd Google Group
Okay, but the same configuration is not working with the fluentd 1.11.1 version. May I know why the same configuration is not working with fluentd 1.11.1 version? and how can I solve this issue if I am using fluentd 1.11.1 version.

Mr. Fiber

unread,
Oct 26, 2020, 10:16:03 AM10/26/20
to Fluentd Google Group
> how can I solve this issue if I am using fluentd 1.11.1 version.

I'm not sure because your configuration works with fluentd v1.11.1 on my environment.
Maybe, the problem is not the configuration.

$ fluentd -c p.conf
2020-10-26 11:35:39 +0000 [info]: parsing config file is succeeded path="p.conf"
2020-10-26 11:35:39 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '4.2.2'
2020-10-26 11:35:39 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.8.4'
2020-10-26 11:35:39 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.2'
2020-10-26 11:35:39 +0000 [info]: gem 'fluentd' version '1.11.1'
2020-10-26 11:35:40 +0000 [info]: using configuration file: <ROOT>

  <system>
    workers 4
  </system>
  <source>
    @type prometheus
  </source>
  <source>
    @type prometheus_monitor
    <labels>
      host ${hostname}
    </labels>
  </source>
  <match xxx.logging.**>
    @type elasticsearch_dynamic
    @log_level "info"
    include_tag_key true
    host "127.0.0.1"

    port 9200
    logstash_format true
    logstash_prefix "log-${tag_parts[2]}"
    reload_connections false
    reconnect_on_error true
    reload_on_failure true
    request_timeout 10s
    <buffer>
      @type "file"
      chunk_limit_size 8MB
      path "./es-fluentd-buffer/xxx.logging.all"
    </buffer>
  </match>
</ROOT>
2020-10-26 11:35:40 +0000 [info]: starting fluentd-1.11.1 pid=8662 ruby="2.7.2"
2020-10-26 11:35:40 +0000 [info]: spawn command to main:  cmdline=["/path/to/2.7.2/bin/ruby", "-Eascii-8bit:ascii-8bit", "/path/to/2.7.2/bin/fluentd", "-c", "p.conf", "--under-supervisor"]
2020-10-26 11:35:46 +0000 [info]: adding match pattern="xxx.logging.**" type="elasticsearch_dynamic"
2020-10-26 11:35:48 +0000 [warn]: #2 Detected ES 7.x: `_doc` will be used as the document `_type`.
2020-10-26 11:35:49 +0000 [warn]: #1 Detected ES 7.x: `_doc` will be used as the document `_type`.
2020-10-26 11:35:49 +0000 [warn]: #0 Detected ES 7.x: `_doc` will be used as the document `_type`.
2020-10-26 11:35:49 +0000 [info]: adding source type="prometheus"
2020-10-26 11:35:49 +0000 [warn]: #3 Detected ES 7.x: `_doc` will be used as the document `_type`.
2020-10-26 11:35:49 +0000 [info]: #2 starting fluentd worker pid=8696 ppid=8662 worker=2
2020-10-26 11:35:49 +0000 [info]: #2 fluentd worker is now running worker=2
2020-10-26 11:35:49 +0000 [info]: adding source type="prometheus_monitor"
2020-10-26 11:35:49 +0000 [info]: #0 starting fluentd worker pid=8694 ppid=8662 worker=0
2020-10-26 11:35:49 +0000 [info]: #1 starting fluentd worker pid=8695 ppid=8662 worker=1
2020-10-26 11:35:49 +0000 [info]: #3 starting fluentd worker pid=8697 ppid=8662 worker=3
2020-10-26 11:35:49 +0000 [info]: #0 fluentd worker is now running worker=0
2020-10-26 11:35:49 +0000 [info]: #3 fluentd worker is now running worker=3
2020-10-26 11:35:49 +0000 [info]: #1 fluentd worker is now running worker=1

mohit jain

unread,
Nov 3, 2020, 3:05:30 AM11/3/20
to Fluentd Google Group
Hi,

I tried worker parameter with below configuration, I executed case1 and case2, but it is not working with fluentd 1.11.1 (no of worker should be more than 1), pods are existing with exit code 137. Can you please suggest me how I can resolve that issue.
.
Case1: Worker with fluentd 1.11.1 version
I am using fluentd 1.11.1 version, in configuration file I added worker functionality with http fluentd input plugin. so when I deploy the fluentd chart, pod will start for few seconds and exiting with exit code 137. To resolve this error I increased pod memory to 2Gb (I configured 4 workers), but after increasing the memory issue remains same. PFB the configuration which I used.

Case2: Worker with fluentd 1.9.2 version
The below configuration I used for fluentd 1.9.2 as well, it is working fine and in this case also I am using 4 workers and the pod memory is 500Mi and cpu as 500m.

Configuration 
<system>
workers 4
</system>

<source>
@type http
@log_level error
@id input_http_ipv4
port 9000
bind 0.0.0.0
</source>

<source>
@type http
@log_level error
@id input_http_ipv6
port 9000
bind ::
</source>

<filter pmdata>
@type record_transformer
enable_ruby
<record>
  date ${ require 'date'; DateTime.rfc3339(record["measurement_time"]).strftime('%Y-%m-%d') }
</record>
</filter>

<filter fmdata>
@type record_transformer
enable_ruby
<record>
  date ${ require 'date'; DateTime.rfc3339(record["alarm_time"]).strftime('%Y-%m-%d') }
</record>
</filter>

<filter {*security_logs*,*debug_logs*,*LI_logs*,*audit_logs*}>
@type record_transformer
enable_ruby
<record>
  date ${ require 'date'; DateTime.rfc3339(record["log_event_time_stamp"]).strftime('%Y-%m-%d') }
</record>
</filter>

<filter fmdata>
@type elasticsearch_genid
hash_id_key _hash
</filter>

<filter pmdata>
@type elasticsearch_genid
hash_id_key _hash
</filter>

<filter {*security_logs*,*debug_logs*,*LI_logs*,*audit_logs*}>
@type elasticsearch_genid
hash_id_key _hash
</filter>

<match pmdata>
@type copy
<store>
<buffer tag, date, dnf_name>
@type file
@log_level error
path  /data/fluentdlogs/pm
timekey      1d
flush_thread_count 4
chunk_limit_size 4MB
overflow_action block
flush_mode interval
flush_interval 5s
total_limit_size 1GB
</buffer>
@type elasticsearch
@log_level error
index_name ${tag}-${dnf_name}-${date}
type_name pm_data
host elasticsearch
port 9200
id_key _hash # specify same key name which is specified in hash_id_key
remove_keys _hash # Elasticsearch doesn't like keys that start with _
logstash_format false
bulk_message_request_threshold 5M
request_timeout 30s
reconnect_on_error true
reload_on_failure true
reload_connections false
</store>
</match>

<match fmdata>
@type copy
<store>
<buffer tag, date, dnf_name>
@type file
@log_level error
path  /data/fluentdlogs/fm
timekey      1d
flush_thread_count 4
chunk_limit_size 4MB
overflow_action block
flush_mode interval
flush_interval 5s
total_limit_size 2GB
</buffer>
@type elasticsearch
@log_level error
index_name ${tag}-${dnf_name}-${date}
type_name fm_data
host elasticsearch
port 9200
id_key _hash # specify same key name which is specified in hash_id_key
remove_keys _hash # Elasticsearch doesn't like keys that start with _
logstash_format false
bulk_message_request_threshold 5M
request_timeout 30s
reconnect_on_error true
reload_on_failure true
reload_connections false
</store>
</match>

<match {*security_logs*,*debug_logs*,*LI_logs*,*audit_logs*}>
@type copy
<store>
<buffer tag, date, facility, dnf_name>
@type file
@log_level error
path  /data/fluentdlogs/logs
timekey      1d
flush_thread_count 4
chunk_limit_size 4MB
overflow_action block
flush_mode interval
flush_interval 5s
total_limit_size 5GB
</buffer>
@type elasticsearch
@log_level error
index_name ${tag}-${facility}-${dnf_name}-${date}
type_name logs_data
host elasticsearch
port 9200
id_key _hash # specify same key name which is specified in hash_id_key
remove_keys _hash # Elasticsearch doesn't like keys that start with _
logstash_format false
bulk_message_request_threshold 5M
request_timeout 30s
reconnect_on_error true
reload_on_failure true
reload_connections false
</store>
</match>

Thanks in advance

Reply all
Reply to author
Forward
0 new messages