Docker JSON log with tail plugin: inconsistent logging

Grigoriy Frolov

unread,

Apr 6, 2021, 12:29:27 PM4/6/21

to Fluentd Google Group

Hi all.

I am wondering if anyone else faced with the similar problem while tailing Docker logs.

I am conducting performance test for my application in EKS. Fluentd is installed as a DaemonSet. Mock application is packaged as a container. It emits log records as fast as it can, and Fluentd is to collect them from Docker JSON log, parse and route to Elasticsearch.

Everything goes fine until Docker rotates the log.

Thus, my "emitter" application emits 100k sequential lines like that:

```{"line_number":"080683","emit_timestamp":"2021-04-06T15:52:38,977180191+00:00","name":"emitter-66b6d9fcc5-rnhl7","uuid":"1d1e9bb4-96f0-11eb-9f27-0e65e37fcb7c"}```

Fluentd collects these lines from /var/log/containers/${container_name}.log file and sends the records to Elasticsearch. After the experiment I count the events in Elasticsearch index.

Elasticsearch index contains 101822 documents, that means some records were duplicated by Fluentd. In the mean time, some records were completely lost. The documents in Elasticsearch are consistent before the first rotation of Docker log happened.

Fluentd is deployed with fluent/fluentd-kubernetes-daemonset:v1.12-debian-elasticsearch7-1 Docker image. Its configuration goes below:

```

@log_level "warn"

</system>

@type tail

path "/var/log/containers/emitter-*_dev_*"

pos_file /var/run/fluentd/dev/emitter.pos

follow_inodes true

tag "my_application.*"

<parse>

@type json

</parse>

</source>

@type parser

key_name log

<parse>

@type multi_format

format json

</pattern>

format none

</pattern>

</parse>

</filter>

@type record_modifier

remove_keys __dummy__

__dummy__ ${if not record.has_key?('type'); record['type'] = 'perftest'; end; nil}

</record>

</filter>

@type rewrite_tag_filter

<rule>

key type

pattern /^Trace$/

tag "app-traces"

</rule>

<rule>

key type

pattern /^Event$/

tag "app-events"

</rule>

<rule>

key type

pattern /^(?!Event|Trace)(.+)$/

tag "perftest"

</rule>

</match>

@type rewrite_tag_filter

<rule>

key version

pattern /(.+)/

tag "${tag}-$1"

</rule>

</match>

@type record_modifier

remove_keys version,type

collect_timestamp "${Time.at(time).strftime('%Y-%m-%dT%H:%M:%S,%9N%:z')}"

</record>

</filter>

@type copy

<store>

@type elasticsearch

@log_level warn

suppress_type_name true

host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"

user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"

password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"

index_name "${tag}"

flush_thread_count "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_THREAD_COUNT'] || '8'}"

flush_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_INTERVAL'] || '5s'}"

chunk_limit_size "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_CHUNK_LIMIT_SIZE'] || '2M'}"

queue_limit_length "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_QUEUE_LIMIT_LENGTH'] || '32'}"

retry_max_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_RETRY_MAX_INTERVAL'] || '30'}"

retry_forever false

</buffer>

</store>

</match>

```

I believe the problem is somewhere in source configuration. I tried different parameters for @tail plugin, but still no luck. Either Fluentd loses events or loses and duplicates them.

Kentaro Hayashi

unread,

Apr 6, 2021, 10:29:35 PM4/6/21

to Fluentd Google Group

Hi,

Is running fluent/fluentd-kubernetes-daemonset:v1.12-debian-elasticsearch7-1 use Fluentd 1.12.2?,

if not (pulled mage is a bit old), it may be better to refresh for latest one.

Regards,

2021年4月7日水曜日 1:29:27 UTC+9 gvf...@gmail.com:

hiroshi hatake

unread,

Apr 6, 2021, 10:40:48 PM4/6/21

to flu...@googlegroups.com

Hi,

As hayashi said,

fluent/fluentd-kubernetes-daemonset:v1.12-debian-elasticsearch7-1 associates Fluentd v1.12 Elasticsearch images series.

So, you should update to the latest fluent/fluentd-kubernetes-daemonset:v1.12-debian-elasticsearch7-1 image at first.

> Elasticsearch index contains 101822 documents, that means some records were duplicated by Fluentd. In the mean time, some records were completely lost. The documents in Elasticsearch are consistent before the first rotation of Docker log happened.

Have you ever confirmed logs with `kubectl logs deployedname` for deployed daemonset?

If the older fluentd-ealsticsearch image, you will get follow_inodes true is not used warning message in Fluentd Daemonset log.

There is some issues for Docker conatiner log rotation.

Waiting file rotation is default 5 seconds and high volume log writing environment, then Fluentd losts and overlooks high frequency log rotated logs.

I guess that your generating log speed is not high for causing overlooking rotate log situation.

But I have no idea taking to generate 101822 documents in Elasticsearch cluster?

Within few minutes?

Your configuration seems fine for me.

Could you share your suspicious part of Fluentd Daemonset working log?

There is some hints to resolve this issue.

Thanks,

Hiroshi

2021/04/07 1:29、Grigoriy Frolov <gvf...@gmail.com>のメール:

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fluentd/9efbdb07-3b9e-4a60-98f5-901fbda27107n%40googlegroups.com.

Grigoriy Frolov

unread,

Apr 7, 2021, 3:52:20 AM4/7/21

to Fluentd Google Group

I updated Fluentd version to fluent/fluentd-kubernetes-daemonset:v1.12.2-debian-elasticsearch7-1.1 and tried setting tail's refresh_interval to 5, but that does not seem to do the trick.

Latest experiment was run with 50k lines. Inspecting the logs in /var/lib/docker/containers/${container_id} i can see, that there are two parts of the log. The first contains lines 1-40342, the second contains lines 40343-50000 that looks pretty consistent. The first line was emitted at 7:14:46, the last - at 7:18:31. Approximate rate is 222 lines per second.

Inspecting Elasticsearch index I can see that there are 40342 documents with line_number greater than or equal "000001" and less than or equal "040342". That seems to be right. Documents with line_number from "040343" to "040427" are absent in the index. And there are 10700 documents with line_number from "040428" to "050000". That looks weird. Narrowing the search I found out that, for example, documents with line_number from "041090" to "041099" are duplicated. That is really strange.

Fluentd log looks fine.

```

2021-04-07 07:13:55 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-concat' version '2.4.0'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-dedot_filter' version '1.0.0'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.13'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.0.2'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-grok-parser' version '2.6.2'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-json-in-json-2' version '1.0.2'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.6.0'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-parser-cri' version '0.1.0'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.8.5'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.0'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'

2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.2'

2021-04-07 07:13:55 +0000 [info]: gem 'fluentd' version '1.12.2'

2021-04-07 07:13:55 +0000 [info]: adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f7045b418d0 @keys="type">, /^Trace$/, "", "app-traces", nil]

2021-04-07 07:13:55 +0000 [info]: adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f7045b81070 @keys="type">, /^Event$/, "", "app-events", nil]

2021-04-07 07:13:55 +0000 [info]: adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f7045b83208 @keys="type">, /^(?!Event|Trace)(.+)$/, "", "perftest", nil]

2021-04-07 07:13:55 +0000 [info]: adding rewrite_tag_filter rule: version [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f704530e550 @keys="version">, /(.+)/, "", "${tag}-$1", nil]

2021-04-07 07:13:55 +0000 [info]: using configuration file: <ROOT>

... configuration omitted for brevity

</ROOT>

2021-04-07 07:13:55 +0000 [info]: starting fluentd-1.12.2 pid=7 ruby="2.6.6"

2021-04-07 07:13:55 +0000 [info]: spawn command to main: cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/fluentd/vendor/bundle/ruby/2.6.0/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--gemfile", "/fluentd/Gemfile", "-r", "/fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-elasticsearch-5.0.2/lib/fluent/plugin/elasticsearch_simple_sniffer.rb", "--under-supervisor"]

2021-04-07 07:13:55 +0000 [info]: adding filter pattern="my_application.**" type="parser"

2021-04-07 07:13:55 +0000 [info]: adding filter pattern="my_application.**" type="record_modifier"

2021-04-07 07:13:55 +0000 [info]: adding match pattern="my_application.**" type="rewrite_tag_filter"

2021-04-07 07:13:55 +0000 [info]: #0 adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f07eeb124a0 @keys="type">, /^Trace$/, "", "app-traces", nil]

2021-04-07 07:13:55 +0000 [info]: #0 adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f07eeb11140 @keys="type">, /^Event$/, "", "app-events", nil]

2021-04-07 07:13:55 +0000 [info]: #0 adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f07eeb10330 @keys="type">, /^(?!Event|Trace)(.+)$/, "", "perftest", nil]

2021-04-07 07:13:55 +0000 [info]: adding match pattern="{app-traces,app-events}" type="rewrite_tag_filter"

2021-04-07 07:13:55 +0000 [info]: #0 adding rewrite_tag_filter rule: version [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f07eeb7a910 @keys="version">, /(.+)/, "", "${tag}-$1", nil]

2021-04-07 07:13:55 +0000 [info]: adding filter pattern="{app-events-**,app-traces-**,perftest}" type="record_modifier"

2021-04-07 07:13:55 +0000 [info]: adding match pattern="{app-events-**,app-traces-**,perftest}" type="copy"

2021-04-07 07:13:56 +0000 [info]: adding source type="tail"

2021-04-07 07:13:56 +0000 [info]: #0 starting fluentd worker pid=17 ppid=7 worker=0

2021-04-07 07:13:56 +0000 [info]: #0 following tail of /var/log/containers/emitter-66b6d9fcc5-2948m_dev_bash-d884ab7e30917b53779990c45560f3c799b03c7f4de314185053a113f800d458.log

2021-04-07 07:13:56 +0000 [info]: #0 fluentd worker is now running worker=0

2021-04-07 07:17:48 +0000 [info]: #0 detected rotation of /var/log/containers/emitter-66b6d9fcc5-2948m_dev_bash-d884ab7e30917b53779990c45560f3c799b03c7f4de314185053a113f800d458.log; waiting 5.0 seconds

2021-04-07 07:17:48 +0000 [info]: #0 following tail of /var/log/containers/emitter-66b6d9fcc5-2948m_dev_bash-d884ab7e30917b53779990c45560f3c799b03c7f4de314185053a113f800d458.log

2021-04-07 07:17:51 +0000 [info]: #0 following tail of /var/log/containers/emitter-66b6d9fcc5-2948m_dev_bash-d884ab7e30917b53779990c45560f3c799b03c7f4de314185053a113f800d458.log

```

The only suspicious thing is the repeated message about following tail of the log file after its rotation.

Have you guys seen something like that?

Grigoriy Frolov

unread,

Apr 7, 2021, 5:18:41 AM4/7/21

to Fluentd Google Group

Also I noticed that if I set enable_watch_timer to false, then log records in Elasticsearch index are consistent, but Fluentd tails the log in weird manner. Right after the start it does not do anything: the index is empty. Since 3 minutes or so after start it sends about 41k/50k records to Elasticsearch and then stops sending to at least 10 minutes (I did not wait longer). But when I gracefully stop Fluentd, it finishes the log collection and I end up with 50k records in the index.

Setting @log_level to trace did not get me more information about what is going on.

cosmo09...@gmail.com

unread,

Apr 9, 2021, 4:48:43 AM4/9/21

to Fluentd Google Group

Hi,

fluent-plugin-elasticserarch bundles elasticsearch_genid filter plugins to prevent duplicated records.

Could you try the plugin?

In more detail, please refer the doc:

https://github.com/uken/fluent-plugin-elasticsearch#generate-hash-id

In this case, I recommend to calculate hash with entire record:

https://github.com/uken/fluent-plugin-elasticsearch/blob/master/README.ElasticsearchGenID.md#advanced-usage

elasticsearch_genid filter plugin description is here:

https://github.com/uken/fluent-plugin-elasticsearch/blob/master/README.ElasticsearchGenID.md

Hope this helps.

Best,

Hiroshi

2021年4月7日水曜日 18:18:41 UTC+9 gvf...@gmail.com:

Reply all

Reply to author

Forward