Docker JSON log with tail plugin: inconsistent logging

476 views
Skip to first unread message

Grigoriy Frolov

unread,
Apr 6, 2021, 12:29:27 PM4/6/21
to Fluentd Google Group
Hi all.
I am wondering if anyone else faced with the similar problem while tailing Docker logs.

I am conducting performance test for my application in EKS. Fluentd is installed as a DaemonSet. Mock application is packaged as a container. It emits log records as fast as it can, and Fluentd is to collect them from Docker JSON log, parse and route to Elasticsearch.
Everything goes fine until Docker rotates the log.

Thus, my "emitter" application emits 100k sequential lines like that:
```{"line_number":"080683","emit_timestamp":"2021-04-06T15:52:38,977180191+00:00","name":"emitter-66b6d9fcc5-rnhl7","uuid":"1d1e9bb4-96f0-11eb-9f27-0e65e37fcb7c"}```

Fluentd collects these lines from /var/log/containers/${container_name}.log file and sends the records to Elasticsearch. After the experiment I count the events in Elasticsearch index.

Elasticsearch index contains 101822 documents, that means some records were duplicated by Fluentd. In the mean time, some records were completely lost. The documents in Elasticsearch are consistent before the first rotation of Docker log happened.

Fluentd is deployed with fluent/fluentd-kubernetes-daemonset:v1.12-debian-elasticsearch7-1 Docker image. Its configuration goes below:

```
<system>
      @log_level "warn"
    </system>

    <source>
      @type tail
      path "/var/log/containers/emitter-*_dev_*"
      pos_file /var/run/fluentd/dev/emitter.pos
      follow_inodes true
      tag "my_application.*"
      <parse>
        @type json
      </parse>
    </source>

    <filter my_application.**>
      @type parser
      key_name log
      <parse>
        @type multi_format
        <pattern>
          format json
        </pattern>
        <pattern>
          format none
        </pattern>
      </parse>
    </filter>

    <filter my_application.**>
      @type record_modifier
      remove_keys __dummy__
      <record>
        __dummy__ ${if not record.has_key?('type'); record['type'] = 'perftest'; end; nil}
      </record>
    </filter>

    <match my_application.**>
      @type rewrite_tag_filter
      <rule>
        key type
        pattern /^Trace$/
        tag "app-traces"
      </rule>
      <rule>
        key type
        pattern /^Event$/
        tag "app-events"
      </rule>
      <rule>
        key type
        pattern /^(?!Event|Trace)(.+)$/
        tag "perftest"
      </rule>
    </match>

    <match {app-traces,app-events}>
      @type rewrite_tag_filter
      <rule>
        key version
        pattern /(.+)/
        tag "${tag}-$1"
      </rule>
    </match>

    <filter {app-events-**,app-traces-**,perftest}>
      @type record_modifier
      remove_keys version,type
      <record>
        collect_timestamp "${Time.at(time).strftime('%Y-%m-%dT%H:%M:%S,%9N%:z')}"
      </record>
    </filter>

    <match {app-events-**,app-traces-**,perftest}>
      @type copy
      <store>
        @type elasticsearch
        @log_level warn
        suppress_type_name true
        host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
        user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"
        password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"
        index_name "${tag}"
        <buffer>
          flush_thread_count "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_THREAD_COUNT'] || '8'}"
          flush_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_INTERVAL'] || '5s'}"
          chunk_limit_size "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_CHUNK_LIMIT_SIZE'] || '2M'}"
          queue_limit_length "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_QUEUE_LIMIT_LENGTH'] || '32'}"
          retry_max_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_RETRY_MAX_INTERVAL'] || '30'}"
          retry_forever false
        </buffer>
      </store>
    </match>
```

I believe the problem is somewhere in source configuration. I tried different parameters for @tail plugin, but still no luck. Either Fluentd loses events or loses and duplicates them.

Kentaro Hayashi

unread,
Apr 6, 2021, 10:29:35 PM4/6/21
to Fluentd Google Group
Hi,

Is running fluent/fluentd-kubernetes-daemonset:v1.12-debian-elasticsearch7-1 use Fluentd 1.12.2?,
if not (pulled mage is a bit old), it may be better to refresh for latest one.

Regards,

2021年4月7日水曜日 1:29:27 UTC+9 gvf...@gmail.com:

hiroshi hatake

unread,
Apr 6, 2021, 10:40:48 PM4/6/21
to flu...@googlegroups.com
Hi,

As hayashi said,
fluent/fluentd-kubernetes-daemonset:v1.12-debian-elasticsearch7-1 associates Fluentd v1.12 Elasticsearch images series.
So, you should update to the latest fluent/fluentd-kubernetes-daemonset:v1.12-debian-elasticsearch7-1 image at first.

> Elasticsearch index contains 101822 documents, that means some records were duplicated by Fluentd. In the mean time, some records were completely lost. The documents in Elasticsearch are consistent before the first rotation of Docker log happened.

Have you ever confirmed logs with `kubectl logs deployedname` for deployed daemonset?
If the older fluentd-ealsticsearch image, you will get follow_inodes true is not used warning message in Fluentd Daemonset log.

There is some issues for Docker conatiner log rotation.
Waiting file rotation is default 5 seconds and high volume log writing environment, then Fluentd losts and overlooks high frequency log rotated logs.
I guess that your generating log speed is not high for causing overlooking rotate log situation.
But I have no idea taking to generate 101822 documents in Elasticsearch cluster?
Within few minutes?

Your configuration seems fine for me.
Could you share your suspicious part of Fluentd Daemonset working log?

There is some hints to resolve this issue.

Thanks,

Hiroshi

2021/04/07 1:29、Grigoriy Frolov <gvf...@gmail.com>のメール:

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fluentd/9efbdb07-3b9e-4a60-98f5-901fbda27107n%40googlegroups.com.

Grigoriy Frolov

unread,
Apr 7, 2021, 3:52:20 AM4/7/21
to Fluentd Google Group
I updated Fluentd version to fluent/fluentd-kubernetes-daemonset:v1.12.2-debian-elasticsearch7-1.1 and tried setting tail's refresh_interval to 5, but that does not seem to do the trick.
Latest experiment was run with 50k lines. Inspecting the logs in /var/lib/docker/containers/${container_id} i can see, that there are two parts of the log. The first contains lines 1-40342, the second contains lines 40343-50000 that looks pretty consistent. The first line was emitted at 7:14:46, the last - at 7:18:31. Approximate rate is 222 lines per second.

Inspecting Elasticsearch index I can see that there are 40342 documents with line_number greater than or equal "000001" and less than or equal "040342". That seems to be right. Documents with line_number from "040343" to "040427" are absent in the index. And there are 10700 documents with line_number from "040428" to "050000". That looks weird. Narrowing the search I found out that, for example, documents with line_number from "041090" to "041099" are duplicated. That is really strange.

Fluentd log looks fine.

```
2021-04-07 07:13:55 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-concat' version '2.4.0'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-dedot_filter' version '1.0.0'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.13'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.0.2'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-grok-parser' version '2.6.2'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-json-in-json-2' version '1.0.2'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.6.0'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-parser-cri' version '0.1.0'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.8.5'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.0'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2021-04-07 07:13:55 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.2'
2021-04-07 07:13:55 +0000 [info]: gem 'fluentd' version '1.12.2'
2021-04-07 07:13:55 +0000 [info]: adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f7045b418d0 @keys="type">, /^Trace$/, "", "app-traces", nil]
2021-04-07 07:13:55 +0000 [info]: adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f7045b81070 @keys="type">, /^Event$/, "", "app-events", nil]
2021-04-07 07:13:55 +0000 [info]: adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f7045b83208 @keys="type">, /^(?!Event|Trace)(.+)$/, "", "perftest", nil]
2021-04-07 07:13:55 +0000 [info]: adding rewrite_tag_filter rule: version [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f704530e550 @keys="version">, /(.+)/, "", "${tag}-$1", nil]
2021-04-07 07:13:55 +0000 [info]: using configuration file: <ROOT>
   ... configuration omitted for brevity 
</ROOT>
2021-04-07 07:13:55 +0000 [info]: starting fluentd-1.12.2 pid=7 ruby="2.6.6"
2021-04-07 07:13:55 +0000 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/fluentd/vendor/bundle/ruby/2.6.0/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--gemfile", "/fluentd/Gemfile", "-r", "/fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-elasticsearch-5.0.2/lib/fluent/plugin/elasticsearch_simple_sniffer.rb", "--under-supervisor"]
2021-04-07 07:13:55 +0000 [info]: adding filter pattern="my_application.**" type="parser"
2021-04-07 07:13:55 +0000 [info]: adding filter pattern="my_application.**" type="record_modifier"
2021-04-07 07:13:55 +0000 [info]: adding match pattern="my_application.**" type="rewrite_tag_filter"
2021-04-07 07:13:55 +0000 [info]: #0 adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f07eeb124a0 @keys="type">, /^Trace$/, "", "app-traces", nil]
2021-04-07 07:13:55 +0000 [info]: #0 adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f07eeb11140 @keys="type">, /^Event$/, "", "app-events", nil]
2021-04-07 07:13:55 +0000 [info]: #0 adding rewrite_tag_filter rule: type [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f07eeb10330 @keys="type">, /^(?!Event|Trace)(.+)$/, "", "perftest", nil]
2021-04-07 07:13:55 +0000 [info]: adding match pattern="{app-traces,app-events}" type="rewrite_tag_filter"
2021-04-07 07:13:55 +0000 [info]: #0 adding rewrite_tag_filter rule: version [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f07eeb7a910 @keys="version">, /(.+)/, "", "${tag}-$1", nil]
2021-04-07 07:13:55 +0000 [info]: adding filter pattern="{app-events-**,app-traces-**,perftest}" type="record_modifier"
2021-04-07 07:13:55 +0000 [info]: adding match pattern="{app-events-**,app-traces-**,perftest}" type="copy"
2021-04-07 07:13:56 +0000 [info]: adding source type="tail"
2021-04-07 07:13:56 +0000 [info]: #0 starting fluentd worker pid=17 ppid=7 worker=0
2021-04-07 07:13:56 +0000 [info]: #0 following tail of /var/log/containers/emitter-66b6d9fcc5-2948m_dev_bash-d884ab7e30917b53779990c45560f3c799b03c7f4de314185053a113f800d458.log
2021-04-07 07:13:56 +0000 [info]: #0 fluentd worker is now running worker=0
2021-04-07 07:17:48 +0000 [info]: #0 detected rotation of /var/log/containers/emitter-66b6d9fcc5-2948m_dev_bash-d884ab7e30917b53779990c45560f3c799b03c7f4de314185053a113f800d458.log; waiting 5.0 seconds
2021-04-07 07:17:48 +0000 [info]: #0 following tail of /var/log/containers/emitter-66b6d9fcc5-2948m_dev_bash-d884ab7e30917b53779990c45560f3c799b03c7f4de314185053a113f800d458.log
2021-04-07 07:17:51 +0000 [info]: #0 following tail of /var/log/containers/emitter-66b6d9fcc5-2948m_dev_bash-d884ab7e30917b53779990c45560f3c799b03c7f4de314185053a113f800d458.log
```

The only suspicious thing is the repeated message about following tail of the log file after its rotation.

Have you guys seen something like that?

Grigoriy Frolov

unread,
Apr 7, 2021, 5:18:41 AM4/7/21
to Fluentd Google Group
Also I noticed that if I set enable_watch_timer to false, then log records in Elasticsearch index are consistent, but Fluentd tails the log in weird manner. Right after the start it does not do anything: the index is empty. Since 3 minutes or so after start it sends about 41k/50k records to Elasticsearch and then stops sending to at least 10 minutes (I did not wait longer). But when I gracefully stop Fluentd, it finishes the log collection and I end up with 50k records in the index.
Setting @log_level to trace did not get me more information about what is going on.

cosmo09...@gmail.com

unread,
Apr 9, 2021, 4:48:43 AM4/9/21
to Fluentd Google Group
Hi,

fluent-plugin-elasticserarch bundles elasticsearch_genid filter plugins to prevent duplicated records.
Could you try the plugin?

In more detail, please refer the doc:

In this case, I recommend to calculate hash with entire record:

elasticsearch_genid filter plugin description is here:

Hope this helps.

Best,

Hiroshi

2021年4月7日水曜日 18:18:41 UTC+9 gvf...@gmail.com:
Reply all
Reply to author
Forward
0 new messages