Fluentd (td-agent) and pos_file handling when tailing multiple files pattern

458 views
Skip to first unread message

Darren Spruell

unread,
Dec 21, 2020, 3:44:59 PM12/21/20
to Fluentd Google Group
Greetings,

td-agent 1.11.2 (ruby 2.7.1p83)
Linux 4.19.0-12-amd64
Debian 10 buster

We are reading from a file source with @type tail, and specifying a wildcard file pattern. This input tails multiple files but uses a shared pos_file if I'm not mistaken.

<source>
    @type tail
    @id zeek_json
    @label @zeek
    tag zeek.*
    path /nsm/zeek/logs/current/json_streaming_*.log
    pos_file /var/log/td-agent/tmp/zeek_json.pos
    <parse>
        @type json
    </parse>
</source>

The issue I'm encountering is that the log source outputs rotated files that match the path pattern as well, and fluentd also picks up the backup files and reads them as well, resulting in duplicate ingestion of lines in the logs. Every line written to the log is reingested again the number of times that the file is rotated. This is described here:


I would prefer if I didn't have to list every file explicitly for this. 

Using fluent-bit isn't an option here as we use an output plugin that is only supported by Fluentd.

If the remarks at https://github.com/corelight/json-streaming-logs/issues/5#issuecomment-736726157 are accurate, is it possible to see an adjustment to the operation of Fluentd/td-agent to support proper pos tracking across wildcarded files as rotation occurs to prevent duplicate ingestion?

- Darren

cosmo09...@gmail.com

unread,
Dec 22, 2020, 12:33:18 AM12/22/20
to Fluentd Google Group
Hi,

> The issue I'm encountering is that the log source outputs rotated files that match the path pattern as well, and fluentd also picks up the backup files and reads them as well, resulting in duplicate ingestion of lines in the logs.

These files are logrotated outside of Fluentd?
If so, "follow_inodes true" should be appropriate in this case.
Fluentd v1.12.0 will support this parameter:
https://github.com/fluent/fluentd/commit/3c858075dbed69da560db759ba6ffd57ec1ecc03

Until Fluentd v1.12.0 is released, you should build fluentd gem at Fluentd master on your own to use follow_inodes parameter.

2020年12月22日火曜日 5:44:59 UTC+9 Darren Spruell:

cosmo09...@gmail.com

unread,
Dec 22, 2020, 3:19:11 AM12/22/20
to Fluentd Google Group
Hi,

Fluentd v1.12.0.rc2 has been released and it supports follow_inodes parameter on in_tail.
Could you try it out?

Cheers,

Hiroshi

2020年12月22日火曜日 14:33:18 UTC+9 cosmo09...@gmail.com:

Darren Spruell

unread,
Dec 22, 2020, 5:32:57 PM12/22/20
to flu...@googlegroups.com
This configuration with follow_inodes seems to be working really well, on early testing.

<source>
    @type tail
    @id zeek_json
    @label @zeek
    tag zeek.*
    path /nsm/zeek/logs/current/json_streaming_*.log
    pos_file /var/log/fluent/tmp/zeek_json.pos
    follow_inodes true

    <parse>
        @type json
    </parse>
</source>

Watching stdout of fluentd it appears to only be tailing the intended files. Will test more.

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fluentd/1afec7ec-1697-4f5b-9e19-3b65692c9bb3n%40googlegroups.com.


--

Darren Spruell

Senior Threat Analyst
Sumo Logic SpecOps

O 1-800-335-0403
 


 



Darren Spruell

unread,
Dec 28, 2020, 3:54:44 PM12/28/20
to flu...@googlegroups.com
Further verified, and I'm happy with the results. Validated that an observed message was only collected once over the course of rotation, and the Fluentd process appears to be stable.

- Darren

cosmo09...@gmail.com

unread,
Jan 4, 2021, 2:05:52 AM1/4/21
to Fluentd Google Group
Awesome! I'm happy to hear that :)

- Hiroshi

2020年12月29日火曜日 5:54:44 UTC+9 Darren Spruell:
Reply all
Reply to author
Forward
0 new messages