Tail multiple files and forward individually

159 views
Skip to first unread message

Mark Sutyak

unread,
Jun 28, 2021, 9:41:28 AM6/28/21
to Fluentd Google Group
My goal is to tail numerous files in one directory and forward the contents to my aggregator solution individually, without having to use possibly hundreds of unique source/match blocks.

Currently my configuration, using wildcards, is able to read every file in a directory, but it also lumps all the data together prior to the match tag receiving it.  This results in one large POST to my aggregator and data being split/buffered.  

In this scenario Fluentd is reading files that have already been created, always tailing from head, so the second goal is to not overflow buffers and keep POST sizes down by segregating the data.

The expected input on the aggregator side would be data from a single file per POST plus an identifier (probably tailed_path) to identify the payload.  Uniquely identifying the payload would permit the aggregator to gracefully handle data split from Fluentd buffering.

Thanks in advance for any guidance.

Mark

Mark Sutyak

unread,
Jul 27, 2021, 12:16:20 PM7/27/21
to Fluentd Google Group
For reference, I resolved this in a two-step approach.  First by using the multiline regex plugin to parse the well-formed CSV files as singular chunks of text, and then second parsing the CSV at the aggregator endpoint.
Here is my source block using the multiline plugin.  The "ID01" is a reliable first line identifier.  Every part of the text is captured into "jsonData" to be parsed later.
Not setting "multiline_flush_interval" was frustrating, as by default it leaves the last parsed object in memory and doesn't write it to the buffer until Fluentd is stopped.  Setting a value forces the flush.

<source>
  tag Identifier.csv
  @type tail
  path "C:/Path/*.csv"
  pos_file "C:/Program Files/App/Fluentd/pos/Identifier.csv.pos"
  path_key tailed_path
  read_from_head true
  read_lines_limit 5000
  # multiline_flush_interval must be present or Fluentd keeps the last record in memory
  # rather than writing it to the buffer.
  multiline_flush_interval 5
  @log_level debug
  <parse>
    @type multiline
    format_firstline /^ID01/
    format1 /^(?<jsonData>ID01.*)/
  </parse>
</source>

Reply all
Reply to author
Forward
0 new messages