ingest all files from a directory with fluentd?

507 views
Skip to first unread message

supreme patadia

unread,
Aug 8, 2014, 5:26:03 PM8/8/14
to flu...@googlegroups.com
Hi,
I've tried looking for help on this topic in the mailing list before sending it out to the group but no luck.  I am trying to use fluentd to ingest multiple csv(logs) files that are dropped in a particular directory once a day.  Once ingested they need to send that to elasticsearch.  I've tried using the tail_ex plugin with the wildcard setting but it doesn't work since the files are only not written too, but dropped/moved in to this directory by a third party proprietary software once a day..  Is there any way for fluentd to read contents of all new files and send them to elasticsearch?  Here's my failed attempt using tail_ex.

Thanks
sp


<source>
  type tail_ex
  tag message
  format csv
  time_format %Y-%m-%d %H:%M:%S%z
  path /archived_logs/xxxxxxx/xxxx_xxxxxxx3-%Y-%m-%d-*.csb
  keys key1,key2,key3,key4,key5,key6,key7,key8,key9,key10,key11,key12,key13,key14,key15,key16,key17,key18,key19,key20,key21,key22,key23,key24
  time_key key3
  #path /var/log/jetty-*/%Y_%m_%d.stderrout.log
  pos_file /var/log/td-agent/xxxxx.log.pos
  refresh_interval 60
</source>

## match tag=debug.** and dump to console
<match debug.**>
  type stdout
</match>

<match **>
  type elasticsearch
  logstash_format true
  host elastic-host
  port 9200
  index_name maillogs
  type_name maillogs
</match>

Kiyoto Tamura

unread,
Aug 8, 2014, 5:51:51 PM8/8/14
to flu...@googlegroups.com
Hi,

>I've tried looking for help on this topic in the mailing list before sending it out to the group but no luck.

Sorry for overlooking your earlier post =/

> I am trying to use fluentd to ingest multiple csv(logs) files that are dropped in a particular directory once a day.

1. I recommend against using in_tail_ex in general. Its functionality has been subsumed by in_tail upstream.
2. Don't use in_tail for periodic file dumps like this. in_tail is designed for log files to which data is appended in a streaming manner.

For something like this, if you want to use Fluentd for it, the better path would be with in_exec (http://docs.fluentd.org/articles/in_exec) with a script that watches the directory, check that the file was already uploaded to Elasticsearch, and if not, read the file and output to stdout, which gets picked up by Fluentd, etc.

Kiyoto


--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Check out Fluentd, the open source data collector for high-volume data streams

supreme patadia

unread,
Aug 8, 2014, 5:58:00 PM8/8/14
to flu...@googlegroups.com
Thank you very much, I appreciate your quick reply.
Reply all
Reply to author
Forward
0 new messages