ingest all files from a directory with fluentd?

supreme patadia

unread,

Aug 8, 2014, 5:26:03 PM8/8/14

to flu...@googlegroups.com

Hi,

I've tried looking for help on this topic in the mailing list before sending it out to the group but no luck. I am trying to use fluentd to ingest multiple csv(logs) files that are dropped in a particular directory once a day. Once ingested they need to send that to elasticsearch. I've tried using the tail_ex plugin with the wildcard setting but it doesn't work since the files are only not written too, but dropped/moved in to this directory by a third party proprietary software once a day.. Is there any way for fluentd to read contents of all new files and send them to elasticsearch? Here's my failed attempt using tail_ex.

Thanks

sp

type tail_ex

tag message

format csv

time_format %Y-%m-%d %H:%M:%S%z

path /archived_logs/xxxxxxx/xxxx_xxxxxxx3-%Y-%m-%d-*.csb

keys key1,key2,key3,key4,key5,key6,key7,key8,key9,key10,key11,key12,key13,key14,key15,key16,key17,key18,key19,key20,key21,key22,key23,key24

time_key key3

#path /var/log/jetty-*/%Y_%m_%d.stderrout.log

pos_file /var/log/td-agent/xxxxx.log.pos

refresh_interval 60

</source>

## match tag=debug.** and dump to console

type stdout

</match>

type elasticsearch

logstash_format true

host elastic-host

port 9200

index_name maillogs

type_name maillogs

</match>

Kiyoto Tamura

unread,

Aug 8, 2014, 5:51:51 PM8/8/14

to flu...@googlegroups.com

Hi,

>I've tried looking for help on this topic in the mailing list before sending it out to the group but no luck.

Sorry for overlooking your earlier post =/

> I am trying to use fluentd to ingest multiple csv(logs) files that are dropped in a particular directory once a day.

1. I recommend against using in_tail_ex in general. Its functionality has been subsumed by in_tail upstream.

2. Don't use in_tail for periodic file dumps like this. in_tail is designed for log files to which data is appended in a streaming manner.

For something like this, if you want to use Fluentd for it, the better path would be with in_exec (http://docs.fluentd.org/articles/in_exec) with a script that watches the directory, check that the file was already uploaded to Elasticsearch, and if not, read the file and output to stdout, which gets picked up by Fluentd, etc.

Kiyoto

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Check out Fluentd, the open source data collector for high-volume data streams

supreme patadia

unread,

Aug 8, 2014, 5:58:00 PM8/8/14

to flu...@googlegroups.com

Thank you very much, I appreciate your quick reply.

Reply all

Reply to author

Forward