How to know fluentd has finished a file process

53 views
Skip to first unread message

Cyril Barbier

unread,
Aug 25, 2015, 5:52:10 AM8/25/15
to Fluentd Google Group
Hi,

I'm having a project where I want to schedule fluentd. I need to get a way to know when fluentd has finished its process on a given file.
I know that there is a .pos file that indicate the position of the last byte processed.

Is there an other way, a plugin, or native option to do that easily ?

I installed multiprocess plugin and want to schedule dynamically in order to redirect the file to be treated to a specific thread, so I need a way to know when the file is treated and thread is free for another one.
I just don't want to split the number of file (of different sizes) in as many queues as the number of threads ==> no inactive thread.

Maybe there is existing plugin to do that ?


Thanks in advance for you help.

Cyril

Mr. Fiber

unread,
Aug 25, 2015, 9:38:29 AM8/25/15
to Fluentd Google Group
Hi Cyril,

I installed multiprocess plugin and want to schedule dynamically in order to redirect the file to be treated to a specific thread, so I need a way to know when the file is treated and thread is free for another one.

It means you are now writing input plugin?
Fluentd handles logs as stream so almost existing plugins don't handle EOF as a special trigger.
For example, in_tail stops IO when hit EOF and try to read newer lines at the next timer trigger.
So fluentd hard to know "has finished its process on a given file." unlike bulk load tools.

Maybe, write new input plugin for watching files is better.


Masahiro


--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Cyril Barbier

unread,
Aug 25, 2015, 10:19:14 AM8/25/15
to Fluentd Google Group
In my case, files will be not updated every timer trigger.

A server receives logs in syslog, and this server send to me the content of the logs in files at a given frequency. So when fluentd has read a file, it will read another one (a new one). So, is there a way a flag to be raised when he reachs this EOF?

Or is this better in that case to send the logs in syslog format directly ?

I have some contraints :

I have not a single but 16 servers, and they are not equal in log providing (not the same calculator power). So, I really need to balance the weight of log from a scheduler I will make in script (or whatever). And for that balance I need an indicator to help me know which process in multithread is over.

If you have a syslog solution, I take it. I don't want to develop an input_plugin for the moment, I'm looking for existing solution. If none exists, I will adapt myself and make another way what I want to do.

Thx for your help

Mr. Fiber

unread,
Aug 25, 2015, 1:40:26 PM8/25/15
to Fluentd Google Group
A server receives logs in syslog, and this server send to me the content of the logs in files at a given frequency. So when fluentd has read a file, it will read another one (a new one). So, is there a way a flag to be raised when he reachs this EOF?

It means your syslog creates a file per request?
The file content is too small?

It seems there are 2 approaches.

- use in_syslog and send received events directly.
- extend in_tail for sending an event which represents reading file is finished after reads a file.

I'm not sure entire log flow and schedular mechanizm.
Is this enough or need extra feature?

Cyril Barbier

unread,
Aug 26, 2015, 3:23:29 AM8/26/15
to flu...@googlegroups.com
No, the syslog put in a file all the requests that has been made in the minute, so it's about 1000 req/s and a request write 25 logs lines.

Before, it send only a line by request so every hour, it tooked 30 min to process. Now I have to use multiprocess to handle this huge amount of logs to treat (to make statistics).

That was the point for a home made scheduler to send to free thread the files to balance the volume of logs.

And files were much easier for me because I have another constraint after fluentd, my log must be in chronological order. So if I mistake on scheduler ...

Thanks

--
You received this message because you are subscribed to a topic in the Google Groups "Fluentd Google Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fluentd/INxVSXKoeew/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fluentd+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages