fluent-plugin-forest scalability?

382 views
Skip to first unread message

Mark Fine

unread,
Apr 21, 2015, 5:00:30 AM4/21/15
to flu...@googlegroups.com
Hi!

I'm evaluating fluentd and fluent-plugin-forest to dynamically route inputs to outputs - something along the lines of:

<match device.*>
type forest
subtype s3
<template>
type s3
s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
path devices/${tag}/logs/
buffer_path /var/log/td-agent/s3/${tag}
time_slice_format %Y%m%d%H
time_slice_wait 10m
</template>
</match>

Most of the use cases I've run across with fluent-plugin-forest have been around a small number of things to reuse a configuration in a template - are there any issues with using these templates where there's going to be a large number of "plants"? (> 1-2K devices) Looking through the source code of the plugin, nothing jumps out at me outside of the mutex around #plant. Should I be concerned about this approach? Is there a more optimal way to realize this kind of dynamic routing? Are there any similar use cases out there with large numbers of "plants"? Thanks!

Mark

Satoshi Tagomori

unread,
Apr 21, 2015, 1:04:57 PM4/21/15
to flu...@googlegroups.com
Hi Mark,

I (am the author of fluent-plugin-forest) think that your concern is just correct.
The way fluent-plugin-forest doing is not optimized totally.

But, it is the only way to handle tags dynamically increasing, in current Fluentd APIs.

Now, I (am also core committer of Fluentd project) am working to improve Plugin APIs for next version v0.14
to handle increasing tags easily.

It's not simple solution, and requires plugin's fixations, but is the only way to make the world better.
Please add comment for the branch/WIP-pull-request if you have any concern about this problem!

tagomoris.

2015年4月21日火曜日 18時00分30秒 UTC+9 Mark Fine:

Lance N.

unread,
Apr 22, 2015, 7:24:23 PM4/22/15
to flu...@googlegroups.com
If your fluentd instance crashes and restarts, the forest plugin will not find your old buffered files and finish sending them to S3.

Also, you pay for I/O events with S3. Every file flush every 10 minutes is a separate I/O event. 
Reply all
Reply to author
Forward
0 new messages