Fluentd S3 upload data every 10 mins

591 views
Skip to first unread message

Oleg Hordiychuck

unread,
Dec 15, 2014, 9:42:14 AM12/15/14
to flu...@googlegroups.com
It's possible to set fluentd s3 time_slice_format option to upload files hourly, or every minute. But is it possible to upload data every custom time? In my case I want upload data every 10 minutes.

Kiyoto Tamura

unread,
Dec 15, 2014, 11:42:00 AM12/15/14
to flu...@googlegroups.com
Oleg-

If you mean creating files names with 10-min increments, I don't think this is possible right now. However, if it is just about uploading every 10 minutes, you can use "flush_interval 10m".

Kiyoto

On Mon, Dec 15, 2014 at 6:42 AM, Oleg Hordiychuck <o.h...@leantegra.com> wrote:
It's possible to set fluentd s3 time_slice_format option to upload files hourly, or every minute. But is it possible to upload data every custom time? In my case I want upload data every 10 minutes.

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Check out Fluentd, the open source data collector to unify log management.

Oleg Hordiychuck

unread,
Dec 16, 2014, 5:06:36 AM12/16/14
to flu...@googlegroups.com
Thank you for the answer. Yes, I mean creating files every 10 minutes. 

Kiyoto Tamura

unread,
Dec 16, 2014, 3:19:12 PM12/16/14
to flu...@googlegroups.com
Ok. No, this is not supported with time_slice_format right now. Can you walk me through why hourly is not sufficient?

Kiyoto

Oleg Hordiychuck

unread,
Dec 17, 2014, 5:52:38 AM12/17/14
to flu...@googlegroups.com
Well, we want to build smth like this:

event generators->fluentd->s3->EMR(amazon's mapreduce)->redshift (amazon's postgres) <- analytic app

As you may see in this chain, analytic app will receive new events no faster than hourly (actually even more due to EMR cycles and other jobs). Also a maximum time-scale in analytic app is 10 mins. So now we can provide such time-scile, but with significant delay.

Oleg Hordiychuck

unread,
Dec 17, 2014, 6:00:14 AM12/17/14
to flu...@googlegroups.com
More details. We can't downscale to 1min with time_slice_format as:
1. It introduces storage overhead in our archive (it's better to archive bigger files than small ones). But perhaps we can even omit this disadvantage as bytes today are very cheap;
2. It introduces more complex logic for EMR;
3. It makes 10x more files than for 10min that is reasonable for us when we want to make manual things.

Mr. Fiber

unread,
Dec 17, 2014, 7:39:39 AM12/17/14
to flu...@googlegroups.com
Uploaded files with 'flush_interval 10m' are like below:

logs/2014010123_0.gz
logs/2014010123_1.gz
logs/2014010123_2.gz
logs/2014010123_3.gz
logs/2014010123_4.gz
logs/2014010123_5.gz
logs/2014010200_0.gz

It is not enough for you?
Do you need actual time slices like below?

logs/201401012300.gz
logs/201401012310.gz
logs/201401012320.gz
logs/201401012330.gz
logs/201401012340.gz
logs/201401012350.gz
logs/201401020000.gz


Masahiro

Oleg Hordiychuck

unread,
Dec 17, 2014, 8:32:16 AM12/17/14
to flu...@googlegroups.com
Question is not on how to store files per 10 minutes. Question is how to upload data to S3 every 10 minutes. 

Kiyoto Tamura

unread,
Dec 18, 2014, 1:51:38 AM12/18/14
to flu...@googlegroups.com
Hi Oleg-

Just to make sure: what masa suggested creates a new file on S3 every 10 minutes (I wasn't aware of this myself), indexed 0 through 5. You can think of it through the 0th~5th 10 minute blocks within an hour. And this means that data is uploaded every 10 minutes as well.

Does that make sense?

Kiyoto

Oleg Hordiychuck

unread,
Dec 18, 2014, 3:36:32 AM12/18/14
to flu...@googlegroups.com
Well, I know that it's possible to upload 10 "1 minutes long" files every 10 minutes using flush_interval 10m and time_slice_format %Y-%m-%d-%H-%M. Definetely it's a workaround, but it introduces some overhead, pain for manual working with files as well as more complicated scripts (there is an easy API in EMR for working with 1 file and harder API for 10 files). There are also other pitfalls that are also solvable somehow. I don't say that current fluentd s3 API can't solve my task rather I say it introduces additional pain for me in developing, testing and supporting our software solution. So it would be nice if some day time_slice_format will be extended to support manual periods like: %Y-%m-%d-%H-%10M. Or even it's reasonable to divide this option into two, smth like: file_format: %Y-%m-%d-%H-%M and file_period: 10m. This will make things simplier for such user like me, but I'm not sure if someone except me needs this feature. 

Kiyoto Tamura

unread,
Dec 18, 2014, 2:37:46 PM12/18/14
to flu...@googlegroups.com
Oleg-


>Well, I know that it's possible to upload 10 "1 minutes long" files every 10 minutes using flush_interval 10m and time_slice_format %Y-%m-%d-%H-%M.

No, that would be 6 "10 minute long" files every hour.


> (there is an easy API in EMR for working with 1 file and harder API for 10 files)

I don't get this point. I thought you wanted separate files for every 10 minutes, right?

If your suggestion is being able to create files like

2014-12-20-00-10.txt
2014-12-20-00-20.txt
...

as opposed to

2014-12-20-00_0.txt
2014-12-20-00_1.txt
...

then I understand it. But I am confused as to how this relates to your point about EMR's API can work better with a single file than multiple files.

Kiyoto
Reply all
Reply to author
Forward
0 new messages