Aggregated vs Not Aggregated?

61 views
Skip to first unread message

Vladimir Bašić

unread,
Feb 14, 2015, 4:07:24 PM2/14/15
to flu...@googlegroups.com
Hi everyone,

Just a quick question:

what are pros/cons for sending logs to AWS S3 directly or having one more node in between aggregating everything (http://bit.ly/1FbniZS) and forward it to S3?

Thanx a lot! :)

Vladimir Bašić

unread,
Feb 14, 2015, 4:13:54 PM2/14/15
to flu...@googlegroups.com
Just to be more precise, in my case I will only need to process the data by Hadoop directly from S3. I do not need other storage options like ElasticSearch in example. I only need to store logs from many servers on S3.

Kiyoto Tamura

unread,
Feb 14, 2015, 4:20:22 PM2/14/15
to flu...@googlegroups.com
Hi Vlad-

The advantage of forwarder-aggregator pattern is division of labor: collecting data from various data sources (input) and compressing and uploading data to external storage systems (output) can be both CPU intensive: by decoupling the two to forwarders and aggregators, one can make better use of CPU resources and scale out.

You can try running without forwarder/aggregator first, and if you start running into issues, consider the forwarder/aggregator approach.

Kiyoto

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Find out how Wish.com, the world's largest mobile shopping site, builds its recommendation engine.

Vladimir Bašić

unread,
Feb 14, 2015, 4:58:23 PM2/14/15
to flu...@googlegroups.com
Hi Kiyoto,

This info is exactly what I was needed!

Thanx for the quick replay!!! :)

Lance N.

unread,
Feb 16, 2015, 12:51:46 AM2/16/15
to flu...@googlegroups.com
The disadvantage is reliability. 

Fluentd does not implement backpressure. If any stage in a message-passing chain cannot deliver or buffer all the messages it needs to pass forward, it drops those messages and continues to accept new ones. If you have two programs on two different servers with a network between them, your reliability has just dropped. You now have three links in your chain instead of one, and one weak link will break the chain. And, sometimes it seems like they all compete to be the weakest link :)

Cheers,

Lance

Kiyoto Tamura

unread,
Feb 16, 2015, 1:31:35 AM2/16/15
to flu...@googlegroups.com
Lance-


>Fluentd does not implement backpressure.

However, the "backpressure" approach has its own drawback: the application needs to keep sending data until the pressure comes down: otherwise, data will be lost. Fluentd's design encourages the application side to "log and forget".

Kiyoto

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mr. Fiber

unread,
Feb 16, 2015, 2:22:23 AM2/16/15
to flu...@googlegroups.com
Yeah. Fluentd's approach is if one stream becomes over capacity, use another stream.
So fluentd's has load-balancing, secondary, at-least-once since v0.12 and etc to avoid data lost.

BTW, non-block / async backpressure is challenging problem.


Masahiro
Reply all
Reply to author
Forward
0 new messages