Handling High-traffic Load Questions

1,100 views
Skip to first unread message

Huy Nguyen

unread,
Oct 15, 2013, 5:40:54 AM10/15/13
to flu...@googlegroups.com
We're revisitting our Heroku fluentd and moving it to dedicated box / EC2 using td-agent / fluentd. Currently we are getting on average about 80,000 records per minute but expect to grow.

I have a few questions, hope anyone could help answer:

1/ What kind of monitoring software could I use to see (or get informed) when the node starts dropping records due to high-traffic? Or this shouldn't be the case at all?

2/ For this diagram about log forwarder - log aggregator ( http://docs.fluentd.org/articles/high-availability )

a) Why is it recommended to have log-forwarder nodes? What's the downside of not having them and have the application nodes send traffic directly to log-aggregator (through either http or tcp).

b) Does this model also support load-balancing (i.e when the main node starts to choke it starts sending traffic to backup node?).

3/ How would you suggest to scale up if the throughput gets over the limit stated (18,000 records / second)? Load balancing?


Thanks a lot!
Huy

Satoshi Tagomori

unread,
Oct 16, 2013, 6:54:10 AM10/16/13
to flu...@googlegroups.com
Hi Huy,

I may help you for some suggestion, because of handling over 100,000 records/seconds...

1. For monitoring of missing of records, we have 2 ways like this:
  * watching Fluentd's logs with configuration such like '<match fluent.**>'
  * flow counting by 'fluent-plugin-flowcounter' and put it to graphs

2a. log-forwarder node are used to provide these features:
  * buffering in downtimes of network between nodes and log-aggregators
  * forwarding with load balancing and/or active-standby supports
    (some of fluent-logger libraries doesn't have these features)

2b. Of course, yes.

3. Scaling up of fluentd throughput is very hard, and scaling out
    by load balancing is very simple and easy to do. (And we do so.)   

2013年10月15日火曜日 18時40分54秒 UTC+9 Huy Nguyen:

Huy Nguyen

unread,
Oct 21, 2013, 4:02:30 AM10/21/13
to flu...@googlegroups.com
I just figured out the answer to this question:

a) Why is it recommended to have log-forwarder nodes? What's the downside of not having them and have the application nodes send traffic directly to log-aggregator (through either http or tcp).

So that we can easily set up cascading fail-over.


But I'm still in doubt about other questions, especially the performance ones. Anyone fron fluentd contributor can give some thoughts? :)



Sadayuki Furuhashi

unread,
Oct 21, 2013, 3:28:37 PM10/21/13
to flu...@googlegroups.com
Hi Huy,

Regarding performance, one problem current fluentd has is that it's difficult to scale up on
a single server because ruby can't take advantage of multiple CPU cores. So, we need to
either running multiple fluentd processes or scale out to improve the performance.

As far as I know, Satoshi is using a custom init script to run multiple fluentd processes on
a single server.

I created a plugin that runs multiple fluentd processes on a fluentd process:
https://github.com/frsyuki/fluent-plugin-multiprocess

The next major version will include multi-process functionality.
Here is the code of the next major version: https://github.com/fluent/fluentd/tree/v11

--
Sadayuki Furuhashi
http://fluentd.org http://msgpack.org
twitter:@frsyuki
> --
> You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Huy Nguyen

unread,
Oct 22, 2013, 8:33:50 PM10/22/13
to flu...@googlegroups.com
Hi Sada and Satoshi,

Somehow Google Groups didn't send me email notification when there's reply to the topic so I missed your answers until now. Thanks a lot for the answers, they cover my doubts very well.

Cheers,
Huy

Ed James

unread,
Jun 12, 2015, 9:31:29 AM6/12/15
to flu...@googlegroups.com
Hi

I have some questions regarding the network topology diagram here: 

My understanding here is that the aggregators are setup as master/slave rather than N+1. So all messages are sent to the master aggregator, and in the event of that server going down, the slave will then start receiving messages.

Assuming this is correct, what happens if the master goes down because of extremely high load? If that load remains constant surely the slave(s) will also go down for the same reason?

Instead of this is it possible to rather have all the forwarders simply forward to a load balancer. Behind the load balancer we could have N+1 aggregators which would all receive a balanced proportion of the load. I'm hoping that this would then allow us to easily add more aggregators in the event of extremely high load, and similarly shut down aggregators when we don't need them, making our "aggregator layer" elastic.

Has anyone done something like this before?

I would really appreciate any help/advice on this.

Many thanks,
Ed.

Lance N.

unread,
Jun 12, 2015, 10:10:19 PM6/12/15
to flu...@googlegroups.com
"N+1 aggregators": that's roughly what I have. I set up the 'pen' load balancer (you could use HAProxy also) as a front end to several Fluentd instances. These are all on the same server. CRuby does not support multiple CPUs and so to run Fluentd on a modern CPU I had to use a load balancer.

"Elastic aggregator": I've heard of people doing exactly this with Amazon's load balancer/autostart group system. I'm not interested because
1) it takes time for new servers to start, and 
2) sometimes Amazon can't get you new servers.

My logging requirements are that they are never ever dropped. I can't tolerate the uncertainty. 

If you're on Amazon you might look at using Kinesis as your transport. It guarantees to keep messages for 24 hours.

Mr. Fiber

unread,
Jun 15, 2015, 12:16:31 AM6/15/15
to flu...@googlegroups.com
Has anyone done something like this before?

Many users use similar approaches with load balancer middleware.
Lance mentioned ELB. I think its popular on AWS environment.

Fluentd's forward plugin has own load balancing feature by weight option.


Some users use this with enough aggregators.


Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages