Fluentd aggregator performance

774 views
Skip to first unread message

Manoj Muraleedharan

unread,
Apr 8, 2016, 2:46:28 AM4/8/16
to Fluentd Google Group
I have used td agent for forwader as well as aggregator. I want to ensure that aggregator fuentd process atleast 25000 lines per second of forwarded logs. I used multi process plugin in fluentd aggregator with num_threads to 2 for each process. i have configured 4 process using multi process plugin. In forwader i used tail plugin to forward the logs to aggregator with flush interval of 5 sceonds (total line of log file is 100000 and size is 18MB). But i have get an average of 5000 lines per second in aggregator to process and generate buffer files.

Config files

#Forwarder 01

<source>
 type tail
 path /logs/tester/testlog.log
 pos_file /var/log/td-agent/file.log.pos
 read_lines_limit 25000
 read_from_head true

 format none
 tag squid.secure
 log_level trace
</source>

# To aggregator nodes
<match squid.secure>
 type secure_forward
 self_hostname Forwarder1
 shared_key    MyKey
 secure yes
 ca_cert_path /etc/pki/tls/certs/ca_cert.pem
 enable_strict_verification no
#buffer
 buffer_type file
 buffer_path /logs/tester/buffer/td
 buffer_chunk_limit 32m
 buffer_queue_limit 4000
 flush_interval 5s
 retry_wait 1m
 log_level trace
 num_threads 2
 <server>
  host 192.168.4.25 # or IP
  port 24224
 </server>
 <server>
  host 192.168.4.25  # or IP
  port 24225
 </server>
 <server>
  host 192.168.4.25  # or IP
  port 24226
 </server>
 <server>
  host 192.168.4.25  # or IP
  port 24227
 </server>
 <secondary>
 type file
  path /logs/tester/failed/fail
 </secondary>
</match>



#Aggregator

<source>
 type secure_forward
 bind 0.0.0.0 # default
 port 24224 # default
 self_hostname Aggregator1
 shared_key    MyKey
 secure yes
 ca_cert_path        /etc/pki/tls/certs/ca_cert.pem
 ca_private_key_path /etc/pki/tls/certs/ca_key.pem
 ca_private_key_passphrase ----------------
 log_level trace
</source>


<source>
  @type multiprocess

 <process>
    cmdline -c /etc/td-agent/td-agent1.conf --log /var/log/td-agent/td-agent1.
log
    sleep_before_start 3s
    sleep_before_shutdown 5s
  </process>
 <process>
    cmdline -c /etc/td-agent/td-agent2.conf --log /var/log/td-agent/td-agent2.log
   sleep_before_start 3s
    sleep_before_shutdown 5s
  </process>
 <process>
    cmdline -c /etc/td-agent/td-agent3.conf --log /var/log/td-agent/td-agent3.log
    sleep_before_start 3s
    sleep_before_shutdown 5s
  </process>
 log_level trace
</source>

.....

.......
<match anonymizer.secure>
type azurestorage
......
.......

#buffering
 buffer_type file
 buffer_path /datadrive/buffer/td
 buffer_chunk_limit 32m
 buffer_queue_limit 16384
 flush_interval 2m
 retry_wait 1m
 num_threads 2
 log_level trace
</match>


Forwarder machine :- 7GB RAM, Total CPU cores -2
AAggregator machine- 14GB  RAM, Total CPU cores - 4
can anyone help to improve aggregator performance to process 25000 lines per second

Mr. Fiber

unread,
Apr 8, 2016, 6:01:13 AM4/8/16
to Fluentd Google Group
At first, does forwarder send 25000 lines per / second to aggregator?
You can check the traffic by inserting flowcounter_simple filter.

<source>
  @type tail
</source>

<filter>
  @type flowcounter_simple
</filter>
 
<match squid.secure>
  @type secure_forward
</match>

read_lines_limit 25000

We don't recommend this setting.
Large read_lines_limit creates lots of temporary object.
I think default setting, 1000, is enough on almost cases.


Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Manoj Muraleedharan

unread,
Apr 8, 2016, 8:02:50 AM4/8/16
to Fluentd Google Group
I have installed the plugin flowcounter_simple on both forwarder and aggregate machine
Run a sample test and statistics is given below
Forwrder log:
2016-04-08 11:48:55 +0000 [debug]: ssl session connected host="192.168.4.25" port=24225
2016-04-08 11:48:55 +0000 [debug]: on_read
2016-04-08 11:48:55 +0000 [debug]: checking helo
2016-04-08 11:48:55 +0000 [debug]: generating ping
2016-04-08 11:48:55 +0000 [debug]: ssl session connected host="192.168.4.25" port=24226
2016-04-08 11:48:55 +0000 [debug]: ssl session connected host="192.168.4.25" port=24226
2016-04-08 11:48:56 +0000 [info]: plugin:out_flowcounter_simple count:25000    indicator:num    unit:second
2016-04-08 11:48:56 +0000 [debug]: on_read
2016-04-08 11:48:56 +0000 [debug]: checking pong
2016-04-08 11:48:56 +0000 [info]: connection established to 192.168.4.25


Aggregator log:

2016-04-08 11:58:15 +0000 [debug]: on_read
2016-04-08 11:58:16 +0000 [info]: plugin:out_flowcounter_simple count:6806      indicator:num   unit:second
2016-04-08 11:58:17 +0000 [info]: plugin:out_flowcounter_simple count:12556     indicator:num   unit:second
2016-04-08 11:58:18 +0000 [info]: plugin:out_flowcounter_simple count:12724     indicator:num   unit:second
2016-04-08 11:58:19 +0000 [info]: plugin:out_flowcounter_simple count:12403     indicator:num   unit:second
2016-04-08 11:58:20 +0000 [info]: plugin:out_flowcounter_simple count:13072     indicator:num   unit:second
2016-04-08 11:58:21 +0000 [info]: plugin:out_flowcounter_simple count:12439     indicator:num   unit:second
2016-04-08 11:58:22 +0000 [info]: plugin:out_flowcounter_simple count:5008      indicator:num   unit:second
2016-04-08 11:58:23 +0000 [info]: plugin:out_flowcounter_simple count:1 indicator:num   unit:second
2016-04-08 11:58:24 +0000 [info]: plugin:out_flowcounter_simple count:1 indicator:num   unit:second
2016-04-08 11:58:25 +0000 [info]: plugin:out_flowcounter_simple count:1 indicator:num   unit:second




[Sample log file used - Total number of lines in log 25000 ]


more than 5 seconds time is taken to write the buffer file on agrregator buffer

Can you give any suggestion to improve the performance.

Mr. Fiber

unread,
Apr 8, 2016, 8:27:54 AM4/8/16
to Fluentd Google Group
more than 5 seconds time is taken to write the buffer file on agrregator buffer

forwarder's flush_interval is 5s so it seems normal behaviour.

> <match anonymizer.secure>

Tag is changed from squid.secure. Do you apply several filters to received events?


Masahiro

Manoj Muraleedharan

unread,
Apr 8, 2016, 8:41:26 AM4/8/16
to Fluentd Google Group
The above test is run on a 10 sec flush interval on forwarder and 30 sec flush interval in aggregator.
i have used filters on received events and they contain very complex regular expression

(((?<data1>.*?))(\s*) ((?<data2>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))) ((?<data18>(?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9]))) ((?<data3>\b\w+\b))/((?<data4>(?:[+-]?(?:[0-9]+)))) ((?<data5>(?:[+-]?(?:[0-9]+)))) (((?<data6>\b\w+\b))|(.*)) (((?<data7>[A-Za-z]+(\+[A-Za-z+]+)?)://)?(?<data8>(?:\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)|(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?|(?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9]))))(?::(?<port>\b(?:[1-9][0-9]*)\b))?(?:(?<uri_param>\S+)|)|(.*)) (?:(((?<data10>(?:[+-]?(?:[0-9]+))))\\((?<data15>(?:[+-]?(?:[0-9]+))))\\)|)((?<user>.*?)) ((?<data9>\b\w+\b))/((?<data13>.*?)|-) ((?<data14>.*?)) ((((?<daat12>\b\w+\b)|-)/((?<data11>(?:[+-]?(?:[0-9]+))))/((?<daat16>(?:[+-]?(?:[0-9]+))))/((?<data17>(?:[+-]?(?:[0-9]+)))|-))|))/

Mr. Fiber

unread,
Apr 8, 2016, 8:49:34 AM4/8/16
to Fluentd Google Group
How about the performance without this filter?

BTW, this regexp seems toooooo complex.
Is this pattern generated by regular expression generator?

Message has been deleted

Manoj Muraleedharan

unread,
Apr 11, 2016, 3:06:10 AM4/11/16
to Fluentd Google Group
After disabling regex processing it seens that fluentd aggregator has shown high performance on log lines processing. In design of our system multiple forwarders are used to pull the stream of live logs to aggregator machine. So all of the forwarder machines generates more than 25000 line per second. The above regex is used for parsing of log data and appending some data before it send to azure blob storage.

Manoj Muraleedharan

unread,
Apr 11, 2016, 6:03:35 AM4/11/16
to Fluentd Google Group
But we need that regex for validating and parsing the fileds from logs, so ther is any other alternative to improve the performance of fluentd.

Mr. Fiber

unread,
Apr 11, 2016, 7:10:05 PM4/11/16
to Fluentd Google Group
I assume your regexp consumes lots of CPU resources.
Ruby's regexp engine is written in C but your regexp is tooo complex...
There are several approaches.

- Improve regex pattern. This regex is slow, lots of grouping, lookbehind and etc.
  From my experience, application log doesn't have such complex pattern so you may get more effective pattern.
- Moved parsing routine to forwarder side.
- Try out_exec / out_exec_filter to call high-performance external program using above regexp.

Manoj Muraleedharan

unread,
Apr 12, 2016, 1:32:07 AM4/12/16
to Fluentd Google Group
Thanks for your suggestion.
Our log file contains 20 fields and it contain unix timestamp , urls (sometimes urls contain more than 8k of characters) etc.
Can we use the parser plugin in fluentd forwarder for applying regular expression for format logs. If we are able to use parser plugin on forwarder how to know the mismatched lines in parser.

Mr. Fiber

unread,
Apr 12, 2016, 3:48:43 AM4/12/16
to Fluentd Google Group
If you use in_tail plugin, format parameter can take regular expression.
Do you use format none in fluentd forwarder side?

Manoj Muraleedharan

unread,
Apr 12, 2016, 4:20:45 AM4/12/16
to Fluentd Google Group
Currently use format none in forwarder config. If any other impact if we apply the regex in forwarder (cpu / memory resource usage), because some other applications are running on forwarder machines that processes are have average usage of  40% on CPU and 12% on memory (these application are resides in memory always and the logs of these application are we currently processing).

Mr. Fiber

unread,
Apr 18, 2016, 5:53:09 PM4/18/16
to Fluentd Google Group
Sorry for missing reply.

 If any other impact if we apply the regex in forwarder (cpu / memory resource usage)

If using regxep consumes lots of resources, using fluent-agent-hydra is another option.
Latest hydra supports regexp parsing.


Reply all
Reply to author
Forward
0 new messages