Memory Usage - Fluentd/Ruby

242 views
Skip to first unread message

Daniel Fisicaro

unread,
Jun 29, 2020, 12:58:20 PM6/29/20
to Fluentd Google Group
Hi All,

We seem to be having an ongoing issue with Fluentd (td-agent) and memory usage on our three Fluentd Aggregators. We have been working on tweaking our environment but keep experiencing the same problem with Memory Usage. Basically we can get the ingest to work for almost a week but then the Aggregators run out of memory and the ingest comes to a halt. From our monitoring graphs we can see that the memory % used on each Aggregator starts to increase slowly each day maybe between 5-10% per day up to a point where it consumes the majority of the memory available for itself and the OS where the ingest just comes to a halt. At this point we usually need to stop the Fluentd Forwarder, flush its bufffer, restart the Aggregators and make sure they are all started before we restart the ingest on our Forwarder. 

We don't see any issues with our file buffer or s3 buffer sizes at all so they are all working fine. 

Our three Aggregators have the following specs:

16 Core (Currently 8 workers assigned in our Fluentd Config)
32GB RAM

We have followed the suggested Performance Tuning Guide and topic related to Ruby GC but with no luck: https://docs.fluentd.org/deployment/performance-tuning-single-process

I added the following to the OS Environmental Variables:

RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.2

and:

flush_thread_count 4 (for the buffer for our ingest into Elastic Search)

Using td-agent version 3.7.0 (Amazon Linux 2).

Config per Aggregator attached below:

<worker 0-7> <source> @type forward port 24224 bind 0.0.0.0 <transport tls> cert_path /etc/td-agent/certs/fluentd.crt private_key_path /etc/td-agent/certs/fluentd.key private_key_passphrase ############ </transport> <security> self_hostname fluentd-aggregator shared_key ########### user_auth true <user> username test password ########## </user> </security> </source> <filter ######.all> @type parser key_name message <parse> @type grok grok_name_key grok_name grok_failure_key grokfailure # reserve_data true # reserve_time true <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> pattern %{################} </grok> <grok> name "################" pattern %{################} </grok> <grok> name "################" pattern %{################} </grok> </parse> </filter> <filter ######.all> @type elasticsearch_genid hash_id_key _hash </filter> <match ######.all> @type copy <store> @type s3 store_as gzip_command s3_bucket ############ path ############/ s3_region us-west-2 s3_object_key_format %{path}################_%{time_slice}_%{index}.%{file_extension} <buffer> @type file path /var/log/td-agent/s3_######## chunk_limit_size 16MB total_limit_size 5120MB flush_at_shutdown true timekey 30 timekey_use_utc true timekey_wait 10s </buffer> </store> <store> @type elasticsearch host ################ user ################ password ################ include_timestamp true time_key @timestamp id_key _hash # specify same key name which is specified in hash_id_key remove_keys _hash # Elasticsearch doesn't like keys that start with _ index_name ################ log_es_400_reason true include_tag_key true include_timestamp true time_key @timestamp tag_key @log_name reconnect_on_error true reload_on_failure true reload_connections false <buffer> @type file path /var/log/td-agent/buffer_################ # chuck + enqueue total_limit_size 10240MB chunk_limit_size 16MB flush_at_shutdown true flush_mode interval flush_thread_count 4 flush_interval 5s retry_timeout 1h retry_max_interval 30 overflow_action drop_oldest_chunk </buffer> <secondary> @type secondary_file directory /var/log/td-agent/error basename ################.* </secondary> </store> </match> </worker> <system> # equal to -qq option # log_level debug workers 8 </system>



Thanks,

Daniel

Mr. Fiber

unread,
Jun 30, 2020, 9:59:40 PM6/30/20
to Fluentd Google Group
> I added the following to the OS Environmental Variables:

What does "OS environment variables" mean?
Does systemd get envvars from it?

> From our monitoring graphs we can see that the memory % used on each Aggregator starts to increase slowly each day 

Hmm... I heard dentry cache causes similar behavior before but I'm not sure it still happens or not.


--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fluentd/b0a6080f-177c-4500-8ad6-96acc0b18dd8o%40googlegroups.com.
Message has been deleted

Daniel Fisicaro

unread,
Jul 2, 2020, 8:05:29 AM7/2/20
to Fluentd Google Group
Thanks for the response. 

On the fluentd site it says: 

Ruby has several GC parameters to tune GC performance and you can configure these parameters via environment variable.
So I added the them to the environment variables of the Linux OS. But it does not help.

So if you type: env
It then shows it in the the list of variables defined, and started fluentd with that set.

We have the same issue no matter what version of td-agent we use (3.7.0, 3.8.0 we testing now). The memory constantly increases where its almost to 100% and only releases when it gets to that point. The ingest can keep working at times but it would be good if we could work out a way to control it.

I'll check about dentry cache.

Any other suggestions are appreciated. We have seen a few articles related to this issue.
To unsubscribe from this group and stop receiving emails from it, send an email to flu...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages