Strange memory leak issue

33 views
Skip to first unread message

Δημήτρης Χαλάτσης

unread,
Dec 14, 2020, 4:49:21 AM12/14/20
to Fluentd Google Group
Hi guys,

I am new to FluentD and trying to manage a handed over configuration that seems to face a memory leak problem. I have also created an issue, but I have come to believe that it might be related to configuration and not an actual bug since we can reproduce from FluentD version 1.6.3 -> 1.11.4, so it is a little bit unprobable for such a major bug not to be noticed in all those versions.

I will just quote the whole Github issue here for the time being, and if needed delete the actual issue or conversation (depending on which is the proper place for it). Thanks for understanding!


Describe the bug
We are up against a really strange - frustrating problem. I do not have any experience with FluentD at all, so I will try to give a representation as complete as possible.

We have deployed FluentD as a DaemonSet in a Kubernetes cluster. FluentD is configured to gather logs from multiple sources (Docker daemon, network, etc...) and send them to a hosted AWS ElasticSearch.

Along with the mentioned logging, we have in-app mechanisms that log directly to FluentD through a separate @type forward source created only for these in-app logging mechanisms, which is then forwarded through a match @type elasticsearch.

The problem is that this in-app log-flow creates a steady-but-slow memory leak on the node which it runs on.... The even stranger thing is that this leak is not happening in userspace application memory. Both the apps' and Fluentd's app memory remain stable. What is constantly increasing is kernel memory resulting in a constantly decreasing available memory of the node, until memory starvation problems begin. Note that I am referring to non-cache kernel memory that is not freed when requested. The applications are not that logging heavy. Max throughput should be around 10 loglines/sec from all together.

This is not hapenning with any of the other log configuration in Fluent where docker, system, kubernetes logs are scraped etc... If I turn off this in-app mechanism then there is no memory leak!

I have installed different monitoring tools on the server trying to see if some other metric's trend is related to the memory decrease... The only thing that I found matching a lot is IPv4 TCP memory usage, which kinda makes sense since this is how the in-app logs are sent to FluentD and also kernel related. However although the trend is similar, the actual memory amount does not match. In the screenshots attached below for the same time period, you can see that the system memory is decreased around 700MB while TCP memory usage increases only 30MB. However the trend is a complete match!

Any help with this problem would be really appreciated! Feel free to ask any extra information that you might need.

Below are the details about my configuration and set up.

To Reproduce
A simple pod running a NodeJS app sending directly logs to FluentD using the fluent-logger npm package is enough to cause the memory problem.

Expected behavior
I expect the kernel memory to remain stable when usage is also stable, as is the case with the rest of the logging configuration.

Your Environment

  • Fluentd or td-agent version: 1.11.4
  • Operating system: Debian GNU/Linux 9 (stretch)
  • Kernel version: 4.9.0-14-amd64
  • Kubernetes Version: v.1.16.15

Your Configuration
FluentD DaemonSet is deployed using latest (v11.3.0) chart version found in https://github.com/kokuwaio/helm-charts/blob/main/charts/fluentd-elasticsearch/Chart.yaml
Since there is a lot of configuration, I will only put here the relevant config that creates the problem. If all is needed let me know to paste it in a pastebin or sth....

FluentD Config: https://pastebin.com/g7CNrphr

Δημήτρης Χαλάτσης

unread,
Dec 14, 2020, 8:34:53 AM12/14/20
to Fluentd Google Group
Hi,

Also adding the screenshots referenced in the issue as well.

Available memory decrease: https://pasteboard.co/JESysQ6.png
TCP memory increase: https://pasteboard.co/JESyPPp.png

Thanks!

Mr. Fiber

unread,
Dec 16, 2020, 8:15:33 AM12/16/20
to Fluentd Google Group
I'm not familiar with fluent-logger node library.
Your graph shows TCP socket memory is increased.
Does this mean TCP connection is not released after sends to fluentd or
TCP connection is closed correctly but TCP socket memory is still increased?

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fluentd/feb5501f-2126-4e9f-81a7-e9b4fc9ca6dbn%40googlegroups.com.

Δημήτρης Χαλάτσης

unread,
Dec 16, 2020, 4:35:29 PM12/16/20
to Fluentd Google Group
There is a constant TCP connection from the app to fluent server. So no memory on extra sockets... Somehow memory usage is gathered on the app -> fluent tcp communication
I don't think that there is a problem actually with the NodeJS fluent-logger module, but since you mentioned it, I will to test something different producing similar traffic to fluent service.
Is there a ready-made dummy service to produce fluent traffic?
If not I will spin one up with Python probably...

Reply all
Reply to author
Forward
0 new messages