Limit RAM usage

1,366 views
Skip to first unread message

Christoph Puppe

unread,
Feb 28, 2017, 5:51:22 AM2/28/17
to Fluentd Google Group
Hi

I've searched and tried ... but nothing helped.

Env is: GCE Compute, f1-Micro for very small container like deployments. So RAM is very limited.

Checking top, ruby is eating RAM.
  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND   
10453 root      20   0  475636 126540   9352 S  0.7 21.0   0:05.74 ruby                                         
10448 root      20   0  145092  45944   4220 S  0.0  7.6   0:00.12 ruby  

I'm using the standard config, just added:
buffer_queue_limit 24

CGroup seem to be the only chance to limit this process. Any other ideas?

Thanks for takign the time to think about it! :)

Christoph

Christoph Puppe

unread,
Feb 28, 2017, 9:13:14 AM2/28/17
to Fluentd Google Group
Ok, this is not fun ...

Managed to limit the memory with systemd ...

mv the autogenerated service file to the regular place, add a memory limit and start. Limit is enforced.

Reactions:

50 M -> Service fails to load
150 M -> Service starts, but get's killed and restartes ... eating up the whole CPU
200 M -> all fine, service stays at 170+ MB ram and is happy

Goal to reduce memory usage is not reached.

Any ideas?

Or an alternative to fluentd to send the logs to stackdriver?

Mr. Fiber

unread,
Feb 28, 2017, 6:47:51 PM2/28/17
to Fluentd Google Group
Any ideas?


Could you try this setting?
If it doesn't help, your traffic needs 170MB average.


Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christoph Puppe

unread,
Mar 1, 2017, 5:29:24 AM3/1/17
to flu...@googlegroups.com
Thanks for the hint. First restart it didn't change anything. 

But while I'm at it I'll limit the CPU shares on fluentd as well. So when bursts come in, customers are served first and logfiles uploaded later. Only 300 CPUShares.

After restart of the service it went down to 140. Dunno if that's related.

Still think a third of ram for logfile uploads is too much :)

--
You received this message because you are subscribed to a topic in the Google Groups "Fluentd Google Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fluentd/v_Y_BIp3NGY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fluentd+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Gruss

Pluto   -   SysAdmin of Hades
Free information! Freedom through knowledge. Wisdom for all!! =:-)

Warten ohne zu hoffen. Sein ohne zu wollen. Handeln ohne Absicht. Tao

Kiyoto Tamura

unread,
Mar 1, 2017, 1:59:40 PM3/1/17
to flu...@googlegroups.com
Christoph,

Can you share the config file? Typically, users use memory-based buffering inadvertently and that can lead to high memory footprint. This is typical of people coming to us from k8s/GCP as many configs out in the wild defaults to memory-based buffering (as opposed to disk-based).

Kiyoto
Refer a friend to Treasure Data and get paid! Ask me about our referral program =)

Christoph Puppe

unread,
Mar 2, 2017, 5:06:32 AM3/2/17
to flu...@googlegroups.com
hi

I've changed to a 1,7 GB RAM instance, needed a stable service, but at this size per node, autoscaling becomes quite expensive. And my service doesn't need that big an instance, paying 3 times the money just for log aggregation seems uneffective to do :(

Now fluentd is at 300 MB:

 Process: 24980 ExecStart=/etc/init.d/google-fluentd start (code=exited, status=0/SUCCESS)
    Tasks: 105
   Memory: 295.8M
      CPU: 4.527s

my google-fluentd.conf
<match **>
  type google_cloud
  # Set the chunk limit conservatively to avoid exceeding the limit
  # of 10MB per write request.
  buffer_chunk_limit 2M
  buffer_queue_limit 24
  flush_interval 5s
  # Never wait longer than 5 minutes between retries.
  max_retry_wait 300
  # Disable the limit on the number of retries (retry forever).
  disable_retry_limit
  # Use multiple threads for processing.
  num_threads 8
  buffer_type file
  buffer_path /var/log/fluent/myapp.*.buffer
</match>
And trying to limit memory in google-fluentd.service with
[Service]
MemoryHigh=150M
CPUShares=300
Type=forking

Xavi Ramirez

unread,
May 8, 2017, 9:50:28 PM5/8/17
to Fluentd Google Group
I'm running into something similar.   It feels like there's no way to limit fluentd's memory consumption.  I'm using fluentd to forward logs to a kinesis stream.  Memory utilization seems reasonable when I first start the container, but if there's any type of disruption (i.e. if it can't send logs for a while), fluentd's memory seems to grow without bound while catching up.  Once it has caught up, it continues to use an exorbitant amount of memory (~1gb).  It doesn't goes back down.  Even after 24 hours.  Reducing the buffer_queue_limit has no effect.

Here's what my config looks like:

<source>
  @type monitor_agent
  bind 0.0.0.0
  port 9001
</source>

<source>
  @type tail
  path "#{ENV['LOG_FILE']}"
  tag logfile
  pos_file "#{ENV['LOG_FILE_POS']}"
  format none
</source>

<match logfile>
  @type kinesis_producer

  # no explicit credential config
  # the ruby aws lib should end up using AWS_* env vars in test,
  # ECS role in dev and prod
  region "#{ENV['KINESIS_REGION']}"
  stream_name "#{ENV['KINESIS_STREAM']}"

  # format none in the tail source generates an event {"message":"<log line>"}
  # data_key tells the kinesis plugin to extract the "message" key in order to send just the log line
  data_key message

  # kinesis plugin recommends these fluentd buffer settings:
  buffer_chunk_limit 1m
  buffer_queue_full_action block
  flush_interval 1
  try_flush_interval 0.1
  queued_chunk_flush_interval 0.01
  num_threads 15

  buffer_queue_full_action block
  buffer_queue_limit 16

  <kinesis_producer>
    aggregation_enabled true
    log_level info
    record_max_buffered_time 300
    record_ttl 120000
    metrics_namespace "#{ENV['KINESIS_METRICS_NAMESPACE']}"
  </kinesis_producer>
</match>

I've also set RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR to 1.3.  Any pointers would be much appreciated.

Thanks,
Xavi
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Fluentd Google Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fluentd/v_Y_BIp3NGY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fluentd+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Gruss

Pluto   -   SysAdmin of Hades
Free information! Freedom through knowledge. Wisdom for all!! =:-)

Warten ohne zu hoffen. Sein ohne zu wollen. Handeln ohne Absicht. Tao

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Refer a friend to Treasure Data and get paid! Ask me about our referral program =)

--
You received this message because you are subscribed to a topic in the Google Groups "Fluentd Google Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fluentd/v_Y_BIp3NGY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fluentd+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mr. Fiber

unread,
May 9, 2017, 8:11:14 AM5/9/17
to Fluentd Google Group
I've also set RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR to 1.3.  

How about `0.9`?
And do you use td-agent or vanilla fluentd?

To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+unsubscribe@googlegroups.com.

Christoph Puppe

unread,
May 9, 2017, 8:30:39 AM5/9/17
to flu...@googlegroups.com
Currently the problem has started to kill all RAM in the can and
apache becomes unresponsive.

Sorry to say, but fleuntd will be deinstallend soon. Sadly stackdriver
has only this as a means to input data, so I'll loose that
functionality as well.

Xavi Ramirez

unread,
May 9, 2017, 2:33:37 PM5/9/17
to Fluentd Google Group
I tried setting RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR  to `0.9` and I'm seeing the same memory performance.

I'm using vanilla fluentd in a container.  Here's the Dockerfile:

```
FROM fluent/fluentd:v0.12-debian

# need access to /var/log, so run as root :-/
USER root

RUN fluent-gem install fluent-plugin-kinesis -v 1.1.3

ADD etc/config.conf /fluentd/etc/config.conf
```

Thanks again,
Xavi
Reply all
Reply to author
Forward
0 new messages