Fluentd Memory Usage

Bryan York

unread,

Oct 18, 2013, 5:41:25 PM10/18/13

to flu...@googlegroups.com

Hello,

I'm currently using Fluentd to parse Scribe messages and put them into ElasticSearch for use with Kibana.

My config:

type scribe

port 1463

msg_format json

</source>

type elasticsearch

logstash_format true

include_tag_key true

tag_key _key

buffer_type file

buffer_path /mnt/fluentd/fluentd.log

flush_at_shutdown true

buffer_chunk_limit 1m

buffer_queue_limit 100

flush_interval 30s # for testing

Plugins:

https://github.com/fluent/fluent-plugin-scribe

https://github.com/uken/fluent-plugin-elasticsearch

Everything works, except for two things:

1) Memory usage continues to leak. I've tried Ruby 1.9.3p448 and Ruby 2.0.0p195 and they both leak memory.

2) I can only __always__ parse JSON, or never parse JSON. I'd like to have the ability to parse both. I can only set one msg_format in my scribe source. (Maybe I should file a bug with the scribe plugin?)

I really would like to figure out how to deal with this memory leak I seem to have.

Thanks,

-Bryan

Kazuki Ohta

unread,

Oct 18, 2013, 6:01:05 PM10/18/13

to flu...@googlegroups.com

Bryan,

> 1) Memory usage continues to leak. I've tried Ruby 1.9.3p448 and Ruby 2.0.0p195 and they both leak memory.

Could you give me your `gem list`? I want to check the version of msgpack gem, because we know msgpack v0.4 has know fragmentation issue. We're recommending to use v0.5.

In addition to that, td-agent is using jemalloc to supress memory fragmentation.

> 2) I can only __always__ parse JSON, or never parse JSON. I'd like to have the ability to parse both. I can only set one msg_format in my scribe source. (Maybe

Yeah, we don't have that feature now. Please create an issue which describes what you want (I'm one of the maintainer of this plugin)

- https://github.com/fluent/fluent-plugin-scribe/issues?state=open

Thanks -K

Bryan York

unread,

Oct 18, 2013, 6:16:12 PM10/18/13

to flu...@googlegroups.com

root@prod-kibana-01:~# gem list --local

*** LOCAL GEMS ***

cool.io (1.1.1)

fluent-plugin-elasticsearch (0.1.4)

fluent-plugin-scribe (0.10.11)

fluentd (0.10.39)

http_parser.rb (0.5.3)

iobuffer (1.1.2)

json (1.8.0)

msgpack (0.5.6)

scribe-rb (2.2.1)

thrift (0.8.0)

yajl-ruby (1.1.0)

Also, I'm using straight fluentd, not td-agent. Is that ok?

Thanks,

-Bryan

Kazuki Ohta

unread,

Oct 18, 2013, 6:24:16 PM10/18/13

to flu...@googlegroups.com

Bryan,

Sounds like you're already using msgpack v0.5 :( td-agent is just a package with jemalloc. So should be fine.

Because we haven't had this problem before, we want to identify by some experiments.

Could you try out the following settings for a certain period of time? I want to identify whether or not the cause is Scribe plugin or others first.

type scribe

port 1463

msg_format json

</source>

type null

</match>

Sorry for the inconvenience. Thanks -K

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

:: Kazuki Ohta - [http://www.treasure-data.com]

:: Founder and CTO, Treasure Data, Inc

:: k...@treasure-data.com | +1-650-223-5679

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bryan York

unread,

Oct 18, 2013, 6:30:29 PM10/18/13

to flu...@googlegroups.com

I just tried that, and within 5 minutes I'm up to 4.0GB of resident memory already. We do log a lot of info here. (Like 90GB/day.) So there is a lot of data coming in. Additionally, fluentd's CPU usage is usually always at 100% CPU. (Single core, since it's single threaded.)

Thanks,

-Bryan

Kazuki Ohta

unread,

Nov 30, 2013, 7:12:53 AM11/30/13

to flu...@googlegroups.com, by...@jawbone.com

Bryan,

Maybe it's too late, but Treasure Data starts recommending in_multiproces plugin to the customers for solving CPU bottlenecks.

> http://docs.fluentd.org/articles/in_multiprocess

It's really simple but works just fine. We now have a customer who is collecting 10+ billion records / day (some TBs). Thanks -K

--
--------------------------------------------------
Kazuki Ohta: http://kzk9.net/

Reply all

Reply to author

Forward