[URGENT] Failed to send logs. Size of the emitted data exceeds buffer_chunk

astro

unread,

Nov 25, 2015, 5:42:58 AM11/25/15

to Fluentd Google Group

I am new to td-agent. We're storing ~10-15GB logs daily in EFK(Elasticsearch, Fulentd, Kibana) stack.

But getting following warning in logs after some periodic time and no logs are sent to Elasticsearch:

2015-11-22 09:34:11 +0000 [warn]: Size of the emitted data exceeds buffer_chunk_limit.

2015-11-22 09:34:11 +0000 [warn]: This may occur problems in the output plugins ``at this server.``

2015-11-22 09:34:11 +0000 [warn]: To avoid problems, set a smaller number to the buffer_chunk_limit

2015-11-22 09:34:11 +0000 [warn]: in the forward output ``at the log forwarding server.``

Looking at this post I tried to change buffer_chunk_limit at both server side and agent side based on this:

"This error happened when your one chunk is larger than buffer_chunk_limit of destination server.

You should decrease the buffer_chunk_limit of agent server and

increase the buffer_chunk_limit of destination server."

But still it fails with same warning.

I have tried following combinations of buffer_chunk_limit :

Server: 8m

Agent-side:512k

Server:8m

Agent-side:4m

Server:50m

Agent-side:4m

But still it fails. I also get:

2015-11-25 04:17:53 +0000 [warn]: unexpected error in in_secure_forward error_class=Fluent::BufferQueueLimitError error=#<Fluent::BufferQueueLimitError: queue size exceeds limit>

There are close to 20 clients sending logs to td-agent server.

Kibana 4.2, ES 2.0.0. fluentd 0.12.12, 8GB Memory.

What can be the right value of buffer_chunk_limit and buffer_queue_limit(if required) at agent side and server side to avoid this issue as there is no any measure to set these values?

Any help will be appreciated.

Mr. Fiber

unread,

Nov 25, 2015, 11:15:15 AM11/25/15

to Fluentd Google Group

At first, please paste your forwarder side and aggregator side configuration here.

> 2015-11-22 09:34:11 +0000 [warn]: Size of the emitted data exceeds buffer_chunk_limit.

This is warning so it doesn't causes a 'failed to send logs'.

Is this warning in aggregator logs, right?

> 2015-11-25 04:17:53 +0000 [warn]: unexpected error in in_secure_forward error_class=Fluent::BufferQueueLimitError error=#<Fluent::BufferQueueLimitError: queue size exceeds limit>

"queue size exceeds limit" sometimes happens when your destination becomes slow

or has a problem.

I think your ES cluster can't handle your log traffic so fluentd's buffer is growing.

Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

astro

unread,

Nov 26, 2015, 2:09:55 AM11/26/15

to Fluentd Google Group

We've scaled up our ES instance. So as of now, not seeing any warnings.

Thanks, Masahiro.

Mr. Fiber

unread,

Nov 30, 2015, 3:20:26 AM11/30/15

to Fluentd Google Group

Good to hear.

Elasticsearch indexes incoming data by default so

it needs more resources for data import rather than

Hadoop, Treasure Data, BigQuery like data analytics platforms.

On Thu, Nov 26, 2015 at 4:09 PM, astro <andhar...@gmail.com> wrote:

We've scaled up our ES instance. So as of now, not seeing any warnings.

Thanks, Masahiro.

--

astro

unread,

Dec 9, 2015, 1:14:43 AM12/9/15

to Fluentd Google Group

Hi Masahiro,

The same issue arieses again. We have scaled up ES cluster from 8GB to 32GB. Now I don't see any errors on ES side.

Unexpectedly, I see same warnings in fluentd logs:

2015-12-08 15:08:58 +0000 [warn]: suppressed same stacktrace

2015-12-08 15:09:31 +0000 [warn]: Could not push logs to Elasticsearch, resetting connection and trying again. read timeout reached

2015-12-08 15:09:33 +0000 [info]: Connection opened to Elasticsearch cluster => {:host=>"172.6.7.45", :port=>9200, :scheme=>"http"}

2015-12-08 15:10:05 +0000 [warn]: Could not push logs to Elasticsearch, resetting connection and trying again. read timeout reached

2015-12-08 15:10:09 +0000 [info]: Connection opened to Elasticsearch cluster => {:host=>"172.6.7.45, :port=>9200, :scheme=>"http"}

2015-12-08 15:10:40 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2015-12-08 15:04:27 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Could not push logs to Elasticsearch after 2 retries. read timeout reached" plugin_id="object:3fdbcf2d2b6c"

The next retry is bit confusing with this log line:

2015-12-08 15:10:40 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2015-12-08 15:04:27 +0000

As suggested, I checked ES logs for this times-tamp but I found no errors, warnings in ES cluster. The cluster was functioning normally.

In all cases, whenever I restarted fluentd service, it started to send in logs to ES. I have observer this many time by now.

Is there any configurations in fluentd to try pushing logs more frequently?

What may be the issue?

In addition, this is how fluentd logs looks like when logs are not being forwarded:

015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.1.0/lib/fluent/plugin/out_elasticsearch.rb:184:in `rescue in send'

2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.1.0/lib/fluent/plugin/out_elasticsearch.rb:174:in `send'

2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.1.0/lib/fluent/plugin/out_elasticsearch.rb:168:in `write'

2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/buffer.rb:325:in `write_chunk'

2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/buffer.rb:304:in `pop'

2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/output.rb:321:in `try_flush'

2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/output.rb:140:in `run'

td-agent.conf at server :

type elasticsearch

logstash_format true

flush_interval 5s #debug

# buffer_chunk_limit 50m

host 172.19.23.34

port 9200

include_tag_key true

tag_key _key

index_name fluentd

type_name fluentd

</match>

Thanks in advance!

Mr. Fiber

unread,

Dec 9, 2015, 5:18:24 AM12/9/15

to Fluentd Google Group

How about sending events to multiple ESs?

One instance for receiving logs is not ES way.

--

astro

unread,

Dec 9, 2015, 8:12:49 AM12/9/15

to Fluentd Google Group

I have tried that few days back but due to many such issues I replaced it by single host.

Thanks!

astro

unread,

Dec 9, 2015, 8:30:35 AM12/9/15

to Fluentd Google Group

Update: We've ES cluster with 2 nodes.

astro

unread,

Mar 16, 2016, 1:41:19 AM3/16/16

to Fluentd Google Group

Hi Masahiro,

I see lots of [warn]: emit transaction failed: error_class=Fluent::BufferQueueLimitError error="queue size exceeds limit" messages in logs. This time, we're sending logs to multiple ES servers. ES servers are having sufficient memory and look completely fine.

Using fluentd v0.12.12

Is there any way to handle this issue?

Thanks!

Mr. Fiber

unread,

Mar 16, 2016, 5:12:53 AM3/16/16

to Fluentd Google Group

> ES servers are having sufficient memory and look completely fine.

Did you store buffer size history of monitor_agent?

Importing logs to Elasticsearch takes a bit log time with large traffic.

So if your traffic has a spike, it may be a cause.

Or if you have a network trouble, flushing buffer is delayed.

--

astro

unread,

Apr 8, 2016, 4:23:06 AM4/8/16

to Fluentd Google Group

Did you store buffer size history of monitor_agent?

> No

Recently we are facing this issue more frequent. On other side, Elasticsearch clsuter is healthy and has lot of free memory and comapratively less data.

Using fluentd v0.12.12.

Seeing lots of emit transaction failed: error_class=Fluent::BufferQueueLimitError error="queue size exceeds limit" in logs.

Is there any quick workaround to avoid this? We restart td-agent in such case to make it send logs again.

Thanks.

Mr. Fiber

unread,

Apr 8, 2016, 6:05:48 AM4/8/16

to Fluentd Google Group

> Elasticsearch clsuter is healthy

How about CPU usage of Fluentd and Elasticsearch?

And bandwidth is also no problem?

If there are no error / warning logs in fluentd logs,

Flushing speed is slower than receiving logs.

--

astro

unread,

Apr 8, 2016, 6:16:29 AM4/8/16

to Fluentd Google Group

How about CPU usage of Fluentd and Elasticsearch?

> There is very low CPU usage on both.

And bandwidth is also no problem?

> No any issue

If there are no error / warning logs in fluentd logs

> Seeing lots of emit transaction failed: error_class=Fluent::BufferQueueLimitError error="queue size exceeds limit" in td agent logs.

astro

unread,

Apr 8, 2016, 9:23:18 AM4/8/16

to Fluentd Google Group

Below are the more verbose logs:

2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/in_secure_forward.rb:256:in `on_message'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:124:in `on_read'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:185:in `feed_each'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:185:in `block in start'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:178:in `loop'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:178:in `start'
2016-04-08 13:17:35 +0000 [warn]: unexpected error in in_secure_forward error_class=Fluent::BufferQueueLimitError error=#<Fluent::BufferQueueLimitError: queue size exceeds limit>
2016-04-08 13:17:35 +0000 [warn]: emit transaction failed: error_class=Fluent::BufferQueueLimitError error="queue size exceeds limit" tag="nginx-ip-10.0.22.243"
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/buffer.rb:189:in `block in emit'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/monitor.rb:211:in `mon_synchronize'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/buffer.rb:179:in `emit'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/output.rb:251:in `emit'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/event_router.rb:88:in `emit_stream'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/engine.rb:116:in `emit_stream'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/in_secure_forward.rb:256:in `on_message'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:124:in `on_read'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:185:in `feed_each'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:185:in `block in start'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:178:in `loop'
2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:178:in `start'
2016-04-08 13:17:35 +0000 [warn]: unexpected error in in_secure_forward error_class=Fluent::BufferQueueLimitError error=#<Fluent::BufferQueueLimitError: queue size exceeds limit>

Is there any solution/workaround on this? This is being more severe and causing instability in dashboard app.

Thanks.

astro

unread,

Apr 8, 2016, 3:30:31 PM4/8/16

to Fluentd Google Group

Any thoughts on this?

Mr. Fiber

unread,

Apr 8, 2016, 7:10:50 PM4/8/16

to Fluentd Google Group

I want to know how long will elasticsearch plugin take to flush events.

Could you insert following '+' lines to your elasticsearch plugin?

diff --git a/lib/fluent/plugin/out_elasticsearch.rb b/lib/fluent/plugin/out_elasticsearch.rb
index fdc52ed..589f27c 100644
--- a/lib/fluent/plugin/out_elasticsearch.rb
+++ b/lib/fluent/plugin/out_elasticsearch.rb
@@ -189,6 +189,7 @@ class Fluent::ElasticsearchOutput < Fluent::BufferedOutput
end

def write(chunk)
+ start = Time.now
bulk_message = []

chunk.msgpack_each do |tag, time, record|
@@ -229,6 +230,8 @@ class Fluent::ElasticsearchOutput < Fluent::BufferedOutput

send(bulk_message) unless bulk_message.empty?
bulk_message.clear
+
+ log.info "Flushing event took #{Time.now - start} seconds"
end

def send(data)

BTW, if you have a problem with buffer,

collecting buffer metrics with in_monitor_agent is good for investigation, e.g. when growing the buffer.

http://docs.fluentd.org/articles/monitoring#monitoring-agent

On Sat, Apr 9, 2016 at 4:30 AM, astro <andhar...@gmail.com> wrote:

Any thoughts on this?

Reply all

Reply to author

Forward

[URGENT] Failed to send logs. Size of the emitted data exceeds buffer_chunk_limit

astro

Mr. Fiber

astro

Mr. Fiber

astro

Mr. Fiber

astro

astro

astro

Mr. Fiber

astro

Mr. Fiber

astro

astro

astro

Mr. Fiber