[URGENT] Failed to send logs. Size of the emitted data exceeds buffer_chunk_limit

3,345 views
Skip to first unread message

astro

unread,
Nov 25, 2015, 5:42:58 AM11/25/15
to Fluentd Google Group
I am new to td-agent. We're storing ~10-15GB logs daily in EFK(Elasticsearch, Fulentd, Kibana) stack.

But getting following warning in logs after some periodic time and no logs are sent to Elasticsearch:

2015-11-22 09:34:11 +0000 [warn]: Size of the emitted data exceeds buffer_chunk_limit.
2015-11-22 09:34:11 +0000 [warn]: This may occur problems in the output plugins ``at this server.``
2015-11-22 09:34:11 +0000 [warn]: To avoid problems, set a smaller number to the buffer_chunk_limit
2015-11-22 09:34:11 +0000 [warn]: in the forward output ``at the log forwarding server.``

Looking at this post I tried to change buffer_chunk_limit at both server side and agent side based on this:

"This error happened when your one chunk is larger than buffer_chunk_limit of destination server.

You should decrease the buffer_chunk_limit of agent server and
increase the buffer_chunk_limit of destination server."

But still it fails with same warning. 

I have tried following combinations of buffer_chunk_limit :
Server: 8m
Agent-side:512k

Server:8m
Agent-side:4m

Server:50m
Agent-side:4m

But still it fails. I also get:

2015-11-25 04:17:53 +0000 [warn]: unexpected error in in_secure_forward error_class=Fluent::BufferQueueLimitError error=#<Fluent::BufferQueueLimitError: queue size exceeds limit>

There are close to 20 clients sending logs to td-agent server.

Kibana 4.2, ES 2.0.0. fluentd 0.12.12, 8GB Memory.


What can be the right value of buffer_chunk_limit and buffer_queue_limit(if required) at agent side and server side to avoid this issue  as there is no any measure to set these values?

Any help will be appreciated.






Mr. Fiber

unread,
Nov 25, 2015, 11:15:15 AM11/25/15
to Fluentd Google Group
At first, please paste your forwarder side and aggregator side configuration here.

> 2015-11-22 09:34:11 +0000 [warn]: Size of the emitted data exceeds buffer_chunk_limit.

This is warning so it doesn't causes a 'failed to send logs'.
Is this warning in aggregator logs, right?


> 2015-11-25 04:17:53 +0000 [warn]: unexpected error in in_secure_forward error_class=Fluent::BufferQueueLimitError error=#<Fluent::BufferQueueLimitError: queue size exceeds limit>

"queue size exceeds limit" sometimes happens when your destination becomes slow
or has a problem.
I think your ES cluster can't handle your log traffic so fluentd's buffer is growing.


Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

astro

unread,
Nov 26, 2015, 2:09:55 AM11/26/15
to Fluentd Google Group
We've scaled up our ES instance. So as of now, not seeing any warnings.

Thanks, Masahiro. 

Mr. Fiber

unread,
Nov 30, 2015, 3:20:26 AM11/30/15
to Fluentd Google Group
Good to hear.
Elasticsearch indexes incoming data by default so
it needs more resources for data import rather than
Hadoop, Treasure Data, BigQuery like data analytics platforms.


On Thu, Nov 26, 2015 at 4:09 PM, astro <andhar...@gmail.com> wrote:
We've scaled up our ES instance. So as of now, not seeing any warnings.

Thanks, Masahiro. 

--

astro

unread,
Dec 9, 2015, 1:14:43 AM12/9/15
to Fluentd Google Group
Hi Masahiro,

The same issue arieses again. We have scaled up ES cluster from 8GB to 32GB. Now I don't see any errors on ES side. 
Unexpectedly, I see same warnings in fluentd logs:

2015-12-08 15:08:58 +0000 [warn]: suppressed same stacktrace
2015-12-08 15:09:31 +0000 [warn]: Could not push logs to Elasticsearch, resetting connection and trying again. read timeout reached
2015-12-08 15:09:33 +0000 [info]: Connection opened to Elasticsearch cluster => {:host=>"172.6.7.45", :port=>9200, :scheme=>"http"}
2015-12-08 15:10:05 +0000 [warn]: Could not push logs to Elasticsearch, resetting connection and trying again. read timeout reached
2015-12-08 15:10:09 +0000 [info]: Connection opened to Elasticsearch cluster => {:host=>"172.6.7.45, :port=>9200, :scheme=>"http"}
2015-12-08 15:10:40 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2015-12-08 15:04:27 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Could not push logs to Elasticsearch after 2 retries. read timeout reached" plugin_id="object:3fdbcf2d2b6c" 

The next retry is bit confusing with this log line:

2015-12-08 15:10:40 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2015-12-08 15:04:27 +0000 

As suggested, I checked ES logs for this times-tamp but I found no errors, warnings in ES cluster. The cluster was functioning normally.
In all cases, whenever I restarted fluentd service, it started to send in logs to ES. I have observer this many time by now.

Is there any configurations in fluentd to try pushing logs more frequently? 

What may be the issue?

In addition, this is how fluentd logs looks like when logs are not being forwarded:

015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.1.0/lib/fluent/plugin/out_elasticsearch.rb:184:in `rescue in send'
  2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.1.0/lib/fluent/plugin/out_elasticsearch.rb:174:in `send'
  2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.1.0/lib/fluent/plugin/out_elasticsearch.rb:168:in `write'
  2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/buffer.rb:325:in `write_chunk'
  2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/buffer.rb:304:in `pop'
  2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/output.rb:321:in `try_flush'
  2015-12-08 14:57:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/output.rb:140:in `run'

td-agent.conf at server :

<match text**, test2** >
   type elasticsearch
   logstash_format true
   flush_interval 5s #debug
  # buffer_chunk_limit 50m
   host 172.19.23.34
   port 9200
   include_tag_key true
   tag_key _key
   index_name fluentd
   type_name fluentd
</match>


 Thanks in advance!

 

Mr. Fiber

unread,
Dec 9, 2015, 5:18:24 AM12/9/15
to Fluentd Google Group
How about sending events to multiple ESs?
One instance for receiving logs is not ES way.



--

astro

unread,
Dec 9, 2015, 8:12:49 AM12/9/15
to Fluentd Google Group
I have tried that few days back but due to many such issues I replaced it by single host.


Thanks!
 

astro

unread,
Dec 9, 2015, 8:30:35 AM12/9/15
to Fluentd Google Group


 
Update: We've ES cluster with 2 nodes.

astro

unread,
Mar 16, 2016, 1:41:19 AM3/16/16
to Fluentd Google Group
Hi Masahiro,

I see lots of [warn]: emit transaction failed: error_class=Fluent::BufferQueueLimitError error="queue size exceeds limit"  messages in logs. This time, we're sending logs to multiple ES servers. ES servers are having sufficient memory and look completely fine. 

Using fluentd v0.12.12

Is there any way to handle this issue?

Thanks!

Mr. Fiber

unread,
Mar 16, 2016, 5:12:53 AM3/16/16
to Fluentd Google Group
ES servers are having sufficient memory and look completely fine. 

Did you store buffer size history of monitor_agent?
Importing logs to Elasticsearch takes a bit log time with large traffic.
So if your traffic has a spike, it may be a cause.
Or if you have a network trouble, flushing buffer is delayed.


--

astro

unread,
Apr 8, 2016, 4:23:06 AM4/8/16
to Fluentd Google Group
 
Did you store buffer size history of monitor_agent?
> No

Recently we are facing this issue more frequent. On other side, Elasticsearch clsuter is healthy and has lot of free memory and comapratively less data.
Using fluentd v0.12.12
 Seeing lots of  emit transaction failed: error_class=Fluent::BufferQueueLimitError error="queue size exceeds limit"  in logs.

Is there any quick workaround to avoid this? We restart td-agent in such case to make it send logs again.

Thanks.
       

Mr. Fiber

unread,
Apr 8, 2016, 6:05:48 AM4/8/16
to Fluentd Google Group
Elasticsearch clsuter is healthy

How about CPU usage of Fluentd and Elasticsearch?
And bandwidth is also no problem?

If there are no error / warning logs in fluentd logs,
Flushing speed is slower than receiving logs.


--

astro

unread,
Apr 8, 2016, 6:16:29 AM4/8/16
to Fluentd Google Group




How about CPU usage of Fluentd and Elasticsearch?
> There is very low CPU usage on both.
And bandwidth is also no problem?
> No any issue 

If there are no error / warning logs in fluentd logs
> Seeing lots of  emit transaction failed: error_class=Fluent::BufferQueueLimitError error="queue size exceeds limit"  in  td agent logs.



astro

unread,
Apr 8, 2016, 9:23:18 AM4/8/16
to Fluentd Google Group
Below are the more verbose logs:

2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/in_secure_forward.rb:256:in `on_message'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:124:in `on_read'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:185:in `feed_each'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:185:in `block in start'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:178:in `loop'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:178:in `start'
2016-04-08 13:17:35 +0000 [warn]: unexpected error in in_secure_forward error_class=Fluent::BufferQueueLimitError error=#<Fluent::BufferQueueLimitError: queue size exceeds limit>
2016-04-08 13:17:35 +0000 [warn]: emit transaction failed: error_class=Fluent::BufferQueueLimitError error="queue size exceeds limit" tag="nginx-ip-10.0.22.243"
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/buffer.rb:189:in `block in emit'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/monitor.rb:211:in `mon_synchronize'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/buffer.rb:179:in `emit'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/output.rb:251:in `emit'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/event_router.rb:88:in `emit_stream'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/engine.rb:116:in `emit_stream'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/in_secure_forward.rb:256:in `on_message'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:124:in `on_read'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:185:in `feed_each'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:185:in `block in start'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:178:in `loop'
  2016-04-08 13:17:35 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.3.2/lib/fluent/plugin/input_session.rb:178:in `start'
2016-04-08 13:17:35 +0000 [warn]: unexpected error in in_secure_forward error_class=Fluent::BufferQueueLimitError error=#<Fluent::BufferQueueLimitError: queue size exceeds limit>

Is there any solution/workaround on this? This is being more severe and causing instability in dashboard app.

Thanks. 

astro

unread,
Apr 8, 2016, 3:30:31 PM4/8/16
to Fluentd Google Group
Any thoughts on this?

Mr. Fiber

unread,
Apr 8, 2016, 7:10:50 PM4/8/16
to Fluentd Google Group
I want to know how long will elasticsearch plugin take to flush events.
Could you insert following '+' lines to your elasticsearch plugin?

diff --git a/lib/fluent/plugin/out_elasticsearch.rb b/lib/fluent/plugin/out_elasticsearch.rb
index fdc52ed..589f27c 100644
--- a/lib/fluent/plugin/out_elasticsearch.rb
+++ b/lib/fluent/plugin/out_elasticsearch.rb
@@ -189,6 +189,7 @@ class Fluent::ElasticsearchOutput < Fluent::BufferedOutput
   end

   def write(chunk)
+    start = Time.now
     bulk_message = []

     chunk.msgpack_each do |tag, time, record|
@@ -229,6 +230,8 @@ class Fluent::ElasticsearchOutput < Fluent::BufferedOutput

     send(bulk_message) unless bulk_message.empty?
     bulk_message.clear
+
+    log.info "Flushing event took #{Time.now - start} seconds"
   end

   def send(data)


BTW, if you have a problem with buffer,
collecting buffer metrics with in_monitor_agent is good for investigation, e.g. when growing the buffer.



On Sat, Apr 9, 2016 at 4:30 AM, astro <andhar...@gmail.com> wrote:
Any thoughts on this?
Reply all
Reply to author
Forward
0 new messages