We are using fluentd + AWS Elasticsearch plugin for our cloud hosted software.
After few days AWS Elasticsearch plugin loses network connectivity to AWS Elasticsearch service.
I tried fluent-plugin-aws-elasticsearch-service-hotfix but I still face the same issue.
2017-02-23 00:30:25 -0800 [warn]: temporarily failed to flush the buffer. next_retry=2017-02-23 00:30:26 -0800 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f86c9870854"
2017-02-23 00:30:25 -0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/base.rb:249:in `perform_request'
2017-02-23 00:30:25 -0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/http/faraday.rb:20:in `perform_request'
2017-02-23 00:30:25 -0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/client.rb:128:in `perform_request'
2017-02-23 00:30:25 -0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-api-1.0.18/lib/elasticsearch/api/actions/bulk.rb:90:in `bulk'
2017-02-23 00:30:25 -0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch.rb:353:in `send_bulk'
2017-02-23 00:30:25 -0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch.rb:339:in `write_objects'
2017-02-23 00:30:25 -0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/output.rb:490:in `write'
2017-02-23 00:30:25 -0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/buffer.rb:354:in `write_chunk'
2017-02-23 00:30:25 -0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/buffer.rb:333:in `pop'
2017-02-23 00:30:25 -0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/output.rb:342:in `try_flush'
2017-02-23 00:30:25 -0800 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/output.rb:149:in `run'
After this I see many instances of
2017-02-23 00:39:17 -0800 [warn]: temporarily failed to flush the buffer. next_retry=2017-02-23 00:48:27 -0800 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f86c9870854"
2017-02-23 00:39:17 -0800 [warn]: suppressed same stacktrace
Could anyone please provide a solution/workaround for this problem? We are in a large deployment and we are affected by this issue.
Even though restarting td-agent will solve this problem but it will come back after few days. I can write a cron job to restart td-agent everyday,
but does docker's fluentd driver buffer the logs for the time td-agent is unavailable? I don't know the answer.
Another question on supportability
If the owner is not responding or no longer interested in an important plugin like AWS elasticsearch, could treasuredata take over the plugin
and support it ?
regards,
Starship