Application restart needed to get logging going (Docker/Amazon ECS), how can we fix this?

875 views
Skip to first unread message

Marco Pas

unread,
Jun 30, 2016, 3:22:56 AM6/30/16
to Fluentd Google Group

Hi there we are facing an issue when running our apps inside Amazon ECS. We are trying to log to Fluentd but when we fire up our docker containers the first time we need to restart them to get logging working. Anyone also experiencing this issue?


Our application setup is as follows:


Multiple Apps (in docker) -> Fluentd (in docker) 


So when we fire up the infrastructure we need to restart the Apps to get logging working. My guess it that this depends on the startup sequence of the docker containers. But in ECS we have no control on the startup order. Our application have the following setup for the log drivers:


    "logConfiguration": {
       
"logDriver": "fluentd",
       
"options": {
         
"fluentd-address": "fluentd url...",
         
"fluentd-async-connect" : "true",
         
"tag": "plain.docker.runtime"
       
}
   
},


Any clue what we are missing to get logging working right from the start?

Marco Pas

unread,
Jun 30, 2016, 4:37:55 AM6/30/16
to Fluentd Google Group
When checking the infrastructure we are seeing the connections to Fluentd go into a certain state from which they do not seem to recover. Using netstat on the docker host we see that the connection to fluentd is: 

tcp        1      0 ip-xxxxx.ec2.i:50882 ip-10-64-48-230.ec2.i:24224 CLOSE_WAIT


It looks like the CLOSE_WAIT is actually blocking until we do a restart then the application continues to log information.

Mr. Fiber

unread,
Jul 5, 2016, 6:52:31 AM7/5/16
to Fluentd Google Group
Does this CLOSE_WAIT happen with only fluentd logging driver?
Or Does Docker without ECS also have this problem?

I want to know fluentd-logging-driver has a problem or not.


Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kevin Grant

unread,
Jul 6, 2016, 10:05:52 AM7/6/16
to Fluentd Google Group
+1 same here, fluentd stops forwarding to elasticsearch after a certain period of time. Trying to track it down.

Using AWS EB docker containers, although our logging is configured in the docker app itself, rather than the EB layer.

David Wood

unread,
Jul 7, 2016, 1:55:09 AM7/7/16
to Fluentd Google Group
> fluentd stops forwarding to elasticsearch after a certain period of time

Is this Amazon Elasticsearch?  You'll want to keep your buffer_chunk_limit below the HTTP request payload maximum:


David

Marcus Morris

unread,
Jul 27, 2016, 11:34:03 AM7/27/16
to Fluentd Google Group
I seem to be experiencing something that may be related. I am not using Amazon ES, but I am forwarding to my own ES cluster and it seems like a bunch of my hosts have stopped showing up in Kibana even though I can see the logs in the Fluentd container.

How can I go about troubleshooting this? My logs are kind of dead in the water atm. I tried restarting the fluentd container and my app containers as well as docker itself.

Marcus Morris

unread,
Jul 28, 2016, 12:58:16 AM7/28/16
to Fluentd Google Group
So it looks like deleting and redeploying the fluentd container gets logs going again (although that means the missing logs are gone).

If this has to do with the buffer_chunk_limit, how do i figure out my HTTP request payload maximum if I'm using my own ES cluster?

Mr. Fiber

unread,
Jul 28, 2016, 11:04:53 PM7/28/16
to Fluentd Google Group
If this has to do with the buffer_chunk_limit,

Is you problem related with buffer_chunk_limit?

On Thu, Jul 28, 2016 at 1:58 PM, Marcus Morris <marcus....@gmail.com> wrote:
So it looks like deleting and redeploying the fluentd container gets logs going again (although that means the missing logs are gone).

If this has to do with the buffer_chunk_limit, how do i figure out my HTTP request payload maximum if I'm using my own ES cluster?

Kevin Grant

unread,
Aug 1, 2016, 8:03:14 AM8/1/16
to Fluentd Google Group
Yes David

Dozens of docker containers in EB send logs directly to a fluentd aggregator, which forwards a copy to stdout and Amazon ES. (I should probably add forwarding agents on each host)

Restarting the Fluentd container used to resume service, once a day, then hourly, but now even that doesnt work. I have tried adding a buffer_chunk_limit 5m to no avail. (I may need to open a separate thread for this issue)

David Wood

unread,
Aug 1, 2016, 4:06:33 PM8/1/16
to Fluentd Google Group
>> Is this Amazon Elasticsearch?
>> ...
> I have tried adding a buffer_chunk_limit 5m to no avail

Is there nothing of interest in the logs?  I've found these settings are also important:

resurrect_after 5s
reload_connections false

I'd expect you would see "Cannot get new connection from pool" errors in the logs if you didn't have these set.

David

Mr. Fiber

unread,
Aug 2, 2016, 1:46:14 AM8/2/16
to Fluentd Google Group
Yeah. If you use fluent-plugin-elasticsearch with AWS Elasticsearch,
these two options are needed.


Message has been deleted

Kevin Grant

unread,
Aug 2, 2016, 7:26:46 AM8/2/16
to Fluentd Google Group
Restarted fluentd container with the recommended options, still not seeing anything in AWS ES. What other debugging options are there?
Reply all
Reply to author
Forward
0 new messages