Hello,
I have an environment of production with 50+ nodes. All the nodes send logs to a central server by td-agent and on the central server, td-agent collects those logs and send those to Elasticsearch.
The issue I am facing is that after some time(~6/7 hours), td-agent on the nodes stop sending logs to the central server. In the logs of td-agent I see the following.
" 2021-03-29 20:31:10 +0000 [warn]: #0 suppressed same stacktrace
2021-03-29 20:31:10 +0000 [warn]: #0 retry succeeded. chunk_id="5beb2c2855d94714d90bf8ec80aede33
2021-03-29 20:32:30 +0000 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2021-03-29 20:32:31.604646018 +0000 chunk="5beb2c74bd60a90aa5890a6ae6132e84" error_class=RuntimeError error="no one nodes with valid ssl session"
These logs keep coming in and no logs are being sent to the central server untill I restart td-agent on the node. Only then I see the logs arriving in central server and then to elasticsearch. On internet there are few such cases but all are related to elasticsearch plugin but in this case, I am not sending logs to elasticsearch directly and there are no elasticsearch plugin running.
Could anybody help me understand what the issue is and how to resolve it?
Thank you
Config:
type copy
<store>
type secure_forward
secure true
self_hostname ad-x.x.x-01
shared_key xxxxxxxxxxxxx
buffer_type file
buffer_path /var/log/td-agent/buffer/forward
flush_interval 10s
num_threads 4
retry_wait 10s
ca_cert_path /etc/ssl/certs/ca_cert_kibana.pem
<server>
host xxxxxx
port 24285
</server>
</store>
<store>
type forest
subtype s3
<template>
s3_bucket "xxxx"
s3_region "eu-west-1"
path xxxxx
s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
time_slice_format ${tag}/YEAR=%Y/MONTH=%m/DAY=%d/HOSTNAME=${hostname}/HOUR=%H/%M
<format>
@type json
</format>
store_as gzip
<buffer time>
timekey 60
@type file
path /var/log/td-agent/buffer/s3/${tag}
timekey_wait 1m
chunk_limit_size 50m
flush_at_shutdown true
</buffer>
</template>
</store>
</match>
td-agent version 4
P.s I had a similar question, wanted to edit it and ended up deleting it. sorry.