Has anyone experienced a problem with a td-agent “sender” process incorrectly determining that no nodes are available when it tries to flush the buffer? It looks like it experienced connection issues with the "receiver" process (which is on another machine) and from that point on, even after the receiver became reachable again, it thinks it's still unavailable.
I can confirm the receiver receives because when I send
```echo -e '{"message":"TEST MESSAGE","host":"hd1app1","service":"test_service"}\0' | nc 10.0.1.138 42185```
the receiver gets it.
Here are logs from td-agent.log when the connection failure first happened:
```
2016-11-02 03:58:18 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-02 03:57:16 +0000 error_class="Errno::ETIMEDOUT" error="Connection timed out - connect(2) for \"10.0.1.138\" port 42185" plugin_id="object:3fed2e39429c"
2016-11-02 03:58:18 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-02 03:57:17 +0000 error_class="RuntimeError" error="no nodes are available" plugin_id="object:3fed2e39429c"
2016-11-02 03:58:18 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-02 03:57:21 +0000 error_class="RuntimeError" error="no nodes are available" plugin_id="object:3fed2e39429c"
```
And after seeing the receiver work with netcat, I send a signal to the process to flush the buffer and it can't see the node:
```sudo kill -s USR1 18815```
produces this in td-agent.log:
```
2016-11-02 17:31:52 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-02 20:01:06 +0000 error_class="RuntimeError" error="no nodes are available" plugin_id="object:3fed2e39429c"
```
Here is the config on the sender:
```
<match hd.**>
@type copy
<store>
@type stdout
</store>
<store>
@type forward
# primary host
<server>
host 10.0.1.138
port 42185
</server>
buffer_type file
buffer_path /var/log/td-agent/buffer/hd.*.buffer
buffer_chunk_limit 128m
buffer_queue_limit 64
flush_interval 20s
</store>
</match>
```
and on the receiver:
```
<source>
type forward
port 42185
protocol_type tcp
tag hd
format none
</source>
```
A restart of the sending td-agent results in the buffer being flushed, but I don’t want to have to do that. I’d like td-agent to be able to tell realize that receiving node/service is back up.