Connection drops due to keep alive expired

646 views
Skip to first unread message

Aakash Ratkal

unread,
Apr 15, 2021, 8:08:15 AM4/15/21
to vernemq-users
I have set up a dockerized verneMQ cluster for my upcoming product. All the IOT devices will be publishing to VerneMQ topics and on the server side we will be processing these packets. I am seeing a lot of connection drops on the client side due to keep-alive expired. 
I have checked the client trace and I see that connect keep alive time is 180 seconds and also all of our devices are sending heartbeat at the rate of 60sec. Ideally there should not be any keep alive expiry. I am unable to pinpoint what is causing this issue. Can somebody help me with this?
Please find few of the logs on my vernemq.
Any help is appreciated.

Regards,
Aakash

06:53:53.531 [warning] client {[],<<"E8:XX:XX:XX:XX:F0">>} with username <<"Hello">> stopped due to keepalive expired
07:30:21.472 [warning] client {[],<<"AC:XX:XX:XX:XX:F4">>} with username <<"Anonymous">> stopped due to keepalive expired
07:50:37.831 [warning] client {[],<<"E8:XX:XX:XX:XX:18">>} with username <<"Anonymous">> stopped due to keepalive expired
08:17:48.778 [warning] client {[],<<"E8:XX:XX:XX:XX:F0">>} with username <<"Anonymous">> stopped due to keepalive expired
08:21:46.927 [warning] client {[],<<"E8:XX:XX:XX:XX:E0">>} with username <<"Anonymous">> stopped due to keepalive expired
08:37:45.206 [warning] client {[],<<"E8:XX:XX:XX:XX:E0">>} with username <<"Anonymous">> stopped due to keepalive expired
08:39:39.087 [warning] client {[],<<"E8:XX:XX:XX:XX:18">>} with username <<"Anonymous">> stopped due to keepalive expired
11:03:12.223 [warning] client {[],<<"E8:XX:XX:XX:XX:E0">>} with username <<"Anonymous">> stopped due to keepalive expired
11:06:01.142 [warning] client {[],<<"E8:XX:XX:XX:XX:74">>} with username <<"Anonymous">> stopped due to keepalive expired
11:09:46.010 [warning] client {[],<<"E8:XX:XX:XX:XX:18">>} with username <<"Anonymous">> stopped due to keepalive expired
12:02:19.214 [warning] client {[],<<"E8:XX:XX:XX:XX:E0">>} with username <<"Anonymous">> stopped due to keepalive expired
12:04:46.013 [warning] client {[],<<"E8:XX:XX:XX:XX:F0">>} with username <<"Anonymous">> stopped due to keepalive expired
12:25:16.895 [warning] client {[],<<"E8:XX:XX:XX:XX:F0">>} with username <<"Anonymous">> stopped due to keepalive expired
14:31:19.879 [warning] client {[],<<"E8:XX:XX:XX:XX:E0">>} with username <<"Anonymous">> stopped due to keepalive expired
15:09:25.880 [warning] client {[],<<"E8:XX:XX:XX:XX:18">>} with username <<"Anonymous">> stopped due to keepalive expired
17:31:12.546 [warning] client {[],<<"AC:XX:XX:XX:XX:F4">>} with username <<"Anonymous">> stopped due to keepalive expired
17:49:15.209 [warning] client {[],<<"AC:XX:XX:XX:XX:F4">>} with username <<"Anonymous">> stopped due to keepalive expired
18:31:45.966 [warning] client {[],<<"AC:XX:XX:XX:XX:F4">>} with username <<"Anonymous">> stopped due to keepalive expired
19:20:47.151 [warning] client {[],<<"E8:XX:XX:XX:XX:90">>} with username <<"Anonymous">> stopped due to keepalive expired
19:45:03.080 [warning] client {[],<<"E8:XX:XX:XX:XX:74">>} with username <<"Anonymous">> stopped due to keepalive expired
08:25:45.286 [warning] client {[],<<"E8:XX:XX:XX:XX:7C">>} with username <<"Anonymous">> stopped due to keepalive expired
08:39:51.766 [warning] client {[],<<"E8:XX:XX:XX:XX:78">>} with username <<"Anonymous">> stopped due to keepalive expired
08:39:55.175 [warning] client {[],<<"E8:XX:XX:XX:XX:C4">>} with username <<"Anonymous">> stopped due to keepalive expired
09:22:13.173 [warning] can't authenticate client {[],<<"MqttClient">>} from X.Y.Z.92:57058 due to plugin_chain_exhausted
09:24:41.688 [warning] client {[],<<"E8:XX:XX:XX:XX:C8">>} with username <<"Anonymous">> stopped due to keepalive expired
09:46:13.648 [warning] client {[],<<"AC:XX:XX:XX:XX:F4">>} with username <<"Anonymous">> stopped due to keepalive expired
09:48:14.092 [warning] client {[],<<"E8:XX:XX:XX:XX:90">>} with username <<"Anonymous">> stopped due to keepalive expired
10:07:02.368 [warning] client {[],<<"AC:XX:XX:XX:XX:F4">>} with username <<"Anonymous">> stopped due to keepalive expired
10:36:49.033 [warning] client {[],<<"AC:XX:XX:XX:XX:F4">>} with username <<"Anonymous">> stopped due to keepalive expired
10:40:20.211 [warning] client {[],<<"E8:XX:XX:XX:XX:C8">>} with username <<"Anonymous">> stopped due to keepalive expired
10:54:40.013 [warning] client {[],<<"E8:XX:XX:XX:XX:C8">>} with username <<"Anonymous">> stopped due to keepalive expired
10:58:51.613 [warning] client {[],<<"AC:XX:XX:XX:XX:F4">>} with username <<"Anonymous">> stopped due to keepalive expired

André Fatton

unread,
Apr 15, 2021, 8:34:41 AM4/15/21
to vernemq-users
Hi Aakash,

this has come up a couple of times, and there are Github Issues about this, I think.
I've tried to reason about the expiry and timeout heuristics used in the MQTT session state machine on the broker. Back then, nothing has come up as a clear explanation causing this, just from server side.

What components are on the network path? do you run a loadbalancer/proxy with its own timeout configuration?
Overall, how reproducible is that behaviour? (happening sometimes, or regularly if it's above a certain time window)
Best,
André

Aakash Ratkal

unread,
Apr 15, 2021, 8:42:04 AM4/15/21
to vernemq-users
Hi Andre,

1. I have found few github issues around this but all of them say this is due to client side configuration. However, that part I have ruled out by doing the trace of a client and verifying 'ka' field there. I have tried the same client on Mosquitto broker and things work perfectly with it.

2. Currently my vernemq cluster is behing AWS NLB (Which keeps timeout of 350 seconds). In order to verify if NLB is causing the issue. I performed same experiment removing the NLB and the issue is observed there too.

3. This is happening with around 50-60% of my clients. And there is no time window for it.

Thanks,
Aakash 

Reply all
Reply to author
Forward
0 new messages