Unable to connect to redis/elasticache cluster after resharding

291 views
Skip to first unread message

Ivan Brusic

unread,
Oct 5, 2021, 6:43:34 PM10/5/21
to vert.x
Using vert.x 4.0.3, connected to a clustered Redis 6.0.5 instance using AWS Elasticache.

After encountering memory issues in the cluster, we decided to scale horizontally, increasing our shards from 5 to 8. No issues at that point. It was not until we restarted our API instances that we discovered that our max pool size did not adequately cover the new number of endpoints. With 24 nodes (8x3) we set maxPoolSize to 24 and maxPoolWaiting to 96 (4 times the amount, as per the default). Still cannot connect with RedisClusterClient throwing "Cannot connect to any of the provided endpoints". 

Network connectivity should not be an issue. Can connect via redis-cli locally, but not via Java since not in the AWS VPC. The client will connect to the endpoint via an SSH tunnel, but will then fail to connect to the slot endpoints. Not sure if I can debug locally.

With no logs, I am at a lost to identify the root cause. Redis is used for caching, so thankfully we have fallback systems, but looking to move back to redis.

Cheers,

Ivan

Ivan Brusic

unread,
Oct 6, 2021, 4:48:06 PM10/6/21
to vert.x
Updated the max pool size to 48 and then 96, but still no connection. We are running the exact same code against a small cluster (5x3) with no issues. Same security group, node type and availability zones.

The options used:

{
"endpoint": "redis://....cache.amazonaws.com:6379",
"endpoints": [
],
"masterName": "mymaster",
"maxNestedArrays": 32,
"maxPoolSize": 96,
"maxPoolWaiting": 96,
"maxWaitingHandlers": 2048,
"netClientOptions": {
"logActivity": false,
"receiveBufferSize": -1,
"reuseAddress": true,
"reusePort": false,
"sendBufferSize": -1,
"trafficClass": -1,
"crlPaths": [],
"crlValues": [],
"enabledCipherSuites": [],
"enabledSecureTransportProtocols": [
"TLSv1",
"TLSv1.1",
"TLSv1.2"
],
"idleTimeout": 0,
"idleTimeoutUnit": "SECONDS",
"soLinger": -1,
"ssl": false,
"sslHandshakeTimeout": 10,
"sslHandshakeTimeoutUnit": "SECONDS",
"tcpCork": false,
"tcpFastOpen": false,
"tcpKeepAlive": true,
"tcpNoDelay": true,
"tcpQuickAck": true,
"useAlpn": false,
"connectTimeout": 100,
"metricsName": "redis",
"trustAll": false
},
"poolCleanerInterval": -1,
"poolRecycleTimeout": 15000,
"role": "MASTER",
"type": "CLUSTER",
"useReplicas": "ALWAYS"
}


Ivan Brusic

unread,
Oct 6, 2021, 4:49:19 PM10/6/21
to vert.x
Testing against a new cluster that has no data. Default slot values.
Reply all
Reply to author
Forward
0 new messages