During elasticache failover and recovery procedure jedis fails to recover back to original performance and primary node

1,022 views
Skip to first unread message

Julian Medina

unread,
Oct 11, 2021, 7:01:43 AM10/11/21
to Jedis

Hello all - Thought I'd post here after multiple days of testing with my team we seem to be hitting a wall with jedis and elasticache redis failover procedures. Thought I'd post here before making a github issue in-case it's user error.

### Expected behavior

  1. Elasticache redis performs a manual failover test (or has an actual primary node fail) making a secondary node become the new primary node.
  2. Jedis switches to new primary node (recovering most if not all performance).
  3. Original primary node recovers
  4. Jedis reconnects to original primary node and recovers original performance.

### Actual behavior

  1. Elasticache redis performs a manual failover test (or has an actual primary node fail) making a secondary node become the new primary node.
  2. Jedis switches to new primary node (at a very poor performance level creating thousands of new connections and spitting out large amounts of errors), e.g.
    • redis.clients.jedis.exceptions.JedisClusterOperationException: Cluster retry deadline exceeded.
    • redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
    • redis.clients.jedis.JedisFactory: Error while close
  3. Original primary node recovers.
  4. Jedis keeps creating connections with old node (at a very poor performance level creating thousands of new connections and spitting out large amounts of errors), e.g.
    • redis.clients.jedis.exceptions.JedisClusterOperationException: Cluster retry deadline exceeded.
  • redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
  • redis.clients.jedis.JedisFactory: Error while close

### Steps to reproduce:

Please create a reproducible case of your problem. Make sure
that case repeats consistently and it's not random
1. Make elasticache redis perform manual failover of a primary node
2.Use jedis cluster client

#### Jedis version:
3.6.0
#### Redis version:
6.0.5
#### Java version:
8

Any  help or ideas for trying would be greatly appreciated, thanks!

Sazzadul Hoque

unread,
Oct 11, 2021, 11:42:46 AM10/11/21
to jedis...@googlegroups.com
You're continuously saying "node" but from JedisClusterOperationException it is evident that it is a cluster.

Did you try reporting to Elasticache? Because it cannot be controlled by Jedis how they are using it.

------------------------------------------------------------------------------------------------------------------------------------------------

If you just want to check Jedis, you can try testing older versions of Jedis (e.g. 3.5.2, 3.3.0).

--
You received this message because you are subscribed to the Google Groups "Jedis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jedis_redis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jedis_redis/f937782f-bc13-4726-8193-50d3e2b578ben%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages