Our project uses a Redis instance as the persistent database(located localhost, AOF enabled, no master/slave, no cluster, version 2.8.24),
and Tomcat 6 as the application server. They communicates through Jedis(versioin 2.5.1). It published half a year ago, ran smoothly.
However, When the Redis instance became fat(about 10G), an unexpected phenomenon happened. It seemed our redis instance confused
keys and values at a very low probability. For instance,
set key1, value1
set key2, value2
......
set keyN, valueN
Most of the time, getting keyX returned valueX as expected. But sometimes it returned valueY. It seems a
threadsafe problem. But we use Jedis like this pattern:
private void doRequest(HttpServletRequest req, HttpServletResponse resp) {
Jedis jedis = new Jedis();
try {
//do things
} finally {
jedis.close();
}
}
It should be threadsafe.
Analyzed the server's logs we noticed that all the illegal writes happened when a AOF rewrite triggered.
At that time a lot of socket read timeout exceptions of Jedis reported.
Maybe a large latency of redis motivated this threadsafe bug.
We have let no-appendfsync-on-rewrite to yes. Now no such bug occurred, but still unresolved.