I have two Redis servers on EC2, one is an m1.large that have a slave on another m1.large, and the other is an m1.small. Both are running Redis 2.6.10.
The m1.large has only about 35% of its memory in use and the m1.small has only about 9% of its memory in use. Neither server is set up with any save lines and appendfsync is set to no. CPU load average is very low on these servers: over the last 7 days, according to New Relic, the m1.small peaked at about 0.07 and the m1.large around 0.10. New Relic's graphs on disk utilization show it at practically 0% the entire time.
We recently tracked down some performance issues to be waiting on Redis commands, so I opened up the slowlog and found that I have lots of operations that are taking between 60-90ms each. The odd thing is that these operations are all O(1): lpop, hincrby, get, set, etc. Because it's under 200ms, it seems too low to turn on the watchdog process. I confirmed that no paging is happening using the commands on the Redis latency page.
I'm wondering if EC2 is the issue (I'm willing to move to dedicated if necessary) but don't want to just chalk it up to "Redis on EC2 is bad" without a good understanding why, especially since I'm not saving on these machines, whereas I would imagine frequent forking causing issues.
Does anyone have a suggestion on how to troubleshoot?