Redis slow on O(1) commands - I'm on EC2 but not saving

208 views
Skip to first unread message

Jon

unread,
Apr 8, 2013, 10:02:00 AM4/8/13
to redi...@googlegroups.com
I have two Redis servers on EC2, one is an m1.large that have a slave on another m1.large, and the other is an m1.small. Both are running Redis 2.6.10.

The m1.large has only about 35% of its memory in use and the m1.small has only about 9% of its memory in use. Neither server is set up with any save lines and appendfsync is set to no. CPU load average is very low on these servers: over the last 7 days, according to New Relic, the m1.small peaked at about 0.07 and the m1.large around 0.10. New Relic's graphs on disk utilization show it at practically 0% the entire time.

We recently tracked down some performance issues to be waiting on Redis commands, so I opened up the slowlog and found that I have lots of operations that are taking between 60-90ms each. The odd thing is that these operations are all O(1): lpop, hincrby, get, set, etc. Because it's under 200ms, it seems too low to turn on the watchdog process. I confirmed that no paging is happening using the commands on the Redis latency page.

I'm wondering if EC2 is the issue (I'm willing to move to dedicated if necessary) but don't want to just chalk it up to "Redis on EC2 is bad" without a good understanding why, especially since I'm not saving on these machines, whereas I would imagine frequent forking causing issues.

Does anyone have a suggestion on how to troubleshoot?

Yiftach Shoolman

unread,
Apr 8, 2013, 10:26:53 AM4/8/13
to redi...@googlegroups.com
1. Do you run only Redis on these instances ?
2. These instances are highly shared between EC2users, I would suggest that you choose a stronger instance, like suggest here, and run the 2 Redis processes that you have on it


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--

Yiftach Shoolman
+972-54-7634621

Jonathan Hyman

unread,
Apr 8, 2013, 11:38:12 AM4/8/13
to redi...@googlegroups.com
Yes, we only run Redis on those instances. I just upgraded the m1.small to an m1.xlarge and will monitor it over the next few hours to see how performance changes.


--
You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/ly_0bTl3yrA/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.

Jonathan Hyman

unread,
Apr 11, 2013, 7:15:49 PM4/11/13
to redi...@googlegroups.com
To follow up here, it seems like upgrading to m1.xlarge helped with the slowlog and also improved the latency seen in a `redis-cli --latency` check. However, it had no material difference in our API's response times nor in our thread profiling results, which show that we're still spending 30-40% of our API's time in redis/client.rb -> connection/hiredis.rb -> ext/connection.rb in the #read method. My guess from here is that it is in fact a latency issue on EC2 and round-trip-time is heavily affecting us.

Over the next few weeks, I'll be updating from a fleet of c1.mediums to a fewer number of c1.xlarge servers which should have higher I/O throughput and will follow up again.

Yiftach Shoolman

unread,
Apr 12, 2013, 3:38:53 AM4/12/13
to redi...@googlegroups.com
Jonathan,

Few question:
1. Are your Redis server and app servers located in the same AWS availability-zone ? if you deploy them on the same AZ it should reduce latency
2. Are you using connection pooling ? otherwise you are adding connection setup/teardown time for each Redis operation
3. Are you using pipelining ? Pipelining does 2 things: (1). It cuts latencies; 2. It improves Redis throughput 

Jonathan Hyman

unread,
Apr 12, 2013, 8:49:32 AM4/12/13
to redi...@googlegroups.com
1. Yes, our servers are in the same AZ. redis-cli --latency shows an average of about 1.6ms between the two with a max of 6ms after about 1000 samples. When my Redis servers were on m1.large and m1.small, --latency showed an average of 3.5ms with a max of 93ms. This is why I'm a bit confused, I would have expected to see huge gains in performance by moving to m1.xlarge since the average/max latency dropped so much, but we didn't see any gains.
2. We're just using the redis-rb gem and store a global connection in Redis.current in our app's initializer and use it. Any suggestions here?
3. Yeah, we're using pipelining in a lot of places. I think that one of the problems is that we end up making a lot of round trips regardless, e.g., lots of code that looks like:

class MyObj
  def some_result
    if (cached_value = Redis.current.get(cache_key))
      return cached_value
    end
    val = Database.where(...)
    Redis.current.set(cache_key, val)
    return val
 end
end

and therefore, we end up making a lot of roundtrips because there are a half dozen instances of code that looks like this.
Reply all
Reply to author
Forward
0 new messages