Redis CPU spikes at the top of each hour

893 views
Skip to first unread message

mruf...@gmail.com

unread,
Nov 26, 2018, 4:58:44 PM11/26/18
to Redis DB

Hello, my team and I are new to Redis.  We have implemented a new Redis remote cache into our application and have been live in production for a month or so.  We are seeing a CPU spike virtually every hour at the top of the hour.  This in turn is causing some socket timeouts within our Jedis client from our application.  We normally average around 75% idle CPU.  However, for a few seconds at the top of each hour Redis is maxing out on CPU.   Additionally, I can also see these CPU spikes at the top of each minute (ie. 0, 1, or 2 seconds past each minute) but they are not nearly as extreme and impactful as the CPU spike at the top of the hour (ie 00 minutes past the hour).  We are not doing any replication or background saving.  I also see these CPU spikes on our backup and development servers even with no traffic or entries hitting the cache.   I’m providing the info stats, command stats, and SAR info in case it helps.  I’m losing my mind here and any ideas or suggestions would be a huge help.  Thanks in advance!

 

info

# Server

redis_version:3.0.7

redis_git_sha1:00000000

redis_git_dirty:0

redis_build_id:72cf55e957a6f159

redis_mode:standalone

os:Linux 3.10.0-862.14.4.el7.x86_64 x86_64

arch_bits:64

multiplexing_api:epoll

gcc_version:4.8.5

process_id:1265

run_id:4ea2d39a518051c2934bbfe3b6079eab90713e4b

tcp_port:6379

uptime_in_seconds:1954685

uptime_in_days:22

hz:10

lru_clock:16533909

config_file:/opt/redis/conf/redis.conf

 

# Clients

connected_clients:604

client_longest_output_list:0

client_biggest_input_buf:125976

blocked_clients:0

 

# Memory

used_memory:5317756536

used_memory_human:4.95G

used_memory_rss:5719674880

used_memory_peak:6309424888

used_memory_peak_human:5.88G

used_memory_lua:36864

mem_fragmentation_ratio:1.08

mem_allocator:jemalloc-3.6.0

 

# Persistence

loading:0

rdb_changes_since_last_save:697187150

rdb_bgsave_in_progress:0

rdb_last_save_time:1541305880

rdb_last_bgsave_status:ok

rdb_last_bgsave_time_sec:-1

rdb_current_bgsave_time_sec:-1

aof_enabled:0

aof_rewrite_in_progress:0

aof_rewrite_scheduled:0

aof_last_rewrite_time_sec:-1

aof_current_rewrite_time_sec:-1

aof_last_bgrewrite_status:ok

aof_last_write_status:ok

 

# Stats

total_connections_received:7235698

total_commands_processed:1482965886

instantaneous_ops_per_sec:1552

total_net_input_bytes:41835380717205

total_net_output_bytes:3654192431288

instantaneous_input_kbps:46642.76

instantaneous_output_kbps:1763.65

rejected_connections:0

sync_full:0

sync_partial_ok:0

sync_partial_err:0

expired_keys:627574702

evicted_keys:0

keyspace_hits:68917352

keyspace_misses:697123893

pubsub_channels:0

pubsub_patterns:0

latest_fork_usec:0

migrate_cached_sockets:0

 

# Replication

role:master

connected_slaves:0

master_repl_offset:0

repl_backlog_active:0

repl_backlog_size:1048576

repl_backlog_first_byte_offset:0

repl_backlog_histlen:0

 

# CPU

used_cpu_sys:324772.25

used_cpu_user:105355.06

used_cpu_sys_children:0.00

used_cpu_user_children:0.00

 

# Cluster

cluster_enabled:0

 

# Keyspace

db0:keys=83335,expires=83335,avg_ttl=20452

 

 

 

# Commandstats

cmdstat_get:calls=766297676,usec=6369187523,usec_per_call=8.31

cmdstat_setex:calls=697420911,usec=26393919537,usec_per_call=37.85

cmdstat_auth:calls=6441161,usec=81666380,usec_per_call=12.68

cmdstat_ping:calls=13303192,usec=58604787,usec_per_call=4.41

cmdstat_info:calls=434,usec=213743,usec_per_call=492.50

cmdstat_monitor:calls=4,usec=27,usec_per_call=6.75

cmdstat_config:calls=3,usec=226,usec_per_call=75.33

cmdstat_slowlog:calls=32,usec=1676,usec_per_call=52.38

 

 

08:59:45 AM     CPU     %user     %nice   %system   %iowait   %steal   %idle

08:59:55 AM     all           5.68          0.00      20.45      0.00      0.00     73.86

08:59:56 AM     all           5.68          0.00     18.75       0.00      0.00     75.57

08:59:57 AM     all           4.92           0.00     14.21      0.00      0.00     80.87

08:59:58 AM     all           4.97           0.00     14.92      0.00      0.00     80.11

08:59:59 AM     all            3.87          0.00     16.02      0.00      0.00     80.11

09:00:00 AM     all           4.97           0.00     16.57      0.00      0.00     78.45

09:00:01 AM     all          19.58          0.00     34.97      0.00      0.00     45.45

09:00:02 AM     all            6.38          0.00     24.47      0.00      0.00     69.15

09:00:05 AM     all          23.38          0.00     76.62      0.00      0.00      0.00

09:00:06 AM     all          27.94          0.00     72.06      0.00      0.00      0.00

 

09:00:06 AM     CPU     %user     %nice   %system   %iowait   %steal   %idle

09:00:07 AM     all          22.73           0.00     68.18      0.00      0.00      9.09

09:00:08 AM     all          40.00           0.00     60.00      0.00      0.00      0.00

09:00:09 AM     all          53.93           0.00     46.07      0.00      0.00      0.00

09:00:10 AM     all          45.33           0.00     54.67      0.00      0.00      0.00

09:00:11 AM     all          24.79           0.00     29.91      0.00      0.00     45.30

09:00:12 AM     all          13.25           0.00     56.02      0.00      0.00     30.72

09:00:13 AM     all          14.20           0.00     46.15      0.00      0.00     39.64

09:00:14 AM     all          17.75          0.00     53.25      0.00      0.00     28.99

09:00:15 AM     all          12.74          0.00     61.15      0.00      0.00     26.11

09:00:16 AM     all          10.86          0.00     52.00      0.00      0.00     37.14

09:00:17 AM     all          11.05          0.00     61.63      0.00      0.00     27.33

09:00:18 AM     all          11.43          0.00     53.71      0.00      0.00     34.86

09:00:19 AM     all           9.04          0.00     34.46      0.00      0.00     56.50

Seandon Mooy

unread,
Nov 26, 2018, 5:11:18 PM11/26/18
to redi...@googlegroups.com
Hi Mrufkahr -

My guess based on your used memory is that you have a blocking call somewhere. Certainly with this level of get / setex activity things like pausing for background flush is expected to cause a halt, but it doesn’t look like that’s the culprit. Is it possible you have a piece of code that runs on the hour in each environment? Calls like “keys” can block all other activity and shouldn’t be used outside of debugging. My go-to here would be to enable the redis slow log (https://redis.io/commands/slowlog) and look for command other than your usual get/set.

Hopefully this helps!

- Seandon
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Xingbo Wang

unread,
Nov 26, 2018, 5:45:17 PM11/26/18
to redi...@googlegroups.com
Hi Mrufkahr:

Could you check whether this could be caused by other other process? As you mentioned, CPU spikes at the top of each hours and minute, it seems like some other processes are running on the instance like cron job, e.g. service log uploading, etc.

Thanks,
Shawn
--
Best Regards,

Xingbo

Damon LaCaille

unread,
Nov 26, 2018, 6:13:35 PM11/26/18
to redi...@googlegroups.com
You should also check your Redis.conf file for backup settings. It could be setup to persist your data to disk every X seconds if Y updates have occurred.

The docs () show an example like this:
save 900 1
save 300 10
save 60 10000
Plus as Shawn mentioned a cronjob could be running, perhaps a BGSAVE? You can try running LASTSAVE to see.

Damon LaCaille
Solutions Architect

Phone: 512.744.5724
Skype: @damon.lacaille

Twitter
LinkedIn
Facebook

Redis Labs

Sripathi Krishnan

unread,
Nov 26, 2018, 6:52:09 PM11/26/18
to redi...@googlegroups.com
I see a few problems with your application. They may not directly be related to how your problem, but I feel these problems add up.

First, total_connections_received is 7235698 or 7.2M, in a period of 22 days (uptime_in_days). Your application is creating 328K new connections every day. Given that you only have 604 active connections, this is a lot of new connections. Another metric - each connection is only performing ~204 operations, which is very low. 

You want your connections to be long lived. Look at how your app is managing connections. You may want to look at JedisPool to help you manage your connections.

Second, your cache hit ratio is way too low - 8.9%. Cache hit = keyspace_hits / (keyspace_hits + keyspace_misses) * 100. This means only 9% of gets to redis are actually returning something useful. Ideally, you want that number to be in the 90% range. 

Because of a low cache hit ratio - you are spending most of your time sending data to redis instead of reading from it. You are sending 46642.76 kbps of data to redis, and only downloading 1763.65 kbps. 

In other words, right now, redis is not adding any value to your application. You might as well turn it off. 

Thirdly, you have 83,335 keys in redis, and all of them have already expired. I am guessing you are bulk loading keys into redis, and all of them expire around the same time. This may be causing the problems you are experiencing. If a large percentage of keys expire on the hour, there will be a lot activity in redis around that time. 

Fourthly, your average key size is ~63KB. By redis standards, this is quite big. If I were to hazard a guess - you are storing java serialized objects in redis. You should rethink your caching approach. Perhaps you can break your objects into smaller objects and cache them? 

--Sri

Thomas Love

unread,
Nov 27, 2018, 4:25:47 PM11/27/18
to Redis DB
I second Sripathi's concern about your cache ratio in particular. It suggests a design problem and I would guess TTLs are being set inappropriately.

Try configuring Redis as an LRU cache, maxmemory 6gb, eviction allkeys-lru, and replace your setex calls with set.

mruf...@gmail.com

unread,
Nov 30, 2018, 9:53:19 AM11/30/18
to Redis DB

First and foremost, thank you to all that responded and helped here. While we have yet to find the culprit, we will be focusing on the cron job theory.   Again, these CPU spikes are also occurring on backup and development servers that have no traffic or keys in the cache.

 

Jus to answer a few of the questions… the slow log has produced no unexpected results.   The last save occurred almost a month ago when the server was last restarted.   And while I admit that we are using Redis in an unorthodox manor, it is in fact by design and is providing a big impact/savings to our company.  The cache hit rate of 9% is expected.  Each key/value is intentionally being set with an expiration of 60 seconds.  There is no bulk loading taking place.  We are using a JedisPool, but I do not have an exact understanding of how the connections are being used and may eventually shift my focus there. 

 

Again… thank you!


On Monday, November 26, 2018 at 3:58:44 PM UTC-6, mruf...@gmail.com wrote:


Reply all
Reply to author
Forward
0 new messages