Re: Redis latency issues

512 views
Skip to first unread message

Salvatore Sanfilippo

unread,
Dec 11, 2012, 11:18:39 AM12/11/12
to Redis DB
Hello Michael,

before asking more questions, did you already tried to follow the
document at http://redis.io/topics/latency to see if some of the
common issues applies to you? This also shows how to use the software
watchdog to possibly get some hint if everything else fails.

Thanks!
Salvatore

On Tue, Dec 11, 2012 at 5:16 PM, Michael Chang <mi...@tellapart.com> wrote:
> Hi all,
>
> We're having some trouble with redis latency issues. We're running redis
> with a variety of different queries, but seeing some wild latency spikes for
> queries. For example, a sorted set range by score query takes 2.52 ms
> (measured from our application side code), but 334.13 ms at on the 95%ile,
> and 8 seconds at the 99%ile. While it is possible that our application code
> is blocking on something, I'm not seeing any slowlogs on these redis
> processes more than ~25ms. I suspect that we're hitting the single-threaded
> limits of redis potentially and requests are getting queued up before being
> processed. Is there some way to figure out if that is the case?
>
> Cheers,
> Michael
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/w2rtgfI1DsgJ.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.



--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org

Beauty is more important in computing than anywhere else in technology
because software is so complicated. Beauty is the ultimate defence
against complexity.
— David Gelernter

Michael Chang

unread,
Dec 11, 2012, 2:40:22 PM12/11/12
to redi...@googlegroups.com
Hey Salvatore,

I had already looked at that page to try to debug some of the common issues.
  • Running redis-cli --latency as described didn't seem problematic.
  • We are pipelining commands, and using a LIFO connection pool on the client side to keep long lived connections around.  We're not using mset/mget right now, but it's in the works.  Like I mentioned above though, I don't see any individual command taking more than ~20ms from the slowlog.
  • With the single threadedness, we are running 8 redis processes (4 master/4 slave) on a single machine with 4 cores, so that could be a point of contention.  Each of these redis processes is writing to RDB every 5 minutes.  We're currently not using AOF at all because it was causing redis processes to die (somehow mysteriously..)
  • We're not using swap
  • Unfortunately we're on redis 2.4 still, so I can't use the watchdog...

Josiah Carlson

unread,
Dec 11, 2012, 4:49:59 PM12/11/12
to redi...@googlegroups.com
I'd bet a case of good beer that this is your problem: 8 Redis process
(4 master/4 slave) on a 4 core machine.

Have you checked top/htop to see where your processor time is going
to? How much spare memory do you have? How fast is your network? Have
you checked your network usage?

I believe one of two things is happening. Either your network is
saturated, or your machine is overloaded:
1. Your system has reached its upper limit
2. Each Redis process is fighting for processor
3. When an instance gets a packet in from the network, it gets woken
up, it performs its operation quickly, and sends the response back to
the client in a single time slice
4. Your high latencies are caused by processes not being woken up in a
timely manner to service requests (which is why Redis' slowlog doesn't
have anything interesting)

Let us know more information about your load, memory, network use,
whether you are running a real machine or VM, etc., and we should be
able to help you more.

Regards,
- Josiah
> https://groups.google.com/d/msg/redis-db/-/nTdkJgtUSH8J.

Salvatore Sanfilippo

unread,
Dec 11, 2012, 5:48:52 PM12/11/12
to Redis DB
Still I'm surprised that redis-cli --latency shows no latency. This is
very strange and points towards a client side problem.

Michael Chang

unread,
Dec 12, 2012, 9:36:58 PM12/12/12
to redi...@googlegroups.com
Hey Josiah.

The %CPU looks okay.  

  • Through iftop, it looks like we're using about 25Mbps of network bandwidth.
  • From the output of 'free', we have about 4GB of free memory (buffer+cache)
  • It doesn't look like our cpus are being taxed.  Is %iowait high here?  We're making RDB snapshots every 5 minutes for each process here
02:15:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
02:25:01 AM     all      4.71      0.03      1.78     14.42      0.14     78.92
02:25:01 AM       0      0.74      0.13      1.29     32.34      0.41     65.09
02:25:01 AM       1      3.14      0.00      1.07      0.17      0.04     95.58
02:25:01 AM       2      6.84      0.00      2.22     11.88      0.06     79.01
02:25:01 AM       3      8.11      0.00      2.53     13.31      0.06     75.98

  • We're running this on a m1.xlarge instance on ec2, with writing RDB files to an EBS volume, which I realize could be a point of contention. (how big though?)
Thanks,
Michael


Michael Chang

unread,
Dec 12, 2012, 9:38:40 PM12/12/12
to redi...@googlegroups.com
Hi Salvatore,

Just to confirm, redis-cli --latency measures responsiveness of a redis-server process, correct?  E.g. it sends a ping to a redis-server process and measures how long it takes to get a response?

Thanks,
Michael

Josiah Carlson

unread,
Dec 12, 2012, 11:00:45 PM12/12/12
to redi...@googlegroups.com
I know your problem, see my inline reply.

On Wed, Dec 12, 2012 at 6:36 PM, Michael Chang <mi...@tellapart.com> wrote:
> Hey Josiah.
>
> The %CPU looks okay.
>
> Through iftop, it looks like we're using about 25Mbps of network bandwidth.
> From the output of 'free', we have about 4GB of free memory (buffer+cache)
> It doesn't look like our cpus are being taxed. Is %iowait high here? We're
> making RDB snapshots every 5 minutes for each process here
>
> 02:15:01 AM CPU %user %nice %system %iowait %steal
> %idle
> 02:25:01 AM all 4.71 0.03 1.78 14.42 0.14
> 78.92
> 02:25:01 AM 0 0.74 0.13 1.29 32.34 0.41
> 65.09
> 02:25:01 AM 1 3.14 0.00 1.07 0.17 0.04
> 95.58
> 02:25:01 AM 2 6.84 0.00 2.22 11.88 0.06
> 79.01
> 02:25:01 AM 3 8.11 0.00 2.53 13.31 0.06
> 75.98
>
> We're running this on a m1.xlarge instance on ec2, with writing RDB files to
> an EBS volume, which I realize could be a point of contention. (how big
> though?)

EBS is remote data storage. Any data you write to your EBS disk gets
written over the network. Think of it like NFS. So any time your MySQL
reads/writes, it is doing a network read/write. Any time Redis dumps
to an RDB file, the data gets written to the network. If any of your
slaves loses sync and reconnects, a dump occurs (over the network),
which then gets read over the network again to sync to the slave
(unless you've got enough spare memory for it not to be an issue).
Check your free memory, buffers, and cache.

My recommendation:
1. Use an Amazon RDS instance with replication to multiple
availability zones. Let them take care of maintaining MySQL (also add
daily snapshots, just in case).
2. Switch Redis to an instance-store backed EC2 VM, use one of the
filesystem watch applications to notice when a dump occurs, which then
can signal an upload to S3 (explicitly limit your bandwidth to ensure
that this doesn't kill your network utilization), and remember to
create a machine image for this after you've gotten it set up right.

These two changes will not only improve your reliability and
resilience to EC2 outages, but your network hiccups will go away.

- Josiah
> https://groups.google.com/d/msg/redis-db/-/RZtpVRcW8NoJ.

Michael Chang

unread,
Dec 13, 2012, 1:52:12 PM12/13/12
to redi...@googlegroups.com
Sorry, where does MySQL play into this?  By network hiccups, are you referring to the %iowait?

Yiftach Shoolman

unread,
Dec 13, 2012, 3:07:38 PM12/13/12
to redi...@googlegroups.com
Michael, can you share with us your iostat during the latency time, i.e.: iostat -x 2




To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/UWtm8e0yQKIJ.

To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.



--

Yiftach Shoolman
+972-54-7634621

Josiah Carlson

unread,
Dec 13, 2012, 3:58:15 PM12/13/12
to redi...@googlegroups.com
I thought I read that you were running MySQL on that same machine... I
must have confused your post with someone else's.

That said, EBS during snapshot is the most likely cause of your
network hiccups. And by network hiccups, I mean your occasional
multi-second latency spike. You can verify this by running the latency
tests over 60 seconds and signaling a BGSAVE on one of your Redis
servers during that time.

My apologies,
- Josiah
> https://groups.google.com/d/msg/redis-db/-/UWtm8e0yQKIJ.

M. Edward (Ed) Borasky

unread,
Dec 14, 2012, 2:35:13 AM12/14/12
to redi...@googlegroups.com
I'm not sure sar/iostat numbers are valid inside a guest VM on a
hypervisor, but if they are, the iowait numbers you're showing
indicate that you're not *reading* data off the 'disk' as fast as you
need to be for maximum throughput / minimum response times. Iostat
will show you the 'block devices' and partitions that are heavily
utilized, and it will also show you whether you're read or write
limited. I'm guessing read from the iowait numbers.

If, as Josiah notes, your 'disks' are actually talking to your
instance via a network connection, there's not going to be much you
can do except get a higher transfer rate over that connection. On a
machine with actual disks, adding RAM for page cache would help cut
down the read activity and the iowait.

I haven't done much with Amazon - most of my hypervisor experience is
with VMware ESX / Virtual Center. There we had *oodles* of tools at
the hypervisor level for guest performance monitoring and tuning. Is
there something like that on Amazon?
--
Twitter: http://twitter.com/znmeb; Computational Journalism Publishers
Workbench: http://znmeb.github.com/Computational-Journalism-Publishers-Workbench/

How the Hell can the lion sleep with all those people singing "A weem
oh way!" at the top of their lungs?

Salvatore Sanfilippo

unread,
Dec 14, 2012, 8:44:27 AM12/14/12
to Redis DB
On Thu, Dec 13, 2012 at 3:38 AM, Michael Chang <mi...@tellapart.com> wrote:
> Just to confirm, redis-cli --latency measures responsiveness of a
> redis-server process, correct? E.g. it sends a ping to a redis-server
> process and measures how long it takes to get a response?

Yes, whatever the cause for your latency is, if it is up to Redis in
some way (slow disks, slow fork, slow commands executed) you *must* be
able to see this with redis-cli --latency.

Just keep it running for some time, and see what the max latency reported is.

If with redis-cli --latency everything is ok, then the problem is
probably on the client side.

Cheers,
Salvatore
Reply all
Reply to author
Forward
0 new messages