Problems with performance - Watchdog

150 views
Skip to first unread message

Teresa F B

unread,
Jul 27, 2017, 11:19:29 AM7/27/17
to Redis DB
We have some issues with redis, we set watchdog to 500. We don't know why but there's a moment where we got more than 200 clients and queries to redis slow down. I attach the server redis conf. We have configured 6 DB with a JedisPool and we store the content compressed in the db

 JedisPoolConfig jedisPoolConfig = new JedisPoolConfig();
                    jedisPoolConfig.setMaxTotal(1000);
                    jedisPoolConfig.setBlockWhenExhausted(false);
                    jedisPoolConfig.setMaxWaitMillis(100);

------- data ------ --------------------- load -------------------- - child -
keys       mem      clients blocked requests            connections
967485     496.69M  319     0       47127817 (+0)       800908
967508     496.97M  323     0       47128813 (+996)     800973
967520     494.59M  323     0       47129648 (+835)     801030
967499     485.71M  323     0       47129747 (+99)      801030


--- WATCHDOG TIMER EXPIRED ---
EIP:
/lib64/libpthread.so.0(write+0x2d)[0x3a1660e7cd]

Backtrace:
/usr/local/bin/redis-server localhost:6379(logStackTrace+0x3c)[0x45c3bc]
/usr/local/bin/redis-server localhost:6379(watchdogSignalHandler+0x1b)[0x45c46b]
/lib64/libpthread.so.0[0x3a1660f7e0]
/lib64/libpthread.so.0(write+0x2d)[0x3a1660e7cd]
/usr/local/bin/redis-server localhost:6379(writeToClient+0x53)[0x433273]
/usr/local/bin/redis-server localhost:6379(handleClientsWithPendingWrites+0x5b)[0x4335cb]
/usr/local/bin/redis-server localhost:6379(beforeSleep+0x5d)[0x427f9d]
/usr/local/bin/redis-server localhost:6379(aeMain+0x1e)[0x41fe5e]
/usr/local/bin/redis-server localhost:6379(main+0x370)[0x4275e0]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3a15e1ed5d]
/usr/local/bin/redis-server localhost:6379[0x41d339]
11410:signal-handler (1501167531) --------

11410:signal-handler (1501167532)
--- WATCHDOG TIMER EXPIRED ---
EIP:
/lib64/libpthread.so.0(write+0x2d)[0x3a1660e7cd]

Backtrace:
/usr/local/bin/redis-server localhost:6379(logStackTrace+0x3c)[0x45c3bc]
/usr/local/bin/redis-server localhost:6379(watchdogSignalHandler+0x1b)[0x45c46b]
/lib64/libpthread.so.0[0x3a1660f7e0]
/lib64/libpthread.so.0(write+0x2d)[0x3a1660e7cd]
/usr/local/bin/redis-server localhost:6379(writeToClient+0x53)[0x433273]
/usr/local/bin/redis-server localhost:6379(handleClientsWithPendingWrites+0x5b)[0x4335cb]
/usr/local/bin/redis-server localhost:6379(beforeSleep+0x5d)[0x427f9d]
/usr/local/bin/redis-server localhost:6379(aeMain+0x1e)[0x41fe5e]
/usr/local/bin/redis-server localhost:6379(main+0x370)[0x4275e0]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3a15e1ed5d]
/usr/local/bin/redis-server localhost:6379[0x41d339]
11410:signal-handler (1501167532) --------

6379.conf

hva...@gmail.com

unread,
Jul 27, 2017, 2:51:56 PM7/27/17
to Redis DB
What do your server monitoring graphs say about the machine's disk i/o and memory consumption statistics around the times your Redis seems to get slow and connections from clients increase?

Teresa F B

unread,
Jul 28, 2017, 8:19:32 AM7/28/17
to Redis DB

Hi,

This is the info:


procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 6  0  72612 1171972 2368688 8796536    0    1    43   112    0    0 50  7 43  1  0

 6  0  72612 1171972 2368692 8796768    0    0    40   112 11876 9482 48  8 44  0  0

11  0  72612 1164848 2368712 8797104    0    0    88   204 11963 13856 74  4 22  0  0



------- data ------ --------------------- load -------------------- - child -
keys       mem      clients blocked requests            connections
681668     380.97M  316     0       98277451 (+742)     1198884
681681     380.56M  315     0       98278684 (+1233)    1198945
681689     381.07M  309     0       98279881 (+1197)    1199010
681705     380.46M  308     0       98281197 (+1316)    1199108
681717     379.45M  304     0       98282467 (+1270)    1199215
681731     380.92M  303     0       98283620 (+1153)    1199286
681739     381.67M  313     0       98284907 (+1287)    1199373
681758     381.89M  324     0       98286080 (+1173)    1199485
681781     382.29M  332     0       98287298 (+1218)    1199561


And doing a redis-benchmark performance is really awfull.
redis-benchmark -t get -c 300
4688.45 requests per second

Teresa F B

unread,
Jul 28, 2017, 9:07:28 AM7/28/17
to Redis DB
sar26

hva...@gmail.com

unread,
Jul 31, 2017, 12:40:04 PM7/31/17
to Redis DB
Thanks, but the raw output of vmstat and the redis-cli stats don't really help, especially without comparing the figures during slow periods and non-slow periods.  Are you graphing the sar data, especially the readout of how busy the disk device is?  Check those graphs for busy disks at the times your Redis is slow.

Salvatore Sanfilippo

unread,
Jul 31, 2017, 12:46:13 PM7/31/17
to redi...@googlegroups.com
Hello, that's quite strange since it appears that the process is
blocked on write(2) while it is writing to non blocking sockets.
With 200 clients, such a time to complete the writes is not normal at
all. Maybe the kernel is not able to provide run time to the Redis
process since the system load was too high?
When this kind of things are happening, you may want to measure the
intrinsic latency in the system using redis-cli (running the redis-cli
command in intrinsic latency measurement more directly inside the box
running Redis).
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.



--
Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com

"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.

Salvatore Sanfilippo

unread,
Jul 31, 2017, 12:47:44 PM7/31/17
to redi...@googlegroups.com
Thanks for your help hvarzan,

I wonder if you are implying something similar to what I wrote, that
is, when the kernel is busy elsewhere (high load or big I/O), it is
not able to schedule the process to run.

hva...@gmail.com

unread,
Jul 31, 2017, 12:56:02 PM7/31/17
to Redis DB
Hi Antirez, my requests are oriented toward checking out the case where snapshot writes to disk are making the disks very busy and this makes everything on the machine slow.  This might not be Teresa's issue, but it's a very common one, and worth confirming or eliminating quickly.

Salvatore Sanfilippo

unread,
Jul 31, 2017, 1:10:23 PM7/31/17
to redi...@googlegroups.com
Yep, that's what I had in mind. While counter-intuitive Linux kernels
have a long history of busy disks affecting the ability of the
scheduler to provide run-time to processes not directly using the
disk. Maybe somebody here knows the root cause of this behavior that I
never studied, but is a famous one AFAIK.

Teresa F B

unread,
Aug 1, 2017, 12:22:38 PM8/1/17
to Redis DB
Hi,
I show you our system status when we experienced the problems with redis yesterday at 18:22h. We restarted the systems almost half  and hour later. 

I executed the latency program when we experienced problems and as you can see the  average latency is quite high, I don't know if it's still acceptable. What do you think?

This is also the increase in the sockets that we detect on the system when we have the bug.

TIME                 totsck    tcpsck    udpsck    rawsck   ip-frag    tcp-tw

12:30:01 PM      1577      1188        30         1         0       234

12:35:01 PM      1940      1557        30         1         0       208

12:40:02 PM      2871      2487        30         1         0       179

12:45:01 PM      3321      2947        30         1         0       271


Thanks a lot!
SystemStatus.png
LatencyNodo1.png

hva...@gmail.com

unread,
Aug 1, 2017, 11:05:26 PM8/1/17
to Redis DB
Well, your disk graph doesn't show how busy the disk is, but it does show a large increase in disk trasactions/sec at pretty much the same time you report the slowness.

My suggestion is to perform a test to see if Redis is causing this by saving the data to disk in snapshots:  Disable snapshots in Redis and see if the slowness and spikes in disk TPS stop.  Then enable snapshots again and see if the slowness and spikes in disk TPS return.

hva...@gmail.com

unread,
Aug 1, 2017, 11:31:19 PM8/1/17
to Redis DB
My theory is that the connection between heavy disk i/o and processes getting blocked is not too mysterious.  I think doesn't happen with heavy i/o on any disk, but rather heavy i/o on the *swap* disk.

I believe this is a fairly common situation:  A single disk machine with the kernel swappiness setting that's unchanged (still at default), with a long-running Redis server process, and for reasons known only to the VM subsystem, the kernel has decided to swap some pages from running processes, including Redis.  Redis clients keep a steady amount of write activity, but only to a portion of the database.

So when a snapshot starts:
  • The writes make the disk very busy.  Memory is fast and disk is slow, so the kernel's buffer for the RDB file fills up with pending writes.
  • The client write activity starts making the memory consumption of the parent and child Redis processes grow (because of copy-on-write behavior), reducing the free memory
  • The parent and the child Redis reach memory pages that were swapped to disk, and are blocked by the kernel until the pages can be read back from the extremely busy disk
  • Even after the child process is blocked, the disk remains busy for minutes because of the buffered writes waiting to be delivered to the disk
I have seen reports on this list where snapshots caused slowness, but moving the RDB file to a different disk made a big improvement.

Teresa F

unread,
Aug 2, 2017, 12:47:39 AM8/2/17
to redi...@googlegroups.com
Thanks hvarzan, we changed the conf some days ago for not saving to disk

#
#   save ""

#save 900 1
#save 300 10
#save 60 10000



But we still get the error:



--- WATCHDOG TIMER EXPIRED ---
EIP:
/lib64/libpthread.so.0(write+0x2d)[0x339280e7cd]


Backtrace:
/usr/local/bin/redis-server localhost:6379(logStackTrace+0x3c)[0x45c3bc]
/usr/local/bin/redis-server localhost:6379(watchdogSignalHandler+0x1b)[0x45c46b]
/lib64/libpthread.so.0[0x339280f7e0]
/lib64/libpthread.so.0(write+0x2d)[0x339280e7cd]

/usr/local/bin/redis-server localhost:6379(writeToClient+0x53)[0x433273]
/usr/local/bin/redis-server localhost:6379(handleClientsWithPendingWrites+0x5b)[0x4335cb]
/usr/local/bin/redis-server localhost:6379(beforeSleep+0x5d)[0x427f9d]
/usr/local/bin/redis-server localhost:6379(aeMain+0x1e)[0x41fe5e]
/usr/local/bin/redis-server localhost:6379(main+0x370)[0x4275e0]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x339201ed5d]
/usr/local/bin/redis-server localhost:6379[0x41d339]
34340:signal-handler (1501629414) --------

34340:signal-handler (1501629472)

--- WATCHDOG TIMER EXPIRED ---
EIP:
/lib64/libpthread.so.0(write+0x2d)[0x339280e7cd]


Backtrace:
/usr/local/bin/redis-server localhost:6379(logStackTrace+0x3c)[0x45c3bc]
/usr/local/bin/redis-server localhost:6379(watchdogSignalHandler+0x1b)[0x45c46b]
/lib64/libpthread.so.0[0x339280f7e0]
/lib64/libpthread.so.0(write+0x2d)[0x339280e7cd]

/usr/local/bin/redis-server localhost:6379(writeToClient+0x53)[0x433273]
/usr/local/bin/redis-server localhost:6379(handleClientsWithPendingWrites+0x5b)[0x4335cb]
/usr/local/bin/redis-server localhost:6379(beforeSleep+0x5d)[0x427f9d]
/usr/local/bin/redis-server localhost:6379(aeMain+0x1e)[0x41fe5e]
/usr/local/bin/redis-server localhost:6379(main+0x370)[0x4275e0]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x339201ed5d]
/usr/local/bin/redis-server localhost:6379[0x41d339]
34340:signal-handler (1501629472) --------


Is there a need to have uncommented save ""? I don't see any modifications in dump.rdb since  Jul 30 03:53

Thanks for your help!


You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/ClfnwrpK5ic/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+unsubscribe@googlegroups.com.

To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.



--
Teresa Fínez
Reply all
Reply to author
Forward
0 new messages