Problems with performance

Teresa F B

unread,

Jul 27, 2017, 11:19:29 AM7/27/17

to Redis DB

We have some issues with redis, we set watchdog to 500. We don't know why but there's a moment where we got more than 200 clients and queries to redis slow down. I attach the server redis conf. We have configured 6 DB with a JedisPool and we store the content compressed in the db

JedisPoolConfig jedisPoolConfig = new JedisPoolConfig();
                    jedisPoolConfig.setMaxTotal(1000);
                    jedisPoolConfig.setBlockWhenExhausted(false);
                    jedisPoolConfig.setMaxWaitMillis(100);

------- data ------ --------------------- load -------------------- - child -
keys       mem      clients blocked requests            connections
967485     496.69M 319     0       47127817 (+0)       800908
967508     496.97M 323     0       47128813 (+996)     800973
967520     494.59M 323     0       47129648 (+835)     801030
967499     485.71M 323     0       47129747 (+99)      801030

--- WATCHDOG TIMER EXPIRED ---
EIP:
/lib64/libpthread.so.0(write+0x2d)[0x3a1660e7cd]

Backtrace:
/usr/local/bin/redis-server localhost:6379(logStackTrace+0x3c)[0x45c3bc]
/usr/local/bin/redis-server localhost:6379(watchdogSignalHandler+0x1b)[0x45c46b]
/lib64/libpthread.so.0[0x3a1660f7e0]
/lib64/libpthread.so.0(write+0x2d)[0x3a1660e7cd]
/usr/local/bin/redis-server localhost:6379(writeToClient+0x53)[0x433273]
/usr/local/bin/redis-server localhost:6379(handleClientsWithPendingWrites+0x5b)[0x4335cb]
/usr/local/bin/redis-server localhost:6379(beforeSleep+0x5d)[0x427f9d]
/usr/local/bin/redis-server localhost:6379(aeMain+0x1e)[0x41fe5e]
/usr/local/bin/redis-server localhost:6379(main+0x370)[0x4275e0]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3a15e1ed5d]
/usr/local/bin/redis-server localhost:6379[0x41d339]
11410:signal-handler (1501167531) --------

11410:signal-handler (1501167532)
--- WATCHDOG TIMER EXPIRED ---
EIP:
/lib64/libpthread.so.0(write+0x2d)[0x3a1660e7cd]

Backtrace:
/usr/local/bin/redis-server localhost:6379(logStackTrace+0x3c)[0x45c3bc]
/usr/local/bin/redis-server localhost:6379(watchdogSignalHandler+0x1b)[0x45c46b]
/lib64/libpthread.so.0[0x3a1660f7e0]
/lib64/libpthread.so.0(write+0x2d)[0x3a1660e7cd]
/usr/local/bin/redis-server localhost:6379(writeToClient+0x53)[0x433273]
/usr/local/bin/redis-server localhost:6379(handleClientsWithPendingWrites+0x5b)[0x4335cb]
/usr/local/bin/redis-server localhost:6379(beforeSleep+0x5d)[0x427f9d]
/usr/local/bin/redis-server localhost:6379(aeMain+0x1e)[0x41fe5e]
/usr/local/bin/redis-server localhost:6379(main+0x370)[0x4275e0]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3a15e1ed5d]
/usr/local/bin/redis-server localhost:6379[0x41d339]
11410:signal-handler (1501167532) --------

6379.conf

hva...@gmail.com

unread,

Jul 27, 2017, 2:51:56 PM7/27/17

to Redis DB

What do your server monitoring graphs say about the machine's disk i/o and memory consumption statistics around the times your Redis seems to get slow and connections from clients increase?

Teresa F B

unread,

Jul 28, 2017, 8:19:32 AM7/28/17

to Redis DB

Hi,

This is the info:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

6 0 72612 1171972 2368688 8796536 0 1 43 112 0 0 50 7 43 1 0

6 0 72612 1171972 2368692 8796768 0 0 40 112 11876 9482 48 8 44 0 0

11 0 72612 1164848 2368712 8797104 0 0 88 204 11963 13856 74 4 22 0 0

------- data ------ --------------------- load -------------------- - child -
keys mem clients blocked requests connections

681668     380.97M 316     0       98277451 (+742)     1198884
681681     380.56M 315     0       98278684 (+1233)    1198945
681689     381.07M 309     0       98279881 (+1197)    1199010
681705     380.46M 308     0       98281197 (+1316)    1199108
681717     379.45M 304     0       98282467 (+1270)    1199215
681731     380.92M 303     0       98283620 (+1153)    1199286
681739     381.67M 313     0       98284907 (+1287)    1199373
681758     381.89M 324     0       98286080 (+1173)    1199485
681781     382.29M 332     0       98287298 (+1218)    1199561

And doing a redis-benchmark performance is really awfull.
redis-benchmark -t get -c 300
4688.45 requests per second

Teresa F B

unread,

Jul 28, 2017, 9:07:28 AM7/28/17

to Redis DB

sar26

hva...@gmail.com

unread,

Jul 31, 2017, 12:40:04 PM7/31/17

to Redis DB

Thanks, but the raw output of vmstat and the redis-cli stats don't really help, especially without comparing the figures during slow periods and non-slow periods. Are you graphing the sar data, especially the readout of how busy the disk device is? Check those graphs for busy disks at the times your Redis is slow.

Salvatore Sanfilippo

unread,

Jul 31, 2017, 12:46:13 PM7/31/17

to redi...@googlegroups.com

Hello, that's quite strange since it appears that the process is
blocked on write(2) while it is writing to non blocking sockets.
With 200 clients, such a time to complete the writes is not normal at
all. Maybe the kernel is not able to provide run time to the Redis
process since the system load was too high?
When this kind of things are happening, you may want to measure the
intrinsic latency in the system using redis-cli (running the redis-cli
command in intrinsic latency measurement more directly inside the box
running Redis).

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.

--
Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com

"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.

Salvatore Sanfilippo

unread,

Jul 31, 2017, 12:47:44 PM7/31/17

to redi...@googlegroups.com

Thanks for your help hvarzan,

I wonder if you are implying something similar to what I wrote, that
is, when the kernel is busy elsewhere (high load or big I/O), it is
not able to schedule the process to run.

hva...@gmail.com

unread,

Jul 31, 2017, 12:56:02 PM7/31/17

to Redis DB

Hi Antirez, my requests are oriented toward checking out the case where snapshot writes to disk are making the disks very busy and this makes everything on the machine slow. This might not be Teresa's issue, but it's a very common one, and worth confirming or eliminating quickly.

Salvatore Sanfilippo

unread,

Jul 31, 2017, 1:10:23 PM7/31/17

to redi...@googlegroups.com

Yep, that's what I had in mind. While counter-intuitive Linux kernels
have a long history of busy disks affecting the ability of the
scheduler to provide run-time to processes not directly using the
disk. Maybe somebody here knows the root cause of this behavior that I
never studied, but is a famous one AFAIK.

Teresa F B

unread,

Aug 1, 2017, 12:22:38 PM8/1/17

to Redis DB

Hi,

I show you our system status when we experienced the problems with redis yesterday at 18:22h. We restarted the systems almost half and hour later.

I executed the latency program when we experienced problems and as you can see the average latency is quite high, I don't know if it's still acceptable. What do you think?

This is also the increase in the sockets that we detect on the system when we have the bug.

TIME totsck tcpsck udpsck rawsck ip-frag tcp-tw

12:30:01 PM 1577 1188 30 1 0 234

12:35:01 PM 1940 1557 30 1 0 208

12:40:02 PM 2871 2487 30 1 0 179

12:45:01 PM 3321 2947 30 1 0 271

Thanks a lot!

SystemStatus.png

LatencyNodo1.png

hva...@gmail.com

unread,

Aug 1, 2017, 11:05:26 PM8/1/17

to Redis DB

Well, your disk graph doesn't show how busy the disk is, but it does show a large increase in disk trasactions/sec at pretty much the same time you report the slowness.

My suggestion is to perform a test to see if Redis is causing this by saving the data to disk in snapshots: Disable snapshots in Redis and see if the slowness and spikes in disk TPS stop. Then enable snapshots again and see if the slowness and spikes in disk TPS return.

hva...@gmail.com

unread,

Aug 1, 2017, 11:31:19 PM8/1/17

to Redis DB

My theory is that the connection between heavy disk i/o and processes getting blocked is not too mysterious. I think doesn't happen with heavy i/o on any disk, but rather heavy i/o on the *swap* disk.

I believe this is a fairly common situation: A single disk machine with the kernel swappiness setting that's unchanged (still at default), with a long-running Redis server process, and for reasons known only to the VM subsystem, the kernel has decided to swap some pages from running processes, including Redis. Redis clients keep a steady amount of write activity, but only to a portion of the database.

So when a snapshot starts:

The writes make the disk very busy. Memory is fast and disk is slow, so the kernel's buffer for the RDB file fills up with pending writes.
The client write activity starts making the memory consumption of the parent and child Redis processes grow (because of copy-on-write behavior), reducing the free memory
The parent and the child Redis reach memory pages that were swapped to disk, and are blocked by the kernel until the pages can be read back from the extremely busy disk
Even after the child process is blocked, the disk remains busy for minutes because of the buffered writes waiting to be delivered to the disk

I have seen reports on this list where snapshots caused slowness, but moving the RDB file to a different disk made a big improvement.

Teresa F

unread,

Aug 2, 2017, 12:47:39 AM8/2/17

to redi...@googlegroups.com

Thanks hvarzan, we changed the conf some days ago for not saving to disk

#
# save ""

#save 900 1
#save 300 10
#save 60 10000

But we still get the error:

--- WATCHDOG TIMER EXPIRED ---
EIP:

/lib64/libpthread.so.0(write+0x2d)[0x339280e7cd]

Backtrace:
/usr/local/bin/redis-server localhost:6379(logStackTrace+0x3c)[0x45c3bc]
/usr/local/bin/redis-server localhost:6379(watchdogSignalHandler+0x1b)[0x45c46b]

/lib64/libpthread.so.0[0x339280f7e0]
/lib64/libpthread.so.0(write+0x2d)[0x339280e7cd]

/usr/local/bin/redis-server localhost:6379(writeToClient+0x53)[0x433273]
/usr/local/bin/redis-server localhost:6379(handleClientsWithPendingWrites+0x5b)[0x4335cb]
/usr/local/bin/redis-server localhost:6379(beforeSleep+0x5d)[0x427f9d]
/usr/local/bin/redis-server localhost:6379(aeMain+0x1e)[0x41fe5e]
/usr/local/bin/redis-server localhost:6379(main+0x370)[0x4275e0]

/lib64/libc.so.6(__libc_start_main+0xfd)[0x339201ed5d]
/usr/local/bin/redis-server localhost:6379[0x41d339]
34340:signal-handler (1501629414) --------

34340:signal-handler (1501629472)

--- WATCHDOG TIMER EXPIRED ---
EIP:

/lib64/libpthread.so.0(write+0x2d)[0x339280e7cd]

Backtrace:
/usr/local/bin/redis-server localhost:6379(logStackTrace+0x3c)[0x45c3bc]
/usr/local/bin/redis-server localhost:6379(watchdogSignalHandler+0x1b)[0x45c46b]

/lib64/libpthread.so.0[0x339280f7e0]
/lib64/libpthread.so.0(write+0x2d)[0x339280e7cd]

/usr/local/bin/redis-server localhost:6379(writeToClient+0x53)[0x433273]
/usr/local/bin/redis-server localhost:6379(handleClientsWithPendingWrites+0x5b)[0x4335cb]
/usr/local/bin/redis-server localhost:6379(beforeSleep+0x5d)[0x427f9d]
/usr/local/bin/redis-server localhost:6379(aeMain+0x1e)[0x41fe5e]
/usr/local/bin/redis-server localhost:6379(main+0x370)[0x4275e0]

/lib64/libc.so.6(__libc_start_main+0xfd)[0x339201ed5d]
/usr/local/bin/redis-server localhost:6379[0x41d339]
34340:signal-handler (1501629472) --------

Is there a need to have uncommented save ""? I don't see any modifications in dump.rdb since Jul 30 03:53

Thanks for your help!

You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/ClfnwrpK5ic/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+unsubscribe@googlegroups.com.

To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Problems with performance - Watchdog

Teresa F B

hva...@gmail.com

Teresa F B

Teresa F B

hva...@gmail.com

Salvatore Sanfilippo

Salvatore Sanfilippo

hva...@gmail.com

Salvatore Sanfilippo

Teresa F B

hva...@gmail.com

hva...@gmail.com

Teresa F