signal-handler (1602304033) Received SIGTERM scheduling shutdown...

3,816 views
Skip to first unread message

vuvant...@gmail.com

unread,
Oct 10, 2020, 7:32:28 AM10/10/20
to Redis DB

Please be allow me to ask a question here relevant to Redis 5.0.5 (00000000/0) 64 bit.

I have an production on EC2 AWS, it's running LEMP Stack (PHP 7.3.9 (cli) (built: Aug 27 2019 22:52:39) ( NTS )/nginx version: nginx/1.12.2/Server version: 10.4.8-MariaDB-log MariaDB Server)

Currently, we have recorded sometime web server down in 1-2 minutes. At the time, I have looked at the logs file (Nginx/Php/Redis). None one of them points out to me what's the problem happening? Then I was looking at the Redis logs, and I have a little doubt about this. Please can you look at it as the logs below?

5214:M 10 Oct 2020 04:10:25.329 # Accepting client connection: accept: Too many open files in system
5214:M 10 Oct 2020 04:12:23.094 * 10 changes in 300 seconds. Saving...
5214:M 10 Oct 2020 04:12:23.097 * Background saving started by pid 9727
9727:C 10 Oct 2020 04:12:26.441 * DB saved on disk
9727:C 10 Oct 2020 04:12:26.442 * RDB: 6 MB of memory used by copy-on-write
5214:M 10 Oct 2020 04:12:26.503 * Background saving terminated with success
5214:signal-handler (1602304033) Received SIGTERM scheduling shutdown...
5214:M 10 Oct 2020 04:27:13.456 # User requested shutdown...
5214:M 10 Oct 2020 04:27:13.457 * Saving the final RDB snapshot before exiting.
5214:M 10 Oct 2020 04:27:16.869 * DB saved on disk
5214:M 10 Oct 2020 04:27:16.869 * Removing the pid file.
5214:M 10 Oct 2020 04:27:16.870 # Redis is now ready to exit, bye bye...
5221:C 10 Oct 2020 04:26:01.063 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
5221:C 10 Oct 2020 04:26:01.064 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=5221, just started
5221:C 10 Oct 2020 04:26:01.064 # Configuration loaded
5221:C 10 Oct 2020 04:26:01.064 * supervised by systemd, will signal readiness
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 5.0.5 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 5221
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

5221:M 10 Oct 2020 04:26:01.067 # Server initialized
5221:M 10 Oct 2020 04:26:01.067 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
5221:M 10 Oct 2020 04:26:01.100 * <graph> Thread pool created, using 16 threads.
5221:M 10 Oct 2020 04:26:01.101 * Module 'graph' loaded from /usr/local/src/RedisGraph/src/redisgraph.so
5221:M 10 Oct 2020 04:26:05.774 * DB loaded from disk: 4.674 seconds
5221:M 10 Oct 2020 04:26:05.774 * Ready to accept connections
5221:M 10 Oct 2020 04:31:02.095 * 10 changes in 300 seconds. Saving...
5221:M 10 Oct 2020 04:31:02.097 * Background saving started by pid 6790
6790:C 10 Oct 2020 04:31:05.450 * DB saved on disk
6790:C 10 Oct 2020 04:31:05.451 * RDB: 34 MB of memory used by copy-on-write
5221:M 10 Oct 2020 04:31:05.502 * Background saving terminated with success

I do research aground the Internet to find out about this issue, but I'm not sure about this case. I have to recreate the Redis in the Lab environment, which has the same production environment.
1- Kill the Redis
2- Everything is normal. The webserver still running.

--> It's doesn't help me to understand this case.

I would really appreciate any ideas and advice.

Regards,


Greg Andrews

unread,
Oct 10, 2020, 5:01:10 PM10/10/20
to Redis DB
The message about not accepting new connections because the Redis server process had no more available file descriptors is likely due to your operating system's configuration applying a limit on the Redis server process that's too low.  If you search the Internet on terms like "file descriptors" along with "web server" or "Redis server" you will find more detailed explanations.  You will also find the file descriptor limit as a performance bottleneck (and several other types) in this blog post by Trivago:  https://tech.trivago.com/2017/01/25/learn-redis-the-hard-way-in-production/

If you had mentioned which Linux distro and version you're using, and whether you installed your Redis from a package provided as part of the distro or from a third-party package, or compiled it yourself from source code, and I might be able to suggest how the file descriptor limit is being applied when your Redis server process is launched, and how you might be able to expand that limit.

The message about being killed is because some process that has the right permissions issued the signal to make the process stop.  There can be a variety of sources for such a signal, but the most common ones are:
  1. The kernel detected that the amount of available RAM was running low and, following the guidelines in its configuration, started killing processes that were using the largest amount of RAM.  If this happened, there will be messages to this effect in the kernel or general Linux logfile (the particular file is different for different Linux distros).
  2. The systemd daemon (which launches and terminates processes) tried to probe the Redis server process to make sure it was still alive.  The probe was unsuccessful, so systemd skilled the process and did not re-start it.  If this happened, there will be messages to this effect in the output of 'systemctl status <redis-service-name>' or a log that the journalctl command will display.
  3. Redis Sentinel or another monitoring/control program decided to kill the Redis server process.  If this happened, the should be messages to this effect in the logfiles written by the program.

Is there a connection between the complaint about an error accepting new connections and the termination signal?  Maybe.  The termination happened 17 minutes later so there's no obvious connection.  However, if monitoring by systemd or Sentinel or other monitoring has a 15-20 minute interval, then the delay between Redis's complaint and the time the monitoring process decided to terminate Redis can make sense.

With respect to the possibility of the server running low on available RAM, what do your monitoring graphs of cpu, memory, and network usage show you around that time?

Vu Van Than

unread,
Oct 12, 2020, 1:44:42 AM10/12/20
to redi...@googlegroups.com

Dear Greg Andrews, 


Thank you for sending me the information as well explained more in detail in this case. 


Please allow me to ask you more a question. As I mentioned previously in this message, I would like to provide you with more information about the environment I'm running on EC2 Production. 


[1] Linuy distro (CentOS Linux release 7.6.1810 (Core))


[2] I have installed Redis from a package provided as part of the distro with Rpm.


[3] I checked CPU/RAM/usage around that time. It's low 


[4] As far as I know in the blogs you give me before.


Tip Redis is single-threaded. Every command may block other commands. Keep this in mind if you start thinking about problems or use cases. This seems obvious but is mostly overlooked and the root cause of many issues.


Is there a connection that will be killed or Redis will keep blocking the access that will cause our customers can't access our web at that time? 


[5] it's the files /etc/php-fpm.d/www.conf.


php_value[session.save_handler] = redis

php_value[session.save_path] = tcp://127.0.0.1:6379


I want to mention it because if we have a Redis, the website is still alive. Is it can't stop or deny access from outside?


Only a hypothesis can happen, it's Redis blocking access, PHP-FPM is waiting for the Redis release session. It denies a new connection.


[6] I checked the Redis logs and discovered that Redis was saving data to disk every few minutes:


24808:C 12 Oct 2020 04:07:15.620 * DB saved on disk

24808:C 12 Oct 2020 04:07:15.622 * RDB: 8 MB of memory used by copy-on-write

5221:M 12 Oct 2020 04:07:15.701 * Background saving terminated with success

5221:M 12 Oct 2020 04:12:16.006 * 10 changes in 300 seconds. Saving...

5221:M 12 Oct 2020 04:12:16.008 * Background saving started by pid 30330

30330:C 12 Oct 2020 04:12:19.526 * DB saved on disk

30330:C 12 Oct 2020 04:12:19.527 * RDB: 8 MB of memory used by copy-on-write

5221:M 12 Oct 2020 04:12:19.610 * Background saving terminated with success

5221:M 12 Oct 2020 04:17:20.085 * 10 changes in 300 seconds. Saving...

5221:M 12 Oct 2020 04:17:20.087 * Background saving started by pid 3544

3544:C 12 Oct 2020 04:17:23.616 * DB saved on disk

3544:C 12 Oct 2020 04:17:23.618 * RDB: 8 MB of memory used by copy-on-write

5221:M 12 Oct 2020 04:17:23.692 * Background saving terminated with success


[7] php-fpm log


[11-Oct-2020 16:00:09] WARNING: [pool www] child 21023, script '/var/www/index.php' (request: "GET /index.php") executing too slow (3.967548 sec), logging

[11-Oct-2020 16:00:09] NOTICE: child 21023 stopped for tracing

[11-Oct-2020 16:00:09] NOTICE: about to trace 21023

[11-Oct-2020 16:00:09] NOTICE: finished trace of 21023

[11-Oct-2020 16:00:13] WARNING: [pool www] child 12154, script '/var/www/index.php' (request: "POST /index.php") executing too slow (3.552117 sec), logging



[8] As far I know from the blog "One connection per request" I quote it again. 


We accepted the challenge with the ever growing amount of traffic in the following months and further optimized our application and stack in several ways. Some of our goals were tackling the consumption of memory per request, optimizing our database queries (slow query log), tuning our caching layers and adding more hardware (web servers) to our datacenters. Especially the last change, adding more web servers to our stack, created yet another challenge.


As mentioned earlier, we were dealing with stateless applications. Without the usage of php-fpm and persistent connections this means:


An HTTP request comes in

The application creates connections to various services

Operations are executed, queries are run, requests are made, etc.

The application closes connections created in 2.

The reponse is delivered to the client

This worked great so far and this is the way many applications work. But if you scale the number of servers that can accept incoming requests, your traffic grows and you don`t pay special attention to your 3rd party components this can go wrong. Very wrong.


Depending on the request, the application creates third party connections, executes one or two commands and disconnects again. 50% up to 75% of the commands we execute are used for connection handling. Remember Redis is single threaded. If you have a lot of clients that try to connect to your Redis instance continuously, you will keep your instance busy with connection handling instead of executing the commands you run your business logic on. This may lead to a slowdown/blocking of your Redis instance. The (simplified) image above visualizes the problem. Every arrow represents one HTTP client request.


With the logs above I think "At some point in time one machine will not be able to handle those use cases alone anymore, because the data won’t fit into RAM
the number of read/write requests is to high for one machine
As a follow up we started to shard our data over several machines using consistent hashing".

I'm looking forward to hearing from you soon. 

Regards,





--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redis-db/da186d26-d40b-4056-8d2e-509d380078c3n%40googlegroups.com.


--
___________________________________________
With best regards, and please stay safe!
Vu Van Than,
DevSecOps

David Maier (Redis Labs)

unread,
Oct 12, 2020, 7:50:20 AM10/12/20
to redi...@googlegroups.com
Hi Vu Van Than,

I have the feeling that you gave yourself an answer already, but let me try to understand/summarize:
  1. `Currently, we have recorded sometime web server down in 1-2 minutes`: I guess 'down' means in this case that the web server is not responsive for a specific period of time. Or do you mean literally 'down'?
  2. `We were dealing with stateless applications`: Yeah, this means that each request causes a new connection to Redis. Here an example: 100K requests happening in parallel would need 100K connections. So how many requests are happening in parallel?
  3. `Accepting client connection: accept: Too many open files in system`: This means that Redis can't accept any new connections because you didn't configure enough file descriptors on the OS side. Each socket connection needs a file descriptor.
So given 1..3, I would assume that the following happens:
  • Your application has comparable high response times without using a cache, especially under high loads. Maybe the response times are such high (seconds) that your monitoring solution identifies them as 'downtime'.
  • The higher the load, the more connections are opened to Redis, BUT Redis can't accept any new connections when you reach the configured file-descriptor limit So the cache is not available for at least a portion of your requests. This has in the first step nothing to do with the single threaded character of Redis, and also not with writing files in the background. It's just the case that Redis can't open new Sockets.
  • So I would say that your workload is just higher than for what you configured the Redis hosts.
Here some proposals:
  • The short term solution would be to increase the number of file-descriptors, but this might just heal the symptom.
  • In addition, you might consider using connection pooling. If your application can't be easily refactored to have a better connection management, then another option could be to put a proxy in the middle (between Redis and the applications) which acts as a connection multiplexer.
  • I would also recommend that you review the sizing of your Redis solution because I got the feeling that you are running tons of application servers that are all ending up on a single Redis instance. The charm of Redis is that it is easily scalable. So you should scale it together with your service tier.
This is a community forum, so I am not trying to sell Redis Enterprise here. However, just mentioning (along with some other options) that Redis Enterprise has a multi-threaded proxy built-in to address exactly such cases. There are other OSS proxies out there that might also help you (e.g., twemproxy). You might also want to take a closer look at Redis Cluster and the Redis Cluster Proxy project (https://github.com/RedisLabs/redis-cluster-proxy).

Regards,
David







--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.


--
David Maier
Director of Technical Enablement






Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

Vu Van Than

unread,
Oct 13, 2020, 12:41:46 AM10/13/20
to redi...@googlegroups.com
Dear David Maier,

Thank you for your time and consideration.

It's clear to me now.



--
___________________________________________
With best regards, and please stay safe!
Vu Van Than,
SecOps
Reply all
Reply to author
Forward
0 new messages