I asked a related question but got no response:
In my case I was focused on the error case.
Here is one thing that helps:
See here:
I'm trying to figure out what can postpone the moment when NodeJS fails completely.
This doesn't increase the performance of the NodeJS app if you measure performance internally to the app. But it does increase the practical performance of the app, if you measure from the outside. That is, we can forestall the moment when NodeJS dies completely.
One thing I've noticed, which is very worrying, is that once a NodeJS app runs out of memory it can take many minutes before it actually dies.
I have NodeJS running under the control of the daemon Supervisord, with this setup for restarts:
[program:dupe_res_v7]
command=/usr/local/bin/node --max_old_space_size=6000 /home/ec2-user/daemons/deduplication_api/v7/dupe-res/server.js
autostart=true
autorestart=true
startretries=100
numprocs=1
stderr_logfile=/var/log/dupe_res_stderr_v7.log
stdout_logfile=/var/log/dupe_res_stdout_v7.log
stdout_logfile_maxbytes=50MB
stderr_logfile_maxbytes=50MB
At some users (my fellow co-workers) wrote to me and said "Hey, the API is non-responsive." I looked into. The app would run out of money and freeze, sometimes for as long as 30 minutes, before it would die and be restarted by Supervisord.
I've learned that aggressive health checks are necessary to kill the app sooner, so that Supervisord can restart it sooner.
---- lawrence