I wrote a blog post related to this a few weeks ago exploring what can be
done from a deployment & uptime POV regarding node [[
In a nutshell, you can use the native cluster module to manage a number of
workers which handle the processing of requests, can be started/stopped as
needed, and can be re-spawned when they die. You can also do 0 down-time
deployments by rolling over your workers as there is new code to load in.
Cluster workers can share ports and open sockets, and requests will be
shared between them.
The problem of storing state between failures/deployments is not really a
node issue, but if you write to a persistant store (DB, disk, ect), you
should be able to recover in the normal way. Another fun addition node
adds is that there is a message passing interface within the cluster module
between master and slaves, so you might also investigate keeping a replica
of any important data in the master's memory space to seed new children
with. You might want to look more into this if the "state" data moves too
fast for redis or a database.