I wrote a blog post related to this a few weeks ago exploring what can be done from a deployment & uptime POV regarding node [[
http://blog.evantahler.com/production-deployment-with-node-js-clusters ]]
In a nutshell, you can use the native cluster module to manage a number of workers which handle the processing of requests, can be started/stopped as needed, and can be re-spawned when they die. You can also do 0 down-time deployments by rolling over your workers as there is new code to load in. Cluster workers can share ports and open sockets, and requests will be shared between them.
The problem of storing state between failures/deployments is not really a node issue, but if you write to a persistant store (DB, disk, ect), you should be able to recover in the normal way. Another fun addition node adds is that there is a message passing interface within the cluster module between master and slaves, so you might also investigate keeping a replica of any important data in the master's memory space to seed new children with. You might want to look more into this if the "state" data moves too fast for redis or a database.