This is a huge problem, isn't it?
If a server, or a process, crashes for some reasons after the reboot/restart the biggest is the database the longest it takes to be available to users.
Why Lucene indexes get corrupted? On other products based on Lucene (such as Solr) this does not happens.
I'm asking because I do not see as an option the need to introduce replication in order to prevent a failure as the one Paul is facing.
This is particularly important when deploying to Azure where to lifecycle of the VM/Web/Worker/Role is totally out of control.
.m