On 2 February 2015 at 23:32:08, Jeff Outlaw (
jeffrey...@gmail.com) wrote:
> I am attempting to troubleshoot a database corruption issue.
> Attached is a dump from procmon where I see it accessing the various
> databases. The service continues to start then crash shortly
> after start. When I delete the 0.rdg file in C:\Windows\System32\config\system
> profile\AppData\Roaming\RabbitMQ\db\rabbit@MachineName-mnesia\msg_store_persistent
> and restart the service RMQ successfully comes back up again
> and stays running.
What was in the logs?
> This behavior can be reproduced/simulated by going into task
> manager and ending the erlsrv process. I realize this is a bad
> thing to do but I am simulating a hard shutdown here without actually
> powering off the PC. Side note I have seen several instances of
> the following message inside of Windows Event Viewer: RabbitMQ:
> Erlang machine voluntarily stopped. The service is not restarted
> as OnFail is set to ignore. this message shows up periodically
> and is on a client PC and they are NOT doing to the procedure I stated
> at the start of this paragraph but the result is the same the erlsrv.exe
> is shown as the source in eventviewer and is somehow crashing.
>
> I am running the latest RMQ 3.4.3 server on a Windows 7 x64 WORKSTATION
> with latest ERLang ( I stressed workstation because I am suspicious
> and have empirical evidence pointing to the fact that the PC is
> either not being shut down correctly or is going into hibernation
> mode or both ). Don't see these issues on a Windows Server 2012
> under our control.
>
> Comments and thoughts appreciated.
If you delete some of the files, that database directory can no longer be used,
and it's a matter of time for the message store to discover inconsistencies. Since
in RabbitMQ cannot restore magically missing data, it shuts down. If this is not acceptable,
you should use multiple nodes and mirror some or even all queues.
Note that deleting a random message store file is not really simulating a power failure. In case of
a power failure, chances are that message store will have some data only in RAM (writes to disk happen
in short periods or when RabbitMQ decides it is idle enough to do it) but won't necessarily result
in inconsistencies.
--
MK
Staff Software Engineer, Pivotal/RabbitMQ