Basically the nodes were scaled down in GKE Kubernetes from 5 to 0. This would take down the nodes in the order, 4, 3, 2, 1, and then 0, which was the corrupt node.
On Thursday, April 16, 2020 at 6:43:38 PM UTC+1, Greg Young wrote:
This sounds like something odd has happened ... How did you take nodes down etc? I would usually say it smells like caching ...
But ...
The values are quite far apart! The chaser checkpoint is almost 50 MB ahead of the writer checkpoint ... could it be copied from time etc etc etc?
We recently scaled down our Eventstore cluster from 5 nodes to 0 as a test.
Node 0 failed to come up with the following.
[00001,01,15:16:01.561] "WRITER CHECKPOINT:" 12747737373 (0x2F7D3091D) [00001,01,15:16:01.567] "CHASER CHECKPOINT:" 12795330217 (0x2FAA93EA9) [00001,01,15:16:01.567] "EPOCH CHECKPOINT:" 12745752545 (0x2F7B4BFE1) [00001,01,15:16:01.567] "TRUNCATE CHECKPOINT:" -1 (0xFFFFFFFFFFFFFFFF) [00001,01,15:16:01.749] MessageHierarchy initialization took 00:00:00.1641696. [00001,01,15:16:01.782] Unhandled exception while starting application: EXCEPTION OCCURRED Corrupt database detected. [00001,01,15:16:01.801] "Corrupt database detected. Checkpoint 'chaser' has greater value than writer checkpoint."
While we were able to recover from the other 4 nodes, is there anything we can do to prevent or repair this?
--
You received this message because you are subscribed to the Google Groups "Event Store" group.