-- Martin Krasser blog: http://krasserm.blogspot.com code: http://github.com/krasserm twitter: http://twitter.com/mrt1nz
- processor emits message on a default channel.
- message is not confirmed right away.
- processor does a successful snapshot.
- system crashes.
If processor replay restores from last snapshot, then we lost a message?
Use reliable channels behind snapshotted processors?
Or is there some smarter way that could take this into account, e.g. snapshot - 1 or similar, if system did not shutdown properly. I guess this one is probably not a walk in the park.
Yes looks great!
Couple of questions related to snapshots in general, not really to the proposal...
Though I do think journal implementations should handle the details of storage, do you think some standardization/utils around serialization formats, etc, make sense?
In practice how will this work if you are attempting to snapshot a processor with lots of state (say > 1GB)
Should the snapshottter be a different actor / process than the journal?
I wonder how to handle a situation like this: - processor emits message on a default channel. - message is not confirmed right away. - processor does a successful snapshot. - system crashes. If processor replay restores from last snapshot, then we lost a message?
Use reliable channels behind snapshotted processors?
Or is there some smarter way that could take this into account, e.g. snapshot - 1 or similar, if system did not shutdown properly. I guess this one is probably not a walk in the park.
The Snapshots proposal looks good.
Hi Alex,
Am 10.04.13 00:44, schrieb ahjohannessen:
I wonder how to handle a situation like this: - processor emits message on a default channel. - message is not confirmed right away. - processor does a successful snapshot. - system crashes. If processor replay restores from last snapshot, then we lost a message?
Correct, this can happen. Very good catch.
Use reliable channels behind snapshotted processors?
This would mitigate the risk but not completely avoid it. For example
- processor emits message to reliable channel
- processor does a successfull snapshot
- system crashes before reliable channel stores message together with ack
This is very unlikely but it can happen.
Or is there some smarter way that could take this into account, e.g. snapshot - 1 or similar, if system did not shutdown properly. I guess this one is probably not a walk in the park.
- a practical solution is to do a replay from snapshots that are older than a certain limit (see also this post). For example, if you only recover from snapshots older than 1 hour and you expect all receipt confirmations to occur within 1 hour, you should be on the safe side.
- to completely avoid the situation you mentioned, for default and reliable channels, you could still do a replay from scratch.
I added " Several snapshots per processor and selection criteria on ReplayParams" (= recovery from older snapshots) to ticket #8. Would this support your needs?