--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.
It would be good to persist async tasks and to periodically checkpoint their progress, so that they can be resumed properly from the last checkpoint.
That is likely not something which can be accomplished as a quick fix, however. Are you capable of paying the extra cost of resuming your rebalance from scratch in the short-term? How urgently do you reckon needing a solution for this?
Gentle bump! Any opinions on this?Cheers,Chinmay
On Friday, 11 March 2016 02:06:13 UTC-8, cgu...@apple.com wrote:Hi community,Rebalance of a voldemort cluster as it stands currently, suffers from transient node failures.Lets say we are zone-expanding a voldemort cluster and the partitions have almost completed moving to the new nodes in the new zone, but just before the successful finish, voldemort suffers failure on one of the new nodes. The entire rebalance process fails and it auto-rollbacks to the original cluster state (which is a good thing!).Lets assume the data does not get upserted often, in that case, even though the data migration is almost complete and just needs a final verification and sync, it is going to restart the entire data migration again when we run rebalance. This will have greater impact on big clusters with huge data sizes and thousands of partition moves involved.Has anyone given a thought on making the rebalance process resumable, so that we do not lose the work already done?One possible solution would be Merkle trees based approach (like the one used by Dynamo and Cassandra for out-of-band repair) i.e stealer nodes will request donor nodes for a Merkle tree of the partitions to be streamed, stealer node validates it with the Merkle tree of the existing data and then requests only the missing subset and not the entire partition via fetch and update admin command (since fetch-and-update is getting used internally by rebalance).I am deep diving into the code to gauge the feasibility of such an approach but would like to get an early opinion from the community at large, especially if anyone has already thought about it.Thanks much,Chinmay
--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.
--
--
Félix
It would be good to persist async tasks and to periodically checkpoint their progress, so that they can be resumed properly from the last checkpoint.I am not too familiar with bdb storage engine and how the data is laid on the disk but wouldn't checkpoint lose accuracy if new data starts flowing in during the rebalance pause and cleaner threads start cleaning the data?
Also, is there a WIP branch for the Read-Only fetches I can take a look at?