I am looking for a little clartity regarding rollup node redundancy. We are trying to expand our deployment very soon.
I guess what I am asking is: is the redundancy at the shard level, or is it at the node level?
One configuration scenario that was described to informally was having 4 pairs of rollup nodes, each pair operating on a set of shards.
Which sounds like one node of the pair is rolling up, and the other node in the pair is on standby.
When the active node goes down, the standby node is activated and continues rolling up the set of shards.
This seems to imply that a rollup node is either entirely on standby, or entirely active.
But the documentation I have been able to find on this is rather ambiguous in this area:
Each rollup node is responsible for managing one or more 'shards.' It is possible (recommended!) to configure your Blueflood cluster in such a way that multiple rollup nodes are responsible for the same shards. If a rollup node goes down, another rollup node will pick up the shards assigned to the downed node and roll up the metrics in those shards.
Zookeeper is used by nodes to claim active 'ownership' of a particular shard so that multiple nodes aren't rolling up the same data.
This suggests (but does not confirm) that the redunancy might happen in a completely different way. That is, on a shard-by-shard basis - the active node of each shard is determined independently of the other shards.
In other words, if a node goes down, it's shards could be taken over by other nodes that are already actively rolling up other shards. A rollup node can be rolling up shards, and if another node goes down, it would be rolling up MORE shards.
Or maybe this is a distinction without a difference. What is possible might be ruled out by best practices.
Can anyone clear this up for me?
Could I configure Blueflood like this:
- node A, shards 0-63
- node B, shards 32-95
- node C, shards 64-127
- node D, shards 96-127 and 0-31
In that configuration, each shard is owned by 2 nodes. If a node fails, another node is available to take over.
If this is a possible configuration, would it be possible for all 4 nodes to be rolling up some number of shards?