Hi Bart,
Let me first distinguish some words:
- "Autofailover" will redistribute the workload of a failing instance to other previously existing instances (
https://en.wikipedia.org/wiki/Failover )
This can be done within an existing ArangoDB cluster and without the help of an orchestration framework.
- "replacement of failing nodes by fresh instances" is more than Autofailover, and requires the control of an outside orcherstration framework.
- "Autoscaling" means that the whole system can deploy more or less instances during runtime to scale according to the loadpattern.
We have chosen to start out to implement cluster management with the most complete open source framework on the market - mesosphere.
The orchestration of the ArangoDB instances is controlled via the ArangoDB Mesos Framework (available at
https://github.com/arangodb/arangodb-mesos-framework) from the DCOS.
In the current implementation state the cluster orchestration framework controls the auto failover management and the replacement of broken nodes with good ones.
The already available ArangoDB 2.8 comes with asynchronous replication, the soon to be released 3.0 will also bring synchronous replication.
With ArangoDB 3.1 we plan to support failover for asynchronous replication without the aid of a cloud orchestration framework.
Autoscaling and replacing of failed nodes by new nodes is and will remain under the control of a cloud orchestration framework.
Once the work on the Mesosphere framework is complete, we plan to replicate these efforts to the other cloud orchestration frameworks.
The architecture of the middleware was done in a modular way with these future enhancements in mind, 3.0 goes a big step into the direction of portabilizing the framework.
Most probably we're going to start out next with porting the framework to Kubernetes. Others are going to follow one by one.
As usual, we're always open for contributions from the community.
In summary, the coupling with mesosphere is different in subsequent versions of ArangoDB and will become less tight with 3.0
However, a certain amount of work is always needed to do the integration with different Cloud orchestrating frameworks.
In 3.0 automatic failover and rebalancing in the above sense is done completely inside of the ArangoDB cluster - as long as you use synchronous replication.
This means it should be relatively straight forward to set up 3.0 with another Cloud orchestrating framework.
Cheers,
Willi