Is ArangoDB Clustering Production Ready?

917 views
Skip to first unread message

Sébastien Médan

unread,
Mar 31, 2016, 10:49:36 AM3/31/16
to ArangoDB
I have a few questions regarding the choice of ArangoDB to offer persistence in our cloud components.

How well does ADB scale when adding nodes to cope with increasing loads:
  Can the `numberOfShards` attribute be altered on existing collections?
  Can we rebalance (even manually) after adding a node?

Also, these roadmap features are important to us, and I would like to know if you have an estimate on when these features are expected to become available?

- Transaction isolation within a cluster
- Automatic failover
- Master / master replication
- Rebalance

From the FAQ:
> Actually, ArangoDB doesn’t compete with massively distributed systems like Cassandra with thousands of nodes and many terabytes of data.

While we're not aiming at such a massively distributed system, we do need a scalable db infrastructure.
Do you consider your clustering to be production ready? 

Thanks for any insight you might have on the above questions!
-Sébastien

Claudius Weinberger

unread,
Apr 1, 2016, 6:40:51 AM4/1/16
to aran...@googlegroups.com


Sébastien Médan schrieb:

I have a few questions regarding the choice of ArangoDB to offer persistence in our cloud components.

How well does ADB scale when adding nodes to cope with increasing loads:
  Can the `numberOfShards` attribute be altered on existing collections?
Altering the numOfShards is not possible at the moment. It will come later this year. Nevertheless, that will be an expensive operation. But you can choose a numOfShards value easily by factor 100 of your current number of servers so that you can grow your cluster up to this size. You need at least one shard per machine. To take a numOfShards by factor 100 will not have a notable performance implication.

  Can we rebalance (even manually) after adding a node?
At the moment this a manual process with, unfortunately, a downtime of your cluster. 3.0 will do this automatically . With 2.8, you have to dump your collection, add a node and restore the collection. Please keep in mind that the numOfShards could not be changed so that you have good value for that from beginning.


Also, these roadmap features are important to us, and I would like to know if you have an estimate on when these features are expected to become available?

- Transaction isolation within a cluster
Yes, it will come later this year. Hopefully in the first half of the year.

- Automatic failover
This will come with 3.0. ETA is May.


- Master / master replication
This will come with 3.0. ETA is May.

- Rebalance
This will come with 3.0. ETA is May.


From the FAQ:
> Actually, ArangoDB doesn’t compete with massively distributed systems like Cassandra with thousands of nodes and many terabytes of data.

While we're not aiming at such a massively distributed system, we do need a scalable db infrastructure.
Do you consider your clustering to be production ready?
Short answer: yes
We have already happy customers who use an Arango Cluster in production. As you see above, some features for the cluster are missing. 3.0 will bring a lot of improvements and also the following releases. To give you a more detailed answer it would be great to hear more about your use-case. Please contact me at clau...@arangodb.com.

Sébastien Médan

unread,
Apr 1, 2016, 11:48:38 AM4/1/16
to ArangoDB
Thank you, that was very detailed and helpful.

Bart DS

unread,
May 24, 2016, 8:58:50 AM5/24/16
to ArangoDB
Hi,

Is it possible with ArangoDB 3 to set up such a cluster environment (i.e. with automatic failover and rebalancing) without having to rely on mesos?
In other words, how tight is the coupling of ArangoDB 3 and mesos?

The reason I'm asking this is that we are currently considering other orchestration frameworks such as swarm or k8s.

Bart

Wilfried Gösgens

unread,
May 25, 2016, 5:12:31 AM5/25/16
to ArangoDB
Hi Bart,

Let me first distinguish some words:
 - "Autofailover" will redistribute the workload of a failing instance to other previously existing instances ( https://en.wikipedia.org/wiki/Failover )
   This can be done within an existing ArangoDB cluster and without the help of an orchestration framework.
 - "replacement of failing nodes by fresh instances" is more than Autofailover, and requires the control of an outside orcherstration framework.
 - "Autoscaling" means that the whole system can deploy more or less instances during runtime to scale according to the loadpattern.


We have chosen to start out to implement cluster management with the most complete open source framework on the market - mesosphere.

The orchestration of the ArangoDB instances is controlled via the ArangoDB Mesos Framework (available at https://github.com/arangodb/arangodb-mesos-framework) from the DCOS.

In the current implementation state the cluster orchestration framework controls the auto failover management and the replacement of broken nodes with good ones.

The already available ArangoDB 2.8 comes with asynchronous replication, the soon to be released 3.0 will also bring synchronous replication.

With ArangoDB 3.1 we plan to support failover for asynchronous replication without the aid of a cloud orchestration framework.

Autoscaling and replacing of failed nodes by new nodes is and will remain under the control of a cloud orchestration framework.

Once the work on the Mesosphere framework is complete, we plan to replicate these efforts to the other cloud orchestration frameworks.
The architecture of the middleware was done in a modular way with these future enhancements in mind, 3.0 goes a big step into the direction of portabilizing the framework.

Most probably we're going to start out next with porting the framework to Kubernetes. Others are going to follow one by one.
As usual, we're always open for contributions from the community.

In summary, the coupling with mesosphere is different in subsequent versions of ArangoDB and will become less tight with 3.0
However, a certain amount of work is always needed to do the integration with different Cloud orchestrating frameworks.
In 3.0 automatic failover and rebalancing in the above sense is done completely inside of the ArangoDB cluster - as long as you use synchronous replication.
This means it should be relatively straight forward to set up 3.0 with another Cloud orchestrating framework.


Cheers,
Willi

Bart DS

unread,
May 25, 2016, 4:49:17 PM5/25/16
to ArangoDB
Hi Willi,

Thanks for this very detailed explanation.
If I understand correctly, I can set up an ArangoDB 3.0 cluster without any orchestration framework (such as Mesos) ?
When using asynchronous replication there will be no automatic failover or rebalancing (yet), but when using synchronous replication I will have automatic failover and rebalancing out of the box?

So if I have e.g. 5 servers in the cluster and one server goes down, one (or more?) of the other 4 servers will take over the tasks of the failing server and all reads/writes will still succeed?
Is that correct?

Also, If I run 3.0 on another orchestration framework, will I be able to dynamically add/remove servers to/from the cluster already? Or isn't that possible yet?

Can you point me to any documentation on how to set up such a cluster with ArangoDB 3.0?

Bart

m...@arangodb.com

unread,
May 25, 2016, 5:11:37 PM5/25/16
to ArangoDB
Hi Bart,

Max from ArangoDB here. We are putting the finishing touches on the 3.0 release basically as we speak. Good cluster setup documentation is unfortunately one of the things we are still missing, but we will definitely have this in place for the release.

Am Mittwoch, 25. Mai 2016 22:49:17 UTC+2 schrieb Bart DS:
Hi Willi,

Thanks for this very detailed explanation.
If I understand correctly, I can set up an ArangoDB 3.0 cluster without any orchestration framework (such as Mesos) ? 
Yes, the startup process has been simplified a lot for 3.0. You basically have to fire up a bunch of Docker containers (all with the same image) with certain command line options. Everything else organises itself within the ArangoDB cluster. We will publish a blog post shortly after the release to explain how this is done.
When using asynchronous replication there will be no automatic failover or rebalancing (yet), but when using synchronous replication I will have automatic failover and rebalancing out of the box?
Yes, all of this is done within the ArangoDB cluster thanks to our own implementation of the Raft consensus protocol. The only thing it cannot do on its own is restart containers or launch new ones. And automatic failover for the asynchronous replication within the ArangoDB cluster will only land with 3.1.

So if I have e.g. 5 servers in the cluster and one server goes down, one (or more?) of the other 4 servers will take over the tasks of the failing server and all reads/writes will still succeed?
Is that correct?
Exactly so. And due to synchronous replication you will not lose committed and confirmed data. 

Also, If I run 3.0 on another orchestration framework, will I be able to dynamically add/remove servers to/from the cluster already? Or isn't that possible yet?out 
You can simply start new Docker containers to add more coordinators or DBServers. You can simply kill coordinators without losing anything.
You can ask the ArangoDB cluster to clean out a DBserver in a controlled fashion such that its data is relocated to other servers automatically.
Once this has completed, you can simply kill the DBServer by stopping its Docker container.

Can you point me to any documentation on how to set up such a cluster with ArangoDB 3.0?
Unfortunately, this is not yet written. I do have a bash script which launches a cluster locally just using docker containers. I attach the script such that you can see how things will work. This uses the 3.0.0b3 which we are about to publish. Note that in this not all of the above mentioned features work yet.

Cheers,
  Max.

Bart
startLocalDockerCluster.sh

Bart DS

unread,
May 25, 2016, 5:50:13 PM5/25/16
to ArangoDB
Hi Max,

This sounds very promising!
What's the estimated timeframe for the 3.0 release?

Regarding client connections, is the official arangojs client cluster aware?
In other words, is it able to survive server / coordinator / host failures in the cluster by connecting to a different host / coordinator in such situations?

Thanks,

Bart

Max Neunhoeffer

unread,
May 25, 2016, 7:10:31 PM5/25/16
to aran...@googlegroups.com
Hi Bart,
We definitely plan to push out 3.0 early in June.
Your question about arangojs is a good one. I would guess that the answer is "no" at this stage. However, I will contact the author Alan and ask him. This should definitely be added and should not be much trouble. The client code would have to specify two or more endpoints of coordinators when the connection is made initially and from then on the failover can be transparent to the client.

Cheers,
Max
>--
>You received this message because you are subscribed to a topic in the
>Google Groups "ArangoDB" group.
>To unsubscribe from this topic, visit
>https://groups.google.com/d/topic/arangodb/WHedmT2mkoE/unsubscribe.
>To unsubscribe from this group and all its topics, send an email to
>arangodb+u...@googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.

Bart DS

unread,
May 26, 2016, 3:01:38 AM5/26/16
to ArangoDB
Hi Max,

That's great!
Please let me know when you have more information regarding arangojs.

I have one last question for now: Will it be possible to mix different versions of ArangoDB in the same cluster?
This would make it a lot easier to upgrade to a newer version of ArangoDB.

Bart

max.neu...@gmail.com

unread,
May 26, 2016, 2:15:33 PM5/26/16
to aran...@googlegroups.com
Hi Bart,

On Thu, May 26, 2016 at 12:01:38AM -0700, Bart DS wrote:
> Hi Max,
>
> That's great!
> Please let me know when you have more information regarding arangojs.
Will do.
>
> I have one last question for now: Will it be possible to mix different
> versions of ArangoDB in the same cluster?
It is not possible at all to mix any 2.x release with any 3.x release.
For 3.0 one will have to arangodump/arangorestore the data, because we
have changed the internal and on disk data format completely.

Within the 3.0.x releases we will do our best to allow mixing different
versions. We will offer a convenient way to perform a rolling upgrade
without service interruption, at least in the Mesos context. This is now
feasible because of the persistent volumes we use from Mesos and because
of our ability to move shards from machine to machine.

Cheers,
Max.

Bart DS

unread,
May 26, 2016, 4:27:07 PM5/26/16
to ArangoDB, m...@arangodb.com
Hi Max,


On Thursday, May 26, 2016 at 8:15:33 PM UTC+2, Max Neunhöffer wrote:
Hi Bart,

On Thu, May 26, 2016 at 12:01:38AM -0700, Bart DS wrote:
> Hi Max,
>
> That's great!
> Please let me know when you have more information regarding arangojs.
Will do.

Thanks!
 
>
> I have one last question for now: Will it be possible to mix different
> versions of ArangoDB in the same cluster?
It is not possible at all to mix any 2.x release with any 3.x release.
For 3.0 one will have to arangodump/arangorestore the data, because we
have changed the internal and on disk data format completely.

Within the 3.0.x releases we will do our best to allow mixing different
versions. We will offer a convenient way to perform a rolling upgrade
without service interruption, at least in the Mesos context. This is now
feasible because of the persistent volumes we use from Mesos and because
of our ability to move shards from machine to machine.

Ok, fair enough.
I understand 2.x can't be mixed with 3.x, but rolling upgrades as you describe would be really nice.
 

Cheers,
  Max.
> This would make it a lot easier to upgrade to a newer version of ArangoDB.
>
> Bart

regards,

Bart
Reply all
Reply to author
Forward
0 new messages