[erlang-questions] Large scale deployments testing

Zhemzhitsky Sergey

unread,

Sep 21, 2012, 9:29:08 AM9/21/12

to riak-...@lists.basho.com, erlang-questions

Hello gurus,

We’re developing a riak-core application, that does not include any persistence and works in-memory, and are wondering what are the best use cases to test riak-core and erlang itself in large-scale deployments (>100 physical nodes).

For example some of the map-reduce frameworks (like hadoop) have performance tests like terasort, etc., which can show to what extent the whole framework can be scaled.

So could you share some ideas what are the best practices to test large-scale deployments of riak-core and erlang applications? What synthetic tests and benchmarks can be executed to answer the following questions:

1. Does the system scale well?

2. Can the system be considered as linearly scalable?

3. Is the system truly fault-tolerant?

Best Regards,

Sergey

_______________________________________________________

The information contained in this message may be privileged and conf idential and protected from disclosure. If you are not the original intended recipient, you are hereby notified that any review, retransmission, dissemination, or other use of, or taking of any action in reliance upon, this information is prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and delete it from your computer. Thank you for your cooperation. Troika Dialog, Russia.

If you need assistance please contact our Contact Center (+7495) 258 0500 or go to www.troika.ru/eng/Contacts/system.wbp

Joel Meyer

unread,

Sep 21, 2012, 6:00:36 PM9/21/12

to Zhemzhitsky Sergey, riak-...@lists.basho.com, erlang-questions

Hi Sergey,

We (OpenX) have a riak_core based application that's running on a 125 node cluster (there are also other smaller clusters). We never really tested to see where it would fall over (and the cluster was much smaller when it started), but I see no indicators that it will fall over when we add the 126th node. FWIW, it's running riak_core 0.13.0, and I assume the newer versions of riak_core have only gotten better. Answers to some of your other questions (based solely on my experience) in-line below.

On Fri, Sep 21, 2012 at 6:29 AM, Zhemzhitsky Sergey <Sergey_Zh...@troika.ru> wrote:

Hello gurus,

We’re developing a riak-core application, that does not include any persistence and works in-memory, and are wondering what are the best use cases to test riak-core and erlang itself in large-scale deployments (>100 physical nodes).

For example some of the map-reduce frameworks (like hadoop) have performance tests like terasort, etc., which can show to what extent the whole framework can be scaled.

So could you share some ideas what are the best practices to test large-scale deployments of riak-core and erlang applications? What synthetic tests and benchmarks can be executed to answer the following questions:

1. Does the system scale well?

Yes, so far it has scaled well.

2. Can the system be considered as linearly scalable?

Yes, the riak_core portion can be considered linearly scalable. The overall behavior is largely dependent on what you're doing in your vnodes and how well you hash the things you want distributed. In theory, if you hash poorly you can get hot-spots that will prevent linear scalability, but I haven't seen that happen with our workload.

3. Is the system truly fault-tolerant?

For the most part, 'yes', but that again depends on how you implement your vnode. The problems that I've encountered were due to my own inexperience with erlang when implementing my vnode.

In general I've been very happy with riak_core and we're definitely looking at using it more for places where it's the right solution.

Cheers,

Joel

Best Regards,

Sergey

_______________________________________________________

The information contained in this message may be privileged and conf idential and protected from disclosure. If you are not the original intended recipient, you are hereby notified that any review, retransmission, dissemination, or other use of, or taking of any action in reliance upon, this information is prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and delete it from your computer. Thank you for your cooperation. Troika Dialog, Russia.

If you need assistance please contact our Contact Center (+7495) 258 0500 or go to www.troika.ru/eng/Contacts/system.wbp

_______________________________________________
riak-users mailing list
riak-...@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Michael Truog

unread,

Sep 21, 2012, 9:58:01 PM9/21/12

to Joel Meyer, Zhemzhitsky Sergey, riak-...@lists.basho.com, erlang-questions

On 09/21/2012 03:00 PM, Joel Meyer wrote:

Hi Sergey,

We (OpenX) have a riak_core based application that's running on a 125 node cluster (there are also other smaller clusters). We never really tested to see where it would fall over (and the cluster was much smaller when it started), but I see no indicators that it will fall over when we add the 126th node. FWIW, it's running riak_core 0.13.0, and I assume the newer versions of riak_core have only gotten better. Answers to some of your other questions (based solely on my experience) in-line below.

The riak list mentioned optimism about riak clusters of 200+ nodes in a thread here:
http://riak-users.197444.n3.nabble.com/does-Riak-cluster-maintain-fully-connected-Erlang-network-td3695942.html

However, as mentioned in the thread, a fully connected network of nodes (fully connected because of the usage of distributed Erlang) doesï¿½ have a natural limit (due to the network speed) on scalability with the net tick time.ï¿½ You can always increase the net tick time, but then failures will take longer to detect.

So, your success may rely on your fault-tolerance requirements.

Zhemzhitsky Sergey

unread,

Sep 24, 2012, 2:15:24 AM9/24/12

to Michael Truog, Joel Meyer, riak-...@lists.basho.com, erlang-questions

Hi guys,

Thanks for the information and links!

Best Regards,

Sergey Zhemzhitsky

Phone. +7 495 2580500 ext. 1246

From: Michael Truog [mailto:mjt...@gmail.com]
Sent: Saturday, September 22, 2012 5:58 AM
To: Joel Meyer; Zhemzhitsky Sergey
Cc: riak-...@lists.basho.com; erlang-questions
Subject: Re: [erlang-questions] Large scale deployments testing

On 09/21/2012 03:00 PM, Joel Meyer wrote:

Hi Sergey,

We (OpenX) have a riak_core based application that's running on a 125 node cluster (there are also other smaller clusters). We never really tested to see where it would fall over (and the cluster was much smaller when it started), but I see no indicators that it will fall over when we add the 126th node. FWIW, it's running riak_core 0.13.0, and I assume the newer versions of riak_core have only gotten better. Answers to some of your other questions (based solely on my experience) in-line below.

The riak list mentioned optimism about riak clusters of 200+ nodes in a thread here:
http://riak-users.197444.n3.nabble.com/does-Riak-cluster-maintain-fully-connected-Erlang-network-td3695942.html

However, as mentioned in the thread, a fully connected network of nodes (fully connected because of the usage of distributed Erlang) does have a natural limit (due to the network speed) on scalability with the net tick time. You can always increase the net tick time, but then failures will take longer to detect.