Test results for NATS server cluster 0.5.0.beta.8 with NATS 0.4.28 as client with heavy workloads

han li

unread,

Sep 29, 2013, 12:59:50 AM9/29/13

to vcap...@cloudfoundry.org

Hi,all,
        Our lab has designed and implemented a stateless and light-weighted MOM system NATS_Cluster, which is based on existing version NATS(0.4.28) as client and cluster version NATS(0.5.0.beta.8) as server.
       We found that NATS cluster can not only solve the SPOF, but also improve the performance of heavy workloads by holding much more connections than single NATS server.

I the tests, we set up nats cluster with three nodes and four nodes separately. The topologies like belows:

   Test results:

        The test results of single node nats server 0.4.28 are as follows:
        ID   Environment          Sub/Pub times time/s(overall time)
        1 Single-Node nats              1                        0.00166894
        2 Single-Node nats      1000                   1.014913508
        3 Single-Node nats              2000                   1.93486609
        4    Single-Node nats      5000                   4.990354504
        5 Single-Node nats 6000       `add_oneshot_timer': ran out of timers; use #set_max_timers to increase limit (RuntimeError)

        The test results of nats cluster 0.5.0.beta.8 are as follows:
        ID    Environment         Sub/Pub times      time/s(overall time)
        1        3 Servers                      1                       0.006009529
        2        3 Servers                      1000                  5.718525281
        3        3 Servers                       2000                 11.301621944
        4        3 Servers                       5000                 28.653961873
        5 3 Servers   6000                 34.31817196
        6      3 Servers                       7000                 40.862727237
        7      3 Servers                       8000                 47.568740444
        8        3 Servers   9000   53.245474721
        9        3 Servers                       10000   59.83681407
        10      3 Servers                        20000              118.399914877
        11      3 Servers      50000   307.370709952
        12      3 Servers   100000              623.333510497
        13      3 Servers                        200000            1236.563246527
        14      4 Servers      1                      0.005060188
        15      4 Servers      1000   5.775566259
        16      4 Servers      2000                 12.187137876
        17      4 Servers      5000     30.722105271
        18      4 Servers      6000                 38.640032527
        19      4 Servers                        7000                 46.091264959
        20      4 Servers                        8000   50.221791602
        21 4 Servers      9000   55.904621584
        22      4 Servers                        10000 61.55738544
        23 4 Servers                         20000      118.963757949
        24      4 Servers   50000               311.795263769

     We can see NATS cluster can hold much more connections than single node!

     BTW, here are the test cases:

  it 'should work when multi clients work' do
    start = Time.now
    sum = 0
    0.upto(5999) { |num|
      temp_start = Time.now
      EM.run do
        NATS.start($single_opts) {  |nc|
          NATS.subscribe('a') { |msg| msg.should == "a" }
          NATS.subscribe('exit') { NATS.unsubscribe('a'); NATS.stop {EM.stop} }

          NATS.publish('a','a')
          NATS.publish('exit')
        }
      end
      temp_stop = Time.now
      sum += temp_stop - temp_start
    }
    stop = Time.now
    puts "sum = #{sum},overall time = #{stop - start}"
  end

We also took robust test ,the test results are as follows:

ID|   Environment|    Action|    Interval between pub and sub/s| Client|    Sub|    Pub| Pub interval/s|    time/s|          introduction
         1      3 Servers   nothing                  10                                        1        100      100             0.1          20.048878163
         2      3 Servers      add 1 Server           10      1        100      100             0.1      20.092233372
         3      4 Servers      kill 1 Server      10                                        1 100      100      0.1      20.123346696
         4      4 Servers      kill 2 Servers          10                                        1   100      100 0.1      20.086477384
         5      4 Servers kill 3 Servers    10     1         100      100            0.1              20.097209607      Administrator receive email ,only one server left.
         6      4 Servers      kill 4 Servers          10      1    100      100      0.1          20.084954242     After rebooting 1 server，system can recover.
         7      4 Servers      kill 4 Servers          10      1 100      100      0.1              not limited       procedure is blocked
         8      4 Servers   nothing      10      1      100      100   0               20.086735649
         9      4 Servers kill 1 Server            10      1 100      100             0               20.088677783
        10   4 Servers      kill 2 Servers      10   1      100      100             0           20.047546054
        11   4 Servers      kill 3 Servers          10                                         1      100      100             0   20.076328895    Administrator receive email ,only one server left.
        12    4 Servers      kill 4 Servers      10   1      100      100             0    not limited    procedure is blocked

        The cases for robust test:

it 'should work when some servers are down' do start = Time.now EM.run do NATS.start($opts) 0.upto(99) do NATS.subscribe('a') { |msg| msg.should == "a"} end NATS.subscribe('exit') do NATS.unsubscribe('a') NATS.stop{EM.stop} end EM.add_timer(10) do test_timer = EM.add_periodic_timer(0.1) do NATS.publish('a','a') end EM.add_timer(10) do NATS.publish('exit') EM.cancel_timer(test_timer) end end puts "The END of EM" end stop = Time.now puts "over all time = #{stop-start}" end

---

Li Han

Zhejiang University

ZJU-SEL

http://github.com/ZJU-SEL

Matt Reider

unread,

Sep 29, 2013, 11:06:52 AM9/29/13

to vcap...@cloudfoundry.org, vcap...@cloudfoundry.org

Han,

Is this in a public repository?

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

James Bayer

unread,

Sep 29, 2013, 6:28:18 PM9/29/13

to vcap...@cloudfoundry.org

Li Han,

Thank you so much for sharing these NATS cluster results. We have been very curious about the clustered branch of NATS and certainly are eager to remove the SPOF and increase the scalability of NATS.

I've shared the results with our team and I'll let you know if we have any questions.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

--

Thank you,

James Bayer

Derek Collison

unread,

Sep 29, 2013, 9:10:25 PM9/29/13

to vcap...@cloudfoundry.org

Li/James,

I would be happy to give you access to the high performance Go gnatsd which is clustered already. I will OSS it soon, but if you want to test against that please let me know. The Ruby server peaks at about 150k msgs/sec. The Go version, gnatsd, is around 5-6M msgs/sec..

Cheers,

=derek

Dr Nic Williams

unread,

Sep 29, 2013, 9:35:55 PM9/29/13

to vcap...@cloudfoundry.org, vcap...@cloudfoundry.org

You're a beautiful man.

Han Li

unread,

Sep 29, 2013, 11:01:20 PM9/29/13

to vcap...@cloudfoundry.org

Hi,Matt,
We set up this NATS_Cluster by using some other repositories.We don't have so much code actually.The architecture is:
1.Use HAProxy as TCP reverse proxy server of nats servers.
2.Add a module to monitor nats servers.
3.Use Redis to save the latest IP and Port information of nats server in cluster.

Han Li

unread,

Sep 29, 2013, 11:07:10 PM9/29/13

to vcap...@cloudfoundry.org

Glad to exchange ideas about it.

Han Li

unread,

Sep 29, 2013, 11:08:49 PM9/29/13

to vcap...@cloudfoundry.org

I'll follow it some time.

Han Li

unread,

Sep 29, 2013, 11:09:11 PM9/29/13

to vcap...@cloudfoundry.org

A girl actually.

Dr Nic Williams

unread,

Sep 29, 2013, 11:12:44 PM9/29/13

to vcap...@cloudfoundry.org, vcap...@cloudfoundry.org

Sorry, Han. I was referring to Derek (who I personally know) for making the new goNATS available.

I am sorry I didn't comment on your post itself yet. I was keen to see if the core team picked it up and were motivated by it.

Reply all

Reply to author

Forward