Test results for NATS server cluster 0.5.0.beta.8 with NATS 0.4.28 as client with heavy workloads

72 views
Skip to first unread message

han li

unread,
Sep 29, 2013, 12:59:50 AM9/29/13
to vcap...@cloudfoundry.org
Hi,all,
        Our lab has designed and implemented a stateless and light-weighted MOM system NATS_Cluster, which is based on existing version NATS(0.4.28) as client and cluster version NATS(0.5.0.beta.8) as server.
       We found that NATS cluster can not only solve the SPOF, but also improve the performance of heavy workloads by holding much more connections than single NATS server.

I the tests, we set up nats cluster with three nodes and four nodes separately. The topologies like belows:

     
      
   Test results:


        The test results of single node nats server 0.4.28 are as follows:
        ID   Environment          Sub/Pub times       time/s(overall time)
        1    Single-Node nats              1                        0.00166894
        2    Single-Node nats              1000                   1.014913508
        3    Single-Node nats              2000                   1.93486609
        4    Single-Node nats              5000                   4.990354504
        5    Single-Node nats              6000                   `add_oneshot_timer': ran out of timers; use #set_max_timers to increase limit (RuntimeError)

        The test results of nats cluster 0.5.0.beta.8 are as follows:
        ID    Environment         Sub/Pub times        time/s(overall time)
        1        3 Servers                      1                       0.006009529
        2        3 Servers                      1000                  5.718525281
        3        3 Servers                       2000                 11.301621944
        4        3 Servers                       5000                 28.653961873
        5        3 Servers                       6000                 34.31817196
        6        3 Servers                       7000                 40.862727237
        7        3 Servers                       8000                 47.568740444
        8        3 Servers                       9000                 53.245474721
        9        3 Servers                       10000               59.83681407
        10      3 Servers                        20000              118.399914877
        11      3 Servers                        50000               307.370709952
        12      3 Servers                       100000              623.333510497
        13      3 Servers                        200000            1236.563246527
        14      4 Servers                        1                      0.005060188
        15      4 Servers                        1000                 5.775566259
        16      4 Servers                        2000                 12.187137876
        17      4 Servers                        5000                 30.722105271
        18      4 Servers                        6000                 38.640032527
        19      4 Servers                        7000                 46.091264959
        20      4 Servers                        8000                 50.221791602
        21      4 Servers                        9000                 55.904621584
        22      4 Servers                        10000               61.55738544
        23      4 Servers                         20000              118.963757949
        24      4 Servers                         50000               311.795263769
       
     We can see NATS cluster can hold much more connections than single node!
  
     BTW, here are the test cases: 
    
  it 'should work when multi clients work' do
    start
= Time.now
    sum
= 0
   
0.upto(5999) { |num|
      temp_start
= Time.now
      EM
.run do
        NATS
.start($single_opts) {  |nc|
          NATS
.subscribe('a') { |msg| msg.should == "a" }
          NATS
.subscribe('exit') { NATS.unsubscribe('a'); NATS.stop {EM.stop} }

          NATS
.publish('a','a')
          NATS
.publish('exit')
       
}
     
end
      temp_stop
= Time.now
      sum
+= temp_stop - temp_start
   
}
    stop
= Time.now
    puts
"sum = #{sum},overall time = #{stop - start}"
 
end
     
We also took robust test ,the test results are as follows:

      ID|   Environment|    Action|      Interval between pub and sub/s|  Client|    Sub|    Pub|   Pub interval/s|      time/s|              introduction
         1      3 Servers       nothing                  10                                        1        100      100             0.1              20.048878163   
         2      3 Servers      add 1 Server           10                                        1        100      100             0.1              20.092233372   
         3      4 Servers      kill 1 Server            10                                        1         100      100            0.1              20.123346696   
         4      4 Servers      kill 2 Servers          10                                        1         100      100            0.1              20.086477384   
         5      4 Servers      kill 3 Servers          10                                        1         100      100            0.1              20.097209607      Administrator  receive email ,only one server left.
         6      4 Servers      kill 4 Servers          10                                        1         100      100            0.1              20.084954242     After rebooting 1 server,system can recover.
         7      4 Servers      kill 4 Servers          10                                        1          100      100            0.1              not limited        procedure is blocked
         8      4 Servers         nothing                10                                        1          100      100             0               20.086735649   
         9      4 Servers      kill 1 Server            10                                        1          100      100             0               20.088677783   
        10     4 Servers      kill 2 Servers          10                                         1          100      100             0               20.047546054   
        11     4 Servers      kill 3 Servers          10                                         1          100      100             0               20.076328895    Administrator  receive email ,only one server left.
        12     4 Servers      kill 4 Servers          10                                         1          100      100             0                not limited        procedure is blocked
       
        The cases for robust test:

  it 'should work when some servers are down' do
    start
= Time.now
    EM
.run do
      NATS
.start($opts)
     
0.upto(99) do
        NATS
.subscribe('a') { |msg| msg.should == "a"}
     
end
      NATS
.subscribe('exit') do
        NATS
.unsubscribe('a')
        NATS
.stop{EM.stop}
     
end
      EM
.add_timer(10) do
        test_timer
= EM.add_periodic_timer(0.1) do
          NATS
.publish('a','a')
       
end
        EM
.add_timer(10) do
          NATS
.publish('exit')
          EM
.cancel_timer(test_timer)
       
end
     
end

      puts
"The END of EM"
   
end
    stop
= Time.now
    puts
"over all time = #{stop-start}"
 
end

---

Li Han

Zhejiang University

ZJU-SEL

http://github.com/ZJU-SEL 

Matt Reider

unread,
Sep 29, 2013, 11:06:52 AM9/29/13
to vcap...@cloudfoundry.org, vcap...@cloudfoundry.org
Han,

Is this in a public repository?
To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

James Bayer

unread,
Sep 29, 2013, 6:28:18 PM9/29/13
to vcap...@cloudfoundry.org
Li Han,

Thank you so much for sharing these NATS cluster results. We have been very curious about the clustered branch of NATS and certainly are eager to remove the SPOF and increase the scalability of NATS.

I've shared the results with our team and I'll let you know if we have any questions.


To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.



--
Thank you,

James Bayer

Derek Collison

unread,
Sep 29, 2013, 9:10:25 PM9/29/13
to vcap...@cloudfoundry.org
Li/James,

I would be happy to give you access to the high performance Go gnatsd which is clustered already. I will OSS it soon, but if you want to test against that please let me know. The Ruby server peaks at about 150k msgs/sec. The Go version, gnatsd, is around 5-6M msgs/sec..

Cheers,
=derek

Dr Nic Williams

unread,
Sep 29, 2013, 9:35:55 PM9/29/13
to vcap...@cloudfoundry.org, vcap...@cloudfoundry.org
You're a beautiful man.

Han Li

unread,
Sep 29, 2013, 11:01:20 PM9/29/13
to vcap...@cloudfoundry.org
Hi,Matt,
    We set up this NATS_Cluster by using some other repositories.We don't have so much code actually.The architecture is:
1.Use HAProxy as TCP reverse proxy server of nats servers.
2.Add a module to monitor nats servers.
3.Use Redis to save the latest IP and Port information of nats server in cluster.

Han Li

unread,
Sep 29, 2013, 11:07:10 PM9/29/13
to vcap...@cloudfoundry.org
Glad to exchange ideas about it.

Han Li

unread,
Sep 29, 2013, 11:08:49 PM9/29/13
to vcap...@cloudfoundry.org
I'll follow it some time.

Han Li

unread,
Sep 29, 2013, 11:09:11 PM9/29/13
to vcap...@cloudfoundry.org
A girl actually.

Dr Nic Williams

unread,
Sep 29, 2013, 11:12:44 PM9/29/13
to vcap...@cloudfoundry.org, vcap...@cloudfoundry.org
Sorry, Han. I was referring to Derek (who I personally know) for making the new goNATS available.

I am sorry I didn't comment on your post itself yet. I was keen to see if the core team picked it up and were motivated by it.
Reply all
Reply to author
Forward
0 new messages