Hi,all,
Our lab has designed and implemented a stateless and light-weighted MOM system NATS_Cluster, which is based on existing version NATS(0.4.28) as client and cluster version NATS(0.5.0.beta.8) as server.
We found that NATS cluster can not only solve the SPOF, but also improve the performance of heavy workloads by holding much more connections than single NATS server.
I the tests, we set up nats cluster with three nodes and four nodes separately. The topologies like belows:
Test results: The test results of single node nats server 0.4.28 are as follows: ID Environment Sub/Pub times time/s(overall time) 1 Single-Node nats 1 0.00166894
2 Single-Node nats 1000 1.014913508
3 Single-Node nats 2000 1.93486609
4 Single-Node nats 5000 4.990354504
5 Single-Node nats 6000 `add_oneshot_timer': ran out of timers; use #set_max_timers to increase limit (RuntimeError)
The test results of nats cluster 0.5.0.beta.8 are as follows: ID Environment Sub/Pub times time/s(overall time) 1 3 Servers 1 0.006009529
2 3 Servers 1000 5.718525281
3 3 Servers 2000 11.301621944
4 3 Servers 5000 28.653961873
5 3 Servers 6000 34.31817196
6 3 Servers 7000 40.862727237
7 3 Servers 8000 47.568740444
8 3 Servers 9000 53.245474721
9 3 Servers 10000 59.83681407
10 3 Servers 20000 118.399914877
11 3 Servers 50000 307.370709952
12 3 Servers 100000 623.333510497
13 3 Servers 200000 1236.563246527
14 4 Servers 1 0.005060188
15 4 Servers 1000 5.775566259
16 4 Servers 2000 12.187137876
17 4 Servers 5000 30.722105271
18 4 Servers 6000 38.640032527
19 4 Servers 7000 46.091264959
20 4 Servers 8000 50.221791602
21 4 Servers 9000 55.904621584
22 4 Servers 10000 61.55738544
23 4 Servers 20000 118.963757949
24 4 Servers 50000 311.795263769
We can see NATS cluster can hold much more connections than single node! BTW, here are the test cases:
it 'should work when multi clients work' do
start = Time.now
sum = 0
0.upto(5999) { |num|
temp_start = Time.now
EM.run do
NATS.start($single_opts) { |nc|
NATS.subscribe('a') { |msg| msg.should == "a" }
NATS.subscribe('exit') { NATS.unsubscribe('a'); NATS.stop {EM.stop} }
NATS.publish('a','a')
NATS.publish('exit')
}
end
temp_stop = Time.now
sum += temp_stop - temp_start
}
stop = Time.now
puts "sum = #{sum},overall time = #{stop - start}"
end
We also took robust test ,the test results are as follows: ID| Environment| Action| Interval between pub and sub/s| Client| Sub| Pub| Pub interval/s| time/s| introduction 1 3 Servers nothing 10 1 100 100 0.1 20.048878163
2 3 Servers add 1 Server 10 1 100 100 0.1 20.092233372
3 4 Servers kill 1 Server 10 1 100 100 0.1 20.123346696
4 4 Servers kill 2 Servers 10 1 100 100 0.1 20.086477384
5 4 Servers kill 3 Servers 10 1 100 100 0.1 20.097209607 Administrator receive email ,only one server left.
6 4 Servers kill 4 Servers 10 1 100 100 0.1 20.084954242 After rebooting 1 server,system can recover.
7 4 Servers kill 4 Servers 10 1 100 100 0.1 not limited procedure is blocked
8 4 Servers nothing 10 1 100 100 0 20.086735649
9 4 Servers kill 1 Server 10 1 100 100 0 20.088677783
10 4 Servers kill 2 Servers 10 1 100 100 0 20.047546054
11 4 Servers kill 3 Servers 10 1 100 100 0 20.076328895 Administrator receive email ,only one server left.
12 4 Servers kill 4 Servers 10 1 100 100 0 not limited procedure is blocked
The cases for robust test:
it 'should work when some servers are down' do
start = Time.now
EM.run do
NATS.start($opts)
0.upto(99) do
NATS.subscribe('a') { |msg| msg.should == "a"}
end
NATS.subscribe('exit') do
NATS.unsubscribe('a')
NATS.stop{EM.stop}
end
EM.add_timer(10) do
test_timer = EM.add_periodic_timer(0.1) do
NATS.publish('a','a')
end
EM.add_timer(10) do
NATS.publish('exit')
EM.cancel_timer(test_timer)
end
end
puts "The END of EM"
end
stop = Time.now
puts "over all time = #{stop-start}"
end
---
Li Han
Zhejiang University
ZJU-SEL
http://github.com/ZJU-SEL