Unable to obtain any performance.

71 views
Skip to first unread message

Werner Grift

unread,
Nov 2, 2015, 9:15:31 AM11/2/15
to CMB User Forum
Hi.

I am busy with a POC that makes use of Amazon's C# API obtained via nuget.

It seems no matter how much hardware i throw at it, how many threads I start, how many queues per thread, how many redis shards, how many cassandra nodes or how many cmb instances fronted by nginx load balancing I never seem to get more than 200 msg/s.

I am at a total loss as to what I am doing wrong. 

My current setup. 

5 linux mahines (6 CPU per box, 4GB ram per box)  Total virtualized hardware =  4 x 3Ghz ZEON CPUs on 2 ESXi boxes, 1 x i7 3.4Ghz CPU VirtualBox on my dev PC
3 cassandra nodes
5 redis "shards"
5 cmb instances
1 nginx

200 msgs/s

My tests consists of starting 6 threads each with it's own Q and each with one producer and consumer. I aggregate total messages subscribed to as my msgs/s.

One of my 3 cassandra nodes shows high load and htop shows lots of 80%+ RED cpu usage where the others show mostly 50% green usage.

What am I doing wrong?


Capture.PNG
Capture2.PNG
Capture3.PNG

boriwo

unread,
Nov 2, 2015, 5:26:46 PM11/2/15
to CMB User Forum
What you describe sounds indeed unusual and more like you have a bottle neck somewhere in your setup. In our own testing we have scaled to thousands of messages per second. Have you checked logs of CMB, Redis, Cassandra, nginx for any errors?

If I understand correctly you are testing CNS (the pub, sub component) and not CQS. Have you tried benchmarking CQS performance alone with all CNS producers/consumers turned off in cmb.properties? If you know how many messages you can pump through a single CQS queue it could give you a good baseline for what's achievable in CNS (as CNS uses CQS internally).

Also, in order for me to reproduce your test I would need to know a few more details: What's your message size? How many topics do you have and how many subscribers per topic? What subscribers do you use, HTTP endpoints or CQS queues? Also I'm not sure I understand where you measure 200 msg/sec: Is that on a single endpoint subscribed to a topic, or all subscribers combined?

Looking at your screenshots I notice a number of API errors (ReceiveMessage and GetQueueUrl). Have you combed through your cmb log files to see if there are any errors or exceptions on any of your cmb nodes?

Further it looks like one of your nodes has delayed heart beat and timestamp and may therefore be dead (10.0.0.150). The 127.0.1.1 IP also looks somewhat out of place. If you like you can send me a copy of your cmb.properties for review.

Finally on the dashboard screen there's a "stats" link you may be able to use to figure out where most of your latency is spend (CMB, Redis or Cassandra).

Best regards,
Boris

Werner Grift

unread,
Nov 3, 2015, 2:24:59 AM11/3/15
to CMB User Forum
Correct, I am testing CNS pub sub as I assumed I would get more performance given the supposed transient nature of CNS. This has not been the case, neither has the transiency. I struggle to understand how CNS could be made to work by using CQS anyway, unless CQS has a non-persistent flavor which would bring into question the entire existence of CNS? Anyways. I think I am using cqs endpoints. UseHttp = false; by default and :

var sr = queue.snsClient.Subscribe(new SubscribeRequest
{
Endpoint = qa.QueueARN,
                        Protocol = "cqs",
                        TopicArn = queue.topicArn
                    });


My Message size is small, 120 chars or so, with overall bandwidth in and out of my dev box sitting at just under 15Mb/s. The latency stat is the difference between the time serialized on publish and deserialized on subscribe.

I have split up the Queue subscribers how ever I want. 3 queues 1 pub, 1 sub per queue. or 4 queues 2 pub, 2 sub per queue etc... any combination of those make absolutely no difference. Well it does, the higher those numbers the less combined (aggregated over all queues and subs) msgs/s I usually get. 

I get the occasional error in CMB's logs, none elsewhere. 

Node 10.0.0.150 does seem to have some issues. I cannot for example install the OPS center agent on it. It just won't take. Node 10.0.0.142 displays strange resource usages characteristics on Linux. Lots of "system cpu" time. I will have to investigate that further. OPS center also reports that node 142 is under load sometimes. I set the replication factor on the keyspaces to 3. Combined with cmb.cassandra.readConsistencyLevel=QUORUM and cmb.cassandra.writeConsistencyLevel=QUORUM there might also be my issue. Not entirely sure what those must be.

The 127.0.1.1 node is an interesting case. I have a nginx load balancer on that box (10.0.0.139), which means that the cmb instance's ports were moved to 6060 and 6062. For some reason that jams the IP to 127.0.1.1 and the node generally fails to show up like the rest do. I load balance all cmb instances and all cmb's connections to the 3 cassandra nodes. The only affect of this had was lower message latency.

I have attached some stats. 50/50 width running and not running. 

I will do some more digging, there are lots of things I can still try to tweak.

Thanks for the great feedback. You have given me a few ideas to work on.

stats1.PNG
stats2.PNG
stats3.PNG

boriwo

unread,
Nov 4, 2015, 9:46:50 PM11/4/15
to CMB User Forum
CNS and CQS are two very different APIs: While CNS does fanout / pubsub, CQS does queues so the two don't really compare at all.

As for the CNS implementation: When you call Publish(), CNS puts the message on an internal CQS queue. Then a publish worker picks the message up from that queue, performs the actual publishing of the message to the endpoint(s) and finally deletes the message from the queue. So while CNS is transient as you point out, messages get temporarily stored in a CQS queue - primarily for load distribution and resiliency. If you have many subscribers the work is internally chopped up into several sub-tasks which are queued separately so multiple workers can do the fanout in parallel. An added benefit of this design is that we don't loose any messages in case a worker dies (the queue will ensure they reappear and get handled by another worker).

Bottom line, when you make a CNS Publish call, there will also be several internal calls to CQS APIs and then finally the call to CQS SendMessage to your CQS subscribers. That's why I'm suggesting that while trouble shooting your setup it may make sense to simplify things and just run some basic CQS throughput tests on a CQS queue and then, once that's running smoothly, turn CNS back on and run your CNS tests. Also, when testing CNS I'd suggest comparing performance of CQS endpoints vs HTTP endpoints.

Your stats screenshots look fine except for the Cassandra percentiles: 50% have a latency near 0 and the other 50% > 9ms - that sounds a bit odd (are all your nodes hosted in the same data center?). What are the CMB log errors you see, do you have any examples?


Werner Grift

unread,
Nov 12, 2015, 3:12:50 AM11/12/15
to CMB User Forum
Turns out I was running my consumers flat out, instead of say 100 timer per second. 

This was causing huge overhead because of many receives yielding no messages.

boriwo

unread,
Nov 12, 2015, 1:51:22 PM11/12/15
to CMB User Forum
Good to hear you figured it out. Also note that if your consumers are reading from CQS queues, you do not have to do frequent polling reads but you can rather use a long-polling option using the WaitTimeSeconds parameter when calling ReceiveMessage. Also you can read batches of up to 10 messages at a time. These options may give you even better performance depending on your test.

Reply all
Reply to author
Forward
0 new messages