“The information contained in this message is intended for the addressee only and may contain classified information. If you are not the addressee, please delete this message and notify the sender; you should not copy or distribute this message or disclose its contents to anyone. Any views or opinions expressed in this message are those of the individual(s) and not necessarily of the Docquity. No reliance may be placed on this message without written confirmation from an authorised representative of its contents.”
--
You received this message because you are subscribed to the Google Groups "Tinode General" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tinode+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/d7363fdf-d316-4428-acb0-3b19c18ae158n%40googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "Tinode General" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tinode/9ZfVGdlOgfU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tinode+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/CA%2BH57kY4zv-vnkqdbS9Db%2B5QUXN5KAV%2B0YYKjKAXp1%2B358dozA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/064c7788-8e8b-4467-b17c-ee6f432582aen%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/f904b4eb-bb0b-4b75-ae8b-2e63bea7e838n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/CA%2BH57kaedPDSMnY7Whs0WJzj4-kvRkXYp-1LQPkNebuS35YDMg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/CAMVbMwmUZSv6MME0yG4CDE%2BteJ%2BAZOB96Kt9G_CmomTRyT%3DfWw%40mail.gmail.com.
Hi, thanks for the prompt response.
This was one of the first suspicions I had as well, but we are not reaching anywhere close to the networking limits available.
I had the single node and the cluster deployed on an EKS cluster with prometheus storing the metrics. On the cluster networking dashboard, I don’t see any packets dropped, and the total bandwidth is as follows:
Single Node load test
Max Receive Bandwidth: 17.67 MB/s
Max Transmit Bandwidth: 19.72 MB/s
3 Node Cluster load test
Max Receive Bandwidth: 29.88 MB/s
Max Transmit Bandwidth: 31.25 MB/s
NOTE: These numbers somewhat match the ~2x networking overhead that you mentioned.
Running `ethtool` command on the EC2 commands didn’t show any non-zero values for bw_in_allowance_exceeded and bw_out_allowance_exceeded, so it checks out with the Prometheus stats.
DB machine is a c5.xlarge EC2 instance with a docker container running on it. Following are numbers from EC2 metrics page:
Peak average CPUUtilization: 6.8%
Peak max CPUUtilization: 9.43%
Tinode max single node CPU usage: 26%
Tinode max 3 node cluster CPU usages: 19%, 15%, 17%
Attaching a log file from all 3 nodes during the load test.
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/CA%2BH57kZAiXmLD2dzKe-tFfcsuCh04uTp9pQHEds-5HJ8%2B5myVw%40mail.gmail.com.
E2022/07/19 19:36:15 hub: topic's broadcast queue is full grpd4nN5Al4tHw
E2022/07/19 19:36:16 s.publish: sub.broadcast channel full, topic grpd4nN5Al4tHw e5rkwyN3aJ0
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/CAMVbMw%3DNFvpUgrHb7PZV-VVttRhqk6sBcb_Mn57nxwy6%3DKHBaA%40mail.gmail.com.
The first typical candidate for it would be your database. Can you check if you are maxing it out (cpu-wise, network-wise, or in any other way) or if there are any errors in your db logs?
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/CA%2BH57kbxzHr%3DiAY7%2Bfnbnx7GZRQdsCCCRPKJD1S56k5ZmVFF9Q%40mail.gmail.com.
I understand that the errors occur due to the broadcast queue and channel being full (which can be seen as the 500 errors in the gatling results). Another type of issue we noticed were the timeout errors in Gatling (which was set to 60s). That means the server didn't reply at all or took too long to reply. Can that be due to the goroutines being 'stuck'?
The first typical candidate for it would be your database. Can you check if you are maxing it out (cpu-wise, network-wise, or in any other way) or if there are any errors in your db logs?
CPU and Network wise we are not working the DB machine at all. I have included MongoDB logs in the attached zip file.Also attaching the tinode logs after sending SIGABRT to the tinode servers during and after the load tests.
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/CA%2BH57kbxzHr%3DiAY7%2Bfnbnx7GZRQdsCCCRPKJD1S56k5ZmVFF9Q%40mail.gmail.com.
--Thanks and Regards,Aditya Chandak
Min | 50th pct | 75th pct | 95th pct | 99th pct | Max | Mean | Std Dev |
---|
1 | 3 | 4 | 9 | 15 | 51 | 4 | 4 |
Min | 50th pct | 75th pct | 95th pct | 99th pct | Max | Mean | Std Dev |
---|
5 | 7 | 9 | 19 | 56 | 91 | 10 | 9 |
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/CAMVbMw%3DPHB1m5LdN0uXxfTcMTKN1XbNkbNBZ0wL0GsVJnmPcyA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/CA%2BH57kaiFNe4dtyfKe6vmw48kA2Ho1QNUhwhCjin37mn2pDTtw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tinode/CAMVbMwnetPo9EsUuFKYRqGy1we%2BTB7hMQM%2B4ic7oBoxeVgDfkA%40mail.gmail.com.