High latency of the network performance between nodes

Shi Chen

unread,

Apr 5, 2025, 1:11:59 AM4/5/25

to cloudlab-users

Hi, CloudLab team,

Recently I'm doing some experiments on r650 clusters. But I find that the latency between two nodes is very high, when testing gRPC TCP, and is horrible. Serveral month before I have tested once, and the latency is in regular range. What's wrong with the CloudLab network? Could you please fix this problem? It's really important to my experiments. Thanks! 微信图片_20250405131128.png

Mike Hibler

unread,

Apr 5, 2025, 7:22:33 PM4/5/25

to cloudla...@googlegroups.com

You should include the URL of the status page of an active experiment when
you have one, so I don't have to spend time figuring out what experiment it is.

In this case I did find it. I am going to assume you did your due diligence
in following best practices for configuring gRPC and that the previous case
(where performance was normal) was identical in configuration (though you
likely were not assigned the exact same nodes).

Both nodes are connected to the same Dell Z9432F switch via 400Gb -> 4 x 100Gb
breakout cables. The only unusual thing I see on the switch ports is that the
port for the sender has a lot of cases of "FEC corrected". But I don't know
if those are the result of your experiment or previous uses. Can you run it
again a few times (node2 to node3) and I will see if the counters go up?

Have you tried this on the Cloudlab Utah d6515 or c6525-100g nodes? Have you
seen this problem with < 100Gb interfaces (10Gb or 25Gb)?

> really important to my experiments. Thanks!微信图片_20250405131128.png
>
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/
> 42af152b-4780-4b39-bd30-44bc3e3e3db9n%40googlegroups.com.

Shi Chen

unread,

Apr 9, 2025, 1:35:17 PM4/9/25

to cloudlab-users

Hello, I'm sorry for receiving this message so late. I have already done some experiments based on sm110p, r650, c6525-100g, and r6525, and I found that only on r6525, the results are in normal range. Here are my experiment URLs (history URL):

r650:

https://www.cloudlab.us/memlane.php?uuid=64a94718-0c64-11f0-af1a-e4434b2381fc

https://www.cloudlab.us/memlane.php?uuid=9ecbe452-11cc-11f0-af1a-e4434b2381fc

c6525-100g:

https://www.cloudlab.us/memlane.php?uuid=cdfba1f0-11dd-11f0-af1a-e4434b2381fc

r6525:

https://www.cloudlab.us/memlane.php?uuid=a4ec1c61-12a2-11f0-af1a-e4434b2381fc

The results (Avg.latency) on r6525 is nearly half of results on r650 and c6525-100g.

And I have not tried <100 Gbps NIC devices yet, I will check if there is available devices later. Thanks for replying!

Yuanzhuo Yang

unread,

Apr 9, 2025, 2:45:06 PM4/9/25

to cloudla...@googlegroups.com

Your avg rpc latency looks quite high(I assume the unit of latency is microsec), I try to use fortio to test the rpc ping latency on 2 r650 nodes, the avg latency I get is 91 µs, 99-ile latency is 990 µs. The commands I run are

fortio load -grpc -c 1 -s 1 -qps 0 -t 30s -payload-size 4096 node-2:8079 (client side) and

fortio grpcping -n 5 localhost (server side).

To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/9b8678a8-0000-4fa5-8d2e-3f9e9d7f1d86n%40googlegroups.com.

Shi Chen

unread,

Apr 9, 2025, 3:36:34 PM4/9/25

to cloudlab-users

Here is my new test result on 2 * c8220 devices:

client thread = 1, io queue depth = 1, which means is sync. And each request's latency is shown in order (unit: ns, and the unit of avg and p90, p99, p999 is us). Attachment size is 4096 bytes, and the server side will reply the request immediately without doing any other operation. Theoretically the code works, because we have already tested the code on local laptop (loopback) and 2 server (connects to each other directly without a switch).

This experiment's URL is: https://www.cloudlab.us/status.php?uuid=e961c670-156a-11f0-af1a-e4434b2381fc. Maybe you could check what happened when doing the test.

Thanks for helping!
Screenshot 2025-04-10 032336.png

Yuanzhuo Yang

unread,

Apr 9, 2025, 3:54:38 PM4/9/25

to cloudla...@googlegroups.com

Sorry about this, I cannot really check your experiment or code because I am also just a user. However, I noticed that you are using 10.10.1.2 as the target IP. You should use the cluster’s internal IP (e.g., 127.0.0.1) so that the traffic is routed through the correct NIC.

To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/b02eff02-9319-4c4e-ae84-44ebf2d30cf5n%40googlegroups.com.

Shi Chen

unread,

Apr 9, 2025, 3:58:26 PM4/9/25

to cloudlab-users

I tried another tool called ghz to test gRPC performance. Server side starts up gRPC's example greeter_server (sync), and client side uses commands as below:

./ghz --insecure --proto=./helloworld.proto --call=helloworld.Greeter.SayHello
--data='{"name":"Joe"}' --concurrency=10 --total=2000 10.10.1.2:50051

And I got result:

Average latency is 'ms' level.

在2025年4月10日星期四 UTC+8 02:45:06<yuanzh...@gmail.com> 写道：

Shi Chen

unread,

Apr 9, 2025, 4:04:04 PM4/9/25

to cloudlab-users

Oh sorry, I thought you were the CloudLab Admin. But why I should use 127.0.0.1? I have two nodes, and server side's IP (RDMA NIC interface's LAN IP) is 10.10.1.2, I think I should use this IP to send gRPC requests through this interface from client node.

Yuanzhuo Yang

unread,

Apr 9, 2025, 4:20:08 PM4/9/25

to cloudla...@googlegroups.com

Take r650 for example, it has 2 NICs as follows, you need to use the ConnectX-6 100 Gb NIC, not 25 Gb for experiments. 25Gbps is the control link and 100Gbps is the experimental link. traffic to the public address will be routed to the 25 Gb NIC.

r650		32 nodes (Intel Ice Lake, 72 core, 256GB RAM, 1.6TB NVMe)
CPU		Two 36-core Intel Xeon Platinum 8360Y at 2.4GHz
RAM		256GB ECC Memory (16x 16 GB 3200MHz DDR4)
Disk		One 480GB SATA SSD
Disk		One 1.6TB NVMe SSD (PCIe v4.0)
NIC		Dual-port Mellanox ConnectX-5 25 Gb NIC (PCIe v4.0)
NIC		Dual-port Mellanox ConnectX-6 100 Gb NIC (PCIe v4.0)

To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/c46c390d-a77e-457d-83bc-3026bd1cb59cn%40googlegroups.com.

Shi Chen

unread,

Apr 9, 2025, 10:17:21 PM4/9/25

to cloudlab-users

I checked the net config, the IP of one port of the 100gb NIC is 10.10.1.x, so using 10.10.1.x is not wrong.

Scott Groel

unread,

Apr 9, 2025, 11:19:42 PM4/9/25

to cloudla...@googlegroups.com

Have you accounted for NUMA locality when running your tests? You should be pinning the processes to the CPU that is in the same numa domain as the network card. Otherwise, you will introduce the latency between the sockets.

To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/02bfa223-58ee-4b54-8ea3-34447012149en%40googlegroups.com.

Shi Chen

unread,

Apr 10, 2025, 12:06:17 AM4/10/25

to cloudlab-users

No I did not do that thing (even don't know). Is there any tutorial or CloudLab Manual about this?

Scott Groel

unread,

Apr 10, 2025, 12:19:35 AM4/10/25

to cloudla...@googlegroups.com

First determine the NUMA node the NIC is on. You can do

lspci -vv

And look for the Mellanox ConnectX-6 device. There will be a NUMA Node field there. Then you can bind your test application/script to that NUMA node using numactl.

numactl --cpunodebind=<NUMA Node> --membind=<NUMA Node> <application>

Fill in the correct variables inside the < > fields.

To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/11f3b26f-0d01-4e67-9620-ede8a54b24ecn%40googlegroups.com.

Shi Chen

unread,

Apr 10, 2025, 3:32:11 AM4/10/25

to cloudlab-users

Okk bro, I will do this operation in my next experiment and check if it works. Thanks!

Mike Hibler

unread,

Apr 10, 2025, 9:59:21 AM4/10/25

to cloudla...@googlegroups.com

Sorry, I was out in the field yesterday. I am a Cloudlab admin and can say
that you *should* be using the 10.x.x.x interfaces (host name aliases are
in /etc/hosts). 127.0.0.1 is the local loopback and is fine if you want to
test within a single node. Addresses associated with the FQDN should *not*
be used (e.g., 130.127.x.x, 128.110.x.x, 128.105.x.x, 155.98.x.x).

> Screenshot 2025-04-10 032336.png

> 9b8678a8-0000-4fa5-8d2e-3f9e9d7f1d86n%40googlegroups.com.

>
> --
> You received this message because you are subscribed to the Google
> Groups "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to cloudlab-user...@googlegroups.com.
>
> To view this discussion visit https://groups.google.com/d/msgid/
> cloudlab-users/

> b02eff02-9319-4c4e-ae84-44ebf2d30cf5n%40googlegroups.com.

>
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cloudlab-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/

> cloudlab-users/c46c390d-a77e-457d-83bc-3026bd1cb59cn%40googlegroups.com.

>
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/

> CAKAxpjoz8sHu3fcaNaL7qXAYH%3D1xcG2xRk6v%3DqhVPGOpQg%2BbcA%40mail.gmail.com.

Shi Chen

unread,

Apr 10, 2025, 1:04:27 PM4/10/25

to cloudlab-users

Hello, It's so weird that the avg latency of gRPC (same version, v1.50.0) on c8220 devices and sm110p devices has such big difference.

The environment:

* c8220: 2 nodes, Ubuntu 20.04, node1 --> node2, using ghz perf tool, and args are as shown in screenshot;

* sm110p: 2 nodes, but Ubuntu 22.04, node1 --> node2, using ghz perf tool, and the same args.

And I have tested that NUMA locality has Neglected Impact on the latency.

Would you please check what happened to network in background when doing the grpc test in this two experiments?

Experiment URLs:

* for c8220 (did test on node0~1): https://www.cloudlab.us/status.php?uuid=80a41807-15de-11f0-af1a-e4434b2381fc

* for sm110p (did test on node1~2): https://www.cloudlab.us/status.php?uuid=f9136305-078a-11f0-af1a-e4434b2381fc

Here are my results: (first one is c8220, and second is sm110p)

Shi Chen

unread,

Apr 10, 2025, 1:19:19 PM4/10/25

to cloudlab-users

r650's performance is the one that I most care about, but I have no r650 nodes until tomorrow. I'm worried that it may encounter the same problem.

Add fortio perf tool tests results, the first is c8220, and the second is sm110p:

Shi Chen

unread,

Apr 10, 2025, 1:36:44 PM4/10/25

to cloudlab-users

Another comparison result, between my ThinkPad Laptop, CPU is i5 12th Gen, and sm110p node, using Loopback.

Seems that it's not the network's problem?

As you can see, sm110p even is worse than my laptop, so what's the reason that leads to such result? Is there any configs of sm110p that I have ignored?

Reply all

Reply to author

Forward