Using VMA with multiple ConnectX 3 NICs

Andrew Jameson

unread,

Jun 25, 2015, 11:19:16 PM6/25/15

to libvm...@googlegroups.com

Hi,

I've written a UDP server and client to evaluate the maximum performance I can get between 2 servers, each with 2 x dual port ConnectX 3 NICs. The servers are directly connected (no switch) and running as 40Gb Ethernet with a 9000 byte MTU (i.e. total of 4 x 40Gb links). My eventual application needs to receive around 23 Gb/s per link (92 Gb/s total), but I'm having some inconsistent results in my UDP transmit performance.

I'm using VMA offloading for the Tx and Rx and that works nicely for a single UDP stream on a single NIC. My packet size is 8200 bytes.

When I try to run 2, 3 or 4 UDP steams on the other links between the 2 servers I see substantial variation in my Tx performance. I'm binding the UDP sender to separate CPU cores and only allocating memory from the adjacent NUMA node to each CPU, so I believe everything is setup quite optimally in that way.

Does libvma have internal resources that would be shared across threads, processes, ports or event devices? i.e. is there only one internal thread per server?

Thanks in advance,

Andrew

M Kelly

unread,

Jun 27, 2015, 12:22:23 PM6/27/15

to libvm...@googlegroups.com

I have seen performance vary with multiple threads using the same NIC. There is/are a/some POLLING setting(s) which might help ?

orkme...@gmail.com

unread,

Jun 28, 2015, 2:35:13 AM6/28/15

to libvm...@googlegroups.com

VMA have resources which by default are shared between threads per NIC interface.

if your process is multithreaded try to follow this:

https://github.com/Mellanox/libvma/wiki/VMA_Parameters#my-application-is-multithreaded

VMA have an internal thread per process (cpu affiinity of the internal thread can be controlled using VMA_INTERNAL_THREAD_AFFINITY).

Reply all

Reply to author

Forward