Each EPYC socket has 4 NUMAs, so the server has total 8 NUMAs.
The first NIC is connected to NUMA 0 and the second to NUMA 3 (both NUMA 0 and 3 are from the same socket).
TRex works with the NIC on NUMA 0 but fails with the NIC on NUMA 3. Both NICs are in loopback.
I am using CentOS 7.5
$ uname -a
Linux amd-010236107136.amd.com 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Here is the trex_cfg.yaml for the first NIC (it works)
- port_limit : 2
version : 2
#List of interfaces. Change to suit your setup. Use ./dpdk_setup_ports.py -s to see available options
interfaces : ["01:00.0","01:00.1"]
port_info : # Port IPs. Change to suit your needs. In case of loopback, you can leave as is.
- ip : 198.18.1.1
default_gw : 198.18.2.1
- ip : 198.18.2.1
default_gw : 198.18.1.1
Here is the trex_cfg.yaml for the second NIC (it fails)
- port_limit : 2
version : 2
#List of interfaces. Change to suit your setup. Use ./dpdk_setup_ports.py -s to see available options
interfaces : ["31:00.0","31:00.1"]
port_info : # Port IPs. Change to suit your needs. In case of loopback, you can leave as is.
- ip : 198.18.3.1
default_gw : 198.18.4.1
- ip : 198.18.4.1
default_gw : 198.18.3.1
Here are the logs when it fails
$ sudo ./t-rex-64 -i -c 8
Killing Scapy server... Scapy server is killed
Starting Scapy server.... Scapy server is started
The ports are bound/configured.
Starting TRex v2.35 please wait ...
set driver name net_mlx5
driver capability : TCP_UDP_OFFLOAD
Number of ports found: 2
zmq publisher at: tcp://*:4500
PMD: net_mlx5: 0x56403c9adc00: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9adc00: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9adc00: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9adc00: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9adc00: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9adc00: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9adc00: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9adc00: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9adc00: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9adc00: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9b1c80: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9b1c80: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9b1c80: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9b1c80: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9b1c80: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9b1c80: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9b1c80: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9b1c80: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9b1c80: Drop queue allocation failed: Unknown error -1
PMD: net_mlx5: 0x56403c9b1c80: Drop queue allocation failed: Unknown error -1
wait 1 sec .
port : 0
------------
link : link : Link Up - speed 100000 Mbps - full-duplex
promiscuous : 0
port : 1
------------
link : link : Link Up - speed 100000 Mbps - full-duplex
promiscuous : 0
./t-rex-64: line 72: 4222 Segmentation fault (core dumped) ./_$(basename $0) $INPUT_ARGS $EXTRA_INPUT_ARGS
--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+u...@googlegroups.com.
To post to this group, send email to trex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/e54aa131-e776-4fc1-a5cf-7cad38c809c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I will try the second NUMA when I come back from 4th of July holiday.
Charlie
Reading more about EPYC
It is pretty impressive. In case of Mellanox CX-5 there would be a limit of 8 cores per 100Gb (for NUMA locality) which might not be enough to drive full line rate, but there are plenty of dies (Zeppelin) to drive more total bandwidth.
Cisco C125 M5 for example will expose only 2xPCI16 while theoretically it could expose 16x PCIx16 for each die, total of 16 slots of 100g for dual socket, a total of 1.6Tb/sec of traffic.
Once UCS C125 will be unreadable we will tune the scripts for 16 NUMA instead of the maximum 2
I would try to fix this.
1)
#define RTE_MAX_NUMA_NODES 8
==>16
2) should be 16 and auto-identify the maximum NUMA
for socket_id in range(2):
filename = '/sys/devices/system/node/node%d/hugepages/hugepages-2048kB/nr_hugepages' % socket_id
3) example
In configuration file ( this is for CX-5 example)
$more /etc/trex_cfg.yaml
### Config file generated by dpdk_setup_ports.py ###
- port_limit: 2
version: 2
interfaces: ['04:00.0', '04:00.1']
port_info:
- ip: 1.1.1.1
default_gw: 2.2.2.2
- ip: 2.2.2.2
default_gw: 1.1.1.1
platform:
master_thread_id: 0
latency_thread_id: 14
dual_if:
- socket: 0
threads: [1,2,3,4,5,6,7,8,9,10,11,12,13]
need to convert it to more local NUMA manually (never tested ..)
this is an example for 8 CX-5 setup maximum of 7 cores per NUMA (socket in our terms)
interfaces: ['04:00.0', '04:00.1',
'05:00.0', '05:00.1',
'06:00.0', '06:00.1',
'07:00.0', '07:00.1'
]
platform:
master_thread_id: 0
latency_thread_id: 15
dual_if:
- socket: 0 #<< this is NUMA 1 in socket 0. this associated with interface 0,1 ['04:00.0', '04:00.1']
threads: [1,2,3,4,5,6,7]
- socket: 1 #<< this is NUMA 1 in socket 0. this associated with interface 2,3 ['05:00.0', '05:00.1']
threads: [8,9,10,11,12,13,14]
- socket: 2 #<< this is NUMA 2 in socket 0. this associated with [ '06:00.0', '06:00.1']
threads: [16,17,18,19,20,21,22]
- socket: 3 #<< this is NUMA 2 in socket 0. this associated with ['07:00.0', '07:00.1']
threads: [24,25,26,27,28,29,30]
hopes, it helps.
thanks
Hanoh
--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+unsubscribe@googlegroups.com.
To post to this group, send email to trex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/f9714a1c-3319-4075-b5b5-4cca1b496b1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Just tested it on NUMA 1 and it worked. As you said, NUMA 0 and 1 work.
Charlie
My system has 8 NUMAs. Is it possible for me to make some changes so TRex will work with 8 NUMAs?
1) #define RTE_MAX_NUMA_NODES 8 ==>16
I do not need change it because my system has 8 NUMAs
2) should be 16 and auto-identify the maximum NUMA
for socket_id in range(2):
filename = '/sys/devices/system/node/node%d/hugepages/hugepages-2048kB/nr_hugepages' % socket_id
Can you point me to the file where I can make this change?
Thanks,
Charlie
--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+u...@googlegroups.com.
To post to this group, send email to trex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/b99b382d-5c5c-4e03-a094-8e5592719718%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
In dpdk_setup_ports.py, I made the following three changes
Line #108 for i in range(0, len(self.interfaces), 4):
Line #191 for i in range(0, len(self.interfaces), 4):
Line #212 for i in range(0, len(self.interfaces), 4):
But I still got the same errors. I guess I must have missed something.
Thanks,
Charlie
Thanks,
Charlie
--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+unsubscribe@googlegroups.com.
To post to this group, send email to trex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/8a26b2da-90db-44fc-bc16-50682311359d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
After downloading the be_latest version and changing it to range(4), NUMA 3 started to work.
-Per port stats table
ports | 0 | 1 | 2 | 3
-----------------------------------------------------------------------------------------
opackets | 0 | 0 | 0 | 0
obytes | 0 | 0 | 0 | 0
ipackets | 0 | 0 | 0 | 0
ibytes | 0 | 0 | 0 | 0
ierrors | 0 | 0 | 0 | 0
oerrors | 1263291820153344 | 1263291819281280 | 1263291857244864 | 1263289731283584
Tx Bw | 0.00 bps | 0.00 bps | 0.00 bps | 0.00 bps
The only strange thing is "oerrors", but it seems that "oerrors" does not affect anything.
-Per port stats table
ports | 0 | 1 | 2 | 3
-----------------------------------------------------------------------------------------
opackets | 136227845 | 136246338 | 136208024 | 136224018
obytes | 139497324500 | 139516259292 | 139477023716 | 139493404632
ipackets | 136230253 | 136243862 | 136207886 | 136224112
ibytes | 139499787232 | 139513721828 | 139476882404 | 139493498848
ierrors | 0 | 0 | 0 | 0
oerrors | 1261023989340672 | 1261023988468608 | 1261024026432192 | 1261021900470912
Tx Bw | 54.26 Gbps | 54.00 Gbps | 54.05 Gbps | 54.15 Gbps
-Global stats enabled
Cpu Utilization : 25.1 % 123.4 Gb/core
Platform_factor : 1.0
Total-Tx : 216.46 Gbps
Total-Rx : 216.46 Gbps
Total-PPS : 26.42 Mpps
Total-CPS : 0.00 cps
Expected-PPS : 0.00 pps
Expected-CPS : 0.00 cps
Expected-BPS : 0.00 bps
Active-flows : 0 Clients : 0 Socket-util : 0.0000 %
Open-flows : 0 Servers : 0 Socket : 0 Socket/Clients : -nan
Total_queue_full : 40081021
drop-rate : 0.00 bps
current time : 43.0 sec
test duration : 0.0 sec
Thanks,
Charlie