Hi,
Trying to figure out why whenever starting TREX on a VM with x710 NICs in pass-through crashes the host. The same setup works perfectly fine with 100G Mellanox NICs.
Let me know if you have any tips :)
On the host
Network devices using kernel driver
===================================
0000:3b:00.0 'Ethernet Controller X710 for 10GbE SFP+' if=p1p1 drv=i40e unused=vfio-pci
0000:3b:00.1 'Ethernet Controller X710 for 10GbE SFP+' if=p1p2 drv=i40e unused=vfio-pci
0000:3b:00.2 'Ethernet Controller X710 for 10GbE SFP+' if=p1p3 drv=i40e unused=vfio-pci
0000:3b:00.3 'Ethernet Controller X710 for 10GbE SFP+' if=p1p4 drv=i40e unused=vfio-pci
ethtool -i p1p1
driver: i40e
version: 2.9.21
firmware-version: 5.05 0x80002aab 255.65535.255
expansion-rom-version:
bus-info: 0000:3b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
On the VM:
cat /etc/trex_cfg.yaml
### Config file generated by dpdk_setup_ports.py ###
- version: 2
interfaces: ['00:07.0', '00:08.0']
port_info:
- dest_mac: 00:e0:ed:73:a0:91 # MAC OF LOOPBACK TO IT'S DUAL INTERFACE
src_mac: 00:e0:ed:73:a0:90
- dest_mac: 00:e0:ed:73:a0:90 # MAC OF LOOPBACK TO IT'S DUAL INTERFACE
src_mac: 00:e0:ed:73:a0:91
platform:
master_thread_id: 0
latency_thread_id: 1
dual_if:
- socket: 0
threads: [2,3,4,5]
./dpdk_setup_ports.py -s
Network devices using DPDK-compatible driver
============================================
<none>
Network devices using kernel driver
===================================
0000:00:03.0 'Virtio network device' if=eth0 drv=virtio-pci unused=virtio_pci,igb_uio,vfio-pci,uio_pci_generic *Active*
0000:00:07.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens7 drv=i40e unused=igb_uio,vfio-pci,uio_pci_generic
0000:00:08.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens8 drv=i40e unused=igb_uio,vfio-pci,uio_pci_generic
0000:00:09.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens9 drv=i40e unused=igb_uio,vfio-pci,uio_pci_generic
0000:00:0a.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens10 drv=i40e unused=igb_uio,vfio-pci,uio_pci_generic
ethtool -i ens7
driver: i40e
version: 2.1.14-k
firmware-version: 5.05 0x80002aab 255.65535.255
expansion-rom-version:
bus-info: 0000:00:07.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
TREX command:
./t-rex-64 -f cap2/http_simple.yaml -v 100
Trying to bind to igb_uio ...
/bin/python dpdk_nic_bind.py --bind=igb_uio 0000:00:07.0 0000:00:08.0
The ports are bound/configured.
Starting TRex v2.57 please wait ...
Using configuration file /etc/trex_cfg.yaml
port limit : not configured
port_bandwidth_gb : 10
if_mask : None
is low-end : 0
stack type :
thread_per_dual_if : 1
if : 00:07.0, 00:08.0,
enable_zmq_pub : 1
zmq_pub_port : 4500
m_zmq_rpc_port : 4501
src : 00:e0:ed:73:a0:90
dest : 00:e0:ed:73:a0:91
src : 00:e0:ed:73:a0:91
dest : 00:e0:ed:73:a0:90
memory per 2x10G ports
MBUF_64 : 16380
MBUF_128 : 8190
MBUF_256 : 8190
MBUF_512 : 8190
MBUF_1024 : 8190
MBUF_2048 : 4095
MBUF_4096 : 128
MBUF_9K : 512
TRAFFIC_MBUF_64 : 65520
TRAFFIC_MBUF_128 : 32760
TRAFFIC_MBUF_256 : 8190
TRAFFIC_MBUF_512 : 8190
TRAFFIC_MBUF_1024 : 8190
TRAFFIC_MBUF_2048 : 32760
TRAFFIC_MBUF_4096 : 128
TRAFFIC_MBUF_9K : 512
MBUF_DP_FLOWS : 524288
MBUF_GLOBAL_FLOWS : 5120
master thread : 0
rx thread : 1
dual_if : 0
socket : 0
[ 2 3 4 5 ]
CTimerWheelYamlInfo does not exist
flags : 8010c00
write_file : 0
verbose : 4
realtime : 1
flip : 0
cores : 1
single core : 0
flow-flip : 0
no clean close : 0
zmq_publish : 1
vlan mode : 0
client_cfg : 0
mbuf_cache_disable : 0
cfg file : cap2/http_simple.yaml
mac file :
out file :
client cfg file :
duration : 3600
factor : 1
mbuf_factor : 1
latency : 0 pkt/sec
zmq_port : 4500
telnet_port : 4501
expected_ports : 2
tw_bucket_usec : 20.000000 usec
tw_buckets : 1024 usec
tw_levels : 3 usec
port : 0 dst:00:e0:ed:73:a0:91 src:00:e0:ed:73:a0:90
port : 1 dst:00:e0:ed:73:a0:90 src:00:e0:ed:73:a0:91
port : 2 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 3 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 4 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 5 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 6 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 7 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 8 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 9 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 10 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 11 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 12 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 13 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 14 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 15 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 16 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 17 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 18 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 19 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 20 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 21 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 22 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
port : 23 dst:00:00:00:01:00:00 src:00:00:00:00:00:00
Total Memory :
MBUF_64 : 81900
MBUF_128 : 40950
MBUF_256 : 16380
MBUF_512 : 16380
MBUF_1024 : 16380
MBUF_2048 : 36855
MBUF_4096 : 1024
MBUF_DP_FLOWS : 524288
MBUF_GLOBAL_FLOWS : 5120
get_each_core_dp_flows : 524288
Total memory : 248.40 Mbytes
core_mask 7
sockets : 0
active sockets : 1
ports_sockets : 1
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
phy | virt
2 1
DPDK args
xx -c 0x7 -n 4 --log-level 5 --master-lcore 0 -w 0000:00:07.0 -w 0000:00:08.0 --legacy-mem
EAL: No available hugepages reported in hugepages-1048576kB
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL: Invalid NUMA socket, default to 0
>> host crashes..
Console logs on the host, before and after the crash happens:
[ 669.089822] vfio-pci 0000:3b:00.0: Masking broken INTx support
[ 669.095717] vfio_ecap_init: 0000:3b:00.0 hiding ecap 0x19@0x1d0
[ 669.210730] vfio-pci 0000:3b:00.1: Masking broken INTx support
[ 669.324518] vfio-pci 0000:3b:00.2: Masking broken INTx support
[ 669.438246] vfio-pci 0000:3b:00.3: Masking broken INTx support
[ 670.837433] vfio-pci 0000:3b:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xdead
[ 670.849437] vfio-pci 0000:3b:00.1: Invalid PCI ROM header signature: expecting 0xaa55, got 0xdead
[ 670.860675] vfio-pci 0000:3b:00.2: Invalid PCI ROM header signature: expecting 0xaa55, got 0xdead
[ 670.871799] vfio-pci 0000:3b:00.3: Invalid PCI ROM header signature: expecting 0xaa55, got 0xdead
[ 1069.806831] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 1069.815075] {1}[Hardware Error]: event severity: fatal
[ 1069.820202] {1}[Hardware Error]: Error 0, type: fatal
[ 1069.825327] {1}[Hardware Error]: section_type: PCIe error
[ 1069.830887] {1}[Hardware Error]: port_type: 0, PCIe end point
[ 1069.836793] {1}[Hardware Error]: version: 3.0
[ 1069.841314] {1}[Hardware Error]: command: 0x0506, status: 0x4010
[ 1069.847479] {1}[Hardware Error]: device_id: 0000:3b:00.3
[ 1069.852952] {1}[Hardware Error]: slot: 1
[ 1069.857040] {1}[Hardware Error]: secondary_bus: 0x00
[ 1069.862169] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x1572
[ 1069.868766] {1}[Hardware Error]: class_code: 000002
[ 1069.873806] Kernel panic - not syncing: Fatal hardware error!
Console logs on the VM:
[ 365.789978] igb_uio: loading out-of-tree module taints kernel.
[ 365.790969] igb_uio: module verification failed: signature and/or required key missing - tainting kernel
[ 365.792736] igb_uio: Use MSIX interrupt by default
[ 365.904489] i40e 0000:00:07.0: i40e_ptp_stop: removed PHC on ens7
[ 366.069460] igb_uio 0000:00:07.0: uio device registered with irq 18
[ 366.070420] igb_uio 0000:00:07.0: mapping 1K dma=0x232cbf000 host=ffff988df2cbf000
[ 366.071541] igb_uio 0000:00:07.0: unmapping 1K dma=0x232cbf000 host=ffff988df2cbf000
[ 366.086736] i40e 0000:00:08.0: i40e_ptp_stop: removed PHC on ens8
[ 366.239363] igb_uio 0000:00:08.0: uio device registered with irq 19
[ 366.240322] igb_uio 0000:00:08.0: mapping 1K dma=0x35ee0000 host=ffff988bf5ee0000
[ 366.241427] igb_uio 0000:00:08.0: unmapping 1K dma=0x35ee0000 host=ffff988bf5ee0000
[ 370.142508] Bits 55-60 of /proc/PID/pagemap entries are about to stop being page-shift some time soon. See the linux/Documentation/vm/pagemap.txt for details.