X710 in pass-through crashes the host

742 views
Skip to first unread message

kit...@gmail.com

unread,
Sep 18, 2019, 3:49:59 PM9/18/19
to TRex Traffic Generator
Hi,

Trying to figure out why whenever starting TREX on a VM with x710 NICs in pass-through crashes the host. The same setup works perfectly fine with 100G Mellanox NICs.

Let me know if you have any tips :)

On the host
Network devices using kernel driver
===================================
0000:3b:00.0 'Ethernet Controller X710 for 10GbE SFP+' if=p1p1 drv=i40e unused=vfio-pci
0000:3b:00.1 'Ethernet Controller X710 for 10GbE SFP+' if=p1p2 drv=i40e unused=vfio-pci
0000:3b:00.2 'Ethernet Controller X710 for 10GbE SFP+' if=p1p3 drv=i40e unused=vfio-pci
0000:3b:00.3 'Ethernet Controller X710 for 10GbE SFP+' if=p1p4 drv=i40e unused=vfio-pci

ethtool -i p1p1
driver: i40e
version: 2.9.21
firmware-version: 5.05 0x80002aab 255.65535.255
expansion-rom-version:
bus-info: 0000:3b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

On the VM:
cat /etc/trex_cfg.yaml
### Config file generated by dpdk_setup_ports.py ###

- version: 2
  interfaces: ['00:07.0', '00:08.0']
  port_info:
      - dest_mac: 00:e0:ed:73:a0:91 # MAC OF LOOPBACK TO IT'S DUAL INTERFACE
        src_mac:  00:e0:ed:73:a0:90
      - dest_mac: 00:e0:ed:73:a0:90 # MAC OF LOOPBACK TO IT'S DUAL INTERFACE
        src_mac:  00:e0:ed:73:a0:91

  platform:
      master_thread_id: 0
      latency_thread_id: 1
      dual_if:
        - socket: 0
          threads: [2,3,4,5]
./dpdk_setup_ports.py -s

Network devices using DPDK-compatible driver
============================================
<none>

Network devices using kernel driver
===================================
0000:00:03.0 'Virtio network device' if=eth0 drv=virtio-pci unused=virtio_pci,igb_uio,vfio-pci,uio_pci_generic *Active*
0000:00:07.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens7 drv=i40e unused=igb_uio,vfio-pci,uio_pci_generic
0000:00:08.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens8 drv=i40e unused=igb_uio,vfio-pci,uio_pci_generic
0000:00:09.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens9 drv=i40e unused=igb_uio,vfio-pci,uio_pci_generic
0000:00:0a.0 'Ethernet Controller X710 for 10GbE SFP+' if=ens10 drv=i40e unused=igb_uio,vfio-pci,uio_pci_generic

ethtool -i ens7
driver: i40e
version: 2.1.14-k
firmware-version: 5.05 0x80002aab 255.65535.255
expansion-rom-version:
bus-info: 0000:00:07.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

TREX command:
./t-rex-64 -f cap2/http_simple.yaml -v 100
Trying to bind to igb_uio ...
/bin/python dpdk_nic_bind.py --bind=igb_uio 0000:00:07.0 0000:00:08.0
The ports are bound/configured.
Starting  TRex v2.57 please wait  ...
Using configuration file /etc/trex_cfg.yaml
 port limit     :  not configured
 port_bandwidth_gb    :  10
 if_mask        : None
 is low-end : 0
 stack type : 
 thread_per_dual_if      : 1
 if        :  00:07.0, 00:08.0,
 enable_zmq_pub :  1
 zmq_pub_port   :  4500
 m_zmq_rpc_port    :  4501
 src     : 00:e0:ed:73:a0:90
 dest    : 00:e0:ed:73:a0:91
 src     : 00:e0:ed:73:a0:91
 dest    : 00:e0:ed:73:a0:90
 memory per 2x10G ports 
 MBUF_64                                   : 16380
 MBUF_128                                  : 8190
 MBUF_256                                  : 8190
 MBUF_512                                  : 8190
 MBUF_1024                                 : 8190
 MBUF_2048                                 : 4095
 MBUF_4096                                 : 128
 MBUF_9K                                   : 512
 TRAFFIC_MBUF_64                           : 65520
 TRAFFIC_MBUF_128                          : 32760
 TRAFFIC_MBUF_256                          : 8190
 TRAFFIC_MBUF_512                          : 8190
 TRAFFIC_MBUF_1024                         : 8190
 TRAFFIC_MBUF_2048                         : 32760
 TRAFFIC_MBUF_4096                         : 128
 TRAFFIC_MBUF_9K                           : 512
 MBUF_DP_FLOWS                             : 524288
 MBUF_GLOBAL_FLOWS                         : 5120
 master   thread  : 0 
 rx  thread  : 1 
 dual_if : 0
    socket  : 0 
   [   2   3   4   5     ] 
CTimerWheelYamlInfo does not exist 
 flags           : 8010c00
 write_file      : 0
 verbose         : 4
 realtime        : 1
 flip            : 0
 cores           : 1
 single core     : 0
 flow-flip       : 0
 no clean close  : 0
 zmq_publish     : 1
 vlan mode       : 0
 client_cfg      : 0
 mbuf_cache_disable  : 0
 cfg file        : cap2/http_simple.yaml
 mac file        : 
 out file        : 
 client cfg file : 
 duration        : 3600
 factor          : 1
 mbuf_factor     : 1
 latency         : 0 pkt/sec
 zmq_port        : 4500
 telnet_port     : 4501
 expected_ports  : 2
 tw_bucket_usec  : 20.000000 usec
 tw_buckets      : 1024 usec
 tw_levels       : 3 usec
 port : 0 dst:00:e0:ed:73:a0:91  src:00:e0:ed:73:a0:90
 port : 1 dst:00:e0:ed:73:a0:90  src:00:e0:ed:73:a0:91
 port : 2 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 3 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 4 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 5 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 6 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 7 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 8 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 9 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 10 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 11 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 12 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 13 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 14 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 15 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 16 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 17 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 18 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 19 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 20 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 21 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 22 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 23 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 Total Memory :
 MBUF_64                                   : 81900
 MBUF_128                                  : 40950
 MBUF_256                                  : 16380
 MBUF_512                                  : 16380
 MBUF_1024                                 : 16380
 MBUF_2048                                 : 36855
 MBUF_4096                                 : 1024
 MBUF_DP_FLOWS                             : 524288
 MBUF_GLOBAL_FLOWS                         : 5120
 get_each_core_dp_flows                    : 524288
 Total memory                              :     248.40 Mbytes 
 core_mask  7 
 sockets : 0 
 active sockets : 1
 ports_sockets : 1
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
 phy   |   virt  
 2      1  
DPDK args
 xx  -c  0x7  -n  4  --log-level  5  --master-lcore  0  -w  0000:00:07.0  -w  0000:00:08.0  --legacy-mem 
EAL: No available hugepages reported in hugepages-1048576kB
 EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL:   Invalid NUMA socket, default to 0
>> host crashes..


Console logs on the host, before and after the crash happens:
[  669.089822] vfio-pci 0000:3b:00.0: Masking broken INTx support
[  669.095717] vfio_ecap_init: 0000:3b:00.0 hiding ecap 0x19@0x1d0
[  669.210730] vfio-pci 0000:3b:00.1: Masking broken INTx support
[  669.324518] vfio-pci 0000:3b:00.2: Masking broken INTx support
[  669.438246] vfio-pci 0000:3b:00.3: Masking broken INTx support
[  670.837433] vfio-pci 0000:3b:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xdead
[  670.849437] vfio-pci 0000:3b:00.1: Invalid PCI ROM header signature: expecting 0xaa55, got 0xdead
[  670.860675] vfio-pci 0000:3b:00.2: Invalid PCI ROM header signature: expecting 0xaa55, got 0xdead
[  670.871799] vfio-pci 0000:3b:00.3: Invalid PCI ROM header signature: expecting 0xaa55, got 0xdead
[ 1069.806831] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 1069.815075] {1}[Hardware Error]: event severity: fatal
[ 1069.820202] {1}[Hardware Error]:  Error 0, type: fatal
[ 1069.825327] {1}[Hardware Error]:   section_type: PCIe error
[ 1069.830887] {1}[Hardware Error]:   port_type: 0, PCIe end point
[ 1069.836793] {1}[Hardware Error]:   version: 3.0
[ 1069.841314] {1}[Hardware Error]:   command: 0x0506, status: 0x4010
[ 1069.847479] {1}[Hardware Error]:   device_id: 0000:3b:00.3
[ 1069.852952] {1}[Hardware Error]:   slot: 1
[ 1069.857040] {1}[Hardware Error]:   secondary_bus: 0x00
[ 1069.862169] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x1572
[ 1069.868766] {1}[Hardware Error]:   class_code: 000002
[ 1069.873806] Kernel panic - not syncing: Fatal hardware error!

Console logs on the VM:
[  365.789978] igb_uio: loading out-of-tree module taints kernel.
[  365.790969] igb_uio: module verification failed: signature and/or required key missing - tainting kernel
[  365.792736] igb_uio: Use MSIX interrupt by default
[  365.904489] i40e 0000:00:07.0: i40e_ptp_stop: removed PHC on ens7
[  366.069460] igb_uio 0000:00:07.0: uio device registered with irq 18
[  366.070420] igb_uio 0000:00:07.0: mapping 1K dma=0x232cbf000 host=ffff988df2cbf000
[  366.071541] igb_uio 0000:00:07.0: unmapping 1K dma=0x232cbf000 host=ffff988df2cbf000
[  366.086736] i40e 0000:00:08.0: i40e_ptp_stop: removed PHC on ens8
[  366.239363] igb_uio 0000:00:08.0: uio device registered with irq 19
[  366.240322] igb_uio 0000:00:08.0: mapping 1K dma=0x35ee0000 host=ffff988bf5ee0000
[  366.241427] igb_uio 0000:00:08.0: unmapping 1K dma=0x35ee0000 host=ffff988bf5ee0000
[  370.142508] Bits 55-60 of /proc/PID/pagemap entries are about to stop being page-shift some time soon. See the linux/Documentation/vm/pagemap.txt for details.


Yaroslav Brustinov

unread,
Sep 18, 2019, 5:00:34 PM9/18/19
to John H, TRex Traffic Generator
Hi,

Could you share output of command below from within host and VM:
cat /proc/cmdline

Thanks,
Yaroslav.

--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/2bad1899-7658-49f6-aedc-c558d3dcc779%40googlegroups.com.

kit...@gmail.com

unread,
Sep 18, 2019, 5:20:13 PM9/18/19
to TRex Traffic Generator
Host:
BOOT_IMAGE=/vmlinuz-3.10.0-862.el7.x86_64 root=/dev/mapper/VolGrp-Vol1 ro crashkernel=auto rd.lvm.lv=VolGrp/Vol1 rd.lvm.lv=VolGrp/Vol0 console=ttyS0,115200n8 isolcpus=0-39 rcu_nocbs=0-39 nohz_full=0-39 intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=64 hugepagesz=2M hugepages=2048

VM:
BOOT_IMAGE=/vmlinuz-3.10.0-862.el7.x86_64 root=/dev/mapper/VolGrp-Vol1 ro crashkernel=auto rd.lvm.lv=VolGrp/Vol1 rd.lvm.lv=VolGrp/Vol2 console=ttyS0,115200n8 LANG=en_US.UTF-8
To unsubscribe from this group and stop receiving emails from it, send an email to trex...@googlegroups.com.

Yaroslav Brustinov

unread,
Sep 18, 2019, 5:43:09 PM9/18/19
to John H, TRex Traffic Generator
Looks like something similar here:

Is it qemu-kvm?
If so, what is the version? Could you try updating to latest?

To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/bba05386-afbc-4b64-a97a-b27e05577490%40googlegroups.com.

kit...@gmail.com

unread,
Sep 20, 2019, 4:00:14 PM9/20/19
to TRex Traffic Generator
Originally I had 1.5.3 but I updated to 2.12.0 (qemu-kvm-ev from repo centos-qemu-ev), and I'm seeing the same crash.

I've also tried CentOS 7.6 on the VM. I previously had 7.5 but no luck :(.

At this point I might try Ubuntu, if there's no other options.

kit...@gmail.com

unread,
Sep 25, 2019, 4:52:53 PM9/25/19
to TRex Traffic Generator
Ubuntu didn't do the trick either, but Mellanox 10G works so we're just going to switch to those.

Thanks!
Reply all
Reply to author
Forward
0 new messages