T-Rex in AWS failing to bring up on ENA Adapter

324 views
Skip to first unread message

Mike Alering

unread,
Aug 16, 2023, 12:06:22 PM8/16/23
to TRex Traffic Generator
Hello,

Im running into an issue with getting T-Rex to come up on any AWS instance (size or flavor of Linux). I have tried centos, Red Hat, Debian, and Ubuntu and run into similar issues on each and am wondering what I am doing wrong here as this appears to work great on my bare metal hosts. Any assistance I could get here would be greatly appreciated as I have been pulling my hair our over the last few days on this item.

What *appears* to be happening is that the instance wants to use the ENA driver but DPDK is telling the instance to use the 'uio_pci_generic' driver. Instance tries to reach into the adapter using the ENA driver, but that times out since its using that other generic driver (this is the best I can come up with from the errors presented). Was wondering if I could not get some guidance on how to move this forward as our team is attemping to use T-Rex for DUT tests within AWS using our product as the DUT using a client/server model with a single dummy interface on each to start.

Setup is as follows:

1) I git clone the repo into /tmp
2) ./b configure
3) ./b build
4) Then I try to run either interactive mode or try to just send out traffic via the interface using the cap2/dns.yaml



Error thats been plaguing me across all the distros below (Ive disabled LRO since I was getting hit with a error message on that):

[ec2-user@ip-172-31-12-184 scripts]$ sudo ./t-rex-64 -i --astf --astf-server-only --lro-disable
The ports are bound/configured.
Starting  TRex v3.03 please wait  ...
EAL: Error reading from file descriptor 20: Input/output error
 set driver name net_ena
 driver capability  : TCP_UDP_OFFLOAD  SLRO
Warning SLRO is supported and asked to be disabled by user
 set dpdk queues mode to ONE_QUE
 Number of ports found: 2 (dummy among them: 1)
Warning LRO is supported and asked to be disabled by user
zmq publisher at: tcp://*:4500
[ENA_COM: ena_com_wait_and_process_admin_cq_interrupts]Timeout waiting for comp_ctx->wait_event
[ENA_COM: ena_com_wait_and_process_admin_cq_interrupts]The ena device sent a completion but the driver didn't receive a MSI-X interrupt (cmd 8), autopolling mode is OFF
[ENA_COM: ena_com_get_feature_ex]Failed to submit get_feature command 26 error: -62
ena_configure_aenq(): Cannot configure AENQ groups, rc=-62
Port0 dev_configure = -62
EAL: Error - exiting with code: 1
  Cause: Cannot configure device: err=-62, port=0

________________________________________Output of dpdk_setup_ports.py____________________________________

[root@ip-172-31-12-184 scripts]# ./dpdk_setup_ports.py -s

Network devices using DPDK-compatible driver
============================================
0000:00:06.0 'Elastic Network Adapter (ENA)' drv=uio_pci_generic unused=ena,igb_uio,vfio-pci

Network devices using kernel driver
===================================
0000:00:05.0 'Elastic Network Adapter (ENA)' if=eth0 drv=ena unused=igb_uio,vfio-pci,uio_pci_generic *Active*

Other network devices
=====================
<none>


______________________________________Output of t-rex-64-debug with verbose output______________________________________

root@ip-172-31-12-184 scripts]# ./t-rex-64-debug -i --astf --astf-server-only --lro-disable -c 6 -v 7
The ports are bound/configured.
Starting  TRex v3.03 please wait  ...
Using configuration file /etc/trex_cfg.yaml
 port limit     :  not configured
 port_bandwidth_gb    :  10
 port_speed           :  0
 port_mtu             :  0
 if_mask        : None
 is low-end : 0
 stack type :  
 thread_per_dual_if      : 1
 if        :  00:06.0, dummy,
 enable_zmq_pub :  1
 zmq_pub_port   :  4500
 m_zmq_rpc_port    :  4501
 src     : 00:00:00:00:00:00
 dest    : 00:00:00:00:00:00
 memory per 2x10G ports  
 MBUF_64                                   : 16380
 MBUF_128                                  : 8190
 MBUF_256                                  : 8190
 MBUF_512                                  : 8190
 MBUF_1024                                 : 8190
 MBUF_2048                                 : 4095
 MBUF_4096                                 : 128
 MBUF_9K                                   : 512
 TRAFFIC_MBUF_64                           : 65520
 TRAFFIC_MBUF_128                          : 32760
 TRAFFIC_MBUF_256                          : 8190
 TRAFFIC_MBUF_512                          : 8190
 TRAFFIC_MBUF_1024                         : 8190
 TRAFFIC_MBUF_2048                         : 32760
 TRAFFIC_MBUF_4096                         : 128
 TRAFFIC_MBUF_9K                           : 512
 MBUF_DP_FLOWS                             : 524288
 MBUF_GLOBAL_FLOWS                         : 5120
 master   thread  : 0  
 rx  thread  : 1  
 dual_if : 0
    socket  : 0  
   [   2   3   4   5   6   7   8   9   10   11   12   13   14   15     ]  
CTimerWheelYamlInfo does not exist  
 flags           : 8060f00
 write_file      : 0
 verbose         : 7
 realtime        : 1
 flip            : 0
 cores           : 6
 single core     : 0
 flow-flip       : 0
 no clean close  : 0
 zmq_publish     : 1
 vlan mode       : 0
 client_cfg      : 0
 mbuf_cache_disable  : 0
 cfg file        :  
 mac file        :  
 out file        :  
 client cfg file :  
 duration        : 0
 factor          : 1
 mbuf_factor     : 1
 latency         : 0 pkt/sec
 zmq_port        : 4500
 telnet_port     : 4501
 expected_ports  : 2
 tw_bucket_usec  : 20.000000 usec
 tw_buckets      : 1024 usec
 tw_levels       : 3 usec
 port : 0 dst:00:00:00:00:00:00  src:00:00:00:00:00:00
 port : 1 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 2 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 3 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 4 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 5 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 6 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 7 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 8 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 9 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 10 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 11 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 12 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 13 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 14 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 15 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 16 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 17 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 18 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 19 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 20 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 21 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 22 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 23 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 24 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 25 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 26 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 27 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 28 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 29 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 30 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 31 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 Total Memory :
 MBUF_64                                   : 81900
 MBUF_128                                  : 40950
 MBUF_256                                  : 16380
 MBUF_512                                  : 16380
 MBUF_1024                                 : 16380
 MBUF_2048                                 : 36855
 MBUF_4096                                 : 6144
 MBUF_DP_FLOWS                             : 524288
 MBUF_GLOBAL_FLOWS                         : 5120
 get_each_core_dp_flows                    : 87381
 Total memory                              :     269.34 Mbytes  
 core_list : 0,1,2,3,4,5,6,7
 sockets : 0  
 active sockets : 1
 ports_sockets : 1
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
 phy   |   virt  
 2      1  
 3      2  
 4      3  
 5      4  
 6      5  
 7      6  
DPDK args
 xx  -l  0,1,2,3,4,5,6,7  -n  4  --log-level  8  --main-lcore  0  -a  0000:00:06.0  --legacy-mem  
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probe PCI driver: net_ena (1d0f:ec20) device: 0000:00:06.0 (socket 0)
EAL: Error reading from file descriptor 40: Input/output error
TELEMETRY: No legacy callbacks, legacy socket not created
                 input : [00:06.0, dummy]
                 dpdk : [0000:00:06.0]
             pci_scan : [0000:00:06.0]
                  map : [ 0, 255]
 TRex port mapping
 -----------------
 TRex vport: 0 dpdk_rte_eth: 0
 TRex vport: 1 dpdk_rte_eth: 255
 set driver name net_ena
 driver capability  : TCP_UDP_OFFLOAD  SLRO
Warning SLRO is supported and asked to be disabled by user
 set dpdk queues mode to MULTI_QUE
 DPDK devices 1 : 1
-----
 0 : vdev 0000:00:06.0
-----
 Number of ports found: 2 (dummy among them: 1)


if_index : 0
driver name : net_ena
min_rx_bufsize : 64
max_rx_pktlen  : 9234
max_rx_queues  : 8
max_tx_queues  : 8
max_mac_addrs  : 1
rx_offload_capa : 0x200e
tx_offload_capa : 0x800e
rss reta_size   : 128
flow_type_rss   : 0xc30
tx_desc_max     : 1024
tx_desc_min     : 128
rx_desc_max     : 16384
rx_desc_min     : 128
Warning LRO is supported and asked to be disabled by user
zmq publisher at: tcp://*:4500
 rx_data_q_num : 0
 rx_drop_q_num : 0
 rx_dp_q_num   : 6
 rx_que_total : 6
 --  
 rx_desc_num_data_q   : 512
 rx_desc_num_drop_q   : 4096
 rx_desc_num_dp_q     : 512
 total_desc           : 3072
 --  
 tx_desc_num     : 1024
port 0 desc: Elastic Network Adapter (ENA)
[ENA_COM: ena_com_wait_and_process_admin_cq_interrupts]Timeout waiting for comp_ctx->wait_event
[ENA_COM: ena_com_wait_and_process_admin_cq_interrupts]The ena device sent a completion but the driver didn't receive a MSI-X interrupt (cmd 8), autopolling mode is OFF
[ENA_COM: ena_com_get_feature_ex]Failed to submit get_feature command 26 error: -62
ena_configure_aenq(): Cannot configure AENQ groups, rc=-62
Port0 dev_configure = -62
EAL: Error - exiting with code: 1
  Cause: Cannot configure device: err=-62, port=0


________________________________________The T-Rex configuration file I am trying to use____________________________________


 [root@ip-172-31-12-184 scripts]# cat /etc/trex_cfg.yaml
### Config file generated by dpdk_setup_ports.py ###

- version: 2
  interfaces: ['00:06.0', 'dummy']
  port_info:
      - ip: 172.31.7.153
        default_gw: 172.31.0.1

  platform:
      master_thread_id: 0
      latency_thread_id: 1
      dual_if:
        - socket: 0
          threads: [2,3,4,5,6,7,8,9,10,11,12,13,14,15]

Mike Alering

unread,
Aug 16, 2023, 4:44:50 PM8/16/23
to TRex Traffic Generator
So I have made a little progress on this today.
I did a 'rmmod uio_pci_generic' to unbind the NIC from the driver and can see from the output of dpdk_setup_ports.py that it does switch back to the ENA driver, however, whenever I launch Trex, DPDK forced the NIC back on the generic driver and Trex tries to use the ENA driver (See below). Is this a bug with Trex or possibly DPDK? Or am I just missing something. Another item here is that if I use the dpdk_nic_bind.py to force it on the ENA driver, that works until I try to launch T-Rex which then pushes the NIC onto the generic driver again. 

Thanks in advance for any guidance you can provide.


[root@ip-172-31-12-159 scripts]# lsmod
Module                  Size  Used by
uio_pci_generic        16384  0
uio                    28672  1 uio_pci_generic
tls                   131072  0
rfkill                 36864  1
intel_rapl_msr         20480  0
intel_rapl_common      32768  1 intel_rapl_msr
isst_if_common         20480  0
vfat                   20480  1
fat                    86016  1 vfat
nfit                   77824  0
libnvdimm             225280  1 nfit
rapl                   24576  0
ppdev                  24576  0
parport_pc             49152  0
pcspkr                 16384  0
i2c_piix4              28672  0
parport                77824  2 parport_pc,ppdev
drm                   581632  0
fuse                  176128  1
xfs                  2048000  2
libcrc32c              16384  1 xfs
nvme                   57344  3
crct10dif_pclmul       16384  1
nvme_core             180224  4 nvme
crc32_pclmul           16384  0
crc32c_intel           24576  1
nvme_common            24576  1 nvme_core
ena                   126976  0
ghash_clmulni_intel    16384  0
t10_pi                 16384  1 nvme_core
serio_raw              20480  0
dm_mirror              28672  0
dm_region_hash         24576  1 dm_mirror
dm_log                 24576  2 dm_region_hash,dm_mirror
dm_mod                204800  2 dm_log,dm_mirror

[root@ip-172-31-12-159 scripts]# rmmod uio_pci_generic

[root@ip-172-31-12-159 scripts]# ./t-rex-64 -f cap2/dns.yaml -m 10 -d 20
Trying to bind to vfio-pci ...
Trying to bind to try_bind_to_uio_pci_generic ...
/bin/python3 dpdk_nic_bind.py --bind=uio_pci_generic 0000:00:06.0

The ports are bound/configured.
Starting  TRex v3.03 please wait  ...
EAL: Error reading from file descriptor 20: Input/output error
 set driver name net_ena

 driver capability  : TCP_UDP_OFFLOAD  SLRO
 set dpdk queues mode to ONE_QUE
 Number of ports found: 2 (dummy among them: 1)
zmq publisher at: tcp://*:4500
[ENA_COM: ena_com_wait_and_process_admin_cq_interrupts]Timeout waiting for comp_ctx->wait_event
[ENA_COM: ena_com_wait_and_process_admin_cq_interrupts]The ena device sent a completion but the driver didn't receive a MSI-X interrupt (cmd 8), autopolling mode is OFF
[ENA_COM: ena_com_get_feature_ex]Failed to submit get_feature command 26 error: -62
ena_configure_aenq(): Cannot configure AENQ groups, rc=-62

Port0 dev_configure = -62
EAL: Error - exiting with code: 1
  Cause: Cannot configure device: err=-62, port=0

Mike Alering

unread,
Aug 17, 2023, 1:13:11 PM8/17/23
to TRex Traffic Generator

Ok I think I got this figured out for anyone else who runs into this issue and/or thread.

Looks like the issue is that T-Rex forces the NIC onto the 'uio_pci_generic' driver while the active NIC (where your SSH session terminates) continues to use the ENA driver. What APPEARS to be happening is that TRex sees that driver in use and instead of using the generic driver it just set, it tries to access that NIC via that ENA driver and since its been bound to the generic, the ENA driver times out and TRex fails to start.

Fix for this is quite simple: Don't use an instance type that uses the ENA driver (c5 etc.) 

I was able to load TRex up on a c4.2xlarge instance using the same config and process as above and it started up just fine. Looks like the c4 instances do use the Intel 82599 driver.

[ec2-user@ip-172-31-10-151 ~]$ lspci | grep Ethernet
00:03.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
00:04.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)


Hope this helps someone. 

Mike Alering

unread,
Aug 21, 2023, 7:02:02 PM8/21/23
to TRex Traffic Generator
Was able to touch this again today, for whatever reason it appears that Trex is unable to resolve the ARP for the AWS subnet gateway. See the below, Im sending out the ARP request, and I can see the reply come in from the gateway, but TRex just isnt catching it. Other item here is that I can see the packet come back tagged, but if I add my own 802.1q tag for the initial ARP request, I dont get a response. Any help with this would be VERY appreciated.

trex>arp

Resolving destination on port(s) [0]:                        [FAILED]


arp - Could not resolve following ports: [0]

trex>


#2 Port: 0 ──▶ TX


    Type: ARP, Size: 42 B, TS: 5.45 [sec]


   ###[ Ethernet ]###
      dst       = ff:ff:ff:ff:ff:ff
      src       = 0a:9e:9e:41:c2:75
      type      = ARP
    ###[ ARP ]###
         hwtype    = 0x1
         ptype     = IPv4
         hwlen     = 6
         plen      = 4
         op        = who-has
         hwsrc     = 0a:9e:9e:41:c2:75
         psrc      = 172.31.7.153
         hwdst     = 00:00:00:00:00:00
         pdst      = 172.31.0.1

#3 Port: 0 ◀── RX

    Type: ARP, Size: 60 B, TS: 5.45 [sec]


   ###[ Ethernet ]###
      dst       = 0a:9e:9e:41:c2:75
      src       = 0a:ca:6e:d2:4a:1f
      type      = n_802_1Q
    ###[ 802.1Q ]###
         prio      = 0
         id        = 0
         vlan      = 2080
         type      = ARP
    ###[ ARP ]###
            hwtype    = 0x1
            ptype     = IPv4
            hwlen     = 6
            plen      = 4
            op        = is-at
            hwsrc     = 0a:ca:6e:d2:4a:1f
            psrc      = 172.31.0.1
            hwdst     = 0a:9e:9e:41:c2:75
            pdst      = 172.31.7.153
    ###[ Padding ]###
               load      = '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'



Its looking more and more than TRex is no longer supported on AWS. Can we please get some confirmation on this item? On the instance using the ENA driver Im using a c5n.2xlarge instance type as recommended along with RHEL. Any help or confirmation that TRex does NOT work in AWS would be highly appreciated. 


Reply all
Reply to author
Forward
0 new messages