Hello,
I am trying to run a trex application (v3.02) in a privileged pod with dedicated CPUs on an OCP 4.12 cluster.
The pod is connected to 2 secondary SRIOV 700 Series (154c) NICs with vfio-pci driver.
I am attempting to create a simple traffic test using the `trex_cfg.yaml` and `testpmd.py` files(both attached). However, I am experiencing issues with oerrors and no traffic being generated. After running the command `start -f /opt/tests/testpmd.py -m 1mpps -p 0 -d 10` on the trex-console, I received the following stats::
```
trex>stats
Global Statistitcs
connection : localhost, Port 4501 total_tx_L2 : 0 bps
version : STL @ v2.87 total_tx_L1 : 0 bps
cpu_util. : 0.08% @ 6 cores (6 per dual port) total_rx : 145.86 Kbps
rx_cpu_util. : 0.0% / 0 pps total_pps : 0 pps
async_util. : 0.04% / 9.68 Kbps drop_rate : 0 bps
total_cps. : 0 cps queue_full : 0 pkts
Port Statistics
port | 0 | 1 | total
-----------+-------------------+-------------------+------------------
owner | root | root |
link | UP | UP |
state | IDLE | IDLE |
speed | 10 Gb/s | 10 Gb/s |
CPU util. | 0.0% | 0.0% |
-- | | |
Tx bps L2 | 0 bps | 0 bps | 0 bps
Tx bps L1 | 0 bps | 0 bps | 0 bps
Tx pps | 0 pps | 0 pps | 0 pps
Line Util. | 0 % | 0 % |
--- | | |
Rx bps | 73.33 Kbps | 72.53 Kbps | 145.86 Kbps
Rx pps | 113.74 pps | 112.5 pps | 226.24 pps
---- | | |
opackets | 0 | 0 | 0
ipackets | 3394 | 3395 | 6789
obytes | 0 | 0 | 0
ibytes | 272778 | 272860 | 545638
tx-pkts | 0 pkts | 0 pkts | 0 pkts
rx-pkts | 3.39 Kpkts | 3.4 Kpkts | 6.79 Kpkts
tx-bytes | 0 B | 0 B | 0 B
rx-bytes | 272.78 KB | 272.86 KB | 545.64 KB
----- | | |
oerrors | 10,000,002 | 0 | 10,000,002
ierrors | 0 | 0 | 0
```
The opackets count is 0, indicating that all the packets sent were not actually transmitted and were instead counted on oerrors.
When I checked if the packets were dropped on the node's side, I found that they were not, which suggests that they are being dropped before leaving the pod.
```
# ip -s -s link show ens2f1
7: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether f8:f2:1e:b7:8d:d1 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
784846071 11707845 0 0 0 7139019
RX errors: length crc frame fifo overrun
0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
317430 4123 0 0 0 0
TX errors: aborted fifo window heartbt transns
0 0 0 0 4
```
As a result, I am unable to generate traffic.
The most perplexing thing is that one out of ten retries, the T-Rex pod does produce traffic, and everything appears to be working well. However, if I repeat the same process a minute later, the traffic fails to generate again.
I would greatly appreciate your assistance in figuring out what is happening here. I have included additional information for those who require it:
- VF data from the node: `lspci -v -nn -mm -k -s {5e:0a.5,5e:0a.7}`:
```
sh-5.1# lspci -v -nn -mm -k -s 5e:0a.5
Slot: 5e:0a.5
Class: Ethernet controller [0200]
Vendor: Intel Corporation [8086]
Device: Ethernet Virtual Function 700 Series [154c]
SVendor: Intel Corporation [8086]
SDevice: Device [0000]
Rev: 02
Driver: vfio-pci
lspci: Unable to load libkmod resources: error -13
NUMANode: 0
IOMMUGroup: 170
sh-5.1# lspci -v -nn -mm -k -s 5e:0a.7
Slot: 5e:0a.7
Class: Ethernet controller [0200]
Vendor: Intel Corporation [8086]
Device: Ethernet Virtual Function 700 Series [154c]
SVendor: Intel Corporation [8086]
SDevice: Device [0000]
Rev: 02
Driver: vfio-pci
lspci: Unable to load libkmod resources: error -13
NUMANode: 0
IOMMUGroup: 172
```
- trex_cfg.yaml attached
- testpmd.py files attached:
If there is any more info you require please tell me nd I'll do my best to provide it.
Thank you in advance for your assistance.
Ram Lavi
Senior SW Engineer at Red Hat.