Intel 710 XXV SRIOV VF Bonding issue - due to i40evf DPDK driver bug

661 views
Skip to first unread message

Haklat Haklat

unread,
Aug 19, 2021, 8:11:57 AM8/19/21
to TRex Traffic Generator
Hi,
I have tried the TREX DPDK bonding implementation using Intel 710XXV NIC with SRIOV VFs. I think there is an Intel SRIOV VF DPDK driver known issue in the DPDK 21.02 version used by TREX. The issue is most likely same as Intel Primer Support Case 00593140. The Intel Case was reported on 20.11.0 but was also there in 18.11 and 19.11 (but not 17.11). And is most likely same in 21.02 release. From Intel Case it is supposed to be fixed in latest DPDK 20.11.2 release.
Even if the bond can come up with error messages for some cases there is slow convergence issue expected at some link break scenarios due to this DPDK driver bug.

Fix is simple for OP_DEL_ETHER_ADDRESS problem in DPDK LTS release 20.11.0 (should be fixed in upstream DPDK 20.11.2) as per below from IPS Case comments:

"""
The change is as follows:

In module: ../drivers/net/i40e/i40e_ethdev_vf.c

function: i40evf_set_default_mac_addr()

You can add the following lines:

static int
i40evf_set_default_mac_addr(struct rte_eth_dev *dev,
struct rte_ether_addr *mac_addr)
{
struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);

if (!rte_is_valid_assigned_ether_addr(mac_addr)) {
PMD_DRV_LOG(ERR, "Tried to set invalid MAC address.");
return -EINVAL;
}

+ if (rte_is_same_ether_addr((struct rte_ether_addr *)hw->mac.addr, mac_addr))
+ return 0;

i40evf_del_mac_addr_by_addr(dev, (struct rte_ether_addr *)hw->mac.addr);

if (i40evf_add_mac_addr(dev, mac_addr, 0, 0) != 0)
return -EIO;

rte_ether_addr_copy(mac_addr, (struct rte_ether_addr *)hw->mac.addr);
return 0;
}
"""

I used below for testpmd 20.11.0 patching successfully. Line numbers are most likely different in DPDK 21.02.
 
sed -i '2850i                 return 0;' /usr/src/dpdk-20.11/drivers/net/i40e/i40e_ethdev_vf.c;
sed -i '2850i         if (rte_is_same_ether_addr((struct rte_ether_addr *)hw->mac.addr, mac_addr))' /usr/src/dpdk-20.11/drivers/net/i40e/i40e_ethdev_vf.c;

Logs from starting up trex with bond configuration:

[root@trex-sriov-intel-bonding-0 v2.91]# ./t-rex-64 -i
Starting Scapy server.... Scapy server is started
The ports are bound/configured.
Starting  TRex v2.91 please wait  ...
 set driver name net_bonding
 driver capability  : TCP_UDP_OFFLOAD  TSO
 set dpdk queues mode to ONE_QUE
 Number of ports found: 2 (dummy among them: 1)
zmq publisher at: tcp://*:4500
__eth_bond_slave_add_lock_free(464) - Slave device is already a slave of a bonded device
bond_ethdev_configure(3643) - Failed to add port 0 as slave to bonded device net_bonding0
__eth_bond_slave_add_lock_free(464) - Slave device is already a slave of a bonded device
bond_ethdev_configure(3643) - Failed to add port 1 as slave to bonded device net_bonding0
i40evf_switch_queue(): fail to switch TX 0 on
i40evf_dev_tx_queue_start(): Failed to switch TX queue 0 on
i40evf_start_queues(): Fail to start queue 0
i40evf_dev_start(): enable queues failed
i40evf_add_del_all_mac_addr(): fail to execute command OP_DEL_ETHER_ADDRESS
slave_configure(1842) - rte_eth_dev_start: port=0, err (-1)
bond_ethdev_start(2015) - bonded port (2) failed to reconfigure slave device (0)
i40evf_del_mac_addr_by_addr(): fail to execute command OP_DEL_ETHER_ADDRESS
i40evf_add_mac_addr(): fail to execute command OP_ADD_ETHER_ADDRESS
mac_address_slaves_update(1524) - Failed to update port Id 0 MAC address
i40evf_handle_aq_msg(): command mismatch,expect 0, get 8
i40evf_handle_aq_msg(): command mismatch,expect 0, get 11
i40evf_handle_aq_msg(): command mismatch,expect 0, get 11
i40evf_handle_aq_msg(): command mismatch,expect 0, get 10
 wait 4 sec ....
port : 0

[root@trex-sriov-intel-bonding-0 v2.91]# cat /etc/trex_cfg.yaml
- version: 2
  interfaces    : ['--vdev=net_bonding0,mode=2,slave=0000:d8:02.4,slave=0000:d8:0a.5', 'dummy']   # list of the interfaces to bind run ./dpdk_nic_bind.py --status
  prefix        : trex1
  limit_memory  : 2048
  rx_desc: 4096
  tx_desc: 4096
  port_info     :  # set e.g. ip,gw,vlan,eth mac addr

                 - ip         : 172.80.40.12
                   default_gw : 172.80.40.1
                   vlan       : 840
  platform:
      master_thread_id: 3
      latency_thread_id: 5
      dual_if:
        - socket: 1
          threads: [43,45]
BR//Håkan

hanoh haim

unread,
Aug 19, 2021, 8:38:42 AM8/19/21
to Haklat Haklat, TRex Traffic Generator
Hi Haklat, thanks for reporting this issue. could you create a PR?

thanks
Hanoh

Haklat Haklat

unread,
Aug 20, 2021, 2:23:46 AM8/20/21
to TRex Traffic Generator
Hi,
with PR do you mean patch request? Is it filing new issue in issue tracker?

I also contacted my colleagues that I think contributed the DPDK bond driver TREX addition,
they told they would have a look at this.

BR//Håkan

Besart Dollma

unread,
Aug 20, 2021, 7:08:13 AM8/20/21
to TRex Traffic Generator

Hi, 
He means a Pool Request in our Github repo.

Haklat Haklat

unread,
Aug 26, 2021, 3:22:01 AM8/26/21
to TRex Traffic Generator

hanoh haim

unread,
Aug 26, 2021, 6:48:19 AM8/26/21
to Haklat Haklat, TRex Traffic Generator
Hi, 
I've pushed the fix into master. 

Thanks
Hanoh

Haklat Haklat

unread,
Sep 16, 2021, 11:03:57 AM9/16/21
to TRex Traffic Generator
Hi, thanks for the fix.

Here is some feedback:
I have tested it now for the TREX 2.92 version in STL mode. It looks basically OK, but I think with some counter issues.

In my case right now I can live with this (work around using the flow_stats drop counters), but maybe worth mentioning anyway.
  • At link break/repair, or latency flows, I cannot trust the  "pgid err_cntrs  dropped" counter (not sure about latency counters). Seems drop counters are always far of for one of two latency flows. Think issue is probably for the flow arriving back on different physical port (from where it was arriving before break/repair) at link down or link up. I do not know but maybe this is not surprising if pgids has to be somehow coordinated between two NICs. I have same issue with or without use of software mode (started with ./t-rex-64 -i -c 4 or  ./t-rex-64 -i -c 4 --software)

I can work around dropped counters thing (using instead simple flow_stats rx_pkt/tx_pkt counters). With this convergence at link down and link up, now with the bond fix, is very similar to what I get with the native DPDK testpmd application (meaning on pair with DPDK reference application which is great).  


Below is some printout of trex counters when making a link break (highlighting the interesting ones):

  

Per flow latency (microsec) statistics:

{

  "latency": {

    "10010": {

      "err_cntrs": {

        "dup": 0,

        "seq_too_high": 1,

        "dropped": 6,

        "seq_too_low": 0,

        "out_of_order": 0

      },

      "latency": {

        "last_max": 0,

        "total_max": 924,

        "average": 31.0,

        "histogram": {

          "900": 1,

          "40": 12602,

          "10": 7,

          "50": 10,

          "20": 104032,

          "60": 3192,

          "30": 55

        },

        "jitter": 7,

        "total_min": 13

      }

    },

    "global": {

      "bad_hdr": 0,

      "old_flow": 0

    },

    "1010": {

      "err_cntrs": {

        "dup": 0,

        "seq_too_high": 1,

       "dropped": 7914,

        "seq_too_low": 0,

        "out_of_order": 0

      },

      "latency": {

        "last_max": 54,

        "total_max": 989,

        "average": 26.0,

        "histogram": {

          "900": 1,

          "70": 5,

          "40": 63,

          "10": 290,

          "50": 64377,

          "20": 7923,

          "60": 8,

          "30": 47196

        },

        "jitter": 27,

        "total_min": 13

      }

    }

  },

  "flow_stats": {

    "10010": {

      "rx_bps": {

        "0": 0.0,

        "total": 0.0

      },

      "rx_pps": {

        "0": 0.0,

        "total": 0.0

      },

      "rx_pkts": {

        "0": 119899,

        "total": 119899

      },

      "rx_bytes": {

        "0": 8153132,

        "total": 8153132

      },

      "tx_pkts": {

        "0": 119905,

        "total": 119905

      },

      "tx_pps": {

        "0": 0.0,

        "total": 0.0

      },

      "tx_bps": {

        "0": 0.0,

        "total": 0.0

      },

      "tx_bytes": {

        "0": 8153540,

        "total": 8153540

      },

      "rx_bps_l1": {

        "0": 0.0,

        "total": 0.0

      },

      "tx_bps_l1": {

        "0": 0.0,

        "total": 0.0

      }

    },

    "global": {

      "rx_err": {

        "0": 0

      },

      "tx_err": {

        "0": 0

      }

    },

    "1010": {

      "rx_bps": {

        "0": 0.0,

        "total": 0.0

      },

      "rx_pps": {

        "0": 0.0,

        "total": 0.0

      },

      "rx_pkts": {

        "0": 119863,

        "total": 119863

      },

      "rx_bytes": {

        "0": 8150684,

        "total": 8150684

      },

      "tx_pkts": {

        "0": 119905,   #### tx-rx = 38 but drop_counters=7914 (the difference is sometimes some x*100k packets – send rate is 1000 PPS)

        "total": 119905

      },

      "tx_pps": {

        "0": 0.0,

        "total": 0.0

      },

      "tx_bps": {

        "0": 0.0,

        "total": 0.0

      },

      "tx_bytes": {

        "0": 8153540,

        "total": 8153540

      },

      "rx_bps_l1": {

        "0": 0.0,

        "total": 0.0

      },

      "tx_bps_l1": {

        "0": 0.0,

        "total": 0.0

      }

    },

  },

  "ver_id": {

    "10011": 17,

    "10010": 19,

    "1010": 18,

    "1011": 16

  }

}

##########

#########

 

BR//Håkan


hanoh haim

unread,
Sep 19, 2021, 8:58:42 AM9/19/21
to Haklat Haklat, TRex Traffic Generator
Hi Haklat, 
The "dropped" is just an estimation and it is based on the meta-data in the packets. In some case, those counters are not accurate as the meta-data could be corrupted or there are too many packets out of order.
Thanks
Hanoh

Haklat Haklat

unread,
Sep 22, 2021, 3:38:44 AM9/22/21
to TRex Traffic Generator
Hi Hanoch,
thanks for answering. It still looks a bit suspicious  to me.

I am fine with that I maybe cannot use the error counters in this case (calculating outage at link break/repair) just want to understand what error counters actually  mean in this case.

- Statistics report show single sequence high  for PGID 1010 and dropped=7914 (all other error counters are=0).    #### I interprete this as a single burst of packet loss
- tx-rx packets in same flow = 38                            #### but drop counter show 7914

What could be the reason?  The latency flow send ratio is 1000 PPS. Could it for example be that lots of packets (packets between the last packet received before the sequence high packet) is held back somehow then comming i sequence later? But I guess that should show some sequence low packet as well?

Could it be that for example packets arrive like (packet numbers):
1 7916 39 40...7915 7917 7918  ### Would this give something like dropped=7914 and only a single sequence high but no sequence low? I would expect more sequence highs and lows

Other thing that makes me suspect something could be wrong in this bond case with counters is that when I run same case with trex traffic generator (not directly affected by the break) monitoring testpmd with bond link break/repair I do not see same behavior. 


BR//Håkan

hanoh haim

unread,
Sep 22, 2021, 3:52:00 AM9/22/21
to Haklat Haklat, TRex Traffic Generator
I think it something like that 
1,2,3,1000,4,5,1001,6,7..
In the case of the first 1000, the drop counters will be added 1000-3, the same for 1001 (1001-5)

However you are right, I would expect more error events and not just 1

I think bond interface has an inherent out of order issue so there should be errors.


We need to look into the counters again.

Thanks
Hanoh

--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/194a19ff-6168-4bcd-9c22-fb3ddf390675n%40googlegroups.com.
--
Hanoh
Sent from my iPhone
Reply all
Reply to author
Forward
0 new messages