Error Launching Trex on a 4 NIC, 6 Core ESXi VM

579 views
Skip to first unread message

Mark Wittling

unread,
May 16, 2022, 2:59:36 PM5/16/22
to TRex Traffic Generator
I am trying to get more cores on my VM so I can scale traffic. I am getting the following error in trying to configure Trex to do this, so hopefully someone can help me.

First, the invocation of Trex and the error message I am getting:
# ./t-rex-64 -i --astf --software --cfg /etc/trex_cfg.yaml
The ports are bound/configured.
Starting  TRex v2.97 please wait  ...
ERROR: Maximum threads in platform section of config file is 1, unable to run with -c 4.

Before I get into the config file, it probably makes sense to discuss the VM itself. The VM, is specified as follows:
Cores: 6
Cores per Socket: 1
Memory: 8G
Adaptors: 4 x VMXNET3 (Trex places these into a igb_uio driver when you launch)
Hypervisor: 24 core x hyperthreading = 48 cores, 512G RAM

The Trex Config File is:
### Config file generated by dpdk_setup_ports.py ###

- version: 2
  interfaces: ['0b:00.0', '13:00.0', '1b:00.0', '04:00.0']
  port_limit: 4
  c: 4
  port_bandwidth_gb: 10
  port_info:
      - dest_mac: 00:50:56:8a:16:de
        src_mac:  00:50:56:8a:af:03
        ip: 192.168.2.2
        default_gw: 192.168.2.4

      - dest_mac: 00:50:56:8a:19:e8
        src_mac:  00:50:56:8a:03:c7
        ip: 192.168.3.2
        default_gw: 192.168.3.4

      - dest_mac: 00:50:56:8a:41:87
        src_mac:  00:50:56:8a:32:68
        ip: 192.168.2.3
        default_gw: 192.168.2.5

      - dest_mac: 00:50:56:8a:38:16
        src_mac:  00:50:56:8a:c3:0d
        ip: 192.168.3.3
        default_gw: 192.168.3.5

  platform:
      master_thread_id: 0
      latency_thread_id: 5
      dual_if:
        - socket: 0
          threads: [1]

        - socket: 1
          threads: [2]

        - socket: 2
          threads: [3]

        - socket: 3
          threads: [4]

Any clue on how to make this work?

hanoh haim

unread,
May 17, 2022, 4:24:32 AM5/17/22
to Mark Wittling, TRex Traffic Generator
Try to use virtual interface that support RSS.

Mark Wittling

unread,
May 17, 2022, 12:10:42 PM5/17/22
to TRex Traffic Generator
Thank you. Let me look into that.

I am using VMXNET3 adaptors on these VMs, but because I am running a Cent7 VM, and that VM swaps the VMXNET3 drivers out for DPDK drivers (I believe it is the igb_uio.ko kernel module), I am/was confused as to whether I am truly using VMXNET3. VMWare (ESXi/vSphere) still sees the adaptor type as VMXNET3, of course, so I guess from the hypervisor perspective, in terms of adaptor emulation, it is still VMXNET3.

But I am not familiar with Receive Side Scaling. Let me dig in and see how to check this, and or configure it.

Mark Wittling

unread,
May 17, 2022, 12:29:39 PM5/17/22
to TRex Traffic Generator
Okay, having looked into Receive Side Scaling, I see that it does in fact appear to be supported. From a Linux VM perspective, the kernel version is 3.x, and if I run an ethtool check on an adaptor that the Linux OS can see, I see the following:

# ethtool -i eth0
driver: vmxnet3
version: 1.4.17.0-k-NAPI
firmware-version:
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

But - I have 5 adaptors on this VM, and the first one is used to access the VM (so the adaptor driver is vmxnet3).  But on the adaptors using Trex, Trex takes them out of vmxnet3, and places them with a driver override of igb_uio.ko (the DPDK kernel module). So I am confused as to how or why, I need to set RSS on the VM adaptors that Trex is using.

Mark Wittling

unread,
May 17, 2022, 12:44:13 PM5/17/22
to TRex Traffic Generator
I think, Hanoch, you must be thinking that I am running Trex against a VM (DUT) that is using regular VMXNET3 adaptors, in which case, I do see that RSS settings would be important. But - I am running tests between two Trex VMs, each sending data to the other simultaneously through the console! In that circumstance, is there some way of setting RSS in Trex itself? Because it is using those Linux DPDK kernel drivers which override the vmxnet3 adaptors and take them away from the Linux Network stack.

hanoh haim

unread,
May 18, 2022, 6:55:37 AM5/18/22
to Mark Wittling, TRex Traffic Generator
The DPDK driver for VMXNET3 probably does not support RSS and this is the reason you are limited to 1 core per dual port 
the Linux kernel is not relevant as it is not used

Thanks
Hanoh

Mark Wittling

unread,
May 19, 2022, 10:39:03 AM5/19/22
to TRex Traffic Generator
You mean the igb_uio driver? This gets loaded when you start Trex, and if it doesn't support RSS is there an alternative driver that does?

# ./dpdk_nic_bind.py -s
Network devices using DPDK-compatible driver
============================================
0000:04:00.0 'VMXNET3 Ethernet Controller' drv=igb_uio unused=vmxnet3
0000:0b:00.0 'VMXNET3 Ethernet Controller' drv=igb_uio unused=vmxnet3
0000:13:00.0 'VMXNET3 Ethernet Controller' drv=igb_uio unused=vmxnet3
0000:1b:00.0 'VMXNET3 Ethernet Controller' drv=igb_uio unused=vmxnet3

Network devices using kernel driver
===================================
0000:03:00.0 'VMXNET3 Ethernet Controller' if=eth0 drv=vmxnet3 unused=igb_uio *Active*

Other network devices
=====================
<none>

Mark Wittling

unread,
May 19, 2022, 11:23:38 AM5/19/22
to TRex Traffic Generator
I noticed this, when I start:
# ./t-rex-64 -i --astf

The ports are bound/configured.
Starting  TRex v2.97 please wait  ...
 set driver name net_vmxnet3
 driver capability  : TSO  LRO
 set dpdk queues mode to ONE_QUE
 Number of ports found: 4
zmq publisher at: tcp://*:4500

Mark Wittling

unread,
May 19, 2022, 11:34:13 AM5/19/22
to TRex Traffic Generator
When I change the number of sockets used in the trex_cfg.yaml file, I get a hugepages error (even though there are plenty):
Original:
  platform:
      master_thread_id: 0
      latency_thread_id: 1
      dual_if:
        - socket: 0 --> there are 4 vCPUs, each on own socket (4 sockets). we choose socket 0 here!
          threads: [2,3]
      - socket: 0 --> there are 4 vCPUs, each on own socket (4 sockets). we choose socket 0 here!
          threads: [4,5]

This config works...Trex comes up.

Changed:
  platform:
      master_thread_id: 0
      latency_thread_id: 1
      dual_if:
        - socket: 0 --> there are 4 vCPUs, each on own socket (4 sockets). we choose socket 0 here!
          threads: [2]

        - socket: 1 --> there are 4 vCPUs, each on own socket (4 sockets). we choose socket 1 here!
          threads: [3]

        - socket: 2 --> there are 4 vCPUs, each on own socket (4 sockets). we choose socket 2 here!
          threads: [4]

        - socket: 3 --> there are 4 vCPUs, each on own socket (4 sockets). we choose socket 3 here!
          threads: [5]

We get an error re: hugepages here and Trex fails.

# ./t-rex-64 -i --astf
The ports are bound/configured.
Starting  TRex v2.97 please wait  ...
 set driver name net_vmxnet3
 driver capability  : TSO  LRO
 set dpdk queues mode to ONE_QUE
 Number of ports found: 4
zmq publisher at: tcp://*:4500
 ERROR there is not enough huge-pages memory in your system
EAL: Error - exiting with code: 1
  Cause: Cannot init mbuf pool small-pkt-const

We have plenty of memory, and plenty of HugePages.

# free -m
              total        used        free      shared  buff/cache   available
Mem:          15884        8945        6763          25         175        6665
Swap:           511           0         511

# sysctl -a | grep vm.nr_hugepages
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.eth0.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
vm.nr_hugepages = 4096
vm.nr_hugepages_mempolicy = 4096

Mark Wittling

unread,
May 19, 2022, 12:05:11 PM5/19/22
to TRex Traffic Generator
I looked into the code, where it is printing "ERROR there is not enough huge-pages memory in your system".

rte_mempool_t * res=
rte_mempool_create(buffer, n,

elt_size, cache_size,

sizeof(struct rte_pktmbuf_pool_private),

rte_pktmbuf_pool_init, NULL,

rte_pktmbuf_init, NULL,

socket_id, flags);

if (res == NULL) {

print_alloc_err(is_hugepages);

rte_exit(EXIT_FAILURE, "Cannot init mbuf pool %s\n", name);

}
return res;


void print_alloc_err(bool is_hugepages) {

if (is_hugepages) {

printf(" ERROR there is not enough huge-pages memory in your system\n");
 
--> assumptive? let's see what the call to rte_mempool_create returns?
} else {

printf(" ERROR could not allocate memory for mbufs\n");

printf(" Either:\n");

printf(" * Check free memory in your system\n");

printf(" * Add 'limit_memory' to trex_cfg.yaml (units are MB)\n");

printf(" * Add --mbuf-factor to CLI\n");

}
}


I looked at the function call for rte_mempool_create, at the following link:

This is what you get back from that call, and the error return codes:
ReturnsThe pointer to the new allocated mempool, on success. NULL on error with rte_errno set appropriately. Possible rte_errno values include:
  • E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
  • E_RTE_SECONDARY - function was called from a secondary process instance
  • EINVAL - cache size provided is too large or an unknown flag was passed
  • ENOSPC - the maximum number of memzones has already been allocated
  • EEXIST - a memzone with the same name already exists
  • ENOMEM - no appropriate memory area found in which to create memzone
Conclusion: The message implying that there is not enough hugepages is assumptive, as the linux_dpdk code is not explicitly checking the error codes of the pool create call.

In other words, just because you can't create the memory pool, does not (necessarily) mean that there are not enough hugepages.

Mark Wittling

unread,
May 19, 2022, 12:09:46 PM5/19/22
to TRex Traffic Generator
I just changed my VM to use 6 CPU, 2 cores per socket (3 sockets).

Changed the file to specify two of the 3 sockets, which looks like this:

  platform:
      master_thread_id: 0
      latency_thread_id: 1
      dual_if:
        - socket: 0
          threads: [2,3]

        - socket: 1
          threads: [4,5]

And, same issue. hugepages error.
# ./t-rex-64 -i --astf
The ports are bound/configured.
Starting  TRex v2.97 please wait  ...
 set driver name net_vmxnet3
 driver capability  : TSO  LRO
 set dpdk queues mode to ONE_QUE
 Number of ports found: 4
zmq publisher at: tcp://*:4500
 ERROR there is not enough huge-pages memory in your system
EAL: Error - exiting with code: 1

  Cause: Cannot init mbuf pool small-pkt-const

So the ONLY way Trex will work, apparently, is with a single core socket specified. Which doesn't even line up fully with what I see when I run Trex, because I actually see 2 cores get saturated (1 per interface, it appears). And, if the VM receives, another core gets lit up.

So I am quite confused about how this config file works, now. And even more confused about how to throw traffic from a VM with a lot of cores on it.

Mark Wittling

unread,
May 20, 2022, 3:17:11 PM5/20/22
to TRex Traffic Generator
As a follow up, if I run the stateless test, then all ports are in use:
start -p 0 1 -f stl/imix.py -m 100% --pin

Port Statistics

   port    |         0         |         1         |       total
-----------+-------------------+-------------------+------------------
owner      |              root |              root |
link       |                UP |                UP |
state      |      TRANSMITTING |      TRANSMITTING |
speed      |           10 Gb/s |           10 Gb/s |
CPU util.  |            99.11% |            99.33% |
--         |                   |                   |
Tx bps L2  |          1.7 Gbps |          1.7 Gbps |         3.41 Gbps
Tx bps L1  |          1.8 Gbps |          1.8 Gbps |          3.6 Gbps
Tx pps     |       594.07 Kpps |       594.07 Kpps |         1.19 Mpps
Line Util. |           17.98 % |           17.97 % |
---        |                   |                   |
Rx bps     |       ▼▼ 537 Mbps |    ▲▲▲ 253.47 bps |       ▼▼ 537 Mbps
Rx pps     |      ▼ 142.6 Kpps |           0.5 pps |      ▼ 142.6 Kpps
----       |                   |                   |
opackets   |          16516524 |          16516364 |          33032888
ipackets   |          33447423 |                 3 |          33447426
obytes     |        5923710806 |        5924579184 |       11848289990
ibytes     |       15943405101 |               192 |       15943405293
tx-pkts    |       16.52 Mpkts |       16.52 Mpkts |       33.03 Mpkts
rx-pkts    |       33.45 Mpkts |            3 pkts |       33.45 Mpkts
tx-bytes   |           5.92 GB |           5.92 GB |          11.85 GB
rx-bytes   |          15.94 GB |             192 B |          15.94 GB
-----      |                   |                   |
oerrors    |                 0 |                 0 |                 0
ierrors    |             2,359 |                 0 |             2,359

Once again, I see an enormous drop rate and queue_full, which I don't know how to solve.
trex>stats
Global Statistics

connection   : localhost, Port 4501                       total_tx_L2  : 3.41 Gbps
version      : STL @ v2.97                                total_tx_L1  : 3.6 Gbps
cpu_util.    : 99.22% @ 4 cores (4 per dual port)         total_rx     : 537 Mbps ▼▼
rx_cpu_util. : 0.0% / 0 pps                               total_pps    : 1.19 Mpps
async_util.  : 0% / 18.6 bps                              drop_rate    : 2.87 Gbps
total_cps.   : 0 cps                                      queue_full   : 99,629,435 pkts

Niloofar Toorchi

unread,
Nov 28, 2022, 9:24:50 PM11/28/22
to TRex Traffic Generator
Hi Mark, did you manage to solve the hugepage memory error at your vm? I am facing the same problem and changing the number and size of hugepages are not helping.
Thanks,
Niloofar

Reply all
Reply to author
Forward
0 new messages