Errors while running on CentOS 7.6 Mellanox NIC ConnectX-5 EX

587 views
Skip to first unread message

Arvind Narayanan

unread,
Dec 20, 2018, 3:11:44 AM12/20/18
to TRex Traffic Generator
Hi,

I know you folks have recommended to install t-rex on CentOS 7.4 with MLNX OFED 4.3

I was however trying to see if my error is due to non-dependency issues.


## Environment details
trex v2.49
CentOS 7.6
MLNX_OFED_LINUX-4.5
dpdk-NOT-Installed

I am trying to run a simple configuration where two machines equipped with Mellanox 100G NIC ConnectX-5 EX are connected back to back.

(Machine0-Port0 connected to Machine1-Port0)
(Machine0-Port1 connected to Machine1-Port1)

The configuration file generated using the interactive setup mode looks like:

### Config file generated by dpdk_setup_ports.py ###

- version: 2
interfaces: ['af:00.0', 'af:00.1']
port_info:
- dest_mac: machine1_port0_mac
src_mac: machine0_port0_mac
- dest_mac: machine1_port1_mac
src_mac: machine0_port1_mac

platform:
master_thread_id: 0
latency_thread_id: 2
dual_if:
- socket: 1
threads: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47]


I allocate hugepages as a kernel parameter, and ensured there are free hugepages before running. However I am getting an error.
kernel params => isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable nohz_full=24-47 rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G hugepages=64 audit=0 nosoftlockup

Here are the command line results..

## BEFORE RUNNING (ensuring there is free memory in hugepages)
[arvind@ved ~]$ grep -i huge /proc/meminfo
AnonHugePages: 26624 kB
HugePages_Total: 64
HugePages_Free: 64
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB


## RUNNING TREX
[arvind@ved ~]$ cd /opt/trex/v2.49/
[arvind@ved v2.49]$ sudo ./t-rex-64 -f /etc/trex_cfg.yaml -d 10
Warning: Mellanox NICs where tested only with RedHat/CentOS 7.4
Correct usage with other Linux distributions is not guaranteed.
WARNING: hugepages config file (/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages) does not exist!
The ports are bound/configured.
Starting TRex v2.49 please wait ...
./t-rex-64: line 80: 29169 Segmentation fault (core dumped) ./_$(basename $0) $INPUT_ARGS $EXTRA_INPUT_ARGS


## HUEGEPAGE POST GETTING ERROR
[arvind@ved v2.49]$ grep -i huge /proc/meminfo
AnonHugePages: 26624 kB
HugePages_Total: 64
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB

Can someone please confirm if this error is due to dependencies as pointed in the documentation, or is it something solvable?


Thanks,
Arvind


Bruce Jorgens

unread,
Jan 29, 2019, 10:36:57 PM1/29/19
to TRex Traffic Generator
I'm seeing the same error with TRex v2.51, CentOS 7.6, Kernel 3.10.0-957.1.3.el7.x86_64 and MLNX_OFED_5.4-1.

Running with verbose level 7, I see the following just before the segment fault:

DPDK args
 xx  -d  libmlx5-64.so  -d  libmlx4-64.so  -c  0x4003  -n  4  --log-level  8  --master-lcore  0  -w  0000:86:00.0  -w  0000:86:00.1  --legacy-mem
EAL: Detected 56 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Probing VFIO support...
EAL: PCI device 0000:86:00.0 on NUMA socket 1
EAL:   probe driver: 15b3:1017 net_mlx5
net_mlx5: tunnel offloading disabled due to old OFED/rdma-core version
net_mlx5: MPLS over GRE/UDP tunnel offloading disabled due to old OFED/rdma-core version or firmware configuration
./t-rex-64: line 80: 52422 Segmentation fault      ./_$(basename $0) $INPUT_ARGS $EXTRA_INPUT_ARGS

Do I need to make changes to rdma-config or firmware configuration?   If so, what?

Is this solvable without a full reinstall using only the recommended settings?

hanoh haim

unread,
Jan 30, 2019, 12:47:58 AM1/30/19
to Bruce Jorgens, TRex Traffic Generator
There is ONE tuple of distro/ofed/trex version that works! This is Mellanox driver dependency issue 
Have a look in the manual for the working tuple


Hanoh

--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+u...@googlegroups.com.
To post to this group, send email to trex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/422ed368-46f1-40c1-9952-5c114f9cb434%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Hanoh
Sent from my iPhone

Yaroslav Brustinov

unread,
Jan 30, 2019, 2:40:39 AM1/30/19
to hanoh haim, Bruce Jorgens, TRex Traffic Generator
Hi,

Please try following:
  1. Run with gdb and show the traceback:
    sudo ./t-rex-64-debug-gdb ...

  2. Use 2M hugepages, not 1G

  3. Disable anonymous hugepages

  4. You probably want to disable tasks scheduling to TRex cores, not other cores in isolcpus etc.

Thanks,
Yaroslav.



Bruce Jorgens

unread,
Jan 30, 2019, 5:56:46 PM1/30/19
to Yaroslav Brustinov, hanoh haim, TRex Traffic Generator
Yaroslav,

I modified my setup to use 2M hugepages - I was using 1G pages before as was Arvind.

Here's the last few lines before the seg fault running with gdb enabled and with verbose level 7 - note that there is a message indicating that no free 1G hugepages are available:

 Total Memory :
 MBUF_64                                   : 81900
 MBUF_128                                  : 40950
 MBUF_256                                  : 16380
 MBUF_512                                  : 16380
 MBUF_1024                                 : 16380
 MBUF_2048                                 : 36855
 MBUF_4096                                 : 1024
 MBUF_DP_FLOWS                             : 524288
 MBUF_GLOBAL_FLOWS                         : 5120
 get_each_core_dp_flows                    : 524288
 Total memory                              :     248.40 Mbytes
 core_mask  4003
 sockets : 1
 active sockets : 1
 ports_sockets : 1
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
 phy   |   virt
 14      1
DPDK args
 xx  -d  libmlx5-64-debug.so  -c  0x4003  -n  4  --log-level  8  --master-lcore  0  -w  0000:86:00.0  -w  0000:86:00.1  --legacy-mem
EAL: Detected 56 lcore(s)
EAL: Detected 2 NUMA nodes
[New Thread 0x7ffff56e2700 (LWP 28824)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
[New Thread 0x7ffff4ee1700 (LWP 28825)]
EAL: No free hugepages reported in hugepages-1048576kB
 EAL: Probing VFIO support...
[New Thread 0x7ffbf45ff700 (LWP 28826)]
[New Thread 0x7ffbf3dfe700 (LWP 28827)]
EAL: PCI device 0000:86:00.0 on NUMA socket 1
EAL:   probe driver: 15b3:1017 net_mlx5
net_mlx5: mlx5.c:847: mlx5_dev_spawn(): tunnel offloading disabled due to old OFED/rdma-core version
net_mlx5: mlx5.c:859: mlx5_dev_spawn(): MPLS over GRE/UDP tunnel offloading disabled due to old OFED/rdma-core version or firmware configuration

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5d8f6cc in mlx5_post_srq_ops () from /lib64/libmlx5.so.1
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libibverbs-45mlnx1-1.45101.x86_64 libnl3-3.2.28-4.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb)


Regards,
Bruce
--
Bruce T. Jorgens
btjo...@gmail.com

JS

unread,
Feb 6, 2019, 9:16:21 AM2/6/19
to TRex Traffic Generator
Seeing same issue on CX5 on centos 7.6. 

Can confirm the card worked on centos 7.4 & 7.5 (no matter the OFED driver version). Another side note from Mellanox I got is that there is some weird issue when using the updated kernel for e.g. centos 3.10.0-957 vs 3.10.0-957.1.3. Sometimes Mellanox does not play well with non default kernels even if you try to compile the drivers with  add-kernel-support.

Bruce Jorgens

unread,
Feb 7, 2019, 12:00:40 PM2/7/19
to JS, TRex Traffic Generator
I was able to get the the CX5 card working fine with with TRex using following environment:
Distro: CentOS Linux release 7.5.1804 (Core) - Fresh install
Kernel: 3.10.0-862.el7.x86_64
OFED: 4.4.2.0.7.1
2M hugepages


--
You received this message because you are subscribed to a topic in the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/trex-tgn/ZBR01WUP-vY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to trex-tgn+u...@googlegroups.com.

To post to this group, send email to trex...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages