4M CPS with Trex ASTF

418 views
Skip to first unread message

AxaY Dorwat

unread,
Feb 3, 2021, 3:29:13 PM2/3/21
to TRex Traffic Generator
Hi,

I am trying to reach 4M CPS with Trex ASTF mode on hosts connected directly to each other. Hosts uses Mellanox Network adaptors.  I have Trex Client running on Host1, while Trex Server running on Host2. I am not able to reach the 4M cps in this setup, CPS is maxing out at 3.5M. I am observing  ierrors 9619342 on Trex Server.

Trex Server Port stats
-Per port stats table
      ports |               1
 ----------------------------------------------------------------
   opackets |       519669149
     obytes |     39583220772
   ipackets |       523746420
     ibytes |     40073878323
    ierrors |         9139454
    oerrors |               0
      Tx Bw |       6.44 Gbps

To improve the performance, i have host in High Performance Mode, Hyper threading disabled and mapped 1G 64 huge pages. 

Also, to add if i send http_req with 72 data, i see ierros count go up faster. Any help here is highly appreciated.

Regards,
Akshay
Trex Client Command:
./t-rex-64 -c 22 -i  --astf --astf-client-mask 0x1 --prom -v 10

Trex Server Command
./t-rex-64 -c 17 -i  --astf --astf-server-only

Trex Configuration 
- port_limit      : 2
  version         : 2
  rx_desc         : 192
  #rx_desc_drop    : 512
  tx_desc         : 128
  interfaces: ['d8:00.0', 'dummy']
  port_bandwidth_gb : 50
  port_info      :
    - dest_mac        :   00:ae:10:02:00:01
      src_mac         :   00:ae:10:02:00:02
    - dest_mac        :   00:00:00:00:00:00
      src_mac         :   00:00:00:00:00:00

  platform:
      master_thread_id: 0
      latency_thread_id: 12
      dual_if:
        - socket: 1
          threads: [13,14,15,16,17,18,19,20,21,22,23,1,2,3,4,5,6,7,8,9,10,11]

  memory    :
         #mbuf_64     : 131072
         mbuf_64     : 524288
         #mbuf_128    : 131072
         mbuf_128    : 524288
         mbuf_256    : 512
         mbuf_512    : 256
         mbuf_1024   : 256
         mbuf_2048   : 128
         mbuf_4096   : 64
         mbuf_8192   : 64
         mbuf_9216   : 32
         traffic_mbuf_64     : 131072
         #traffic_mbuf_64     : 4194304
         traffic_mbuf_128    : 131072
         #traffic_mbuf_128    : 4194304
         traffic_mbuf_256    : 512
         traffic_mbuf_512    : 256
         traffic_mbuf_1024   : 256
         traffic_mbuf_2048   : 128
         traffic_mbuf_4096   : 64
         traffic_mbuf_8192   : 64
         traffic_mbuf_9216   : 32
         dp_flows            : 40000000
         active_flows        : 40000000
         dp_max_flows        : 40000000

Trex Profile
    def create_profile(self, res_size):
        prog_c = ASTFProgram()
        prog_c.send(http_req)

        prog_s = ASTFProgram()
        prog_s.recv(len(http_req))

        cps = 4 * 1000 * 1000
        start_ip = ipaddress.ip_address(u"192.1.0.1")
        rmap_per_subnet = 1024
        lmap_per_subnet = 1024
        count = 8
        templates = [0] * count
        temp_c    = [0] * count
        temp_s    = [0] * count
        ip_gen_c  = [0] * count
        ip_gen_s  = [0] * count
        ip_gen    = [0] * count

        for i in range(count):
            # ip generator
            local_start_ip = start_ip
            ip_gen_c[i] = ASTFIPGenDist(ip_range=[str(local_start_ip), str(local_start_ip + lmap_per_subnet - 1)], distribution="seq")

            remote_start_ip = local_start_ip + lmap_per_subnet
            ip_gen_s[i] = ASTFIPGenDist(ip_range=[str(remote_start_ip), str(remote_start_ip + rmap_per_subnet - 1)], distribution="seq")
            ip_gen[i] = ASTFIPGen(glob=ASTFIPGenGlobal(ip_offset="1.0.0.0"),
                                  dist_client=ip_gen_c[i],
                                  dist_server=ip_gen_s[i])
            # template
            temp_c[i] = ASTFTCPClientTemplate(program=prog_c,  ip_gen=ip_gen[i], cps=(cps/count), port=i)
            temp_s[i] = ASTFTCPServerTemplate(program=prog_s, assoc=ASTFAssociationRule(i))  # using default association
            templates[i] = ASTFTemplate(client_template=temp_c[i], server_template=temp_s[i])
            start_ip += 65536 # increment start ip to 192.2.0.1 and so on
        # profile
        profile = ASTFProfile(default_ip_gen=ip_gen[0], templates=templates)
        return profile

Host CPU configuration
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              24
On-line CPU(s) list: 0-23
Thread(s) per core:  1
Core(s) per socket:  12
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
Stepping:            7
CPU MHz:             1000.468
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            16896K
NUMA node0 CPU(s):   0-11
NUMA node1 CPU(s):   12-23
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin mba tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local ibpb ibrs stibp dtherm ida arat pln pts pku ospke avx512_vnni arch_capabilities

Host Memory configuration
# cat /proc/meminfo
MemTotal:       131692552 kB
MemFree:        62410264 kB
MemAvailable:   62241100 kB
Buffers:           57232 kB
Cached:           431196 kB
SwapCached:            0 kB
Active:           431276 kB
Inactive:         261644 kB
Active(anon):     204808 kB
Inactive(anon):      440 kB
Active(file):     226468 kB
Inactive(file):   261204 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       2097148 kB
SwapFree:        2097148 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        204344 kB
Mapped:           172236 kB
Shmem:              2376 kB
Slab:             275440 kB
SReclaimable:      97524 kB
SUnreclaim:       177916 kB
KernelStack:        7728 kB
PageTables:        10220 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    34388992 kB
Committed_AS:    1705676 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:      64
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
DirectMap4k:      328508 kB
DirectMap2M:    11915264 kB
DirectMap1G:    121634816 kB



trex_start_up_logs.rtf

hanoh haim

unread,
Feb 4, 2021, 1:31:37 AM2/4/21
to AxaY Dorwat, TRex Traffic Generator

Please try with the Console and send the TUI snapshot 


--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/24a3923b-bd11-43c8-9631-98c3f141620an%40googlegroups.com.
--
Hanoh
Sent from my iPhone

AxaY Dorwat

unread,
Feb 4, 2021, 2:19:41 PM2/4/21
to hanoh haim, TRex Traffic Generator
Hi Hanoh,
Thanks for quick response. I have attached the new configuration and TUI outputs. Goal is to run http_get.pcap with realistic data at 4M CPS. 
Things i have tried
1. As soon as I add more data packets in ASTF client and server programs, I observe more ierror associated drops and then TCP protocol jumps into picture and overhead of TCP retransmission gets added.
2. When I increase the descriptor count again performance suffers.

Regards,
Akshay Dorwat



Screen Shot 2021-02-04 at 10.57.45 AM.png

Screen Shot 2021-02-04 at 10.54.47 AM.png

Screen Shot 2021-02-04 at 10.55.05 AM.png

Screen Shot 2021-02-04 at 11.13.29 AM.png


hanoh haim

unread,
Feb 4, 2021, 6:22:22 PM2/4/21
to AxaY Dorwat, TRex Traffic Generator
I don’t think it is possible to get to 4MCPS with this profile. Try with simpler one without data 
The bottleneck is alloc/de allocation of flows and not the driver. 


Thanks
Hanoh
Reply all
Reply to author
Forward
0 new messages