Untransmitted packets

217 views
Skip to first unread message

joe.r...@gmail.com

unread,
Jul 22, 2017, 8:40:37 PM7/22/17
to TRex Traffic Generator
I'm running into an issue with what appears to be untransmitted packets using the http_simple.yaml traffic profile.

This is running on a Dell R630 (2x10-core E5-2640 2.4Ghz, 64GB), CentOS 7.3 with a Mellanox ConnectX-4 dual-port 100G NIC running version 4.1-1.0.2.0 of the OFED driver.

I'm running v2.28 of TRex (though I've tried 2.27 as well) with this trex_cfg.yaml file:
- port_limit: 2
version: 2
interfaces: ['03:00.0', '03:00.1']
port_info:
- ip: 10.1.125.10
default_gw: 10.1.125.100
- ip: 10.1.125.100
default_gw: 10.1.125.10
platform:
master_thread_id: 0
latency_thread_id: 1
dual_if:
- socket: 0
threads: [2,4,6,8,10,12,14,16,18]

The threads are configured to match the NUMA zone of the NIC.

The two NIC ports are running at 100Gbps connected to a Nexus 3232C with the ports in access mode on the same VLAN to support back-to-back TRex testing.

I'm using the stock http_simple.yaml traffic config file with this command line to run it:

./t-rex-64 -f cap2/http_simple.yaml -m 1 -d 60 -c 9

I have a 2nd 100G-attached server set up to span the two TRex server NIC's so that I can run captures to look at what's being transmitted. The monitor session on the Nexus is configured with:

monitor session 1
source interface Ethernet1/2 rx
source interface Ethernet1/5 rx
destination interface Ethernet1/4
no shut

eth1/2 is the first TRex NIC, eth1/5 is the second TRex NIC. Using this config I should see the entire http conversation between the two NIC's. I've verified that there are no queuing drops or interface errors on the Nexus ('show interface counters errors'), and that there are no drops being reported on the capture server NIC (netstat -i).

When I run a tshark capture of the traffic being sent by TRex, I occasionally see "TCP ACKed unseen segment" for several packets in a given http connection. Here's an example of one http connection that was captured:

1 0.000000000 16.0.0.103 → 48.0.64.102 TCP 60 61161 → 80 [SYN] Seq=0 Win=32768 Len=0 MSS=1460
2 0.011999399 48.0.64.102 → 16.0.0.103 TCP 60 80 → 61161 [SYN, ACK] Seq=0 Ack=1 Win=32768 Len=0 MSS=1460
3 0.022999163 16.0.0.103 → 48.0.64.102 TCP 60 61161 → 80 [ACK] Seq=1 Ack=1 Win=32768 Len=0
4 0.023001426 16.0.0.103 → 48.0.64.102 HTTP 303 GET /3384 HTTP/1.1
5 0.034000300 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
6 0.034002393 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
7 0.044999507 16.0.0.103 → 48.0.64.102 TCP 60 61161 → 80 [ACK] Seq=250 Ack=2921 Win=32768 Len=0
8 0.056000247 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
9 0.056002184 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
10 0.056003857 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
11 0.066999508 16.0.0.103 → 48.0.64.102 TCP 60 61161 → 80 [ACK] Seq=250 Ack=5841 Win=32768 Len=0
12 0.078000305 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
13 0.078002438 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
14 0.078004128 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
15 0.088999822 16.0.0.103 → 48.0.64.102 TCP 60 61161 → 80 [ACK] Seq=250 Ack=8761 Win=32768 Len=0
16 0.089001906 16.0.0.103 → 48.0.64.102 TCP 60 61161 → 80 [ACK] Seq=250 Ack=11681 Win=32768 Len=0
17 0.100000339 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
18 0.100002475 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
19 0.100003839 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
20 0.100005325 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
21 0.100006655 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
22 0.100008082 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
23 0.110999876 16.0.0.103 → 48.0.64.102 TCP 60 61161 → 80 [ACK] Seq=250 Ack=14601 Win=32768 Len=0
24 0.111002073 16.0.0.103 → 48.0.64.102 TCP 60 61161 → 80 [ACK] Seq=250 Ack=17521 Win=32768 Len=0
25 0.111003146 16.0.0.103 → 48.0.64.102 TCP 60 61161 → 80 [ACK] Seq=250 Ack=20441 Win=32768 Len=0
26 0.122007767 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
27 0.122009143 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
28 0.122010600 48.0.64.102 → 16.0.0.103 TCP 1514 [TCP segment of a reassembled PDU]
29 0.132001392 16.0.0.103 → 48.0.64.102 TCP 60 61161 → 80 [ACK] Seq=250 Ack=23361 Win=32768 Len=0
30 0.132003258 16.0.0.103 → 48.0.64.102 TCP 60 [TCP ACKed unseen segment] 61161 → 80 [ACK] Seq=250 Ack=26281 Win=32768 Len=0
31 0.132004355 16.0.0.103 → 48.0.64.102 TCP 60 [TCP ACKed unseen segment] 61161 → 80 [ACK] Seq=250 Ack=29201 Win=32768 Len=0
32 0.132005388 16.0.0.103 → 48.0.64.102 TCP 60 [TCP ACKed unseen segment] 61161 → 80 [RST, ACK] Seq=250 Ack=32095 Win=32768 Len=0

This capture was run for several tens of seconds before the captured http connection and for several tens of seconds after to be sure that all packets sent by TRex for the given connection were captured and that this wasn't just some issue of out-of-order delivery.

When compared against the avl/delay_10_http_browsing_0.pcap capture file defined in the simple_http.yaml file, the last burst of packets from the server to the client should have been 8 packets long:

26 0 22.0.0.3 -> 21.0.0.7 TCP 1518 [TCP segment of a reassembled PDU]
27 0 22.0.0.3 -> 21.0.0.7 TCP 1518 [TCP segment of a reassembled PDU]
28 0 22.0.0.3 -> 21.0.0.7 TCP 1518 [TCP segment of a reassembled PDU]
29 0 22.0.0.3 -> 21.0.0.7 TCP 1518 [TCP segment of a reassembled PDU]
30 0 22.0.0.3 -> 21.0.0.7 TCP 1518 [TCP segment of a reassembled PDU]
31 0 22.0.0.3 -> 21.0.0.7 TCP 1518 [TCP segment of a reassembled PDU]
32 0 22.0.0.3 -> 21.0.0.7 TCP 1518 [TCP segment of a reassembled PDU]
33 0 22.0.0.3 -> 21.0.0.7 HTTP 1492 HTTP/1.1 200 OK (text/html)

But, in the capture from the test, it was only three packets long. And, I've verified in Wireshark that the packets which are being ACK'd (relative sequence numbers 26281, 29201, and 32095) aren't in the capture of packets from the server.

I've tried to rule out packet loss anywhere else in the network (Nexus, capture server, etc). And since this test is only sending ~700kbps it's not like I'm really stressing the 100G links, NICs or the servers. So, I hope I'm just missing some obvious TRex configuration option.

Thanks


hanoh haim

unread,
Jul 23, 2017, 3:15:03 AM7/23/17
to joe.r...@gmail.com, TRex Traffic Generator
Hi, 
We have a CX-4 setup ( one nic in loopback) connected back to back . 

I did a short experiment with lower speed and capturing the traffic by TRex itself, 

the diff to print the packets
////////////////////////////////////////////////////
git diff
diff --git a/src/main_dpdk.cpp b/src/main_dpdk.cpp
index dcbd5b2..6619c46 100644
--- a/src/main_dpdk.cpp
+++ b/src/main_dpdk.cpp
@@ -6027,6 +6027,7 @@ int main_test(int argc , char * argv[]){
     for (int i = 0; i < g_trex.m_max_ports; i++) {
         CPhyEthIF *_if = &g_trex.m_ports[i];
         _if->stop_rx_drop_queue();
+        _if->set_port_rcv_all(true);
     }
 
     if ( CGlobalInfo::m_options.is_latency_enabled()
diff --git a/src/stateful_rx_core.cpp b/src/stateful_rx_core.cpp
index a63864c..9b2f996 100644
--- a/src/stateful_rx_core.cpp
+++ b/src/stateful_rx_core.cpp
@@ -746,7 +746,7 @@ void CLatencyManager::handle_rx_pkt(CLatencyManagerPerPort * lp,
                                     rte_mbuf_t * m){
     CRx_check_header *rxc = NULL;
 
-#if 0
+#if 1
     /****************************************/
     uint8_t *p=rte_pktmbuf_mtod(m, uint8_t*);
     uint16_t pkt_size=rte_pktmbuf_pkt_len(m);
////////////////////////////////////////////////////


I've used this command on v.2.28 

$sudo ./t-rex-64 -f cap2/http_simple.yaml -m 0.1 -d 5 -c 1 -l 1 --iom 0 --software --nc > _http.pcap

the capture file with flow filter (_http2.pcap) is exactly the same as the original flow

can you verify that  with -m 0.1 it works for you and only with higher rates it does not

thanks
Hanoh



--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+unsubscribe@googlegroups.com.
To post to this group, send email to trex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/22c63a23-dc75-4527-a15c-a47aeb7869cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Hanoh
Sent from my iPhone
_http.pcap
_http2.pcap

hanoh haim

unread,
Jul 23, 2017, 3:37:02 AM7/23/17
to joe.r...@gmail.com, TRex Traffic Generator
another test that I did on vanilla v2.28 

$sudo ./t-rex-64 -f cap2/http_simple.yaml -m 10000 -d 1000 -c 1 -l 1 --rx-check 8

this command send random flows (1/8) to software for testing and you can see that flow size in exactly 37 without any errors 



-Per port stats table 
      ports |               0 |               1 
 -----------------------------------------------------------------------------------------
   opackets |        10710714 |        17595409 
     obytes |       903906962 |     25627162158 
   ipackets |        17593754 |        10711719 
     ibytes |     25624752466 |       903991322 
    ierrors |               0 |               0 
    oerrors |               0 |               0 
      Tx Bw |     262.38 Mbps |       7.43 Gbps 

-Global stats enabled 
 Cpu Utilization : 23.0  %  66.8 Gb/core 
 Platform_factor : 1.0  
 Total-Tx        :       7.69 Gbps  
 Total-Rx        :       7.71 Gbps  
 Total-PPS       :       1.03 Mpps  
 Total-CPS       :      27.78 Kcps  

 Expected-PPS    :       1.03 Mpps  
 Expected-CPS    :      27.76 Kcps  
 Expected-BPS    :       7.68 Gbps  

 Active-flows    :     3663  Clients :      255   Socket-util : 0.0228 %    
 Open-flows      :   767541  Servers :    65535   Socket :     3663 Socket/Clients :  14.4 
 drop-rate       :       0.00  bps   
 current time    : 30.9 sec  
 test duration   : 969.1 sec  

-Latency stats enabled 
 Cpu Utilization : 1.7 %  
 if|   tx_ok , rx_ok  , rx check ,error,       latency (usec) ,    Jitter          max window 
   |         ,        ,          ,     ,   average   ,   max  ,    (usec)                     
 ---------------------------------------------------------------------------------------------------------------- 
 0 |       28,      28,   2196604,    0,          3  ,      24,       0      |  3  3  4  4  3  3  3  3  3  3  3  3  3 
 1 |       28,      28,   1337258,    0,          2  ,      25,       1      |  2  4  2  3  4  2  2  2  2  4  2  2  2 

-Rx Check stats enabled 
------------------------------------------------------------------------------------------------------------
rx check:  avg/max/jitter latency,       36  ,    1018,       10      |  104  384  104  104  104  104  106  364  103  330  104  104  103 
---
 active flows:      429, fif:    95791,  drop:        0, errors:        0 
-------------------------------------------------------------------------------



 m_total_rx                              : 12836965 
 m_lookup                                : 12836965 
 m_found                                 : 12490020 
 m_fif                                   : 346945 
 m_add                                   : 346945 
 m_remove                                : 346945 
 
12836965/12836965 = 37 the same a template 


thanks,
Hanoh

hanoh haim

unread,
Jul 23, 2017, 3:40:26 AM7/23/17
to joe.r...@gmail.com, TRex Traffic Generator
BTW we have different OFED same OS

[csi-trex-07]> ofed_info 
MLNX_OFED_LINUX-4.0-1.0.1.0 (OFED-4.0-1.0.1):


Joe Rogers

unread,
Jul 23, 2017, 10:56:29 PM7/23/17
to hanoh haim, TRex Traffic Generator
Thanks for the quick response and the help! I recreated your test
cli's in our environment and was seeing the same missing-packet issues
with the external capture I was running. So, rather than trust that,
I turned to the Nexus which sits between the two test NIC's and just
looked at the packet counters for traffic coming from each NIC. The
ports are entirely idle when no tests are occurring since this is a
dedicated test network. Those counters were spot on. They matched
perfectly with the number of flows transmitted and the 37 packets that
should have been seen in each flow. It didn't matter if I was sending
1Mbps or 60Gbps. The counters matched "total # flows * 37". So, it
looks like the problems with missing packets are with the external
capture set up. I'll dig further into that.

I apologize for wasting your time with something that turned out to
not be a TRex issue. I should have tried looking at the counters to
valid the wireshark results first.

Thanks again!
>>> email to trex-tgn+u...@googlegroups.com.
output.txt
Reply all
Reply to author
Forward
0 new messages