ASTF test not shutting down cleanly

480 views
Skip to first unread message

jsea...@gmail.com

unread,
Mar 22, 2018, 5:19:35 PM3/22/18
to TRex Traffic Generator
Trex devs,
I am trying to test a terminating TCP proxy. We are running trex in astf mode. The configuration is posted below.
We are running into a case, while under load with the DUT dropping some frames, where trex is not closing down cleanly and will continue to run forever. There are actually two issues: 1) trex is sending keep alives client -> server, the server is generating resets to the client, but then the client does not accept those resets. In this case trex packets are forwarded through DUT unmodified. Issue 2) is somehow DUT has open/established sessions (probably lost reset packets from trex) at end of test but upon re-transmitting data from DUT no reset is received from trex.

Here is a capture for issue 1); Packet 56 is trex server sending DUT a reset packet. Packet 57 is a reset packet out DUT sending to trex client - this is close with SO_LINGER on at 0s to generate a 'forward' reset. The reset is out of window to trex client since previous data was lost so client rightly drops the ack. At that point DUT has no sockets and drops into 'forward' mdoe. Then in packet 58 trex client will then send a tcp keep alive which is forwarded in 59 by DUT to trex server. Trex server responds with a reset in packet 60 that our DUT just fowards. This will continue on for forever.

I haven't captured issue 2 but it seems similar to 1 in that it's related to unclean shutdown in the face of packet loss through terminating proxy DUT.


47 13.266257 16.0.0.181 -> 48.0.0.10 TCP 66 [TCP Dup ACK 41#3] 46254 > http [ACK] Seq=925417152 Ack=1792275948 Win=32768 Len=0 TSval=41 TSecr=12874
48 13.266261 48.0.0.10 -> 16.0.0.181 TCP 1514 [TCP Fast Retransmission] http > 46254 [ACK] Seq=1792275948 Ack=925417152 Win=29631 Len=1448 TSval=13280 TSecr=41[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)]
49 13.266265 16.0.0.181 -> 48.0.0.10 TCP 66 [TCP Dup ACK 41#4] 46254 > http [ACK] Seq=925417152 Ack=1792275948 Win=32768 Len=0 TSval=41 TSecr=12874
50 13.266272 16.0.0.181 -> 48.0.0.10 TCP 66 [TCP Dup ACK 41#5] 46254 > http [ACK] Seq=925417152 Ack=1792275948 Win=32768 Len=0 TSval=41 TSecr=12874
51 13.535897 16.0.0.181 -> 48.0.0.10 TCP 66 [TCP Dup ACK 41#6] 46254 > http [ACK] Seq=925417152 Ack=1792275948 Win=32768 Len=0 TSval=41 TSecr=12874
52 13.535913 48.0.0.10 -> 16.0.0.181 TCP 1514 [TCP Retransmission] http > 46254 [ACK] Seq=1792290428 Ack=925417152 Win=29631 Len=1448 TSval=13549 TSecr=41[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)]
53 13.535930 16.0.0.181 -> 48.0.0.10 TCP 66 [TCP Dup ACK 41#7] 46254 > http [ACK] Seq=925417152 Ack=1792275948 Win=32768 Len=0 TSval=41 TSecr=12874
54 13.569837 16.0.0.181 -> 48.0.0.10 TCP 66 [TCP Dup ACK 41#8] 46254 > http [ACK] Seq=925417152 Ack=1792275948 Win=32768 Len=0 TSval=42 TSecr=12874
55 13.569850 48.0.0.10 -> 16.0.0.181 TCP 1514 [TCP Retransmission] http > 46254 [PSH, ACK] Seq=1792293324 Ack=925417152 Win=29631 Len=1448 TSval=13583 TSecr=42[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)]
56 16.093041 48.0.0.10 -> 16.0.0.181 TCP 60 http > 46254 [RST, ACK] Seq=1792298554 Ack=925417152 Win=32768 Len=0
57 16.163179 48.0.0.10 -> 16.0.0.181 TCP 66 http > 46254 [RST, ACK] Seq=1792297668 Ack=925417152 Win=29631 Len=0 TSval=4287027506 TSecr=42
58 19.529302 16.0.0.181 -> 48.0.0.10 TCP 60 [TCP Keep-Alive] 46254 > http [<None>] Seq=925417151 Win=32768 Len=0
59 19.529312 16.0.0.181 -> 48.0.0.10 TCP 54 [TCP Keep-Alive] 46254 > http [<None>] Seq=925417151 Win=32768 Len=0
60 19.738891 48.0.0.10 -> 16.0.0.181 TCP 60 http > 46254 [RST, ACK] Seq=0 Ack=925417152 Win=32768 Len=0
61 19.738916 48.0.0.10 -> 16.0.0.181 TCP 54 http > 46254 [RST, ACK] Seq=0 Ack=925417152 Win=32768 Len=0
62 25.419347 16.0.0.181 -> 48.0.0.10 TCP 60 [TCP Keep-Alive] 46254 > http [<None>] Seq=925417151 Win=32768 Len=0
63 25.419368 16.0.0.181 -> 48.0.0.10 TCP 54 [TCP Keep-Alive] 46254 > http [<None>] Seq=925417151 Win=32768 Len=0
64 25.419740 48.0.0.10 -> 16.0.0.181 TCP 60 http > 46254 [RST, ACK] Seq=0 Ack=925417152 Win=32768 Len=0
65 25.419761 48.0.0.10 -> 16.0.0.181 TCP 54 http > 46254 [RST, ACK] Seq=0 Ack=925417152 Win=32768 Len=0
66 31.410592 16.0.0.181 -> 48.0.0.10 TCP 60 [TCP Keep-Alive] 46254 > http [<None>] Seq=925417151 Win=32768 Len=0
67 31.410612 16.0.0.181 -> 48.0.0.10 TCP 54 [TCP Keep-Alive] 46254 > http [<None>] Seq=925417151 Win=32768 Len=0
68 31.410981 48.0.0.10 -> 16.0.0.181 TCP 60 http > 46254 [RST, ACK] Seq=0 Ack=925417152 Win=32768 Len=0
69 31.411002 48.0.0.10 -> 16.0.0.181 TCP 54 http > 46254 [RST, ACK] Seq=0 Ack=925417152 Win=32768 Len=0
70 37.400676 16.0.0.181 -> 48.0.0.10 TCP 60 [TCP Keep-Alive] 46254 > http [<None>] Seq=925417151 Win=32768 Len=0
71 37.400698 16.0.0.181 -> 48.0.0.10 TCP 54 [TCP Keep-Alive] 46254 > http [<None>] Seq=925417151 Win=32768 Len=0
72 37.401043 48.0.0.10 -> 16.0.0.181 TCP 60 http > 46254 [RST, ACK] Seq=0 Ack=925417152 Win=32768 Len=0
73 37.401062 48.0.0.10 -> 16.0.0.181 TCP 54 http > 46254 [RST, ACK] Seq=0 Ack=925417152 Win=32768 Len=0

# Example for creating your program by specifying buffers to send, without relaying on pcap file

from trex_astf_lib.api import *

def get_running_ascii (size):
s='';
c=65;
for i in range(0,size):
s+=chr(c)
c+=1;
if c==91:
c=65
return(s);

class Prof1():
def __init__(self):
pass # tunables

def create_profile(self, size=1):
# we can send either Python bytes type as below:
http_req = b'GET /3384 HTTP/1.1\r\nHost: 22.0.0.3\r\nConnection: Keep-Alive\r\nUser-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)\r\nAccept: */*\r\nAccept-Language: en-us\r\nAccept-Encoding: gzip, deflate, compress\r\n\r\n'
# or we can send Python string containing ascii chars, as below:
length = 24 + size
http_response = 'HTTP/1.1 200 OK\r\nServer: Microsoft-IIS/6.0\r\nContent-Type: text/html\r\nContent-Length: ' + str(length) + '\r\n\r\n<html><pre>' + get_running_ascii(size) + '</pre></html>'

# client commands
prog_c = ASTFProgram()
prog_c.connect()
prog_c.send(http_req)
prog_c.recv(len(http_response))

prog_s = ASTFProgram()
prog_s.recv(len(http_req))
prog_s.send(http_response)
prog_s.wait_for_peer_close()


# ip generator
ip_gen_c = ASTFIPGenDist(ip_range=["16.0.0.0", "16.0.0.255"], distribution="seq")
ip_gen_s = ASTFIPGenDist(ip_range=["48.0.0.0", "48.0.0.10"], distribution="seq")
ip_gen = ASTFIPGen(glob=ASTFIPGenGlobal(ip_offset="1.0.0.0"),
dist_client=ip_gen_c,
dist_server=ip_gen_s)

info = ASTFGlobalInfo()
info.scheduler.accurate = 1

# template
temp_c = ASTFTCPClientTemplate(program=prog_c, ip_gen=ip_gen)
temp_s = ASTFTCPServerTemplate(program=prog_s) # using default association
template = ASTFTemplate(client_template=temp_c, server_template=temp_s)

# profile
profile = ASTFProfile(default_ip_gen=ip_gen, templates=template,
default_c_glob_info=info,
default_s_glob_info=info)
#profile = ASTFProfile(default_ip_gen=ip_gen, templates=template)
return profile

def get_profile(self, **kwargs):
size = kwargs.get('size',1)
return self.create_profile(size)

def register():
return Prof1()

Thank you,
John Searles

hanoh haim

unread,
Mar 22, 2018, 5:46:18 PM3/22/18
to jsea...@gmail.com, TRex Traffic Generator
Hi John,
Interesting..
We have extensive tests on the BSD TCP stack and this scenario is simulated in our unit-test.

Could you help and provide the following 

1. send the capture file in pcap file format
2. Tell on which side it was captured
3. Send the output of tcp counters every 5 min at termination time.

In case of any error the keepalive should close the flow in both side after a few seconds.

In case of high drop rates TRex will wait for all flows to terminates (active-flow should be zero).


Thanks
Hanoh

--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+u...@googlegroups.com.
To post to this group, send email to trex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/bb5195f1-9117-42af-bb0b-3ecd1714f506%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Hanoh
Sent from my iPhone
Message has been deleted

hanoh haim

unread,
Mar 22, 2018, 6:13:46 PM3/22/18
to jsea...@gmail.com, TRex Traffic Generator
I think I understand the issue.
The server does not have a flow, so it just rrespond with RST with seq+1 to the keepalive.

However the client send keepalive with seq+1 so the RST is not in the window, so it drop it *BUT* restart the timers keepalive timer.

This create the endless look.

I need to look into the unit-test to check this out

Thanks,
Hanoh

hanoh haim

unread,
Mar 23, 2018, 9:36:26 AM3/23/18
to sbar...@gmail.com, TRex Traffic Generator
Hi,
I have enough information to fix this.
I’m OOO this week so it will take me a few days.

At close time the logic is the same. It just wais to active-flow to be zero and it is not.
There is a way to capture in TRex at slow rates using “-v 7 —iom 0”

Thanks,
Hanoh

On Fri, 23 Mar 2018 at 6:21 <sbar...@gmail.com> wrote:
Hi Hanoh,

We've been capturing on our DUT which, unfortunately, grabs very confusing packet captures because it's both to DUT and from DUT on the same interface. Is there a way (with ASTF) to capture packets? If not we can figure something out.

We'll also grab TCP counters as well.

At close time does trex ASTF do anything different from normal runtime in regards to the TCP stack?


Thanks,
Stefan


--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+u...@googlegroups.com.
To post to this group, send email to trex...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

hanoh haim

unread,
Apr 15, 2018, 10:50:15 AM4/15/18
to Stefan Baranoff, TRex Traffic Generator
Hi Stefan and John, 
I had time to look into it  today and it was easy to reproduce it with the simulator. A big thanks for finding this issue. 
Actually, there was a test that failed due to this issue and I've masked it..
 
Anyhow, more details can be found here with a fix 

https://trex-tgn.cisco.com/youtrack/issue/trex-524

appreciate if you could test it on your system. (with tcp.blackhole=0 and   tcp.blackhole=2)

I will push it to master.
thanks,
Hanoh


To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+unsubscribe@googlegroups.com.

To post to this group, send email to trex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/97833cac-bf53-4706-b9ce-19fd7d0edf14%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Hanoh
Sent from my iPhone

John Searles

unread,
Apr 17, 2018, 1:58:00 PM4/17/18
to hanoh haim, Stefan Baranoff, TRex Traffic Generator
Hanoh,

I have been doing some testing with your branch and it seems to be better with the default tcp.blackhole setting.   It is sending resets correctly.  Thank you for looking into it/fixing it.

However, I can generate a case where under high load (so dropping occurs) trex thinks it closes the connection, but our application has it still open.  The application will resend data to trex client, but since trex does not know about the flow it will just drop it and increment err_cwf.  I would expect trex to generate a reset packet saying it does not know about the connection.  This would cause our application to terminate the flow and stop trying to resend to trex.

Thanks,
John Searles

You received this message because you are subscribed to a topic in the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/trex-tgn/Q6sBBKwu_iU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to trex-tgn+unsubscribe@googlegroups.com.

To post to this group, send email to trex...@googlegroups.com.

hanoh haim

unread,
Apr 17, 2018, 2:01:27 PM4/17/18
to John Searles, Stefan Baranoff, TRex Traffic Generator
Hi John,
Yes, you are right, the code is not symmetric.
I was thinking about this issue. Will send another patch for that.

Thanks,
Hanoh

--
Hanoh
Sent from my iPhone



--
Hanoh
Sent from my iPhone

--
You received this message because you are subscribed to a topic in the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/trex-tgn/Q6sBBKwu_iU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to trex-tgn+u...@googlegroups.com.

To post to this group, send email to trex...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

hanoh haim

unread,
Apr 18, 2018, 3:42:29 AM4/18/18
to John Searles, Stefan Baranoff, TRex Traffic Generator

To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+unsubscribe@googlegroups.com.

To post to this group, send email to trex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/97833cac-bf53-4706-b9ce-19fd7d0edf14%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Hanoh
Sent from my iPhone



--
Hanoh
Sent from my iPhone

--
You received this message because you are subscribed to a topic in the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/trex-tgn/Q6sBBKwu_iU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to trex-tgn+unsubscribe@googlegroups.com.
--
Hanoh
Sent from my iPhone

John Searles

unread,
Apr 18, 2018, 5:19:13 PM4/18/18
to hanoh haim, Stefan Baranoff, TRex Traffic Generator
Hanoh,

I applied this  diff to your branch and ran it.  The reset being generated has the same ip/port for both the source and dest address.  So this reset packet goes nowhere.

Thanks,
John Searles

hanoh haim

unread,
Apr 20, 2018, 2:57:41 AM4/20/18
to John Searles, Stefan Baranoff, TRex Traffic Generator
Hi John, 

This would solve it, you should apply it on top the old patch
https://github.com/cisco-system-traffic-generator/trex-core/commit/27879cdf9b57b7e871477b8b43efd38e6b77ac6a

Hanoh

John Searles

unread,
Apr 23, 2018, 7:48:59 PM4/23/18
to hanoh haim, Stefan Baranoff, TRex Traffic Generator
Hanoh,

I tested the patch and that one works much better.  It was generating reset packets with the right source/dest address.  Related to keeping sessions open I just saw a case where Active connections were still happening after the test was finished.  I hit T and I saw Client  tcps_keeptimeo incrementing, but no keep-alive packets were being sent.  I confirmed by both looking at stats(no other field was incrementing), and I had tcpdump running and saw no traffic on the link.  This was running on your branch with the reset patches applied.

Is there a way to know what 5 tuple the active connections are?  This would make looking for the in  packet captures easier. 

Thanks,
John 

hanoh haim

unread,
Apr 24, 2018, 1:15:38 PM4/24/18
to John Searles, Stefan Baranoff, TRex Traffic Generator
Hi John,
The keepalive counter “tcps_keeptimeo”
Is going up every 0.5 sec (in case the flow is stall). every 10 ticks (~5sec) you should see a keepalive packet. After 5 packets (depends on the state of the TCP) the connection will be drop.

So is after ~30 seconds all the flows should be closed. Isn’t this the case?

Thanks,
Hanoh

--
Hanoh
Sent from my iPhone



--
Hanoh
Sent from my iPhone

--
You received this message because you are subscribed to a topic in the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/trex-tgn/Q6sBBKwu_iU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to trex-tgn+u...@googlegroups.com.

To post to this group, send email to trex...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Hanoh
Sent from my iPhone



--
Hanoh
Sent from my iPhone



--
Hanoh
Sent from my iPhone

John Searles

unread,
Apr 24, 2018, 2:31:15 PM4/24/18
to hanoh haim, Stefan Baranoff, TRex Traffic Generator
Hanoh,

That is not what I was seeing. I have waited 2-3 mins and nothing changes. There are no keep alive packets being sent/counter is not incrementing.  Only the tcps_keeptimeo is incrementing. I could not get this behaviour to triger with a test running 30 seconds.  I could get it to trigger running at 60 seconds.   I have attached two stats from t about 30 seconds apart from a test with 300 or so active connections stuck about 2 mins after the test should have finished.

cps_connattempt  |          359401  |                0  |  connections initiated

         tcps_accepts  |               0  |           254768  |  connections accepted

        tcps_connects  |          242854  |           235716  |  connections established

          tcps_closed  |          358793  |           254768  |  conn. closed (includes drops)

       tcps_segstimed  |          633870  |           601051  |  segs where we tried to get rtt

      tcps_rttupdated  |          344480  |           762174  |  times we succeeded

          tcps_delack  |           48739  |                0  |  delayed acks sent

        tcps_sndtotal  |         2217975  |          3014434  |  total packets sent

         tcps_sndpack  |          242854  |          1682192  |  data packets sent

         tcps_sndbyte  |        60470646  |       2506401030  |  data bytes sent by application

      tcps_sndbyte_ok  |        60470646  |       2408934851  |  data bytes sent by tcp

         tcps_sndctrl  |         1196649  |           316984  |  control (SYN|FIN|RST) packets sent

         tcps_sndacks  |          662200  |           446464  |  ack-only packets sent 

         tcps_rcvpack  |          477402  |           153555  |  packets received in sequence

         tcps_rcvbyte  |       685802251  |         31021665  |  bytes received in sequence

      tcps_rcvackpack  |          252555  |           762174  |  rcvd ack packets

      tcps_rcvackbyte  |        55092495  |       2087961289  |  tx bytes acked by rcvd acks 

   tcps_rcvackbyte_of  |           31300  |           264464  |  tx bytes acked by rcvd acks - overflow acked

         tcps_preddat  |          306922  |             3665  |  times hdr predict ok for data pkts 

           tcps_drops  |          211558  |           225973  |  connections dropped

       tcps_conndrops  |          116547  |                0  | *embryonic connections dropped

     tcps_timeoutdrop  |            4844  |              215  | *conn. dropped in rxmt timeout

      tcps_rexmttimeo  |          123869  |           158789  | *retransmit timeouts

  tcps_rexmttimeo_syn  |          624042  |            78027  | *retransmit SYN timeouts

       tcps_keeptimeo  |          600647  |           460911  | *keepalive timeouts

       tcps_keepprobe  |          254226  |           250289  | *keepalive probes sent

       tcps_keepdrops  |          317302  |           209523  | *connections dropped in keepalive

   tcps_sndrexmitpack  |          116272  |           568794  | *data packets retransmitted

   tcps_sndrexmitbyte  |        28951728  |        522840465  | *data bytes retransmitted

      tcps_rcvduppack  |          134984  |           165497  | *duplicate-only packets received

      tcps_rcvdupbyte  |        16105606  |          2382432  | *duplicate-only bytes received

  tcps_rcvpartduppack  |              59  |                0  | *packets with some duplicate data

  tcps_rcvpartdupbyte  |           83090  |                0  | *dup. bytes in part-dup. packets

       tcps_rcvoopack  |          173697  |                0  | *out-of-order packets received

       tcps_rcvoobyte  |       250136479  |                0  | *out-of-order bytes received

 tcps_rcvpackafterwin  |              29  |                0  | *packets with data after window

       tcps_rcvdupack  |           10286  |            32513  | *rcvd duplicate acks

   tcps_rcvacktoomuch  |            1184  |               30  | *rcvd acks for unsent data

       tcps_rcvwinupd  |               0  |             9996  | *rcvd window update packets

        tcps_pawsdrop  |               1  |             1246  | *segments dropped due to PAWS

       tcps_reasalloc  |           16589  |                0  | *allocate tcp reasembly ctx

        tcps_reasfree  |           16589  |                0  | *free tcp reasembly ctx

                    -  |             ---  |              ---  |  

                  UDP  |             ---  |              ---  |  

                    -  |             ---  |              ---  |  

                    -  |             ---  |              ---  |  

           Flow Table  |             ---  |              ---  |  

                    -  |             ---  |              ---  |  

              err_cwf  |           94879  |                0  | *client pkt without flow

           err_no_syn  |               0  |            89602  | *server first flow packet with no SYN

       redirect_rx_ok  |               1  |                1  |  redirect to rx OK

     err_rx_throttled  |             297  |              278  |  rx thread was throttled


10seconds or so later



  tcps_connattempt  |          359401  |                0  |  connections initiated

         tcps_accepts  |               0  |           254768  |  connections accepted

        tcps_connects  |          242854  |           235716  |  connections established

          tcps_closed  |          358793  |           254768  |  conn. closed (includes drops)

       tcps_segstimed  |          633870  |           601051  |  segs where we tried to get rtt

      tcps_rttupdated  |          344480  |           762174  |  times we succeeded

          tcps_delack  |           48739  |                0  |  delayed acks sent

        tcps_sndtotal  |         2217975  |          3014434  |  total packets sent

         tcps_sndpack  |          242854  |          1682192  |  data packets sent

         tcps_sndbyte  |        60470646  |       2506401030  |  data bytes sent by application

      tcps_sndbyte_ok  |        60470646  |       2408934851  |  data bytes sent by tcp

         tcps_sndctrl  |         1196649  |           316984  |  control (SYN|FIN|RST) packets sent

         tcps_sndacks  |          662200  |           446464  |  ack-only packets sent 

         tcps_rcvpack  |          477402  |           153555  |  packets received in sequence

         tcps_rcvbyte  |       685802251  |         31021665  |  bytes received in sequence

      tcps_rcvackpack  |          252555  |           762174  |  rcvd ack packets

      tcps_rcvackbyte  |        55092495  |       2087961289  |  tx bytes acked by rcvd acks 

   tcps_rcvackbyte_of  |           31300  |           264464  |  tx bytes acked by rcvd acks - overflow acked

         tcps_preddat  |          306922  |             3665  |  times hdr predict ok for data pkts 

           tcps_drops  |          211558  |           225973  |  connections dropped

       tcps_conndrops  |          116547  |                0  | *embryonic connections dropped

     tcps_timeoutdrop  |            4844  |              215  | *conn. dropped in rxmt timeout

      tcps_rexmttimeo  |          123869  |           158789  | *retransmit timeouts

  tcps_rexmttimeo_syn  |          624042  |            78027  | *retransmit SYN timeouts

       tcps_keeptimeo  |          601749  |           460911  | *keepalive timeouts

       tcps_keepprobe  |          254226  |           250289  | *keepalive probes sent

       tcps_keepdrops  |          317302  |           209523  | *connections dropped in keepalive

   tcps_sndrexmitpack  |          116272  |           568794  | *data packets retransmitted

   tcps_sndrexmitbyte  |        28951728  |        522840465  | *data bytes retransmitted

      tcps_rcvduppack  |          134984  |           165497  | *duplicate-only packets received

      tcps_rcvdupbyte  |        16105606  |          2382432  | *duplicate-only bytes received

  tcps_rcvpartduppack  |              59  |                0  | *packets with some duplicate data

  tcps_rcvpartdupbyte  |           83090  |                0  | *dup. bytes in part-dup. packets

       tcps_rcvoopack  |          173697  |                0  | *out-of-order packets received

       tcps_rcvoobyte  |       250136479  |                0  | *out-of-order bytes received

 tcps_rcvpackafterwin  |              29  |                0  | *packets with data after window

       tcps_rcvdupack  |           10286  |            32513  | *rcvd duplicate acks

   tcps_rcvacktoomuch  |            1184  |               30  | *rcvd acks for unsent data

       tcps_rcvwinupd  |               0  |             9996  | *rcvd window update packets

        tcps_pawsdrop  |               1  |             1246  | *segments dropped due to PAWS

       tcps_reasalloc  |           16589  |                0  | *allocate tcp reasembly ctx

        tcps_reasfree  |           16589  |                0  | *free tcp reasembly ctx

                    -  |             ---  |              ---  |  

                  UDP  |             ---  |              ---  |  

                    -  |             ---  |              ---  |  

                    -  |             ---  |              ---  |  

           Flow Table  |             ---  |              ---  |  

                    -  |             ---  |              ---  |  

              err_cwf  |           94879  |                0  | *client pkt without flow

           err_no_syn  |               0  |            89602  | *server first flow packet with no SYN

       redirect_rx_ok  |               1  |                1  |  redirect to rx OK

     err_rx_throttled  |             297  |              278  |  rx thread was throttled



To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+unsubscribe@googlegroups.com.

To post to this group, send email to trex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/97833cac-bf53-4706-b9ce-19fd7d0edf14%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Hanoh
Sent from my iPhone



--
Hanoh
Sent from my iPhone

--
You received this message because you are subscribed to a topic in the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/trex-tgn/Q6sBBKwu_iU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to trex-tgn+unsubscribe@googlegroups.com.
--
Hanoh
Sent from my iPhone



--
Hanoh
Sent from my iPhone



--
Hanoh
Sent from my iPhone

hanoh haim

unread,
Apr 24, 2018, 4:18:54 PM4/24/18
to John Searles, Stefan Baranoff, TRex Traffic Generator
I’ve looked into the code.
It seems that BSD guys have chosen to protect the flow with keepalive only when the flow in *establishe* state (need to see the RFC)
This could create zombies flows in case it is in states of FIN/ACK.

Could you try this:

In this line remove the condition of < CLOSE_WAIT 
            tp->t_state <= TCPS_CLOSE_WAIT) {

Thanks,
Hanoh


<div class="m_-667917184424729922m_-3614953330689467994m_1059053418505

hanoh haim

unread,
Apr 25, 2018, 8:21:28 AM4/25/18
to John Searles, Stefan Baranoff, TRex Traffic Generator
Hi John, 
I've verified the solution on the TRex simulator and it works. 

Let me explain the keepalive hole in the BSD implementation that explain the counters you sent.

 

First, if any packet that was sent by the client or server is not ACKED, the flow will be dropped after 5*5 Sec (~25 Sec) -- this is a TCP retry timer.

 

so let's take the default

 

FIN-ACK ->

   <-FIN-ACK

ACK->

 

If any of those packets is dropped there is no need for keepalive. The retry or 2MSL timer will handle this.

 

   

However, in this case.

 

FIN-ACK->

<-ACK

<-FIN-ACK

->ACK

 

If  the server side <-FIN/ACK dropped somehow the client flow will stay in half-close state forever, not guarded by the keepalive time.

 

So this is the *only* case to get into this scenario.

 

the fix is simple

 


[csi-sceasr-b81]> git diff
diff --git a/doc/images/icons/Thumbs.db b/doc/images/icons/Thumbs.db
index a348790..5273f4a 100755
Binary files a/doc/images/icons/Thumbs.db and b/doc/images/icons/Thumbs.db differ
diff --git a/src/44bsd/tcp_timer.cpp b/src/44bsd/tcp_timer.cpp
index 6c46847..cd2d117 100644
--- a/src/44bsd/tcp_timer.cpp
+++ b/src/44bsd/tcp_timer.cpp
@@ -221,8 +221,7 @@ tcp_timers(CTcpPerThreadCtx * ctx,struct tcpcb *tp, int timer){
         INC_STAT(ctx,tcps_keeptimeo);
         if (tp->t_state < TCPS_ESTABLISHED)
             goto dropit;
-        if (tp->m_socket.so_options & US_SO_KEEPALIVE &&
-            tp->t_state <= TCPS_CLOSE_WAIT) {
+        if (tp->m_socket.so_options & US_SO_KEEPALIVE) {
                 if (tp->t_idle >= ctx->tcp_keepidle + ctx->tcp_maxidle)


BTW in Linux TCP impl this issue does not exits. 

thanks,
Hanoh

hanoh haim

unread,
Apr 25, 2018, 8:27:15 AM4/25/18
to John Searles, Stefan Baranoff, TRex Traffic Generator

hanoh haim

unread,
Apr 26, 2018, 10:45:58 AM4/26/18
to John Searles, Stefan Baranoff, TRex Traffic Generator
Hi John, 



FIN-ACK->

  <-ACK

<-FIN-ACK

->ACK

 

More thinking about the *lost* server FIN-ACK packet, that could create the keepalive hole. The only option to create this state is to make the server "crash" exactly at the time of sending FIN-ACK packet (before). If not, it will keep sending these packets. The probability that *all* the FIN-ACK packets are dropped by chance is very low. So this means that your DUT terminate the server using RST (wrongly?).

Is it possible?


thanks

Hanoh

John Searles

unread,
Apr 26, 2018, 10:55:24 AM4/26/18
to hanoh haim, Stefan Baranoff, TRex Traffic Generator
Hanoh,

With the change you suggested, i have not had a case where active connections will not close out now.  I still have more testing to do to confirm that.

In regards to your question, is there a way to tell what 5 tuple is considered active?  This will allow me to look through pcap and get a better understanding of what is happening.

Thanks,
John

hanoh haim

unread,
Apr 26, 2018, 11:11:36 AM4/26/18
to John Searles, Stefan Baranoff, TRex Traffic Generator

a quick solution would be to print the tuples that keepalive time terminate in this state (>CLOSE)

you should add the code here:

try this code with "--iom 0" in CLI (it will show you the tuple at the end of the tests)


$> git diff
diff --git a/src/44bsd/tcp_timer.cpp b/src/44bsd/tcp_timer.cpp
index cd2d117..79e7566 100644
--- a/src/44bsd/tcp_timer.cpp
+++ b/src/44bsd/tcp_timer.cpp
@@ -244,6 +244,10 @@ tcp_timers(CTcpPerThreadCtx * ctx,struct tcpcb *tp, int timer){
             tp->t_timer[TCPT_KEEP] = ctx->tcp_keepidle;
         break;
     dropit:
+        if (tp->t_state > TCPS_CLOSE_WAIT) {
+            CFlowTemplate * tm=&tp->m_flow->m_template;
+            printf("keepliave termination : [d:%x:s:%x:%x:%x] state: %s \n",tm->get_dst_ipv4(),tm->get_src_ipv4(),tm->get_dst_port(),tm->get_src_port(),tcp_get_tcpstate()[tp->t_state]);
+        }
         INC_STAT(ctx,tcps_keepdrops);
         tp = tcp_drop_now(ctx,tp, TCP_US_ETIMEDOUT);
         break;



John Searles

unread,
Apr 26, 2018, 3:01:15 PM4/26/18
to hanoh haim, Stefan Baranoff, TRex Traffic Generator
Hanoh,

This is testing under load and DUT is dropping a large number of packets (running a debug with verbose logging version of software). There are three states we're seeing:

LAST_ACK: Looks like trex server is usually in this state. The DUT client side capture shows us sending a FIN/ACK and two retransmissions at +2 and +4 seconds, receiving a keep-alive, sending 3 more re-transmissions, and after the last one receiving a RST.
  -- It is possible that for some reason every FIN/ACK back from trex server to us is dropped on these flows but I would expect near the end of the test after load has dropped that they would get through. Clearly trex thinks it sent its FIN but we never saw it - is it possible that FIN/ACK retransmission isn't happening?

FIN_WAIT1: Looks like trex client is usually in this state. The DUT server side capture shows us getting the FIN/ACK and responding with an ACK, a FIN/ACK, and then two retransmissions of the FIN/ACKs. Then at last FIN/ACK + 6s we see a RST from trex client.
 -- Interestingly enough on some of these flows near the end of test when DUT client side sends the FIN/ACK there is no response from the trex server. After a few re-transmissions one of them elicits a RST in response. Other flows that are mid-test do not have this.

FIN_WAIT2: Again trex client is usually in this state. Spot checking these there it appears similar to FIN_WAIT1 but the other half of the session (trex server/DUT client) sees RST from trex server and proxies that through to trex client to kill the session (which then returns RST because trex already tore down the session).

Thanks,
Stefan/John

--
You received this message because you are subscribed to a topic in the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/trex-tgn/Q6sBBKwu_iU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to trex-tgn+unsubscribe@googlegroups.com.
To post to this group, send email to trex...@googlegroups.com.

hanoh haim

unread,
Apr 26, 2018, 5:02:31 PM4/26/18
to John Searles, Stefan Baranoff, TRex Traffic Generator
Hi,
Quick answer.
In current implementation TRex should not drop packets in Tx side but there could but drop in Rx side (There should be global port ierror counter in this case).

So in the first case, there should be retry timer for FIN/ACK in case the TRex server didn’t get ack to the FIN-ACK (I’ve actually review this to find the keepalive hole). 

I have a random drop test that verify that all L7 data in valid with many types of drop percentage (it does not simulate the keepalive hole because it is not the same, as for the hole there is a need to simulate server *crash* in *specific* packet).

If you can send a pcap with something that does not make sense I would be able to simulate it.

Thanks,
Hanoh
--
Reply all
Reply to author
Forward
0 new messages