Bug of RTO calculation in tcp-socket-base.cc

394 views
Skip to first unread message

Zhou Charles

unread,
Jan 20, 2015, 3:54:32 PM1/20/15
to ns-3-...@googlegroups.com
Hi everyone:

Actually I am not sure if it is a bug or it is just I misunderstand the program. When doing TCP performance evaluation I found that the RTO is always 1s, which the actual value should be varying according to the  RTT and the estimation error (RFC 2988).

After looking the code, I found the problem is that although after every new ACK, the method to estimate RTO is called (m_rto = m_rtt->RetransmitTimeout (), the method is implemented in rtt-estimator.cc) , the value of m_variance in the method RetransmitTimeout() is never changed. According to RttMeanDeviation::RetransmitTimeout (), the estimation of RTO is m_currentEstimatedRtt.ToInteger (Time::MS) + 4 * m_variance.ToInteger (Time::MS). As m_variance is always 0, there is no change in estimation of RTO. (I printed part of results as followings)



The value of m_variance is changed in method RttMeanDeviation::Measurement (Time m). I think  method Measurement (m) should be called every new ACK arrives, so that we could re-estimate the value of RTO based on the current estimation and the error in current estimation. 

I am not sure whether I am right or not. Any suggestion would be a great help. Thanks!!!

Best
Fan


Tommaso Pecorella

unread,
Jan 20, 2015, 4:17:18 PM1/20/15
to ns-3-...@googlegroups.com, Natale Patriciello
Hi,

if you're using a very recent ns-3 version, then you're missing one thing: the TimeStamp TCP Option.

    .AddAttribute ("Timestamp", "Enable or disable Timestamp option",
                   
BooleanValue (true),
                   
MakeBooleanAccessor (&TcpSocketBase::m_timestampEnabled),
                   
MakeBooleanChecker ())

If this option is used, then a few bytes for each packet are lost, but the RTT is not estimated, it's measured:

void
TcpSocketBase::EstimateRtt (const TcpHeader& tcpHeader)
{
 
Time nextRtt;
 
if (m_timestampEnabled)
   
{
      nextRtt
= TcpOptionTS::ElapsedTimeFromTsValue (m_lastEchoedTime);
   
}
 
else
   
{
     
// Use m_rtt for the estimation. Note, RTT of duplicated acknowledgement
     
// (which should be ignored) is handled by m_rtt.
      nextRtt
=  m_rtt->EstimateRttFromSeq (tcpHeader.GetAckNumber () );
   
}

 
//nextRtt will be zero for dup acks.  Don't want to update lastRtt in that case
 
//but still needed to do list clearing that is done in EstimateRttFromSeq.
 
if(nextRtt != Time (0))
 
{
    m_lastRtt
= nextRtt;
    NS_LOG_FUNCTION
(this << m_lastRtt);
 
}
}


You could argue that even if the TimeStamp option is used the RTT should be smoothed in some way, but right now it's not the case. 

Hope this helps,

T.

Zhou Charles

unread,
Jan 20, 2015, 4:48:49 PM1/20/15
to ns-3-...@googlegroups.com, natale.pa...@unimore.it
Hi Tommaso:

Thank you very much for your prompt response!

However, the problem is not actually about the RTT estimation. I understand that the RTT is measured rather than estimated, which is of course more accurate. 

The problem is the value of RTO. Every time senders send a new packet, sender should estimate the value of RTO to decide "how long should I wait for the ACK?". The information I know is the exact value of RTT of last round (m) and the estimation value of RTT of last round (m_currentEstimatedRtt). From these two values we could calculate m_variance, which is average of the error between m and m_currentEstimatedRtt. So we could predict RTO in next round using m_currentEstimatedRtt + 4*m_variance.  These can be refereed from RFC 2988

However, in the program of tcp-socket-base.cc, the m_variance is never calculated, which cause the m_currentEstimatedRtt always be its initial value (1s). 

Did I miss anything?

Best
Fan

在 2015年1月20日星期二 UTC-5下午4:17:18,Tommaso Pecorella写道:

Tommaso Pecorella

unread,
Jan 21, 2015, 12:46:11 AM1/21/15
to ns-3-...@googlegroups.com, natale.pa...@unimore.it, Tom Henderson
Gotcha !
And you found a quite horrible bug too.

Let me explain. When TimeStamp is used, the RTT estimator is made useless. As is, it is not tracking anymore the RTT. That's why you see only a "1" value and no variance. The object is there, but it's not used.
However, and THIS is the real bug, the RTO is calculated in various places, but in a couple of them...
   m_rto = m_rtt->RetransmitTimeout ();

Of course this is a bug. m_rtt wasn't feed with any value, so it's just returning the initial values.

Cheers,

T.

Tommaso Pecorella

unread,
Jan 21, 2015, 12:48:18 AM1/21/15
to ns-3-...@googlegroups.com, natale.pa...@unimore.it, tomh...@gmail.com
For the records, I opened a bug in Bugzilla: https://www.nsnam.org/bugzilla/show_bug.cgi?id=2041

T.

Zhou Charles

unread,
Jan 21, 2015, 12:51:26 AM1/21/15
to ns-3-...@googlegroups.com
Thank you very much. This problem has really bothered me for a little while.

I will try to fix it, if there is any progress I would be glad to tell you.  

Best
Fan

--
You received this message because you are subscribed to a topic in the Google Groups "ns-3-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ns-3-users/YR78eyVkg08/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ns-3-users+...@googlegroups.com.
To post to this group, send email to ns-3-...@googlegroups.com.
Visit this group at http://groups.google.com/group/ns-3-users.
For more options, visit https://groups.google.com/d/optout.

Aniesh Chawla

unread,
Jan 21, 2015, 2:31:18 AM1/21/15
to ns-3-...@googlegroups.com
Hi tommaso and Zhou,
I don't think there is bug in the code:
look at this piece of line :
at line 297 of rtt-estimator.cc file of ns-3.21

if (temp < m_minRto.ToInteger (Time::MS))
    {
      temp = m_minRto.ToInteger (Time::MS);
    }


so min rto is set as 1 sec and that is why this is there. Please let me know if my understanding is correct

Aniesh

Nat P

unread,
Jan 21, 2015, 3:42:03 AM1/21/15
to ns-3-...@googlegroups.com


Il giorno mercoledì 21 gennaio 2015 08:31:18 UTC+1, Aniesh Chawla ha scritto:
Hi tommaso and Zhou,
I don't think there is bug in the code:
look at this piece of line :
at line 297 of rtt-estimator.cc file of ns-3.21

if (temp < m_minRto.ToInteger (Time::MS))
    {
      temp = m_minRto.ToInteger (Time::MS);
    }


so min rto is set as 1 sec and that is why this is there. Please let me know if my understanding is correct

Aniesh

I wish you were right, but in fact the bug is alive (and kicking us behind). Let me explain a bit more: the logic of RTT and RTO are in the same place (rtt-estimator.cc). When using Timestamp, some functions inside rtt-estimator.cc are not called (replaced by call to timestamp ones). This way, the RTO is never updated, and so the same value is returned along the entire communications. 1 line patch (easy):


TcpSocketBase::EstimateRtt (const TcpHeader& tcpHeader)
{
  Time nextRtt;

  if (m_timestampEnabled)
    {
      nextRtt = TcpOptionTS::ElapsedTimeFromTsValue (m_lastEchoedTime);
+    m_rtt->EstimateRttFromSeq (tcpHeader.GetAckNumber () );

    }
  else
    {
      // Use m_rtt for the estimation. Note, RTT of duplicated acknowledgement
      // (which should be ignored) is handled by m_rtt.
      nextRtt =  m_rtt->EstimateRttFromSeq (tcpHeader.GetAckNumber () );
    }

But, at this point, I think the time for a little refactoring (splitting out RTT logic from RTO logic) is arrived.

Can we do it before the release?

Nat

Tommaso Pecorella

unread,
Jan 21, 2015, 5:14:48 AM1/21/15
to ns-3-...@googlegroups.com
Hi Nat,

I don't know if we have enough time for a full refactory, plus the testing.

The 1-line patch could work, but it kinda miss the point of having the TimeStamp option. I'd rather feed the RTT estimator with the measured RTT, e.g.,
m_rtt->SetCurrentEstimate (nextRtt);
Mind that even the values from TimeStamp option are measures (just like the ones derived from the Ack Rx time), so they can be imprecise, have a variance, change over time and so on.
As a consequence, using the "standard" way of smoothing makes sense. 

I don't have the time right now to check either solution, but a check is mandatory.

Cheers,

T.

Nat P

unread,
Jan 21, 2015, 6:17:57 AM1/21/15
to ns-3-...@googlegroups.com


Il giorno mercoledì 21 gennaio 2015 11:14:48 UTC+1, Tommaso Pecorella ha scritto:
Hi Nat,

I don't know if we have enough time for a full refactory, plus the testing.

The 1-line patch could work, but it kinda miss the point of having the TimeStamp option. I'd rather feed the RTT estimator with the measured RTT, e.g.,
m_rtt->SetCurrentEstimate (nextRtt);
Mind that even the values from TimeStamp option are measures (just like the ones derived from the Ack Rx time), so they can be imprecise, have a variance, change over time and so on.
As a consequence, using the "standard" way of smoothing makes sense. 


The only solution that, without refactoring the rtt/rto code (IMHO m_rtt->GetRTO () is a non-sense) will resolve this problem is calling in some way Measurement (m), where m is the rtt that we have. Actually, it is called only inside EstimateRTT; 

Using this line


m_rtt->EstimateRttFromSeq (tcpHeader.GetAckNumber () );

will have the effect of only "feeding" the rto calculation, but the rtt used would be the one obtained with timestamp (nextRtt is updated from the Timestamp option).

Tommaso Pecorella

unread,
Jan 21, 2015, 7:30:33 AM1/21/15
to ns-3-...@googlegroups.com
Not entirely correct: m_lastRtt may be updated with the "correct" value, but m_lastRtt is not used in the RTT estimation.
It is used in the TcpWestwood::EstimateRtt, which isn't really used to estimate the RTT, it's an expansion of the TcpSocketBase::EstimateRtt and it is used to update the bandwidth estimation.

Summarizing: changing m_lastRtt doesn't affect how the RTT estimator is doing its calcs, unless you push the value into it. Calling EstimateRttFromSeq will use the old method, discarding the TimeStamp data.

Anyway, the point isn't who's right. The point is how to fix the bug and improve the code.
 
T.

Zhou Charles

unread,
Jan 21, 2015, 7:46:52 PM1/21/15
to ns-3-...@googlegroups.com
Hi Tommaso:

A quick fix to this problem is to move m_rto = m_rtt->RetransmitTimeout () from NewAck function to ReceivedAck function, and add m_rtt->Measurement(m_lastRtt) and m_rto = m_rtt->RetransmitTimeout () in front of it. 

The revised ReceivedAck function is shown as followings:

void
TcpSocketBase::ReceivedAck (Ptr<Packet> packet, const TcpHeader& tcpHeader)
{
  NS_LOG_FUNCTION (this << tcpHeader);

  // Received ACK. Compare the ACK number against highest unacked seqno
  if (0 == (tcpHeader.GetFlags () & TcpHeader::ACK))
    { // Ignore if no ACK flag
    }
  else if (tcpHeader.GetAckNumber () < m_txBuffer.HeadSequence ())
    { // Case 1: Old ACK, ignored.
      NS_LOG_LOGIC ("Ignored ack of " << tcpHeader.GetAckNumber ());
    }
  else if (tcpHeader.GetAckNumber () == m_txBuffer.HeadSequence ())
    { // Case 2: Potentially a duplicated ACK
      if (tcpHeader.GetAckNumber () < m_nextTxSequence && packet->GetSize() == 0)
        {
          NS_LOG_LOGIC ("Dupack of " << tcpHeader.GetAckNumber ());
          DupAck (tcpHeader, ++m_dupAckCount);
        }
      // otherwise, the ACK is precisely equal to the nextTxSequence
      NS_ASSERT (tcpHeader.GetAckNumber () <= m_nextTxSequence);
    }
  else if (tcpHeader.GetAckNumber () > m_txBuffer.HeadSequence ())
    { // Case 3: New ACK, reset m_dupAckCount and update m_txBuffer
      NS_LOG_LOGIC ("New ack of " << tcpHeader.GetAckNumber ());
      EstimateRtt (tcpHeader);
      m_rtt->Measurement(m_lastRtt);
      // On recieving a "New" ack we restart retransmission timer .. RFC 2988
      m_rto = m_rtt->RetransmitTimeout ();
      NewAck (tcpHeader.GetAckNumber ());
      m_dupAckCount = 0;
    }
  // If there is any data piggybacked, store it into m_rxBuffer
  if (packet->GetSize () > 0)
    {
      ReceivedData (packet, tcpHeader);
    }
}


I did some tests and the results look fine:

(Red line is the measured RTT in last round, green line is the estimated RTO in next round)

The idea is pretty straightforward: before calculating RTO, first get the measured RTT in last round, than using Measure(Time:time) function update m_variance. 

Of course more tests are required to prove that this simple remedy would not cause backfire. 

Besides, I found another problem in RTO setting here, and would like to receive your suggestions. According to the RFC2988:

"When an ACK is received that acknowledges new data, restart the retransmission timer so that it will expire after RTO seconds"

 In NS3, after each timeout, the m_multiplier will exponentially increase until it reaches maximum (default 64). However, it seems that when an new ACK is received, the m_multiplier is not properly restarted to 1 after recovering from a timeout. The outcome is that the m_multiplier will continue increasing for not only consecutive timeout event but also for discrete timeout event. I guess this is against with the RFC2988. 

To illustrates this, I did a test. I set up a simple network: A --> B --> C. A is sending data to C, B is the intermediate node. Every 2 seconds B will stop transmitting for 0.2s (packets sent from A is dropped). So we create timeout event once every 2 seconds. The whole simulation will run for 100 seconds. And the following is the simulated cwnd of A:

It is clear that although the timeout happens discreetly, the  m_multiplier keeps increasing until it reach 64.

The easy way to fix this, is to reset m_multiplier every time a new ACK arrives. 

Thanks and best regards
Fan


在 2015年1月21日星期三 UTC-5上午7:30:33,Tommaso Pecorella写道:

在 2015年1月21日星期三 UTC-5上午7:30:33,Tommaso Pecorella写道:

Tommaso Pecorella

unread,
Jan 22, 2015, 2:14:47 AM1/22/15
to ns-3-...@googlegroups.com
Hi Zhou,

Natale sent me this patch. It's a bit longer than what you're proposing, but it may solve also the multiplier problem.
Mind to check it ? Alternatively, mind sharing the scripts you're using for the tests ?

Thanks,

T.
rtt.patch

Nat P

unread,
Jan 22, 2015, 3:34:46 AM1/22/15
to ns-3-...@googlegroups.com


Il giorno giovedì 22 gennaio 2015 01:46:52 UTC+1, Zhou Charles ha scritto:
Hi Tommaso:

A quick fix to this problem is to move m_rto = m_rtt->RetransmitTimeout () from NewAck function to ReceivedAck function, and add m_rtt->Measurement(m_lastRtt) and m_rto = m_rtt->RetransmitTimeout () in front of it. 

Hi Zhou, your patch doesn't remove a side effect that me and Tommaso (mmm, only Tommaso) discovered yesterday, which is the monotonic increase of m_history (in m_rtt). However, it could be useful if you can produce a diff file, and post that diff on the bugtracker.. Thank you !

Zhou Charles

unread,
Jan 22, 2015, 9:27:57 AM1/22/15
to ns-3-...@googlegroups.com
Hi Tommaso:

I would be glad to check the patch. BTW, you mean script for RTO (the first) test or m_multiplier test? 

Thanks and best regards
Fan

--

Zhou Charles

unread,
Jan 22, 2015, 9:29:40 AM1/22/15
to ns-3-...@googlegroups.com
Hi Nat:

I am sure you are right. I did not check the fast fix deeply so there might be some side effects.

Would you mind telling me what do you mean by diff file?

Best
Fan

--

Zhou Charles

unread,
Jan 22, 2015, 10:36:05 AM1/22/15
to ns-3-...@googlegroups.com
Hi Tommaso, Natale, Nat:

The patch works perfect! I haven't deeply check each line of the code, but I did some test. All tests show now the RTO estimation works fine.

Test1: RTT in last round (red line) and RTO estimation (green line), x_axis is time

Test2: RTO with multiple discreet timeout event


It seems that both bugs have been fixed! THANKS A LOT Tommaso, Natale and Nat! (actually I am not sure are Natale and Nat the same person?)

Best
Fan


在 2015年1月22日星期四 UTC-5上午2:14:47,Tommaso Pecorella写道:

Tommaso Pecorella

unread,
Jan 25, 2015, 9:02:07 AM1/25/15
to ns-3-...@googlegroups.com
Hi Zhou,

Nat and Natale are the same guy.
However I have a thing to ask: mind checking this new patch ?
It is quite huge, as it addresses a number of issues in both RTT and RTO calcs, plus it moves a large part of code around. Since it's an intrusive patch, we need all the testing we can before pushing it.

Thanks,

T.
rtt.patch

Zhou Charles

unread,
Jan 25, 2015, 10:41:27 AM1/25/15
to ns-3-...@googlegroups.com
Hi Tommaso:

Sure, I would be very glad. But I am afraid that I cannot start until the Feb. Now I am very busy as there is an important deadline in 31 Jan.

BTW, I have implemented a new TCP protocol (TCP Compound) in ns3, would you interest in the possibility of add it into the NS3 family? I understand a lot of tests must have been done before that, but I hope that I can make some contributions to the open source project.

Thanks and best regards
Fan


--

Tommaso Pecorella

unread,
Jan 25, 2015, 12:10:40 PM1/25/15
to ns-3-...@googlegroups.com
Hi,

no problem, when you can it's ok.
About TCP Compound, sure thing, we'll be happy to have another TCP flavour. You may want to check the ns-3 contribution process, it's in the ns-3 website.

Cheers,

T.

Eneko Atxutegi Narbona

unread,
Mar 11, 2015, 12:17:11 PM3/11/15
to ns-3-...@googlegroups.com
Hi Zhou,

I'm currently evaluating the performance of various emulated/simulated TCP flavours in ns-3. Have you finally made your compound TCP code available somewhere or do you plan to do so in the near future?

Thanks in advance,

Eneko
University of the Basque Country (UPV/EHU)

Zhou Charles

unread,
Mar 23, 2015, 5:50:43 PM3/23/15
to ns-3-...@googlegroups.com
Hi Eneko:

Sorry for late reply. I will submit the code of CTCP to my github tonight, while it was done I will send you address. 

But I would like to say that I have not really thoroughly tested it, so if you find any bugs in it I would appreciate it if you let me know. 

Best
Fan

Zhou Charles

unread,
Mar 23, 2015, 9:33:43 PM3/23/15
to ns-3-...@googlegroups.com
Hi Eneko:


Thanks

Nat P

unread,
Mar 24, 2015, 6:56:31 AM3/24/15
to ns-3-...@googlegroups.com


Il giorno martedì 24 marzo 2015 02:33:43 UTC+1, Zhou Charles ha scritto:
Hi Eneko:


Thanks

Nice, could you please make a patch on codereview.appspot.com ? I'd like to review your code.

Thank you!

Eneko Atxutegi Narbona

unread,
Mar 24, 2015, 10:54:46 AM3/24/15
to ns-3-...@googlegroups.com
Hi Zhou.

Thank you for your time and collaboration. For sure, I will use your code. I appreciate it.

Best wishes,

Eneko

Zhou Charles

unread,
Mar 24, 2015, 6:10:42 PM3/24/15
to ns-3-...@googlegroups.com
Hi Eneko:

You are welcome. I realized today that some problems might occur when you try to use this code in ns3-3.22. 

Which TCP version are you using?

Hi Nat:

I would be glad to submit a patch and I really appreciate your review. 

I will did some reviews first myself and submit it, after this is done I will let you know.

Thanks 
Fan

Eneko Atxutegi Narbona

unread,
Mar 25, 2015, 7:25:45 AM3/25/15
to ns-3-...@googlegroups.com
Hi Zhou,

I'm using ns3-3.21 and your code works in it. However, I have noticed that, the Compound congestion avoidance phase only happen after a loss/drop event. I don't know whether the issue is being caused by a non-proper estimation of RTT, conflict/incompatibility between both congestion avoidance phases or something else.
I will try to figure out. Any suggestion will be gratefully received.

Thanks again.

Eneko

Zhou Charles

unread,
Mar 25, 2015, 1:52:27 PM3/25/15
to ns-3-...@googlegroups.com
Hi Eneko:

I would like to say when writing the code I did not really consider submiting it to ns3 - that is why I have not done very strict review. Since now I put that in my github I will review it carefully. Hopefully this could be done in this week. 

Thanks for any further suggestion. 
Best
Fan 

Zhou Charles

unread,
May 7, 2015, 8:15:29 PM5/7/15
to ns-3-...@googlegroups.com
Hi Nat:

Sorry for the delay and thanks a lot for helping review the code.  

I just made a patch file using git diff. However I do not know how to upload it using upload.py to codereview.appspot.com. Would you mind kindly give me an example of how you use upload.py? Or could I send the patch file to you directly?

Thanks
Fan

--

Tommaso Pecorella

unread,
May 8, 2015, 2:13:06 AM5/8/15
to ns-3-...@googlegroups.com
Hi Charles,

we just pushed some patches, and we're preparing for the 3.23 release. If you have an important and urgent bug to fix, forward it to me and Tom Henderson.

For code review, the best thing is to use mercurial and "upload.py". The exact instructions are here:

Cheers,

T.

Zhou Charles

unread,
May 8, 2015, 2:57:22 PM5/8/15
to ns-3-...@googlegroups.com
Hi Tommoaso:

It is not an urgent patch. I am just try to add another TCP variant (compound) to existent NS3 simulator. 

Unfortunately I am unable to use upload.py provided in the link, it keeps giving me authentication error, while I am pretty sure that my username and password is correct (I am using the same username and password to log into https://codereview.appspot.com/)

The error information was like:

The following files are not added to version control:
patch
upload.py
upload.py~
Are you sure to continue?(y/N) y
Upload server: codereview.appspot.com (change with -s/--server)
Loaded authentication cookies from /home/charles/.codereview_upload_cookies
New issue subject: tcpcompound
Email (login for uploading to codereview.appspot.com) [charles...@gmail.com]: charles...@gmail.com
Password for charles...@gmail.com
Invalid username or password.

Would you mind kindly shed me some light upon how to solve this?

Thanks and best regards
Fan

Tommaso Pecorella

unread,
May 8, 2015, 5:56:49 PM5/8/15
to ns-3-...@googlegroups.com
Hi,

very strange, but it happened to me too once. Sadly, I don't remember how I fixed it. Perhaps it's a problem with their server, whait a couple of days and see if it's fixed.
In case the problems persists, let us know.

Cheers,

T.

Fan Zhou

unread,
May 9, 2015, 11:11:22 PM5/9/15
to ns-3-...@googlegroups.com
Thanks, I will try again later.

Zhou Charles

unread,
May 13, 2015, 11:06:58 PM5/13/15
to ns-3-...@googlegroups.com
Hi Tommaso:

Sorry after several days I still cannot using upload.py. I upload my patch here and I was wondering if it is possible for you to kindly upload it for me?

Thanks and best regards
Fan

 
tcpcompound.patch
Reply all
Reply to author
Forward
0 new messages