TCP Send Buffer Overflow

589 views
Skip to first unread message

b.s...@gmail.com

unread,
Nov 3, 2017, 2:13:35 PM11/3/17
to ns-3-users
Good day again,

Still working on MANET simulations, may have run into an overflow in the TCP send buffer.  

As an attempt to simulate background traffic on the network, I've modified the UDP Echo Server and On/Off application into a TCP Reply Server and TCP Traffic Generator applications.  Basically, the Traffic Generator runs for a few seconds sending packets with it's Node ID and a string, plus padding to get me to my desired packet size of 256kB.  The echo server crafts a reply of the same size and fires it back.  This helps me keep track of who's talking to who in the PCAPs.  I also added a HandleRead method for the traffic generator and registered its callback so that the generator will properly receive the replies from the server.  The generator than goes to sleep for several more seconds before popping back up.  I'm using random variables for the on/off periods along with start times, and assigning these applications to various groups of nodes throughout the MANET.  My end goal is to have different parts of the network under different traffic loads at different times.  From examining the PCAPs, this appears to be successful.

I am running simulations using AODV and OLSR.  To account for this, I have moved the creation and opening of the socket from StartApplication to StartSending, so that connections occur a few seconds in and after most initial routes are established.  I have also added code to check if the socket has been created but is not connected, and to attempt to reconnect if the socket is in an ERR_NOTCONN or ERR_NOROUTETOHOUST state.  Seems to be an issue in OLSR, less so AODV.

I have reached the point now where everything seems to be working.  I can run five minutes on a fairly static network of 35 nodes...I have a moving network of 136 nodes running at the moment, about 145 seconds into a  five minute run in and over a minute past the point where it usually explodes.  In the PCAPs, TCP traffic continues periodically right until the current point in the simulation.  My next big test will be a network of 545 nodes.

But in the course of working on this,  I have run into two odd issues:

The first has to do with using the AddPaddingAtEnd method for the packets.  Viewed in Wireshark, the padding was not zeros.  It seemed to be pulling stuff off the stack, as I could sometimes see strings I use elsewhere in the applications, sometimes junk data.  The method was called simply as packet->AddPaddingAtEnd(m_pktMaxSize - packet->GetSize() ).  I worked around this issue by creating a empty packet of the size I needed and using the AddAtEnd method to add it to my message packet.

The second and far more troublesome issue has to do with what seems to be overflows in the TCP transmission buffer.  I was experiencing crashes 15-30 seconds into the simulation, usually accompanied by sudden spikes in memory usage. The only other clues from debugging and logging were that it was terminating due to std::bad_alloc, and the issue seemed to involve tcp-tx-buf.  I worked around this by checking if the TCP send buffer has enough space for my packet before I try sending.  If it doesn't, I do not send the packet and I increment an error counter.  After ten errors, the application gives up for the current ON cycle.  A successful send resets the error counter, and two full cycles without a successful send (twenty errors) will cause the socket to be closed and a new one created.  As I said, this appears to work.

The reason I bring this up is because I would expect some sort of error-handling in this situation.  I did see the bit about packets not being removed from the buffer until they are ACK'd by the recipient...and I would imagine that because I'm using a MANET more ACKs than usual are being lost or delayed.  (Going through the PCAPs generated so far of the current run, it appears that delayed ACKs and packet re-transmission are the culprit)

I have tried increasing the size of the buffer with "Config::Set("ns3::TcpSocket/SndBufSize", UintegerValue(262144));" during my simulation setup, but I'll admit I'm not all that great with the configuration system.  I'm not sure if works...

A couple code snippets from the Traffic Generator...AKA the modified On/Off Application.  I know, it needs to be cleaned up and have a few things fixed, but I've been adding bits and pieces as I was troubleshooting.



// Event handlers
void TrafficGenerator::StartSending ()
{
  NS_LOG_FUNCTION (this);
  //Reset our error count from the last cycle so we try again this cycle.
  m_errorCount = 0;
  //If we've already had two error loops, we need to recreate the socket.
  if (m_errorLoops >= 2)
  {
  NS_LOG_INFO (Simulator::Now ().GetSeconds () << ": Max errors exceeded on Node " << GetNode()->GetId() << ". Resetting socket.");

  m_errorLoops = 0;
  m_socket->Close();
  m_socket = NULL;
  }
  // Create the socket if not already
      if (!m_socket)
        {
          m_socket = Socket::CreateSocket (GetNode (), m_tid);
          if (Inet6SocketAddress::IsMatchingType (m_peer))
            {
              if (m_socket->Bind6 () == -1)
                {
                  NS_FATAL_ERROR ("Failed to bind socket");
                }
            }
          else if (InetSocketAddress::IsMatchingType (m_peer) ||
                   PacketSocketAddress::IsMatchingType (m_peer))
            {
              if (m_socket->Bind () == -1)
                {
                  NS_FATAL_ERROR ("Failed to bind socket");
                }
            }
          m_socket->Connect (m_peer);
          NS_LOG_INFO(Simulator::Now().GetSeconds() << ": Creating new socket on Node " << GetNode()->GetId() << "...");
          m_socket->SetAllowBroadcast (true);

          m_socket->SetRecvCallback (MakeCallback (&TrafficGenerator::HandleRead, this));
          m_socket->SetConnectCallback (
            MakeCallback (&TrafficGenerator::ConnectionSucceeded, this),
            MakeCallback (&TrafficGenerator::ConnectionFailed, this));
          m_cbrRateFailSafe = m_cbrRate;
        }
      else if(m_socket->GetErrno() == Socket::ERROR_NOTERROR  && !m_connected)
      {
      NS_LOG_INFO(Simulator::Now().GetSeconds() << ": Node " << GetNode()->GetId() << " not connected at On Cycle...");
    NS_LOG_INFO(Simulator::Now().GetSeconds() << ": Requesting reconnection on Node " << GetNode()->GetId() << "...");
      m_socket->Connect(m_peer);
      }


      //If the socket error is concerning a lack of route or just not being connected, try to connect.
       if (m_socket->GetErrno() && !m_connected)
       {
        NS_LOG_INFO(Simulator::Now().GetSeconds() << ": Socket State:" << m_socket->GetErrno() << " on Node " << GetNode()->GetId() << ".");

        if((m_socket->GetErrno() == Socket::ERROR_NOTCONN) || (m_socket->GetErrno() == Socket::ERROR_NOROUTETOHOST))
            {
            NS_LOG_INFO(Simulator::Now().GetSeconds() << ": Requesting reconnection on Node " << GetNode()->GetId() << "...");
            m_socket->Connect(m_peer);
            }

       }
  m_lastStartTime = Simulator::Now ();

  ScheduleNextTx ();  // Schedule the send packet event
  ScheduleStopEvent ();
}


void TrafficGenerator::SendPacket ()
{
  NS_LOG_FUNCTION (this);
  //If we're not connected, don't even bother.
  if(!m_connected || m_errorCount > 10)
  {
  m_errorLoops++;
  return;
  }
  NS_ASSERT (m_sendEvent.IsExpired ());
  Ptr<Packet> packet = Create<Packet> ((uint8_t*)m_fillData.c_str(), m_fillData.size());
  Ptr<Packet> padding = Create<Packet> (m_pktSize-m_fillData.size());
  packet->AddAtEnd(padding);
  m_txTrace (packet);
  int bytesSent = 0;
  if (m_socket->GetTxAvailable() > packet->GetSize())
  {
  bytesSent = m_socket->Send(packet);
  m_errorCount = 0;
  }
  else
  {
  //NS_LOG_INFO (Simulator::Now ().GetSeconds () << ": Generator TCP Tx Buffer Full on Node " << GetNode()->GetId());
  m_errorCount++;
  }
  if (bytesSent < 0 && !m_errorCount)
  {
  NS_LOG_INFO(Simulator::Now().GetSeconds() << ": Traffic generator socket returned error state on Node " << GetNode()->GetId() << ".");
  m_connected = false;
  }
  else
  {
  m_totBytes += m_pktSize;
  if (InetSocketAddress::IsMatchingType (m_peer))
  {
  /*NS_LOG_INFO ("At time " << Simulator::Now ().GetSeconds ()
                   << "s on-off application sent "
                   <<  packet->GetSize () << " bytes to "
                   << InetSocketAddress::ConvertFrom(m_peer).GetIpv4 ()
                   << " port " << InetSocketAddress::ConvertFrom (m_peer).GetPort ()
                   << " total Tx " << m_totBytes << " bytes");*/
  }
  else if (Inet6SocketAddress::IsMatchingType (m_peer))
  {
  NS_LOG_INFO ("At time " << Simulator::Now ().GetSeconds ()
                   << "s on-off application sent "
                   <<  packet->GetSize () << " bytes to "
                   << Inet6SocketAddress::ConvertFrom(m_peer).GetIpv6 ()
                   << " port " << Inet6SocketAddress::ConvertFrom (m_peer).GetPort ()
                   << " total Tx " << m_totBytes << " bytes");
  }
  }

  m_lastStartTime = Simulator::Now ();
  m_residualBits = 0;
  ScheduleNextTx ();
}


Message has been deleted

b.s...@gmail.com

unread,
Nov 3, 2017, 2:17:13 PM11/3/17
to ns-3-users
Forgot to mention I am using version 3.27.  I was using 3-dev, but reverted when I hit trouble.
 

Tommaso Pecorella

unread,
Nov 4, 2017, 2:00:24 PM11/4/17
to ns-3-users
Hi

thanks for the detailed report.

Could you please post a complete script so that we can reproduce the problem ?
It would be best to have TWO scripts, one for the AddPaddingAtEnd problem, and one for the buffer overflow.
Of course, try to remove any "unnecessary" part, so we can concentrate on the issues.

As a side note, the buffer overflow could be also a completely normal behaviour, but it's best to check.

Cheers,

T.

b.s...@gmail.com

unread,
Nov 6, 2017, 1:57:27 PM11/6/17
to ns-3-users
Just as an update, the issue is still ongoing.  I made several small changes over the weekend, and I have a few simulations running while I work on another project.

Oddly enough, a simulation that ran fine for five minutes at home crashed after two minutes on my workstation at school.  It definitely seems to be a buffer overrun, as the dmesg output indicates that the IP was at 000000000001 (segfault error 14).  

The 545-node simulation I had running over the weekend crashed 85 seconds it...I have it running again (with minor tweaks) in a debugger, so will hopefully get better data out.

Putting together a stripped-down version with only the traffic generation and trying to reproduce the problem is on my 'to do' list.  Will send that when it gets done.

Thanks!


Reply all
Reply to author
Forward
0 new messages