Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Strange/dire Windows TCP Performance
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  24 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Bernard Brooks  
View profile  
 More options Oct 22 2002, 12:38 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: bernardb...@hotmail.com (Bernard Brooks)
Date: 22 Oct 2002 09:38:23 -0700
Local: Tues, Oct 22 2002 12:38 pm
Subject: Strange/dire Windows TCP Performance
I've seen a lot of news articles of TCP going slowly, but none that look
similar to this.  Anyone else seen this problem, or can say why it's happening?

A problem of WinSock2 appears to cause a severe drop in thoughput.  The problem
shows when the application uses send() to send a large buffer of data, followed
by a send of a small buffer of data. eg

    for (;;) {
        send (sd, buf, 12000);
        send (sd, buf, 4);
    }

Of course TCP is a byte stream and the concept of message boundaries should be
irrelevant, but the way WinSock2 appears to implement socket buffers, it seems
that they assume send buffers to imply message boundaries!  If a small send
buffer is passed following a large send buffer, WinSock2 will wait until *all*
the data from the large buffer is acknowledged before starting to transmit the
small buffer.  At least, this is the way it seems.  What's more, disabling
Nagle doesn't change this.

This only happens if a large send is followed immediately by a small send.  The
definitions of a large and a small send are as follows:

A large send is defined as any send both greater than the current tcp window
size, and greater than or equal to the current SO_SNDBUF size.  Note: I'm only
talking about unidirectional data flow, so "tcp window size" refers to the
window size advertised by the receiver.

A small send seems to be: insufficient data to fill 2 full packets.  Actually,
I'm not 100% sure about this...

For example.  On an ethernet connection with TCP MSS of 1460.  A WinSock2
server sending data to a Solaris client.  Typically the Solaris client (data
consumer) advertises a tcp window size of 8760 bytes.  The default SO_SNDBUF
size on the WinSock2 server (data producer) is 8192 bytes...

Then, sending a 12000 byte buffer followed by a 4 byte buffer, repeatedly, will
show this problem: 12000 bytes sends 8 full packets (1460 bytes each), and
leaves a 320 byte remainder.  Therefore, (2 * 1460) - 320 = 2600.  Any send of
less than 2600 bytes following a send of 12000 bytes will show this problem.

What happens is this:

WinSock2 sends the 8 full sized 1460 byte packets.  It coalesces the 320 byte
remainder and the 4 byte buffer (I left Nagle enabled in this test) and sends
them together in a 324 byte packet.  It then waits until all 12004 bytes of
data have been acknowledged before indicating (via select or a blocking send
call) that buffer space is now available.  Since the last packet sent (324
bytes) is a small packet, the peer delays the ACK (for approx 100ms), leading
to a massive drop in performance.

A snoop of such an exchange
     0.00044 a -> b Ack=536882553 Seq=421461919 Len=1460 Win=64240
     0.00011 a -> b Ack=536882553 Seq=421463379 Len=1460 Win=64240
     0.00009 b -> a Ack=421464839 Seq=536882553 Len=0    Win=8760
     0.00003 a -> b Ack=536882553 Seq=421464839 Len=1460 Win=64240
     0.00012 a -> b Ack=536882553 Seq=421466299 Len=1460 Win=64240
     0.00007 b -> a Ack=421467759 Seq=536882553 Len=0    Win=8760
     0.00005 a -> b Ack=536882553 Seq=421467759 Len=1460 Win=64240
     0.00011 a -> b Ack=536882553 Seq=421469219 Len=1460 Win=64240
     0.00007 b -> a Ack=421470679 Seq=536882553 Len=0    Win=8760
     0.00005 a -> b Ack=536882553 Seq=421470679 Len=1460 Win=64240
     0.00012 a -> b Ack=536882553 Seq=421472139 Len=1460 Win=64240
     0.00002 a -> b Ack=536882553 Seq=421473599 Len=324  Win=64240
     0.00005 b -> a Ack=421473599 Seq=536882553 Len=0    Win=8760
        <delayed ack>
     0.09862 b -> a Ack=421473923 Seq=536882553 Len=0    Win=8760
        <all data ack'd, only now does WinSock2 send the next data>
     0.00061 a -> b Ack=536882553 Seq=421473923 Len=1460 Win=64240

You get a similar trace with Nagle disabled (ie TCP_NODELAY set).

Keep in mind that WinSock2 tends to advertise much larger TCP windows, so if
you're trying this out sending data Windows to Windows, you have to use larger
numbers to see this - but it's still visible.  Try sending 64241 bytes followed
by 1000 bytes, repeatedly.

There seems to be a timing issue involved as well...  On our Windows to Windows
tests, I was using Ethereal software running on the Windows server (the data
producer) to trace the TCP traffic.  Sending 64241 bytes followed by 1000 bytes
*without* running Ethereal, the transfer rate was low (as described above).  As
soon as I started tracing on the server, the transfer rate went back to normal
(high)!

Astonishingly, using WSASend with two buffers, one of 12000 bytes and one of 4
bytes exhibits the same problem!  You might expect WinSock2 to treat the two
buffers as one (in the style of Unix writev), but it doesn't.

Bernard Brooks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Schwartz  
View profile  
 More options Oct 22 2002, 3:51 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: David Schwartz <dav...@webmaster.com>
Date: Tue, 22 Oct 2002 12:47:57 -0700
Local: Tues, Oct 22 2002 3:47 pm
Subject: Re: Strange/dire Windows TCP Performance

Bernard Brooks wrote:
> A problem of WinSock2 appears to cause a severe drop in thoughput.  The problem
> shows when the application uses send() to send a large buffer of data, followed
> by a send of a small buffer of data. eg

>     for (;;) {
>         send (sd, buf, 12000);
>         send (sd, buf, 4);
>     }

        Okay, first, this program is broken. You would expect pathological
behavior from a program like this.

        When you send data to TCP, you must use a sensible buffer flushing
strategy and the one shown above is nonsensical. There is absolutely no
reason to do a 4-byte send when you have more than 10,000 bytes ready to
go at that time.

        You must either pass a reasonably large buffer (2Kb or more) or all the
data that is ready to go at that time on each call to send. If you don't
do this, TCP performance will suck.

> Of course TCP is a byte stream and the concept of message boundaries should be
> irrelevant, but the way WinSock2 appears to implement socket buffers, it seems
> that they assume send buffers to imply message boundaries!  If a small send
> buffer is passed following a large send buffer, WinSock2 will wait until *all*
> the data from the large buffer is acknowledged before starting to transmit the
> small buffer.  At least, this is the way it seems.  What's more, disabling
> Nagle doesn't change this.

        Winsock does not have ESP. It has to make a decision whether or not to
send a packet and it has to make it when you call 'send'. Otherwise, it
can set a timer. It has no other options. If Winsock were to send data
immediately in your 4 byte send, what happens if it's followed later by
32 4-byte sends? Should they all go in their own packet, with data
efficiency dropping in the toilet as headers exceed data by huge
factors?

        Winsock could handle this better, I admit. But this is definitely a
"then don't do that". If you care about TCP throughput and latency, you
*must* implement sensible buffer flushing. Disabling Nagle is not a
reasonable shortcut.

        DS


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernard Brooks  
View profile  
 More options Oct 23 2002, 9:08 am
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: bernardb...@hotmail.com (Bernard Brooks)
Date: 23 Oct 2002 06:08:42 -0700
Local: Wed, Oct 23 2002 9:08 am
Subject: Re: Strange/dire Windows TCP Performance
Hi David.  Thanks for the reply.

> >     for (;;) {
> >         send (sd, buf, 12000);
> >         send (sd, buf, 4);
> >     }

> Okay, first, this program is broken. You would expect pathological
> behavior from a program like this.

I absolutely agree that the program is poorly written, and the solution,
as you point out, is to employ a sensible buffering strategy.

But the code snippet above is purely pedagogical.  Regardless of the poor
coding, I still think there's a genuine problem with WinSock2 that's
worth looking at.  Neither Linux nor Solaris TCP/IP stacks suffer from
this problem.

Look at these two examples - both send a large buffer followed by a small
buffer: if you replace the 12000 and 4 with
     8760 bytes and 2 bytes  =   over 10 MBytes/s
     8761           1        =   under 100 KBytes/s

You can't just explain this away by saying that the applications
buffering strategy is poor.  Peversely, you only become subject to
these problems when you DO buffer!

> Winsock does not have ESP. It has to make a decision whether or not to
> send a packet and it has to make it when you call 'send'.

As I understand it, the algorithm on whether to send a packet is well
defined.

> If Winsock were to send data
> immediately in your 4 byte send, what happens if it's followed later by
> 32 4-byte sends? Should they all go in their own packet, with data
> efficiency dropping in the toilet as headers exceed data by huge
> factors?

 + With Nagle disabled, that's exactly what it should do.  And yes,
   it's inefficient.

 + With Nagle enabled, it'll send one 4 byte packet, and then a sequence
   of full packets (since the computer is easily fast enough to
   coalesce a full packet before receiving the ACK from the previous
   packet).  This is the whole point of the Nagle algorithm.

But... the problem I'm showing is quite different.  [ No I'm not talking
about the well discussed problem of the deadlock introduced by Nagle
and the Delayed Ack. ]

This problem seems to be in the WinSock2 socket buffering layer.  It
seems that the WinSock2 socket buffer is not allowing data to be
presented to the TCP/IP stack in a timely fashion.

My tests seem to show that it's the socket buffering layer itself that's
blocking until *all* data has been acknowledged.  It's blocking to such
an extent that it's not allowing the program to present any more data,
which prevents it from using the Nagle algorithm to coalesce it.

The first send(12000) gets presented to the TCP/IP stack.  The next
send(4) gets presented, and the 4 bytes + the 320 bytes (remaining
from the 12000) get coalesced.  The socket buffer is now full, so you'd
expect the next send(12000) to block - which is fine.

Then WinSock2 starts to receive ACK's from the peer.  As the ACK's
arrive, you'd expect it to release buffer space, and maybe it does...
who can tell?  What needs explaining is that (as you can see from the
trace) even after receiving ACK's for 11680 bytes of data (in my mind
leaving _plenty_ of space in the socket buffer), it still doesn't
unblock the socket!  Why?

The explanation seems to be that WinSock2 socket buffering layer treats
individual send buffers as *messages*, and not as a byte stream as it
should.  I'm sure this bug is affecting a lot of other peoples code too.

Bernard Brooks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
phn  
View profile  
 More options Oct 23 2002, 10:41 am
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: p...@icke-reklam.ipsec.nu
Date: 23 Oct 2002 14:36:19 GMT
Local: Wed, Oct 23 2002 10:36 am
Subject: Re: Strange/dire Windows TCP Performance
In comp.protocols.tcp-ip Bernard Brooks <bernardb...@hotmail.com> wrote:

Any system that is more complex then a stack of punced cards must be
understood and used properly to be reasonable efficient.

The above is to drive a car in first gear only and complain about
low gas mileage.

conclusion : do not expect inexperienced programmers to write
good code. Education and measurment is needed.
--
Peter Håkanson        
        IPSec  Sverige      ( At Gothenburg Riverside )
           Sorry about my e-mail address, but i'm trying to keep spam out,
           remove "icke-reklam" if you feel for mailing me. Thanx.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rufus V. Smith  
View profile  
 More options Oct 23 2002, 12:18 pm
Newsgroups: alt.winsock.programming, comp.protocols.tcp-ip
From: "Rufus V. Smith" <nos...@nospam.net>
Date: Wed, 23 Oct 2002 12:17:59 -0400
Local: Wed, Oct 23 2002 12:17 pm
Subject: Re: Strange/dire Windows TCP Performance

IT'S JUST AN EXAMPLE, PETER!  NOT ACTUAL CODE!

The same behavior and performance hit could happen if two objects
were serializing themselves in sequence out the same socket and one
happened to be large and one small.

Or two threads might use the same socket and be completele oblivious
to what the other thread is sending out.  One just happened to have a
large block and the other a small one.

Why are you fixated on the form or quality of the example????

Rufus


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Fernando Gont  
View profile  
 More options Oct 23 2002, 5:08 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: arielg...@softhome.net (Fernando Gont)
Date: Wed, 23 Oct 2002 21:32:00 GMT
Subject: Re: Strange/dire Windows TCP Performance
On Tue, 22 Oct 2002 12:47:57 -0700, David Schwartz

<dav...@webmaster.com> wrote:
>> Of course TCP is a byte stream and the concept of message boundaries should be
>> irrelevant, but the way WinSock2 appears to implement socket buffers, it seems
>> that they assume send buffers to imply message boundaries!  If a small send
>> buffer is passed following a large send buffer, WinSock2 will wait until *all*
>> the data from the large buffer is acknowledged before starting to transmit the
>> small buffer.  At least, this is the way it seems.  What's more, disabling
>> Nagle doesn't change this.
>    Winsock does not have ESP. It has to make a decision whether or not to
>send a packet and it has to make it when you call 'send'.

Sorry, what does "ESP" stand for?

--
Fernando Gont
e-mail: ferna...@ANTISPAM.gont.com.ar

[To send a personal reply, please remove the ANTISPAM tag]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Barry Margolin  
View profile  
 More options Oct 23 2002, 5:47 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: Barry Margolin <bar...@genuity.net>
Date: Wed, 23 Oct 2002 21:47:02 GMT
Local: Wed, Oct 23 2002 5:47 pm
Subject: Re: Strange/dire Windows TCP Performance
In article <3db5f5a2.2551...@News.CIS.DFN.DE>,

Fernando Gont <arielg...@softhome.net> wrote:
>On Tue, 22 Oct 2002 12:47:57 -0700, David Schwartz
><dav...@webmaster.com> wrote:
>>        Winsock does not have ESP. It has to make a decision whether or not to
>>send a packet and it has to make it when you call 'send'.

>Sorry, what does "ESP" stand for?

Extra-Sensory Perception, i.e. mind-reading.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernard Brooks  
View profile  
 More options Oct 24 2002, 6:53 am
Newsgroups: alt.winsock.programming, comp.protocols.tcp-ip, comp.os.ms-windows.networking
From: bernardb...@hotmail.com (Bernard Brooks)
Date: 24 Oct 2002 03:53:14 -0700
Local: Thurs, Oct 24 2002 6:53 am
Subject: Re: Strange/dire Windows TCP Performance
Thanks again for the responses.  Please let's not get hung up on the
example programs.  It's the underlying problem with WinSock2 that needs
to be addressed.

I'll give another example which I think might interest you more.  Using
David Schwartz advice:

> You must either pass a reasonably large buffer (2Kb or more) or all the
> data that is ready to go at that time on each call to send.

Then you'll be surprised to hear that
    for (;;) {
        send (sd, buf, 12000);
        send (sd, buf, 2500);
    }

achieves an appauling 140 KBytes/s.  This must surely be affecting many
people?

Has anyone else seen this performance problem or know if Microsoft knows
anything about it?  Better still, can anyone explain what's going on in
WinSock2 to cause it?

Bernard Brooks

The small print... as per my first posting, my examples depend on a
uni-directional data flow from a WinSock2 machine; SO_SNDBUF=8192 (the
default); MSS=1460 (ethernet); to a machine offering a TCP window size    
of 8760 (eg solaris).  If your setup is different (ie sending to another
Windows machine), then you just have to plug in different numbers to
demonstrate the problem.  Also, my tests were done with Nagle ENABLED,
though a similar problem occurs with Nagle disabled.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
phn  
View profile  
 More options Oct 24 2002, 11:01 am
Newsgroups: alt.winsock.programming, comp.protocols.tcp-ip
From: p...@icke-reklam.ipsec.nu
Date: 24 Oct 2002 14:59:29 GMT
Local: Thurs, Oct 24 2002 10:59 am
Subject: Re: Strange/dire Windows TCP Performance
Rufus V. Smith <nos...@nospam.net> wrote:

Serializing ON THE SAME socket needs to be done with knowledge
about how tcp works.

> Or two threads might use the same socket and be completele oblivious
> to what the other thread is sending out.  One just happened to have a
> large block and the other a small one.

Threads is no cure for everything. In fact they might screw up
stuff, like using the same socket for two independent comminucation
channels. Again, they might blew your performance. Again, the cure
is knowledge about tcp and skillful programming.

> Why are you fixated on the form or quality of the example????

I'm not.

> Rufus

--
Peter Håkanson        
        IPSec  Sverige      ( At Gothenburg Riverside )
           Sorry about my e-mail address, but i'm trying to keep spam out,
           remove "icke-reklam" if you feel for mailing me. Thanx.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Fernando Gont  
View profile  
 More options Oct 24 2002, 1:59 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: arielg...@softhome.net (Fernando Gont)
Date: Thu, 24 Oct 2002 18:22:53 GMT
Local: Thurs, Oct 24 2002 2:22 pm
Subject: Re: Strange/dire Windows TCP Performance
On 22 Oct 2002 09:38:23 -0700, bernardb...@hotmail.com (Bernard

Brooks) wrote:
>Of course TCP is a byte stream and the concept of message boundaries should be
>irrelevant, but the way WinSock2 appears to implement socket buffers, it seems
>that they assume send buffers to imply message boundaries!  If a small send
>buffer is passed following a large send buffer, WinSock2 will wait until *all*
>the data from the large buffer is acknowledged before starting to transmit the
>small buffer.  At least, this is the way it seems.  What's more, disabling
>Nagle doesn't change this.

Do you know whether using TCP_NODELAY option with Winsock *really*
disables the Nagle algorithm or not?

--
Fernando Gont
e-mail: ferna...@ANTISPAM.gont.com.ar

[To send a personal reply, please remove the ANTISPAM tag]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Schwartz  
View profile  
 More options Oct 24 2002, 10:21 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: David Schwartz <dav...@webmaster.com>
Date: Thu, 24 Oct 2002 19:17:12 -0700
Local: Thurs, Oct 24 2002 10:17 pm
Subject: Re: Strange/dire Windows TCP Performance

Barry Margolin wrote:

> In article <3db5f5a2.2551...@News.CIS.DFN.DE>,
> Fernando Gont <arielg...@softhome.net> wrote:
> >On Tue, 22 Oct 2002 12:47:57 -0700, David Schwartz
> ><dav...@webmaster.com> wrote:
> >>      Winsock does not have ESP. It has to make a decision whether or not to
> >>send a packet and it has to make it when you call 'send'.

> >Sorry, what does "ESP" stand for?

> Extra-Sensory Perception, i.e. mind-reading.

        In this case, it can't predict the future. It doesn't know whether your
4 byte send is the first of 100 such sends, to be followed immediately
by another 4 byte send, or the last byte of data you'll send for a week.
So it *can't* do the right thing all the time.

        DS


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Schwartz  
View profile  
 More options Oct 25 2002, 12:16 am
Newsgroups: alt.winsock.programming, comp.protocols.tcp-ip
From: David Schwartz <dav...@webmaster.com>
Date: Thu, 24 Oct 2002 21:13:53 -0700
Local: Fri, Oct 25 2002 12:13 am
Subject: Re: Strange/dire Windows TCP Performance

"Rufus V. Smith" wrote:
> IT'S JUST AN EXAMPLE, PETER!  NOT ACTUAL CODE!

        It's an example of bad code, and so it works badly.

> The same behavior and performance hit could happen if two objects
> were serializing themselves in sequence out the same socket and one
> happened to be large and one small.

        Sure, that's why you have to handle those cases properly.

        You could, for example, accumulate small writes into an application
buffer until it either hits 2Kb or 100 milliseconds pass with no data
being written.

> Or two threads might use the same socket and be completele oblivious
> to what the other thread is sending out.  One just happened to have a
> large block and the other a small one.

        Sure, that's why you have to handle those cases properly. Users of a
socket *can't* be oblivious to other socket users. If you need to do
this, you need to write a sensible multiplexer with a sensible buffer
flushing strategy.

> Why are you fixated on the form or quality of the example????

        Because the example shows why you have to handle those cases properly.
In all of these cases, the programmer has more information that the TCP
stack, does the wrong thing, and expects the TCP stack to magically fix
it.

        And you know what cracks me up completely? The one thing that could
have helped an application with a poor buffer flushing strategy to still
work reasonably, which is Nagle's algorithm, was the thing that was
disabled first.

        DS


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Fernando Gont  
View profile  
 More options Oct 25 2002, 3:51 am
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: arielg...@softhome.net (Fernando Gont)
Date: Fri, 25 Oct 2002 08:14:30 GMT
Local: Fri, Oct 25 2002 4:14 am
Subject: Re: Strange/dire Windows TCP Performance
On Tue, 22 Oct 2002 12:47:57 -0700, David Schwartz

<dav...@webmaster.com> wrote:
>    Winsock does not have ESP. It has to make a decision whether or not to
>send a packet and it has to make it when you call 'send'. Otherwise, it
>can set a timer. It has no other options. If Winsock were to send data
>immediately in your 4 byte send, what happens if it's followed later by
>32 4-byte sends? Should they all go in their own packet, with data
>efficiency dropping in the toilet as headers exceed data by huge
>factors?

Supposing Nagle was enabled, his data should be sent (at least) when
he gets MSS bytes in the socket send buffer, as the idea of Nagle is
"do not send *small* packets when.....".

With Nagle disabled, I see no reason for not sending the data, when
you have MSS bytes available to be sent.

--
Fernando Gont
e-mail: ferna...@ANTISPAM.gont.com.ar

[To send a personal reply, please remove the ANTISPAM tag]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rufus V. Smith  
View profile  
 More options Oct 25 2002, 10:14 am
Newsgroups: alt.winsock.programming, comp.protocols.tcp-ip
From: "Rufus V. Smith" <nos...@nospam.net>
Date: Thu, 24 Oct 2002 12:40:17 -0400
Local: Thurs, Oct 24 2002 12:40 pm
Subject: Re: Strange/dire Windows TCP Performance

<p...@icke-reklam.ipsec.nu> wrote in message news:ap91sh$8uf$1@nyheter.crt.se...
> Rufus V. Smith <nos...@nospam.net> wrote:

> > IT'S JUST AN EXAMPLE, PETER!  NOT ACTUAL CODE!

> > The same behavior and performance hit could happen if two objects
> > were serializing themselves in sequence out the same socket and one
> > happened to be large and one small.

> Serializing ON THE SAME socket needs to be done with knowledge
> about how tcp works.

Why is that?  A socket should be handled like any other byte stream.
When I serialize something out, I just pump bytes into a stream that
are fomatted such that I can serialize back into an equivalent of the object.

> > Or two threads might use the same socket and be completele oblivious
> > to what the other thread is sending out.  One just happened to have a
> > large block and the other a small one.

> Threads is no cure for everything. In fact they might screw up
> stuff, like using the same socket for two independent comminucation
> channels. Again, they might blew your performance. Again, the cure
> is knowledge about tcp and skillful programming.

My mistake to mention threads, to cause you to take issue with
threads.

If the problem described  is a "feature" of TCP, rather than a bug in Winsock2,
why do the other implementations not exhibit the "problem"?

Is your knowledge about tcp complete enough to explain this behavior?  If
you did explain it in a prior posting, I must have missed it somehow.  I'll look
back.

It sound to me the kind of cure you are talking about (skillful programming
and knowledge of tcp) would suggest that they live accept the problem and
"skillfully" program a workaround, by perhaps adding another layer of
and ensure the buffer handling described doesn't happen.  That's like building
in delay loops in an output driver because your I/O card can't handle full
speed I/O.  Sure it works, but it doesn't solve the base problem.  Or as they
call it around here: the "root cause".

Rufus


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
phn  
View profile  
 More options Oct 25 2002, 10:41 am
Newsgroups: alt.winsock.programming, comp.protocols.tcp-ip
From: p...@icke-reklam.ipsec.nu
Date: 25 Oct 2002 14:39:11 GMT
Local: Fri, Oct 25 2002 10:39 am
Subject: Re: Strange/dire Windows TCP Performance
Rufus V. Smith <nos...@nospam.net> wrote:

get a copy of "TCP Illustrated Vol1" by R Stevens ( isbn 0-201-63346-9)
pages 223 to 357 is devoted to the basics of tcp.

There is also a companion "Unix network programming" isbn 0-13-490012-x
that deals with issues on the socket layer that a programmer should
think about.

And yes, writing on a TCP socket has to be done carefully so the
built-in features won't hurt performace.

You might consider UDP ( if you never can saturate your network)

> Rufus

--
Peter Håkanson        
        IPSec  Sverige      ( At Gothenburg Riverside )
           Sorry about my e-mail address, but i'm trying to keep spam out,
           remove "icke-reklam" if you feel for mailing me. Thanx.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernard Brooks  
View profile  
 More options Oct 25 2002, 11:33 am
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: bernardb...@hotmail.com (Bernard Brooks)
Date: 25 Oct 2002 08:33:16 -0700
Local: Fri, Oct 25 2002 11:33 am
Subject: Re: Strange/dire Windows TCP Performance

arielg...@softhome.net (Fernando Gont) wrote in message <news:3db7fd89.1001698@News.CIS.DFN.DE>...
> On 22 Oct 2002 09:38:23 -0700, bernardb...@hotmail.com (Bernard
> Brooks) wrote:

> >Of course TCP is a byte stream and the concept of message boundaries should be
> >irrelevant, but the way WinSock2 appears to implement socket buffers, it seems
> >that they assume send buffers to imply message boundaries!  If a small send
> >buffer is passed following a large send buffer, WinSock2 will wait until *all*
> >the data from the large buffer is acknowledged before starting to transmit the
> >small buffer.  At least, this is the way it seems.  What's more, disabling
> >Nagle doesn't change this.

> Do you know whether using TCP_NODELAY option with Winsock *really*
> disables the Nagle algorithm or not?

I have evidence to show that TCP_NODELAY does disable Nagle on WinSock2.
BUT - the problem I'm talking about happens with Nagle ENABLED (and also
with it disabled).  The problem has nothing to do with Nagle.

Bernard Brooks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernard Brooks  
View profile  
 More options Oct 25 2002, 11:35 am
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: bernardb...@hotmail.com (Bernard Brooks)
Date: 25 Oct 2002 08:35:58 -0700
Local: Fri, Oct 25 2002 11:35 am
Subject: Re: Strange/dire Windows TCP Performance

arielg...@softhome.net (Fernando Gont) wrote in message <news:3db849e3.4241698@News.CIS.DFN.DE>...
> On Tue, 22 Oct 2002 12:47:57 -0700, David Schwartz
> <dav...@webmaster.com> wrote:

> >       Winsock does not have ESP. It has to make a decision whether or not to
> >send a packet and it has to make it when you call 'send'. Otherwise, it
> >can set a timer. It has no other options. If Winsock were to send data
> >immediately in your 4 byte send, what happens if it's followed later by
> >32 4-byte sends? Should they all go in their own packet, with data
> >efficiency dropping in the toilet as headers exceed data by huge
> >factors?

> Supposing Nagle was enabled, his data should be sent (at least) when
> he gets MSS bytes in the socket send buffer, as the idea of Nagle is
> "do not send *small* packets when.....".

Absolutely true - and yet it doesn't.  This (imho) is a bug.

> With Nagle disabled, I see no reason for not sending the data, when
> you have MSS bytes available to be sent.

Also absolutely true - and yet again, it doesn't.  Also a bug.

Bernard Brooks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernard Brooks  
View profile  
 More options Oct 25 2002, 12:03 pm
Newsgroups: alt.winsock.programming, comp.protocols.tcp-ip
From: bernardb...@hotmail.com (Bernard Brooks)
Date: 25 Oct 2002 09:03:58 -0700
Local: Fri, Oct 25 2002 12:03 pm
Subject: Re: Strange/dire Windows TCP Performance

David Schwartz <dav...@webmaster.com> wrote in message <news:3DB8C501.F105E5FB@webmaster.com>...
> "Rufus V. Smith" wrote:

> > IT'S JUST AN EXAMPLE, PETER!  NOT ACTUAL CODE!

>    It's an example of bad code, and so it works badly.

According to you (and I agree), my later example was an example of Good code:
    for (;;) {
        send (sd, buf, 12000);
        send (sd, buf, 2500);
    }

and yet it still suffers from the same appauling problem.

>    You could, for example, accumulate small writes into an application
> buffer until it either hits 2Kb or 100 milliseconds pass with no data
> being written.

If the application is presented with 12000 bytes of data, would you split
that into five 2Kb sends, and buffer the rest?  Of course you wouldn't.
You'd think, this data is more than 2Kb, I'll pass it directly to send().
If you are then presented with 512 small writes of 4 bytes, you might
accumulate them into a single 2Kb buffer and send that next... The result
    send (sd, buf, 12000);
    send (sd, buf, 2048);

and crap performance...  Explain that.

>    And you know what cracks me up completely? The one thing that could
> have helped an application with a poor buffer flushing strategy to still
> work reasonably, which is Nagle's algorithm, was the thing that was
> disabled first.

Actually I have said quite explicitly in all my posts that these tests are
done with nagle ENABLED.

Let me restate the salient points of this problem
   + the problem has NOTHING to do with Nagle

   + the size of the "small data buffer" is IRRELEVANT - I used 4 bytes
     just to demonstrate, but as I pointed out, even buffers of 2Kb or more
     can show this problem

   + the problem occurs immediately after you use a LARGE buffer

Bernard Brooks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernard Brooks  
View profile  
 More options Oct 25 2002, 12:43 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: bernardb...@hotmail.com (Bernard Brooks)
Date: 25 Oct 2002 09:43:22 -0700
Local: Fri, Oct 25 2002 12:43 pm
Subject: Re: Strange/dire Windows TCP Performance

David Schwartz <dav...@webmaster.com> wrote in message <news:3DB8A9A8.AF008D52@webmaster.com>...

>    In this case, it can't predict the future. It doesn't know whether your
> 4 byte send is the first of 100 such sends, to be followed immediately
> by another 4 byte send, or the last byte of data you'll send for a week.
> So it *can't* do the right thing all the time.

The algorithm is the Nagle algorithm:  It should store and coalesce data
while there's outstanding unacknowledged data, or until it coalesces
enough to fill a segment (subject of course, to the TCP window size and
the congestion window).

So you see it *can* do the right thing.

If you were to send a thousand 4 byte buffers, it'll send one 4 byte
packet, and then a sequence of full packets (since the computer is easily
fast enough to coalesce a full packet before receiving the ACK from the
previous packet).

But why are we arguing about the 4 byte buffer?  It's irrelevant, and
I wish I'd never used it in my example.

You seem to have missed two key points from my original post
    1) This problem occurs with Nagle enabled.

    2) Replace the 4 byte buffer with a 2Kb buffer if you like...
       It'll still go slowly.

Send this sequence of buffers from a Windows box to a Solaris box and
time it.  You'll find it takes roughly 1 second to send 20 packets!
Surely this isn't right?  You can read it faster than Windows can
send it!
    12000,2500,12000,2500,12000,2500,12000,2500,12000,2500,
    12000,2500,12000,2500,12000,2500,12000,2500,12000,2500

Bernard Brooks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alun Jones  
View profile  
 More options Oct 25 2002, 2:50 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: a...@texis.com (Alun Jones)
Date: Fri, 25 Oct 2002 18:49:21 GMT
Local: Fri, Oct 25 2002 2:49 pm
Subject: Re: Strange/dire Windows TCP Performance
In article <a7458e5b.0210250733.76602...@posting.google.com>,

bernardb...@hotmail.com (Bernard Brooks) wrote:
>I have evidence to show that TCP_NODELAY does disable Nagle on WinSock2.
>BUT - the problem I'm talking about happens with Nagle ENABLED (and also
>with it disabled).  The problem has nothing to do with Nagle.

Most of the 'problems' laid at Nagle's door likewise have nothing to do with
Nagle.  Coalescence of outgoing data happens even in the absence of Nagle.  
Delayed sending also occurs in the absence of Nagle (for instance, when the
send buffer size is larger than the negotiated window).

Alun.
~~~~

[Please don't email posters, if a Usenet response is appropriate.]
--
Texas Imperial Software   | Try WFTPD, the Windows FTP Server. Find us at
1602 Harvest Moon Place   | http://www.wftpd.com or email a...@texis.com
Cedar Park TX 78613-1419  | VISA/MC accepted.  NT-based sites, be sure to
Fax/Voice +1(512)258-9858 | read details of WFTPD Pro for XP/2000/NT.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alun Jones  
View profile  
 More options Oct 25 2002, 2:50 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: a...@texis.com (Alun Jones)
Date: Fri, 25 Oct 2002 18:49:30 GMT
Local: Fri, Oct 25 2002 2:49 pm
Subject: Re: Strange/dire Windows TCP Performance
In article <a7458e5b.0210250735.45a02...@posting.google.com>,

bernardb...@hotmail.com (Bernard Brooks) wrote:
>arielg...@softhome.net (Fernando Gont) wrote in message
>> With Nagle disabled, I see no reason for not sending the data, when
>> you have MSS bytes available to be sent.

>Also absolutely true - and yet again, it doesn't.  Also a bug.

Have you run a network trace to determine what is, and isn't being negotiated
and sent?  Have you checked the buffer size that's being given to your
application?  What about the window size?

Alun.
~~~~

[Please don't email posters, if a Usenet response is appropriate.]
--
Texas Imperial Software   | Try WFTPD, the Windows FTP Server. Find us at
1602 Harvest Moon Place   | http://www.wftpd.com or email a...@texis.com
Cedar Park TX 78613-1419  | VISA/MC accepted.  NT-based sites, be sure to
Fax/Voice +1(512)258-9858 | read details of WFTPD Pro for XP/2000/NT.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernard Brooks  
View profile  
 More options Oct 26 2002, 12:11 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: bernardb...@hotmail.com (Bernard Brooks)
Date: 26 Oct 2002 09:11:44 -0700
Local: Sat, Oct 26 2002 12:11 pm
Subject: Re: Strange/dire Windows TCP Performance

> >> With Nagle disabled, I see no reason for not sending the data, when
> >> you have MSS bytes available to be sent.

> >Also absolutely true - and yet again, it doesn't.  Also a bug.

> Have you run a network trace to determine what is, and isn't being negotiated
> and sent?  Have you checked the buffer size that's being given to your
> application?  What about the window size?

Yes, I've traced it, and if you're thinking that the delay is because    
the TCP window size is being reduced, or because of congestion - the
trace shows that neither of these are the case.

The setup:  A local ethernet (no routers, etc).  MSS is 1460 bytes; no
window scaling negotiated; Solaris is advertising a receive window of
8760 bytes; SO_SNDBUF on the Windows box is the default (8192 bytes).

The trace is of traffic from a Windows box to a Solaris box, produced by
    for (;;) {
        send (sd, buf, 12000);
        send (sd, buf, 2500);
    }

    0.00056 a -> b Ack=1          Seq=4144714689 Len=1460 Win=64240
    0.00009 a -> b Ack=1          Seq=4144716149 Len=1460 Win=64240
    0.00008 a -> b Ack=1          Seq=4144717609 Len=1460 Win=64240
    0.00009 a -> b Ack=1          Seq=4144719069 Len=1460 Win=64240
    0.00010 b -> a Ack=4144717609 Seq=1          Len=0    Win=8760
    0.00008 a -> b Ack=1          Seq=4144720529 Len=1460 Win=64240
    0.00006 b -> a Ack=4144720529 Seq=1          Len=0    Win=8760
    0.00002 a -> b Ack=1          Seq=4144721989 Len=1460 Win=64240
    0.00021 a -> b Ack=1          Seq=4144723449 Len=1460 Win=64240
  p 0.00015 a -> b Ack=1          Seq=4144724909 Len=1460 Win=64240
    0.00001 b -> a Ack=4144723449 Seq=1          Len=0    Win=8760
  q 0.00007 a -> b Ack=1          Seq=4144726369 Len=1460 Win=64240
    0.00007 b -> a Ack=4144726369 Seq=1          Len=0    Win=8760
  X 0.00005 a -> b Ack=1          Seq=4144727829 Len=1360 Win=64240
    0.09829 b -> a Ack=4144729189 Seq=1          Len=0    Win=8760
    0.00046 a -> b Ack=1          Seq=4144729189 Len=1460 Win=64240

The packet I've marked with an 'X' is the first suspicious event.  At
this point in the conversation there's outstanding unacknowledged data.
The ACK immediately before packet 'X' acknowledges all data up to and
including packet 'p'.  Packet 'q' is unacknowledged.

The Nagle algorithm says that if there's outstanding unacknowledged    
data, then it should NOT send data unless it's got enough to fill a
packet.  So why does it send the 1360 byte packet?  It appears (from
other tests I've done) that the bug is not in the WinSock2
implementation of the Nagle algorithm.

What actually happens is that the socket buffering layer within
WinSock2 tells the application (via select for write, or via a blocking
send() call) that there's insufficient space in the buffer, and won't
allow the next send() to present the data buffer it has.  But we know  
from the ACK's received, that there should be plenty of space in the
buffer.  This seems to be where the bug is.

In fact (as you can see from the trace) it waits until the ACK arrives
for the 1360 byte packet before it unblocks the socket - causing these
appalling delays.

Bernard Brooks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Schwartz  
View profile  
 More options Oct 26 2002, 5:41 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: David Schwartz <dav...@webmaster.com>
Date: Sat, 26 Oct 2002 14:36:10 -0700
Local: Sat, Oct 26 2002 5:36 pm
Subject: Re: Strange/dire Windows TCP Performance

Fernando Gont wrote:
> On Tue, 22 Oct 2002 12:47:57 -0700, David Schwartz
> <dav...@webmaster.com> wrote:
> >       Winsock does not have ESP. It has to make a decision whether or not to
> >send a packet and it has to make it when you call 'send'. Otherwise, it
> >can set a timer. It has no other options. If Winsock were to send data
> >immediately in your 4 byte send, what happens if it's followed later by
> >32 4-byte sends? Should they all go in their own packet, with data
> >efficiency dropping in the toilet as headers exceed data by huge
> >factors?
> Supposing Nagle was enabled, his data should be sent (at least) when
> he gets MSS bytes in the socket send buffer, as the idea of Nagle is
> "do not send *small* packets when.....".

        Right, that's why disabling Nagle didn't help.

> With Nagle disabled, I see no reason for not sending the data, when
> you have MSS bytes available to be sent.

        There are any number of reasons, even without Nagle, when you might not
send data even if you have an MSS worth. See, for example, RFC1122. With
Nagle disabled, the 4 byte send is much more likely to result in a
packet being sent, thus robbing the stack of the chance to use that
oppurtunity to send more data. TCP pacing only allows so many
oppurtunities.

        DS


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Vernon Schryver  
View profile  
 More options Oct 26 2002, 6:20 pm
Newsgroups: alt.winsock.programming, comp.os.ms-windows.networking, comp.protocols.tcp-ip
From: v...@calcite.rhyolite.com (Vernon Schryver)
Date: 26 Oct 2002 16:19:15 -0600
Local: Sat, Oct 26 2002 6:19 pm
Subject: Re: Strange/dire Windows TCP Performance
In article <3DBB0ACA.9C56C...@webmaster.com>,
David Schwartz  <dav...@webmaster.com> wrote:

> ...
>> With Nagle disabled, I see no reason for not sending the data, when
>> you have MSS bytes available to be sent.

>    There are any number of reasons, even without Nagle, when you might not
>send data even if you have an MSS worth. See, for example, RFC1122. With
>Nagle disabled, the 4 byte send is much more likely to result in a
>packet being sent, thus robbing the stack of the chance to use that
>oppurtunity to send more data. TCP pacing only allows so many
>oppurtunities.

Which part of RFC 1122 are you referring to?

Given the sample code with the Nagle Algorithm enabled, how can the
4-byte send() result in sending a segment that is not maximum sized
or otherwise rob the stack of a chance to send more data?

Your advice to write application code that knows the MSS and RTT of
the network can result in code that is much worse than the ugly and
inefficient infinite loop of alternating big and tiny writes.  Unless
the network is very much faster than the host and probably also has
20K MTU, the only unnecessary costs of those alternate writes is in
CPU cycles in the host.  Your advice to wire "buffer flushing" into
the application can produce code that also has problems on the wire.

Vernon Schryver    v...@rhyolite.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »