TCP interactive data flow

Sru...@gmail.com

unread,

Jul 7, 2008, 7:20:37 PM7/7/08

to

Hiya

I’m new at TCP/IP. Anyway, quite a few things about TCP interactive
data flow confuse me.

1. We enable interactive data flow with setting PSH flag to 1.

a) So if PSH = 1, does that mean that if app sends with a single
write() 10 bytes of data to the TCP send buffer, then these 10 bytes
will be immediately send to other end each in its own segment ( thus
10 segments ), or does it mean that each write() from an application
will be immediately send as single segment ( thus in our example a
single segment will have 10 bytes of data )?

b) Can there be a situation, where two successive writes to TCP send
buffer ( say first write() writes 2 and second write() writes 9
bytes ) will be send in a single segment ( containing 11 bytes of
data ) or does TCP somehow prevent this?

c) How many packets can sender transmit before it has to stop and wait
for an ACK ( assuming Nagle’s algorithm is disabled )? If just one,
why?

d) Provided we have a fast connection between the two host, would
interactive data transfer be faster or slower if TCP could transmit
several packets before waiting for an acknowledgement?

2. Say we have a client application that can send text based commands
via console windows to the server. For command to be sent, user must
first type it into console windows and then press enter. Only then is
that command transmitted to TCP send buffer and then to the server.

Now should we set the PSH flag on and thus enable interactive data
flow or not? Is there a possibility that if PSH flag is not set to 1,
that TCP send buffer won’t send that data immediately, but will
instead wait for more data? I would assume that to be the case only if
command is 1 or 2 bytes of length, but if it is 3 or more, then data
will be send immediately?!

3. With bulk data enabled, I assume that if there are only few bytes
in TCP send buffer, then TCP will wait for certain amount of time in
case some more data will be available, so that more data could be send
with a single segment? How long will it wait?

cheers

David Schwartz

unread,

Jul 10, 2008, 4:12:00 AM7/10/08

to

On Jul 7, 4:20 pm, Sru...@gmail.com wrote:

> 1. We enable interactive data flow with setting PSH flag to 1.
>
> a) So if PSH = 1, does that mean that if app sends with a single
> write() 10 bytes of data to the TCP send buffer, then these 10 bytes
> will be immediately send to other end each in its own segment ( thus
> 10 segments ), or does it mean that each write() from an application
> will be immediately send as single segment ( thus in our example a
> single segment will have 10 bytes of data )?

It means neither. TCP is a byte-stream protocol that does not preserve
application message boundaries.

> b) Can there be a situation, where two successive writes to TCP send
> buffer ( say first write() writes 2 and second write() writes 9
> bytes ) will be send in a single segment ( containing 11 bytes of
> data ) or does TCP somehow prevent this?

Absolutely that can happen.

> c) How many packets can sender transmit before it has to stop and wait
> for an ACK ( assuming Nagle’s algorithm is disabled )? If just one,
> why?

It can send many, until the window is full. There is no specific
limit, as it will depend on how much data winds up in each packet.

> d) Provided we have a fast connection between the two host, would
> interactive data transfer be faster or slower if TCP could transmit
> several packets before waiting for an acknowledgement?

It can, so this question doesn't make sense. Waiting for acks that
often would be a disaster if the ack dropped. You'd have to timeout
just to send more data.

> 2. Say we have a client application that can send text based commands
> via console windows to the server. For command to be sent, user must
> first type it into console windows and then press enter. Only then is
> that command transmitted to TCP send buffer and then to the server.
>
> Now should we set the PSH flag on and thus enable interactive data
> flow or not? Is there a possibility that if PSH flag is not set to 1,
> that TCP send buffer won’t send that data immediately, but will
> instead wait for more data? I would assume that to be the case only if
> command is 1 or 2 bytes of length, but if it is 3 or more, then data
> will be send immediately?!

The PSH flag is a flag on the wire. It's not used to communicate
between an application and its local TCP implementation. So it cannot
affect when data is transmitted.

> 3. With bulk data enabled, I assume that if there are only few bytes
> in TCP send buffer, then TCP will wait for certain amount of time in
> case some more data will be available, so that more data could be send
> with a single segment? How long will it wait?

No, it won't. A bulk data sender will, presumably, transfer data in
bulk if possible. If it only sends a few bytes, that means it only has
a few bytes to send. (Why would it deliberately send 8 bytes if it had
450 to send?!) Why would the implementation assume the application is
broken?

DS

Sru...@gmail.com

unread,

Jul 10, 2008, 2:00:36 PM7/10/08

to

Now I'm confused. From your replies it seems that there is no
difference between
between interactive and bulk data flow ( thus it makes no difference
whether we set push flag on or not ). I must be missing something. So
what sets appart interactive and bulk data flows?

Rick Jones

unread,

Jul 10, 2008, 4:24:06 PM7/10/08

to

Sru...@gmail.com wrote:
> Now I'm confused. From your replies it seems that there is no
> difference between between interactive and bulk data flow ( thus it
> makes no difference whether we set push flag on or not ). I must be
> missing something. So what sets appart interactive and bulk data
> flows?

PSH is merely a "hint" that the receiving application should go ahead
and be notified of data arrival if it hasn't been already. As far as
how virtually all (?) TCP stacks today behave, PSH is (mostly) a noop.
Data arrives, app is notified, PSH or no PSH. (Modulo what
interaction, if any, there may be between PSH and an application
setting a watermark on the socket).

TCP setting PSH at the end of each "send" by the application is still
convenient to see in a packet trace - it gives the person reading the
trace _some_ idea of how the application was presenting data to TCP,
but one probably cannot _really_ count on that.

Go back to the days of "long ago and far away" there was at least one
TCP stack which abused the PSH bit as a message boundary - that was
the "NS Transport" of MPE/V and MPE/XL (and IX). However that was
contrary to the specs for TCP and was only exposed via the NetIPC
rather than BSC Sockets interface on those OSes.

rick jones
--
oxymoron n, Hummer H2 with California Save Our Coasts and Oceans plates
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

Sru...@gmail.com

unread,

Jul 10, 2008, 8:31:57 PM7/10/08

to

hiya

It’s a bit confusing. In no way am I questioning your authority on the
subject, but after some googling I found TCP/IP guide ( online book )
that contradicts some of the stuff you say ( most likely I'm just
misinterpreting all of this )

1.
a)

> > 3. With bulk data enabled, I assume that if there are only few bytes
> > in TCP send buffer, then TCP will wait for certain amount of time in
> > case some more data will be available, so that more data could be send
> > with a single segment? How long will it wait?
>
> No, it won't.

Excerpt from TCP/IP guide:

"TCP includes a special “push” function to handle cases where data
given to TCP needs to be sent immediately. An application can send
data to its TCP software and indicate that it should be pushed. This
tells the sending TCP to immediately “push” all the data it has to the
recipient's TCP as soon as it is able to do so, without waiting for
more data.The segment will be sent right away rather than being
buffered. The pushed segment’s PSH control bit will be set to one to
tell the receiving TCP that it should immediately pass the data up to
the receiving application."

According to the above text when PSH is not set, TCP does buffer data
due to waiting on more data to arrive into TCP send buffer.

b)
> The destination device's TCP software, seeing this bit sent, will know that it
> should not just take the data in the segment it received and buffer it, but rather
> push it through directly to the application.

>
> PSH is merely a "hint" that the receiving application should go ahead
> and be notified of data arrival if it hasn't been already.

* What do we mean by the term notifying an app of received data? I
assume by that we mean immediately passing data to the app?

* If so, then only difference between PSH = 1 and PSH = 0 is/was that
when read() is issued by an app, data is read a bit more quickly if
PSH = 1, since it was already send to app from receiver buffer and
thus took less time to be read?

c)

> Data arrives, app is notified, PSH or no PSH.

So PSH was useful when lots of TCP stacks didn’t immediately notify
an app when new data arrived?
But the impression I got from the excerpts from TCP/IP guide is that
TCP stacks still behave that way even today, meaning if PSH = 0 they
will wait for more data before sending it

d)

> > c) How many packets can sender transmit before it has to stop and wait
> > for an ACK ( assuming Nagle’s algorithm is disabled )? If just one, why?
> It can send many, until the window is full. There is no specific limit, as it will
> depend on how much data winds up in each packet.

Assuming Nagle’s algorithm is disabled, then there is virtually no
difference between packets with PSH = 1 and packets with PSH = 0 when
it comes to the number of packets transmitted before waiting for an
ACK?

2. Is PSH of no use even when X Window System is used, where small
mouse movements need to be transmitted in “real time” in order to keep
the system responsive to the user?

3. Is Nagle’s algorithm enabled only when PSH=1?

EJP

unread,

Jul 10, 2008, 8:54:49 PM7/10/08

to

Sru...@gmail.com wrote:
> According to the above text when PSH is not set, TCP does buffer data
> due to waiting on more data to arrive into TCP send buffer.

But it doesn't say that. It just says that TCP should PSH the data when
the PSH bit is set. It doesn't say that it shouldn't when it isn't. And
what you've been told here is that PSH is mostly a no-op. I've certainly
never seen an implementation where it isn't.

> Assuming Nagle’s algorithm is disabled, then there is virtually no
> difference between packets with PSH = 1 and packets with PSH = 0 when
> it comes to the number of packets transmitted before waiting for an
> ACK?

PSH has nothing to do with this.

> 3. Is Nagle’s algorithm enabled only when PSH=1?

The question doesn't make sense. Nagle's algorithm is enabled/disabled
on a per-socket basis. PSH is set on a per-send basis.

Rick Jones

unread,

Jul 10, 2008, 9:34:00 PM7/10/08

to

EJP <esmond....@not.bigpond.com> wrote:
> > 3. Is Nagle's algorithm enabled only when PSH=1?

> The question doesn't make sense. Nagle's algorithm is
> enabled/disabled on a per-socket basis. PSH is set on a per-send
> basis.

Perhaps misreading the tea-leaves, but while I suppose it is
conceivable that PSH (is there actually a user-space option to set it
explicitly in (m)any stacks?) could override Nagle, it certainly
shouldn't override the congestion window, nor the classic TCP receive
window. So, an application would have to "deal" with it not "working"
in those cases which ends-up pretty much meaning an application has to
be able to work as if PSH never existed.

rick jones
--
firebug n, the idiot who tosses a lit cigarette out his car window

David Schwartz

unread,

Jul 11, 2008, 1:00:51 AM7/11/08

to

On Jul 10, 5:31 pm, Sru...@gmail.com wrote:

> Excerpt from TCP/IP guide:

> "TCP includes a special “push” function to handle cases where data
> given to TCP needs to be sent immediately. An application can send
> data to its TCP software and indicate that it should be pushed.

This is confusing, because it's using the same word to mean two
different things. Yes, an application can indicate that the data
should be pushed, but not with the PSH flag. The PSH flag is a TCP
flag sent on the wire in a data packet. It can't be sent between an

application and its local TCP implementation.

> This

> tells the sending TCP to immediately “push” all the data it has to the
> recipient's TCP as soon as it is able to do so, without waiting for
> more data.

Right, so this is some communication between the application and its
local TCP stack.

> The segment will be sent right away rather than being
> buffered. The pushed segment’s PSH control bit will be set to one to
> tell the receiving TCP that it should immediately pass the data up to
> the receiving application."

Okay, so when the application does this, its local TCP stack tries to
send the data immediately. (Note that this is only one reason data
might be sent immediately and only one reason the PSH bit might be
set. In fact, it's one of the rarest reasons either of things could or
would happen.)

> According to the above text when PSH is not set, TCP does buffer data
> due to waiting on more data to arrive into TCP send buffer.

Again, you are confusing some 'push' option between an application and
its local TCP stack and the PSH flag on the wire between TCP stacks.
TCP may buffer data regardless of what the application says, and this
certainly has nothing to do with any PSH flags it may receive from the
other end.

> b)

> > The destination device's TCP software, seeing this bit sent, will know that it
> > should not just take the data in the segment it received and buffer it, but rather
> > push it through directly to the application.

> > PSH is merely a "hint" that the receiving application should go ahead
> > and be notified of data arrival if it hasn't been already.

> * What do we mean by the term notifying an app of received data? I
> assume by that we mean immediately passing data to the app?

The stack can't immediately pass data to the app. If the application
doesn't call 'receive' there's nothing the stack can do. In practice,
it doesn't matter, receiving stacks *always* pass receiving data
immediately. They never wait for more data to accumulate in their
receive buffers.

> * If so, then only difference between PSH = 1 and PSH = 0 is/was that
> when read() is issued by an app, data is read a bit more quickly if
> PSH = 1, since it was already send to app from receiver buffer and
> thus took less time to be read?

Perhaps, though probably not. If the application is blocked in 'read',
a modern stack will unblock it the instant it can give it more than
zero bytes of data.

> c)
>
> > Data arrives, app is notified, PSH or no PSH.

Exactly.

> So PSH was useful when lots of TCP stacks didn’t immediately notify
> an app when new data arrived?

Right.

> But the impression I got from the excerpts from TCP/IP guide is that
> TCP stacks still behave that way even today, meaning if PSH = 0 they
> will wait for more data before sending it

Before *sending* it?! You are again confusing the PSH bit in a TCP
packet with some other communication mechanism between an application
and its local TCP stack.

> d)
>
> > > c) How many packets can sender transmit before it has to stop and wait
> > > for an ACK ( assuming Nagle’s algorithm is disabled )? If just one, why?
> > It can send many, until the window is full. There is no specific limit, as it will
> > depend on how much data winds up in each packet.

> Assuming Nagle’s algorithm is disabled, then there is virtually no
> difference between packets with PSH = 1 and packets with PSH = 0 when
> it comes to the number of packets transmitted before waiting for an
> ACK?

> 2. Is PSH of no use even when X Window System is used, where small
> mouse movements need to be transmitted in “real time” in order to keep
> the system responsive to the user?
>
> 3. Is Nagle’s algorithm enabled only when PSH=1?

Nagle's algorithm is always enabled unless an application chooses to
disable it. Responsiveness over low-latency links is fine with or
without Nagle, and Nagle doesn't significantly worsen latency for high-
latency links. (Assuming the applications are reasonably smart.)

DS

David Schwartz

unread,

Jul 11, 2008, 1:02:26 AM7/11/08

to

On Jul 10, 6:34 pm, Rick Jones <rick.jon...@hp.com> wrote:

> Perhaps misreading the tea-leaves, but while I suppose it is
> conceivable that PSH (is there actually a user-space option to set it
> explicitly in (m)any stacks?) could override Nagle, it certainly
> shouldn't override the congestion window, nor the classic TCP receive
> window. So, an application would have to "deal" with it not "working"
> in those cases which ends-up pretty much meaning an application has to
> be able to work as if PSH never existed.

Everybody who uses TCP *MUST* understand this: Fundamentally, TCP is a
byte-stream protocol that does not preserve message boundaries. Full
stop. All attempts to "teach" TCP to do different tricks fail.

DS

Malachy Moses

unread,

Jul 11, 2008, 11:05:39 AM7/11/08

to

At the outset, TCP is probably a bad choice for a data flow that must
be "interactive" and "where small mouse movements need to be

transmitted in “real time” in order to keep the system responsive to

the user". TCP is a stream-based protocol, with a ton of built-in
latencies, such as sliding window, congestion, re-transmit back-off,
Nagle, delayed ACK, etc. With all these latencies, you might end up
regretting a choice to use TCP where you truly need and rely on
"interactive" and "real-time".

don provan

unread,

Jul 11, 2008, 3:16:15 PM7/11/08

to

Malachy Moses <malach...@gmail.com> writes:
> With all these latencies, you might end up
> regretting a choice to use TCP where you truly need and rely on
> "interactive" and "real-time".

"real-time" is correct: TCP *specifically* provides reliability at the
expense of timeliness. That's its point.

"interactive" I disagree with. Interactive *demands* reliability, and
TCP, at least as implemented in modern systems, provides ways to allow
an application supporting interactive exchanges to avoid the latencies
that TCP uses to be more efficient in bulk transfers. Well designed
applications can, in fact, use TCP take advantage of both ends of the
spectrum: being highly efficient whether there's a lot of data, and
still being reasonably timely when the data is coming in little
bursts.

-don

Vernon Schryver

unread,

Jul 11, 2008, 3:10:52 PM7/11/08

to

In article <a3444cd7-381f-45a1...@a70g2000hsh.googlegroups.com>,
Malachy Moses <malach...@gmail.com> wrote:

>At the outset, TCP is probably a bad choice for a data flow that must
>be "interactive" and "where small mouse movements need to be
>transmitted in “real time” in order to keep the system responsive to
>the user". TCP is a stream-based protocol, with a ton of built-in
>latencies, such as sliding window, congestion, re-transmit back-off,
>Nagle, delayed ACK, etc. With all these latencies, you might end up
>regretting a choice to use TCP where you truly need and rely on
>"interactive" and "real-time".

Maybe so, but the fact that the only practical alternative in many
cases is UDP might be the reason why TCP is used for many interactive
applications including some where small mouse movements need to be
transmitted in "real time" including the X Windows.

Are there any common UNIX window systems that do not use TCP when
the I/O devices and the program are on different computers?

Note that talk about "sliding window, congestion, re-transmit back-off"
as intolerable "built-in latencies" is at best off the mark. No transport
protocol for non-trivial networks can do without equivalents of all of
those features. For example, an interactive application is unlikely
to be able to tolerate many lost mouse movements, which implies that
what ever network protocol used is likely to involve retransmissions.
No network protocol that involves retransmissions can do without
re-transmit back-offs and some form of congestion control and advoidance.

Vernon Schryver v...@rhyolite.com

Sru...@gmail.com

unread,

Jul 11, 2008, 6:17:09 PM7/11/08

to

Hiya

Sorry for keep on dragging this topics

1.

> > So PSH was useful when lots of TCP stacks didn’t immediately notify
> > an app when new data arrived?
> Right.

I will ask this again just to be sure I haven’t misunderstood
anything:

So in the days of old TCP stacks ( when Stevens wrote his book ):

a) most TCP stacks didn’t immediately notify an app when new data
arrived?

b) when bulk data flow was enabled ( PSH = 0 ), TCP would buffer data
in hopes to receive more data, but now most TCP stacks immediately
send received data?

c) Due to reasons above PSH set to 1 really made a difference, but not
anymore?

2.

> TCP may buffer data regardless of what the application says, and this
> certainly has nothing to do with any PSH flags it may receive from the other
> end.

a) So in today’s TCP stacks TCP would buffer received data ( and thus
not immediately send it to the app ) only if app hasn’t issued a
read() statement yet?

b) In what circumstances TCP doesn’t try to send data immediately
( besides network problems or waiting for ACKs )?

3.

> > Excerpt from TCP/IP guide:
> > "TCP includes a special “push” function to handle cases where data given to
> > TCP needs to be sent immediately. An application can send data to its TCP
>> software and indicate that it should be pushed.
> This is confusing, because it's using the same word to mean two different
> things. Yes, an application can indicate that the data should be pushed, but
> not with the PSH flag. The PSH flag is a TCP flag sent on the wire in a data
> packet. It can't be sent between an application and its local TCP
> implementation.

a) How then does app tell TCP to push data?

b) However might app indicate to TCP that data should be pushed, TCP’s
reaction to this request is always to set PSH to 1 and then act
accordingly ( whatever "accordingly " may be in today’s stacks )?!

4.

> > According to the above text when PSH is not set, TCP does buffer data
> > due to waiting on more data to arrive into TCP send buffer.

> But it doesn't say that. It just says that TCP should PSH the data when
> the PSH bit is set. It doesn't say that it shouldn't when it isn't. And what you've
> been told here is that PSH is mostly a no-op. I've certainly never seen an
> implementation where it isn't.

It doesn’t say that explicitly, but if it says that due to PSH set to
1 TCP pushes data ( which reader interprets as that some action will
happen sooner ), then that would implicitly imply that PSH = 0 doesn’t
push data ( which reader would interpret as some action happening
slower than when pushed ). Else why didn’t he just say that setting
PSH to 1 has same effect as PSH = 0?

5.

> > d) Provided we have a fast connection between the two host, would
> > interactive data transfer be faster or slower if TCP could transmit
> > several packets before waiting for an acknowledgement?
> It can, so this question doesn't make sense. Waiting for acks that often would
> be a disaster if the ack dropped. You'd have to timeout just to send more data.

If Nagle’s algorithm is enabled by default, then TCP stack waits for
ack after each packet sent, and only then sends another packet. So why
isn’t it disabled by default if it is such a performance killer?

6.

> > 3. Is Nagle’s algorithm enabled only when PSH=1?

> The question doesn't make sense. Nagle's algorithm is enabled/disabled
> on a per-socket basis. PSH is set on a per-send basis.

Stevens talks about Nagle’s algorithm only in the context of TCP
interactive data flow and since interactive flow is enabled only when
PSH=1, I assumed …

I really appreciate it

David Schwartz

unread,

Jul 11, 2008, 7:25:09 PM7/11/08

to

On Jul 11, 3:17 pm, Sru...@gmail.com wrote:

> If Nagle’s algorithm is enabled by default, then TCP stack waits for
> ack after each packet sent, and only then sends another packet. So why
> isn’t it disabled by default if it is such a performance killer?

Umm, no. You completely misunderstand Nagle's algorithm. First,
Nagle's algorithm only delays a packet if, and so long as, the stack
cannot send a full segment.

DS

robert...@yahoo.com

unread,

Jul 11, 2008, 7:50:32 PM7/11/08

to

On Jul 11, 5:17 pm, Sru...@gmail.com wrote:
> I will ask this again just to be sure I haven’t misunderstood
> anything:
>
> So in the days of old TCP stacks ( when Stevens wrote his book ):
>
> a) most TCP stacks didn’t immediately notify an app when new data
> arrived?
>
> b) when bulk data flow was enabled ( PSH = 0 ), TCP would buffer data
> in hopes to receive more data, but now most TCP stacks immediately
> send received data?
>
> c) Due to reasons above PSH set to 1 really made a difference, but not
> anymore?

The issue is that the receiving TCP always buffers data up to some
locally determined limit. There is often some way of notifying the
application that data is available (beyond the typical recv and
select), but that's all. It's hard to see what an application would
really do with a knowing that there was data *and* a PSH available.
It's not like the application can wait to issue a recv until the PSH
arrives - it can't know how much buffer the TCP/IP stack has
available, and so cannot depend on there being enough room for the TCP
stack to receive and buffer the data before the PSH. So it's mostly
pointless at the receiving end.

> 2.

> a) So in today’s TCP stacks TCP would buffer received data ( and thus
> not immediately send it to the app ) only if app hasn’t issued a
> read() statement yet?

As above.

> b) In what circumstances TCP doesn’t try to send data immediately
> ( besides network problems or waiting for ACKs )?

TCP is (obviously) limited to sending no faster than the network can
support, so you can get substantial buffering there. And TCP cannot
send unless the other side has opened the window (by sending back a
non-zero window size), because the receiver has to be able to put the
data someplace once it arrives.

> 3.

> a) How then does app tell TCP to push data?

On most TCP/IP stacks, it can't. There’s just no exposed API for it.

> b) However might app indicate to TCP that data should be pushed, TCP’s
> reaction to this request is always to set PSH to 1 and then act
> accordingly ( whatever "accordingly " may be in today’s stacks )?!

Most stacks assume that the sender wants its data to get to the
receiver sooner rather than later, and so, unless there's a reason not
too (send window, network bandwidth, Nagle), it'll send. Note that
those reasons for not sending are very significant, and do lead to
significant combining of consecutive sends, especially in cases where
you're transmitting a lot of data quickly, or the link is slow or
busy. IOW, in most typical bulk flows, you *will* see the sending TCP
build full size segments simply because the sending application is
generating data faster than it can actually be sent.

> 4.

> It doesn’t say that explicitly, but if it says that due to PSH set to
> 1 TCP pushes data ( which reader interprets as that some action will
> happen sooner ), then that would implicitly imply that PSH = 0 doesn’t
> push data ( which reader would interpret as some action happening
> slower than when pushed ). Else why didn’t he just say that setting
> PSH to 1 has same effect as PSH = 0?

As I mentioned, PSH has little practical effect at the receiver. On
the sender, it might well be useful in the sense of increasing the
aggregation of consecutive sends. For example, the native APIs (APPC
and CPI-C) for LU6.2 provide for just that sort of thing (actually,
it's the other way around - the normal "send" only buffers, until you
explicitly "push" - or the stack needs to send because it's got a full
"segment").

But in a practical sense, if the network is *not* busy, it's not that
big a deal. If the network is busy, you'll get buffering and big
segments with aggregated data from multiple sends. And if the network
is not busy, why not use it to get the data to the other side a little
faster? There's really a fairly small range where explicitly
controlling buffering on a full duplex link (like TCP) will make much
practical difference. And in those cases the application can usually
help by doing only large sends (doing it's own buffering, if
necessary).

Barry Margolin

unread,

Jul 11, 2008, 9:42:20 PM7/11/08

to

In article
<377bdc2f-53fb-44f0...@25g2000hsx.googlegroups.com>,
David Schwartz <dav...@webmaster.com> wrote:

> On Jul 10, 5:31 pm, Sru...@gmail.com wrote:
>
> > Excerpt from TCP/IP guide:
>
> > "TCP includes a special ³push² function to handle cases where data
> > given to TCP needs to be sent immediately. An application can send
> > data to its TCP software and indicate that it should be pushed.
>
> This is confusing, because it's using the same word to mean two
> different things. Yes, an application can indicate that the data
> should be pushed, but not with the PSH flag. The PSH flag is a TCP
> flag sent on the wire in a data packet. It can't be sent between an
> application and its local TCP implementation.

Although if someone reads RFC 793 they could be excused for confusing
the API with the protocol. The sample API in the RFC makes the TCP
flags visible as parameters in the SEND() function. It describes a
TCP-specific API, rather than a more modern, general-purpose API.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE don't copy me on replies, I'll read them in the group ***

Digital Mercenary For Honor -2-

unread,

Jul 12, 2008, 4:09:53 PM7/12/08

to

On 2008-07-10 16:24:06 -0400, Rick Jones <rick....@hp.com> said:

> TCP setting PSH at the end of each "send" by the application is still
> convenient to see in a packet trace - it gives the person reading the
> trace _some_ idea of how the application was presenting data to TCP,
> but one probably cannot _really_ count on that.

Rick, thanks for bringing this point up.

It's always been my understanding that the PSH bit is (*not*) a bit
that can be "set" by an application, it's an inference drawn up by the
TCP/IP stack when the application presumably has finished some kind of
"write()" to the socket - I'll stop here because I'm not really a
programmer, and stack implementations vary greatly, unfortunately. I
often go through traces looking for the PSH bit from the stack as an
indication of how the application is chunking or attempting to chunk
data. This has been very useful for me to figure out "application
stutter" - where the application is producing data to the network, but
some process behind the emitting application is latent in delivering
data and where the PSH bit is missing, the application hasn't finished
"talking", and you can inference problems from that. Not a 100%
troubleshooting diagnosis, and one that needs to be taken with a grain
of salt, but when you see things like:

- (customer data in packets)
- (long delay)
- (other data from a sql query)
- (rest of data from application)
- (packet with PSH bit set)

You can inference that the SQL query was latent behind the scenes.

/dmfh

--
_ __ _
__| |_ __ / _| |_ 01100100 01101101
/ _` | ' \| _| ' \ 01100110 01101000
\__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx

EJP

unread,

Jul 14, 2008, 4:02:46 AM7/14/08

to

Sru...@gmail.com wrote:
> So in the days of old TCP stacks ( when Stevens wrote his book ):
>
> a) most TCP stacks didn’t immediately notify an app when new data
> arrived?

They always did, at least within living memory, although maybe not when
the RFC was written, or maybe the RFC was written to allow that
behaviour and it was never implemented.

> b) when bulk data flow was enabled ( PSH = 0 ), TCP would buffer data
> in hopes to receive more data, but now most TCP stacks immediately
> send received data?

See above.

> c) Due to reasons above PSH set to 1 really made a difference, but not
> anymore?

Never, see above.

> a) So in today’s TCP stacks TCP would buffer received data ( and thus
> not immediately send it to the app ) only if app hasn’t issued a
> read() statement yet?

In the absence of a read that's all it can do, isn't it?

> b) In what circumstances TCP doesn’t try to send data immediately
> ( besides network problems or waiting for ACKs )?

When Nagle is on, which is the default, and the appropriate conditions
hold for it to take effect.

> a) How then does app tell TCP to push data?

It can't. See Stevens I #20.5.

> b) However might app indicate to TCP that data should be pushed, TCP’s
> reaction to this request is always to set PSH to 1 and then act
> accordingly ( whatever "accordingly " may be in today’s stacks )?!

See above. You've asked this question several times here. The answer
continues to be the same every time. The statement from the 'online TCP
Guide' you quoted: 'an application can send data to its TCP software and
indicate that it should be pushed' is incorrect. Don't rely on these
hobby sites: rely on Stevens. There's a lot of misinformation out there.
Whole websites full of it. I could point you at a few more ...

> Else why didn’t he just say that setting
> PSH to 1 has same effect as PSH = 0?

He does. See Stevens I #20.5.

> If Nagle’s algorithm is enabled by default, then TCP stack waits for
> ack after each packet sent, and only then sends another packet. So why
> isn’t it disabled by default if it is such a performance killer?

But it isn't a performance killer. It's a performance *improver*. You
haven't stated the conditions correctly here. See Stevens I #19.4.

> Stevens talks about Nagle’s algorithm only in the context of TCP
> interactive data flow and since interactive flow is enabled only when
> PSH=1, I assumed …

As a matter of fact, Stevens only talks about the PSH flag in the
context of TCP Bulk Data Flow! Your reasoning isn't very rigorous ...

Sru...@gmail.com

unread,

Jul 14, 2008, 5:16:40 PM7/14/08

to

hiya

>
> > If Nagle’s algorithm is enabled by default, then TCP stack waits for
> > ack after each packet sent, and only then sends another packet. So why
> > isn’t it disabled by default if it is such a performance killer?
>
> But it isn't a performance killer. It's a performance *improver*. You
> haven't stated the conditions correctly here. See Stevens I #19.4.
>

Aha, so Nagle’s algorithm only kicks in when packet to be send is very
small

thank you all

Rick Jones

unread,

Jul 14, 2008, 7:18:07 PM7/14/08

to

Sru...@gmail.com wrote:
> Aha, so Nagle?s algorithm only kicks in when packet to be send is very
> small

For some definition of very small, ususally < the TCP MSS (Maximum
Segment Size). In broad handwaving terms, Nagle (should) work like
this:

1) Is this send() by the user, plus any queued, untransmitted data, >=
MSS? If yes, then transmit the data now, modulo constraints like
congestion or receiver window. If no, go to question 2.

2) Is the connection otherwise "idle?" That is, is there no
transmitted but not yet ACKnowledged data outstanding on the
connection? If yes, transmit the data now, modulo constraints like
congestion or receiver window. If no, go to 3.

3) Queue the data until:
a) The application provides enough data to get >= the MSS
b) The remote ACK's the currently unACKed data
c) The retransmission timer for currently unACKed data (if any)
expires and there is room for (some of) the queued data in the
segment to be retransmitted.

rick jones
--
web2.0 n, the dot.com reunion tour...

Sru...@gmail.com

unread,

Jul 14, 2008, 9:11:27 PM7/14/08

to

On Jul 15, 1:18 am, Rick Jones <rick.jon...@hp.com> wrote:
> Sru...@gmail.com wrote:
> > Aha, so Nagle?s algorithm only kicks in when packet to be send is very
> > small
>
> For some definition of very small, ususally < the TCP MSS (Maximum
> Segment Size). In broad handwaving terms, Nagle (should) work like
> this:
>

I'm not sure I understand why you were trying to say here.

> 1) Is this send() by the user, plus any queued, untransmitted data, >=
> MSS? If yes, then transmit the data now, modulo constraints like
> congestion or receiver window. If no, go to question 2.
>
> 2) Is the connection otherwise "idle?" That is, is there no
> transmitted but not yet ACKnowledged data outstanding on the
> connection? If yes, transmit the data now, modulo constraints like
> congestion or receiver window. If no, go to 3.
>
> 3) Queue the data until:
> a) The application provides enough data to get >= the MSS
> b) The remote ACK's the currently unACKed data
> c) The retransmission timer for currently unACKed data (if any)
> expires and there is room for (some of) the queued data in the
> segment to be retransmitted.
>

So basically the size limit is MSS, where anything smaller is buffered
until there are no unacknowledged data? But why is MSS the limit? MSS
can be greater than 1000 bytes, which in my opinion is not a tinygram.
So why handle packets with 1 byte of data the same as packets of say
400 bytes of data?

David Schwartz

unread,

Jul 14, 2008, 10:25:45 PM7/14/08

to

On Jul 14, 6:11 pm, Sru...@gmail.com wrote:

> So basically the size limit is MSS, where anything smaller is buffered
> until there are no unacknowledged data? But why is MSS the limit? MSS
> can be greater than 1000 bytes, which in my opinion is not a tinygram.
> So why handle packets with 1 byte of data the same as packets of say
> 400 bytes of data?

Because in either case waiting is more efficient than sending. If you
have at least one MSS, you are going to send the same packet no matter
what.

Also, if you send at even one byte less than the MSS, you can get
repeatable degenerate behavior. For example, suppose the MSS is 768
bytes. Suppose an application has a huge amount of data to send, but
chooses to send it in 3,838 byte chunks (it has to use some chunk
size, right?). You can send 4 768-byte chunks immediately, and you
have 766 bytes left over. The application is about to call 'send'
again. Which is better? To wait a split send and send a full segment?
Or to repeatedly and inefficiently send unfull segments with no
possible application workaround? (Since the app doesn't know the
MSS.)

DS

robert...@yahoo.com

unread,

Jul 14, 2008, 10:41:37 PM7/14/08

to

You keep overanalyzing this. With Nagel data is buffered if there's
less than a packet's worth (MSS) to send, and there is already sent
data waiting for an acknowledgement from the other end. The idea is
to transmit as few packets as possible by making them as large as
possible (which results in the most efficient utilization of the
network), while still keeping interactive traffic prompt. Thus the
maximum amount of buffering time is about one round trip plus half a
second (200ms for most stacks).

But the question is really the opposite of yours - why *not* a full
packet (MSS)? It's obvious why you'd not want Nagel to buffer more
than a packet's (MSS) worth of data (because a full pack is actually
the goal of Nagel, there’s nothing left to accomplish but to transmit
the thing). But what would you gain from capping the buffering at
some lower limit? Remembering that it's for a rather limited time
interval. And just how many applications would actually be better off
because only (say) 200 bytes got buffered, rather than 1500?

Rick Jones

unread,

Jul 15, 2008, 2:14:30 PM7/15/08

to

Sru...@gmail.com wrote:
> So basically the size limit is MSS, where anything smaller is
> buffered until there are no unacknowledged data? But why is MSS the
> limit? MSS can be greater than 1000 bytes, which in my opinion is
> not a tinygram. So why handle packets with 1 byte of data the same
> as packets of say 400 bytes of data?

If you go back in time by reading the initial Nagle paper/RFC MSSes
were "typically" in the 536 byte range.

The MSS is the "best" TCP can do for the ratio of data to data+headers.

I'm not sure about your question wrt 1 byte vs 400 bytes. Are you
asking why the Nagle limit isn't based on a constant rather than the
MSS? In some stacks IIRC on can configure the value against which the
user's send is compared. It generally defaults to the MSS for the
connection. And yes, as MTU's and thus MSS's increase in size that
does start to look a little, well, odd... :)

rick jones
--
firebug n, the idiot who tosses a lit cigarette out his car window

Rick Jones

unread,

Jul 15, 2008, 2:17:52 PM7/15/08

to

robert...@yahoo.com <robert...@yahoo.com> wrote:
> But the question is really the opposite of yours - why *not* a full
> packet (MSS)?

Interestingly enough, TCP stacks trying to make use of TSO in the NIC
(Transport/TCP Segmentation Offload) have just that issue - when/if to
wait until there is even more than one MSS-worth of data before
shipping data down the stack.

rick jones
--
The computing industry isn't as much a game of "Follow The Leader" as
it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
- Rick Jones

Sru...@gmail.com

unread,

Jul 15, 2008, 7:16:17 PM7/15/08

to

> But what would you gain from capping the buffering at
> some lower limit? Remembering that it's for a rather limited time
> interval. And just how many applications would actually be better off
> because only (say) 200 bytes got buffered, rather than 1500?

not much I presume, but still, communication between two apps where at
least one of them buffered 200 bytes instead of MSS-1 would be just a
wee faster, provided this app ( one that buffers only up to 200
bytes ) would send lots of data of size greater than 200 but smaller
than MSS.

thank you all for your help

kind regards

Rick Jones

unread,

Jul 15, 2008, 7:58:00 PM7/15/08

to

Not necessarily. Here we have an example of something sending 200
bytes at a time, leaving Nagle enabled, and then that same 200 byte
send, with Nagle disabled (the nodelay case)

manny:~# netperf -H moe -c -C -- -m 200
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to moe.west (10.208.0.15) port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

87380 16384 200 10.02 941.38 19.62 12.84 6.828 4.469
manny:~# netperf -H moe -c -C -- -m 200 -D
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to moe.west (10.208.0.15) port 0 AF_INET : nodelay
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

87380 16384 200 10.00 318.76 24.99 22.68 25.687 23.311
manny:~#

Notice that in the nodelay (Nagle off) case there is a significantly
higher depand placed on the CPU of the system - in this case a four
core system, which is why the CPU util caps at 25% since a single TCP
connection will not (generally) make use of the services of more than
one core. The increase is between 4x and 6x CPU consumed per KB
transferred.

rick jones
--
portable adj, code that compiles under more than one compiler

Rick Jones

unread,

Jul 15, 2008, 8:19:23 PM7/15/08

to

Rick Jones <rick....@hp.com> wrote:
> Not necessarily. Here we have an example of something sending 200
> bytes at a time, leaving Nagle enabled, and then that same 200 byte
> send, with Nagle disabled (the nodelay case)

Those numbers were for a unidirectional test over a GbE LAN. If the
test is request/response then we start having "races" between
standalone ACK timers, RTT's and how many requests or responses will
be put into the connection at one time by the application. What
follows is the ./configure --enable-burst mode of netperf with a
TCP_RR test and a 200 byte request/response size. Again first is with
defaults, second is with nagle disabled. I've stripped the socket
buffer, request/response size and time columns to better fit in 80
columns:

manny:~# for i in 0 1 2 3 4 5 6 7 8 9 10; do netperf $HDR -t TCP_RR -H moe -c -C -B "burst $i" -- -r 200 -b $i; HDR="-P 0"; done
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to moe.west (10.208.0.15) port 0 AF_INET : first burst 0

Trans. CPU CPU S.dem S.dem
Rate local remote local remote
per sec % S % S us/Tr us/Tr

8657.38 3.61 3.47 16.681 16.044
9247.23 3.21 3.57 13.882 15.463 burst 1
10324.20 4.69 4.27 18.152 16.550 burst 2
11371.37 4.08 4.31 14.340 15.150 burst 3
13726.78 2.51 3.03 7.305 8.823 burst 4
16007.27 4.82 8.12 12.052 20.283 burst 5
18231.57 3.30 3.43 7.230 7.529 burst 6
20235.90 2.98 3.01 5.893 5.950 burst 7
22214.26 3.99 3.24 7.184 5.837 burst 8
24002.79 4.00 2.99 6.663 4.984 burst 9
25778.28 4.46 3.58 6.918 5.562 burst 10
...
67198.41 7.11 6.29 4.229 3.745 burst 20
98375.44 9.76 9.00 3.967 3.659 burst 30
132360.98 11.86 12.00 3.583 3.627 burst 40
173646.81 15.43 14.87 3.554 3.424 burst 50
204709.83 18.38 17.15 3.591 3.351 burst 60
235860.77 20.81 19.94 3.529 3.382 burst 70

manny:~# HDR="-P 1";for i in 0 1 2 3 4 5 6 7 8 9 10; do netperf $HDR -t TCP_RR -H moe -c -C -B "burst $i" -- -r 200 -b $i -D; HDR="-P 0"; done
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to moe.west (10.208.0.15) port 0 AF_INET : nodelay : first burst 0

Trans. CPU CPU S.dem S.dem
Rate local remote local remote
per sec % S % S us/Tr us/Tr

8523.55 3.78 3.52 17.720 16.509
17714.38 5.16 4.94 11.652 11.161 burst 1
18660.94 5.92 5.59 12.697 11.978 burst 2
27373.66 8.78 8.53 12.828 12.462 burst 3
34303.27 10.22 10.67 11.914 12.436 burst 4
41652.40 11.34 10.39 10.891 9.973 burst 5
42222.80 12.43 12.81 11.778 12.135 burst 6
45601.75 13.03 12.76 11.430 11.196 burst 7
48737.80 13.58 13.47 11.142 11.052 burst 8
52505.19 14.43 14.25 10.994 10.858 burst 9
56406.20 14.95 14.40 10.602 10.209 burst 10
...
101401.90 24.74 24.35 9.761 9.605 burst 20
102946.48 24.99 24.75 9.711 9.619 burst 30
104170.04 24.99 24.72 9.595 9.493 burst 40

I stopped at 40 in the Nagle disabled case because it was pretty clear
things had maxed-out - again one of the four cores was saturated.

So, for smaller numbers of transactions outstanding at one time, the
transaction rate for the RR test is higher with Nagle disabled, but as
you increase the concurrent transactions, having Nagle enabled enables
a higher transaction rate because it allows several transactions to be
carried in a single TCP segment. As before, this is reflected in the
lower service demand figures for the Nagle enabled case.

rick jones
--
The computing industry isn't as much a game of "Follow The Leader" as
it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
- Rick Jones

David Schwartz

unread,

Jul 16, 2008, 12:01:22 AM7/16/08

to

On Jul 15, 4:16 pm, Sru...@gmail.com wrote:

> not much I presume, but still, communication between two apps where at
> least one of them buffered 200 bytes instead of MSS-1 would be just a
> wee faster, provided this app ( one that buffers only up to 200
> bytes ) would send lots of data of size greater than 200 but smaller
> than MSS.

No. Because such an application would be sending 200 bytes when no
unacknowledged data is pending and so wouldn't trigger Nagle. The data
replies from the other side would have ACKs piggy-backed on them, so
Nagle would never delay any transmissions.

Most people who criticize Nagle don't understand it.

DS

Mark (newsgroups)

unread,

Jul 16, 2008, 12:55:57 PM7/16/08

to

I was going to start a new thread on this, but after searching for
information on nagling and delayed acks, found this thread and decided
to hang my questions on here.

I work with some application architecture where timely notification of
data is far more important than max throughput (or at least, we do not
have any issues with max throughput).

The apps in question generally establish a connection with third party
servers over whose software we have no control of course. However,
it's not helpful to think in terms of client / server because
communication flows both ways. IOW either can initiate flow of
information at any time. That's irrelevant anyway. Transmission is
usually in small chunks of data that are likely to be sporadic and
almost always below the MSS. A typical flow (real data, not tcp/ip
acks) may look something like:

OurApp -> submit request 112 bytes -> 3rdParty
3rdParty -> acknowledge request 89 bytes -> OurApp
3rdParty -> dataX 140 bytes -> OurApp

The problem came in because 3rdParty was implementing nagling, and
OurApp was delaying tcp/ip acks, so we were getting delays between
"acknowledge request" and "dataX" of approximately 200ms. I have
WireShark captures if anyone really cares, but basically with tcp/ip
(simplified) included it looked something like this.

OurApp -> PSH "submit request" 112 bytes -> 3rdParty
3rdParty -> immediate ACK
3rdParty -> 40ms later PSH "acknowledge request" 89 bytes -> OurApp
(at this point 3rdParty has dataX available virtually instantly after
the above is sent, and I assume it tries to send, however due to
nagling, it is waiting for the ACK to the previous PSH).
OurApp -> delays ack: 200ms later ACK
3rdParty -> immediate PSH "dataX" 140 bytes -> OurApp

From our point of view, that delay of 200ms to receive "dataX" was
unacceptable.

I apologise in advance for the great simplifcation of what is
happening, or if I misunderstood the situation, but with a recent
upgrade to the 3rdParty server software, nagling has been disabled and
we see immediate resolution of this problem with so far no negative
consequences.

We are now seeing similar latency from a different 3rd party, and are
about to start investigations. In the previous example, disabling
delayed acks on the boxes OurApp ran on pretty much resolved the
problem. We saw a slight round trip latency because 3rdParty was still
waiting for the tcp/ip ACK, but it was acceptable. The better solution
was disabling nagling on their side since "dataX" is sent immediately,
incurring no round trip.

So if the case with the latest problem turns out to be the same, and
assuming we can't rely on 3rd party2 to disable nagling, what negative
effects will disabling delayed acks on our boxes have?

If it were just my application on the box I'd have no concerns, but
they are production machines we share server real estate with any
number of other applications. I wouldn't want to degrade their
performance in some way, say consuming more CPU for example.

Any corrections to my (mis)understanding are welcome.

Mark (newsgroups)

unread,

Jul 16, 2008, 1:57:18 PM7/16/08

to

Gah. I forgot to add, the connection and communication to 3rdParty and
OurApp is done via an API provided by 3rdParty (lib/dll). We have no
control over setting no ack delay at a socket level. It would have to be
at machine level, unless someone knows ways around this.

robert...@yahoo.com

unread,

Jul 16, 2008, 3:17:03 PM7/16/08

to

On Jul 16, 11:55 am, "Mark (newsgroups)" <marknewsgro...@yahoo.com>
wrote:

> So if the case with the latest problem turns out to be the same, and
> assuming we can't rely on 3rd party2 to disable nagling, what negative
> effects will disabling delayed acks on our boxes have?
>
> If it were just my application on the box I'd have no concerns, but
> they are production machines we share server real estate with any
> number of other applications. I wouldn't want to degrade their
> performance in some way, say consuming more CPU for example.

Turning off delayed acks will (usually slightly) increase the CPU load
on both ends of the conversation(s), and will result in more send
traffic from your host. It will usually help response time, at some
cost in bandwidth.

Rick Jones

unread,

Jul 16, 2008, 4:03:13 PM7/16/08

to

Here is my "boilerplate" Nagle discussion, performance discussion at
the end:

In broad terms, whenever an application does a send() call, the logic
of the Nagle algorithm is supposed to go something like this:

1) Is the quantity of data in this send, plus any queued, unsent data,
greater than the MSS (Maximum Segment Size) for this connection? If
yes, send the data in the user's send now (modulo any other
constraints such as receiver's advertised window and the TCP
congestion window). If no, go to 2.

2) Is the connection to the remote otherwise idle? That is, is there
no unACKed data outstanding on the network. If yes, send the data in
the user's send now. If no, queue the data and wait. Either the
application will continue to call send() with enough data to get to a
full MSS-worth of data, or the remote will ACK all the currently sent,
unACKed data, or our retransmission timer will expire.

Now, where applications run into trouble is when they have what might
be described as "write, write, read" behaviour, where they present
logically associated data to the transport in separate 'send' calls
and those sends are typically less than the MSS for the connection.
It isn't so much that they run afoul of Nagle as they run into issues
with the interaction of Nagle and the other heuristics operating on
the remote. In particular, the delayed ACK heuristics.

When a receiving TCP is deciding whether or not to send an ACK back to
the sender, in broad handwaving terms it goes through logic similar to
this:

a) is there data being sent back to the sender? if yes, piggy-back the
ACK on the data segment.

b) is there a window update being sent back to the sender? if yes,
piggy-back the ACK on the window update.

c) has the standalone ACK timer expired.

Window updates are generally triggered by the following heuristics:

i) would the window update be for a non-trivial fraction of the window
- typically somewhere at or above 1/4 the window, that is, has the
application "consumed" at least that much data? if yes, send a
window update. if no, check ii.

ii) would the window update be for, the application "consumed," at
least 2*MSS worth of data? if yes, send a window update, if no wait.

Now, going back to that write, write, read application, on the sending
side, the first write will be transmitted by TCP via logic rule 2 -
the connection is otherwise idle. However, the second small send will
be delayed as there is at that point unACKnowledged data outstanding
on the connection.

At the receiver, that small TCP segment will arrive and will be passed
to the application. The application does not have the entire app-level
message, so it will not send a reply (data to TCP) back. The typical
TCP window is much much larger than the MSS, so no window update would
be triggered by heuristic i. The data just arrived is < 2*MSS, so no
window update from heuristic ii. Since there is no window update, no
ACK is sent by heuristic b.

So, that leaves heuristic c - the standalone ACK timer. That ranges
anywhere between 50 and 200 milliseconds depending on the TCP stack in
use.

If you've read this far :) now we can take a look at the effect of
various things touted as "fixes" to applications experiencing this
interaction. We take as our example a client-server application where
both the client and the server are implemented with a write of a small
application header, followed by application data. First, the
"default" case which is with Nagle enabled (TCP_NODELAY _NOT_ set) and
with standard ACK behaviour:

Client Server
Req Header ->
<- Standalone ACK after Nms
Req Data ->
<- Possible standalone ACK
<- Rsp Header
Standalone ACK ->
<- Rsp Data
Possible standalone ACK ->

For two "messages" we end-up with at least six segments on the wire.
The possible standalone ACKs will depend on whether the server's
response time, or client's think time is longer than the standalone
ACK interval on their respective sides. Now, if TCP_NODELAY is set we
see:

Client Server
Req Header ->
Req Data ->
<- Possible Standalone ACK after Nms
<- Rsp Header
<- Rsp Data
Possible Standalone ACK ->

In theory, we are down two four segments on the wire which seems good,
but frankly we can do better. First though, consider what happens
when someone disables delayed ACKs

Client Server
Req Header ->
<- Immediate Standalone ACK
Req Data ->
<- Immediate Standalone ACK
<- Rsp Header
Immediate Standalone ACK ->
<- Rsp Data
Immediate Standalone ACK ->

Now we definitly see 8 segments on the wire. It will also be that way
if both TCP_NODELAY is set and delayed ACKs are disabled.

How about if the application did the "right" think in the first place?
That is sent the logically associated data at the same time:

Client Server
Request ->
<- Possible Standalone ACK
<- Response
Possible Standalone ACK ->

We are down to two segments on the wire.

For "small" packets, the CPU cost is about the same regardless of data
or ACK. This means that the application which is making the propper
gathering send call will spend far fewer CPU cycles in the networking
stack.

--
denial, anger, bargaining, depression, acceptance, rebirth...
where do you want to be today?

Mark (newsgroups)

unread,

Jul 16, 2008, 4:46:01 PM7/16/08

to

A pretty complete description of the problem, and seems to be exactly as
I understood it. Thanks for that.

Mark (newsgroups)

unread,

Jul 16, 2008, 4:50:45 PM7/16/08

to

Thank you. I guess the real answer is to look at the performance of the
boxes in question and see how much give we have. It's not trivial since
as I said, other applications share server real estate. But it seems
that I have understood the problem correctly.

Rick Jones

unread,

Jul 16, 2008, 5:07:32 PM7/16/08

to

"Mark (newsgroups)" <marknew...@yahoo.com> wrote:
> A pretty complete description of the problem, and seems to be
> exactly as I understood it. Thanks for that.

My pleasure. You should be able to plug-in your message sizes and the
sizes for a standalone TCP ACK segment (plus IP header and link-layer
header) and arrive at an estimate for the differences in maximum
network bandwidth achievable.

rick jones
--
Wisdom Teeth are impacted, people are affected by the effects of events.

Mark (newsgroups)

unread,

Jul 16, 2008, 5:29:43 PM7/16/08

to

Rick Jones wrote:
> "Mark (newsgroups)" <marknew...@yahoo.com> wrote:
>> A pretty complete description of the problem, and seems to be
>> exactly as I understood it. Thanks for that.
>
> My pleasure. You should be able to plug-in your message sizes and the
> sizes for a standalone TCP ACK segment (plus IP header and link-layer
> header) and arrive at an estimate for the differences in maximum
> network bandwidth achievable.

Interesting in theory, but not applicable to me in practice. I have a
few options

1) Leave things as they are - not really acceptable, we have clients
complaining about these 100-200ms latencies.

2) Try turning off delayed ack on a production machine - possible but
I'm very worried about the negative impacts on other applications

3) Hope the 3rd party comes out with a solution with nagling disabled on
their side.

As I mentioned, I have no control at a socket level on the tcp/ip
communication since this is done through their own provided API.

Rick Jones

unread,

Jul 16, 2008, 5:48:52 PM7/16/08

to

> Interesting in theory, but not applicable to me in practice. I have a
> few options

> 1) Leave things as they are - not really acceptable, we have clients
> complaining about these 100-200ms latencies.

> 2) Try turning off delayed ack on a production machine - possible
> but I'm very worried about the negative impacts on other
> applications

That was one of the reasons for doing the packet size overhead
calculation. If we are talking about an "ethernet like" thing, there
is 14 bytes worth of link-layer header, 20 bytes of IPv4 header and
then, assuming timstamps are on in TCP, 32 bytes of TCP header. So,
headers for any packet on the wire will be at least 14+20+32 or 66
bytes. You can then use your known application-level message and ack
sizes. That could tell you the effect at the network bandwidth level.
Effect at the CPU util level would require gathering some fundamental
performance figures for your system(s) and stacks(s) with something
like netperf. Perhaps using a test system if you have one.

> 3) Hope the 3rd party comes out with a solution with nagling
> disabled on their side.

An application-layer ACK implies an application-layer retransmission
mechanism. Is there one? Any idea what those timers happen to be and
whether the application can implemented application-layer delayed ACK?
Then it could piggy-back its ACKs on replies and avoid the nagle bit.

rick jones
--
firebug n, the idiot who tosses a lit cigarette out his car window

David Schwartz

unread,

Jul 16, 2008, 9:50:10 PM7/16/08

to

On Jul 16, 2:29 pm, "Mark (newsgroups)" <marknewsgro...@yahoo.com>
wrote:

> 3) Hope the 3rd party comes out with a solution with nagling disabled on
> their side.

You left out:

4) Hope the 3rt party comes out what a *proper* solution on their
side, sending all the data in a single write call like they're
supposed to.

5) Disabling Nagle on your side and dribbling data in the delay
interval to give your ACKs something to piggyback on.

DS

Mark (newsgroups)

unread,

Jul 17, 2008, 4:14:04 AM7/17/08

to

On Jul 17, 2:50 am, David Schwartz <dav...@webmaster.com> wrote:
> On Jul 16, 2:29 pm, "Mark (newsgroups)" <marknewsgro...@yahoo.com>
> wrote:
>
> > 3) Hope the 3rd party comes out with a solution with nagling disabled on
> > their side.
>
> You left out:
>
> 4) Hope the 3rt party comes out what a *proper* solution on their
> side, sending all the data in a single write call like they're
> supposed to.

Thanks but you lack understanding of the problem space. I'm reluctant
to actually say what it is we're doing due to sensitivities, but
needless to say you should accept that the situation I described is as
it is for a reason (I don't mean the nagling I mean sending the
"acknowledge" and "dataX" in seperate write calls).

> 5) Disabling Nagle on your side and dribbling data in the delay
> interval to give your ACKs something to piggyback on.

Firstly, I don't think this would solve the problem since data can be
sporadic therefore we'd still see the delay in many cases. Which is
not a solution. Secondly, as I mentioned, I do not have control over
the underlying tcp/ip communication which is done via an api provided.

David Schwartz

unread,

Jul 17, 2008, 9:02:15 AM7/17/08

to

On Jul 17, 1:14 am, "Mark (newsgroups)" <marknewsgro...@yahoo.com>
wrote:

> > 4) Hope the 3rt party comes out what a *proper* solution on their

> > side, sending all the data in a single write call like they're
> > supposed to.
>
> Thanks but you lack understanding of the problem space. I'm reluctant
> to actually say what it is we're doing due to sensitivities, but
> needless to say you should accept that the situation I described is as
> it is for a reason (I don't mean the nagling I mean sending the
> "acknowledge" and "dataX" in seperate write calls).

If the two sends are for a reason, then the 200mS delay is for a
reason. One is a direct consequence of the other.

> > 5) Disabling Nagle on your side and dribbling data in the delay
> > interval to give your ACKs something to piggyback on.
>
> Firstly, I don't think this would solve the problem since data can be
> sporadic therefore we'd still see the delay in many cases. Which is
> not a solution. Secondly, as I mentioned, I do not have control over
> the underlying tcp/ip communication which is done via an api provided.

Then wrap both ends of the connection with your own proxy. If you
can't even do that, then I'd say your problem is so specialized and
secret that nobody can give you advice with just what you've
disclosed.

DS

Rick Jones

unread,

Jul 17, 2008, 1:49:04 PM7/17/08

to

> > Thanks but you lack understanding of the problem space. I'm
> > reluctant to actually say what it is we're doing due to
> > sensitivities, but needless to say you should accept that the
> > situation I described is as it is for a reason (I don't mean the
> > nagling I mean sending the "acknowledge" and "dataX" in seperate
> > write calls).

Well, it is difficult to help optimize with our hands tied behind our
backs that way :(

> If the two sends are for a reason, then the 200mS delay is for a
> reason. One is a direct consequence of the other.

I would rather see the application-layer ack and the response in the
same send, but _if_ they are not logically associated sends, then in
broad terms it is "ok" (not great) to disable Nagle then. Rather but
not entirely like how it might be considered OK for X11 to do that to
keep mouse movements flowing.

rick jones

--
denial, anger, bargaining, depression, acceptance, rebirth...
where do you want to be today?

Digital Mercenary For Honor -2-

unread,

Jul 18, 2008, 10:08:24 AM7/18/08

to

On 2008-07-16 21:50:10 -0400, David Schwartz <dav...@webmaster.com> said:

> 4) Hope the 3rt party comes out what a *proper* solution on their
> side, sending all the data in a single write call like they're
> supposed to.

Indeed, there are too many applications out there nowadays, and the
associating attitude by their developers that TCP sockets are one "big
happy pipe" you can keep dumping stuff into.

This may seem like whacking a fly with a baseball bat, but if you have
a really picayune problem, where you can't or don't want to change
stack behavior because your application is co-located with others and
those other applications might not play nice with your tweaks, consider
one of the following:

- Move the application to its own box - if it's a special case, then
isolate it out so any tweaking for that third-party application can be
done in peace. If I were in the OP's shoes @ his company and management
said no, I'd tell them to be happy with being short-sighted, and enjoy
the consequences, etc.

- Create a "split" on the existing box where you engage the use of a
ToE (TCP Offload Engine) card, and have the application bind to / use
that interface / address combination on the existing hardware, after a
thorough understanding of the ToE cards tweak-ables for network stack
settings, etc. In this case you're "installing two TCP/IP stacks" on
the box - the one the OS uses, and the one the ToE NIC uses, etc. The
application is the root cause, indeed, but the extra hardware could
provide you the tweak you might want or need. I'm not interested in
hearing any flames from this suggestion about how ToE cards are evil or
how we should just write TCP/IP stacks correctly, etc. ToE cards, like
any hardware are good for some problems and not good for others. I've
personally solved some bad application issues with them.

- Probably the best solution here isn't even the technological one -
issues like this fester and become worse - it's likely more important
to consider the functions of what you're trying to do, and if the
current 3rd party application isn't getting it done, replace it. If the
3rd party application is linked to specific hardware and isn't getting
the job done, evaluate a competitor. If there is no competitor, I'm
sure there are many companies hungry for new business that would like
to become one.

/dmfh

--
_ __ _
__| |_ __ / _| |_ 01100100 01101101
/ _` | ' \| _| ' \ 01100110 01101000
\__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx

CSquared

unread,

Mar 21, 2009, 12:19:14 PM3/21/09

to

<Sru...@gmail.com> wrote in message
news:846124d4-5283-4f51...@y38g2000hsy.googlegroups.com...
> Hiya

> I’m new at TCP/IP. Anyway, quite a few things about TCP interactive
> data flow confuse me.
<snip>

I just wanted to say thanks to all of you for this discussion, and many
similar ones as well. As another TCP/IP newby, I find postings like these
quite helpful. I've saved many of the posts for future reference.
Later,
Charlie Carothers
--
To email me, eradicate obfuscate and remove dot invalid!