Turning off Nagles Algorithm

Tony Moran

unread,

May 4, 1999, 3:00:00 AM5/4/99

to

Hi could anyone point me to an executable for MS/Wintel machines to turn
off and on
Nagles Algorithm. Source code with it would be a major plus..

Thanks, Tony

Dmytro Myasnykov

unread,

May 4, 1999, 3:00:00 AM5/4/99

to

Hi
What does it means MS/Wintel?
Normally there is a function setsockopt() for socket or something like this.
You turn on option TCP_NODELAY and that's it - Nagle's algorithm is
disabled.....

Regards,
Dmitriy

Alun Jones

unread,

May 4, 1999, 3:00:00 AM5/4/99

to

In article <372EC97E...@eur.3com.com>, Tony Moran <tony_...@eur.3com.com> wrote:
> Hi could anyone point me to an executable for MS/Wintel machines to turn
> off and on
> Nagles Algorithm. Source code with it would be a major plus..

_If_ you really want to turn on the Nagle algorithm, you can do so on a
_per-socket_ basis only, in the source code for that socket's app, by
calling setsockopt with TCP_NODELAY.

However, as I note _every_ time someone asks this question, the fact that
you are asking how to turn off the Nagle algorithm very likely indicates
that you are not aware of how it, or the underlying TCP stack, works, and
that you are desperate for any performance gain you can get, while not
understanding how to get the best performance gain possible.

The Nagle algorithm's aim is simply to avoid clogging the network with
unnecessary short packets. Its standard example is that of a telnet client,
where a typist generates 1 character at every keypress. Without the Nagle
algorithm, this results in one packet of 41 bytes hitting the network every
time a key is pressed - an overhead of 4000%. With the Nagle algorithm, the
overhead is reduced by holding small packets until a larger packet can be
built and sent, and thus the overhead is reduced - as in the original RFC,
to as little as 1500%, which is quite a dramatic saving. It's rather like
suddenly doubling the bandwidth of your network - but it speeds up _other_
applications, without noticably affecting the telnet session.

What are you doing in your case that you feel the need to disable this
coalescing of smaller packets into larger ones? Often, it can be shown that
a redesign of your protocol, or your implementation of it, can achieve a far
better saving on your time than disabling the Nagle algorithm, and will not
adversely affect the network as will disabling Nagle. Please go see the
Winsock FAQ for more explanation, or come back and post what it is that you
feel requires you to disable the Nagle algorithm, and maybe we can offer
suggestions as to better ways to improve your program's performance without
being a network hog.

Alun.
~~~~

--
Texas Imperial Software | Try WFTPD, the Windows FTP Server. Find it
1602 Harvest Moon Place | at web site http://www.wftpd.com or email
Cedar Park TX 78613 | us at al...@texis.com. VISA / MC accepted.
Fax +1 (512) 378 3246 | NT based ISPs, be sure to read details of
Phone +1 (512) 378 3246 | WFTPD Pro, NT service version - $100.
*WFTPD and WFTPD Pro now available as native Alpha versions for NT*

Thomas R. Truscott

unread,

May 4, 1999, 3:00:00 AM5/4/99

to

The Nagle algorithm has become hopelessly obsolete.
We should get rid of it.
At the least it should be off by default.

> However, as I note _every_ time someone asks this question, the fact that
> you are asking how to turn off the Nagle algorithm very likely indicates
> that you are not aware of how it, or the underlying TCP stack, works, and
> that you are desperate for any performance gain you can get, while not
> understanding how to get the best performance gain possible.

Leaving aside the question of awareness, you have the conclusion
completely backwards. Turning off the Nagle algorithm
can easily result in a 100x performance boost.
In contrast, "a redesign of your protocol" to avoid triggering
Nagle will typically result in a minor additional speedup.

Even people who are quite familiar with Nagle get bitten by it.
I have a friend who does performance trouble-shooting at one of the
leading network-management companies.
A typical scenario is low throughput, and low cpu utilization.
She tells the authors that Nagle is biting them.
They don't believe it. I mean the authors KNOW this stuff, right?
So she sets TCP_NODELAY on a socket or two and poof the the software
runs 5x faster. Management is happy and they cut her a bonus check.
She has gotten several checks, just for flipping a bit!

> The Nagle algorithm's aim is simply to avoid clogging the network with
> unnecessary short packets. Its standard example is that of a telnet client,

Please use a different example, as this one is no longer relevant.
Networks that are so fast now that
user keystrokes are sent one at a time regardless of Nagle.
And I suspect that telnet traffic is a tiny blip on the Internet.

In contrast almost every TCP RPC-type program
is susceptible to Nagle. The penalty can be phenomenal.

Tom Truscott

Alun Jones

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

In article <7gnus3$ots$1...@hal.cs.duke.edu>, t...@cs.duke.edu (Thomas R. Truscott) wrote:
> Leaving aside the question of awareness, you have the conclusion
> completely backwards. Turning off the Nagle algorithm
> can easily result in a 100x performance boost.
> In contrast, "a redesign of your protocol" to avoid triggering
> Nagle will typically result in a minor additional speedup.

Bull.

> Even people who are quite familiar with Nagle get bitten by it.
> I have a friend who does performance trouble-shooting at one of the
> leading network-management companies.
> A typical scenario is low throughput, and low cpu utilization.
> She tells the authors that Nagle is biting them.
> They don't believe it. I mean the authors KNOW this stuff, right?
> So she sets TCP_NODELAY on a socket or two and poof the the software
> runs 5x faster. Management is happy and they cut her a bonus check.
> She has gotten several checks, just for flipping a bit!

And as such, she is contributing to bandwidth abuse.

> Please use a different example, as this one is no longer relevant.
> Networks that are so fast now that
> user keystrokes are sent one at a time regardless of Nagle.
> And I suspect that telnet traffic is a tiny blip on the Internet.

Okay, let's pick X-Windows, which uses TCP_NODELAY so that it can send mouse
movements without jitteriness caused by the Nagle algorithm. Why are mouse
movements required to be sent through TCP? Sure, the click points are
required to get sent without a loss, but the use of UDP to send absolute
mouse position rather than relative movements would result in a mouse that
didn't jitter, which followed the user's requirements, and which didn't clog
up the network with unnecessary header repetition.

> In contrast almost every TCP RPC-type program
> is susceptible to Nagle. The penalty can be phenomenal.

As I don't know anything about RPC, I'll decline to comment on that.
However, I will point you to Deja News to do a search on the TCP_NODELAY
option, and note that the majority of people posting questions regarding
disabling the Nagle algorithm do so because they:
a) don't understand the Nagle algorithm enough to avoid being hurt by it,
and
b) naively notice a speed improvement (or told of a speed improvement) when
disabling Nagle.

Let's face it, I can get a speed improvement in my car by sticking a jet
engine to the roof, but as one of the recent Darwin Awards winners found,
it's not exactly a smart idea, if you don't understand what you are doing.

There _is_ work underway to provide a slightly adjusted Nagle algorithm to
take advantage of the manner in which many TCP apps are currently written,
and you can read more about it in the Internet drafts; however, the Nagle
algorithm does reduce network bandwidth waste, at the comparatively low
price of asking programmers to _think_ a little about how they use (or
abuse) the network. It's similar in my mind to people who claimed that
their spreadsheet worked wonderfully fast on Windows 3.1, and yet I found it
unusable, because it prevented _all_ other applications from running while
it was calculating. Both approaches provide a speed improvement, and both
are not "playing nicely" with other users of the system.

Rick Jones

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

The example I like to use to demonstate an inappropriate use of
TCP_NODELAY is that of an email client (or server) where the client
writes the email header in one send, then the message body in another,
and the server cannot respond until it has the whole message.

If TCP_NODELAY is not set (generally the default), no more than about
five message per second will be exchanged - this assumes a standalone
ACK timer of 200 ms.

If TCP_NODELAY is set, that exchange will be three TCP segments - one
email header, one message body (we assume a small message), and one
email ack.

If instead the client application is written (imo correctly) to
present all logically associated data (ie the whole email message) to
the transport in one gathering send (ie writev) it will only be two
messages.

Three trips up and down the protocol stack versus two.

Now, some might say "so what, I've got plenty of cycles on my client."
One needs to consider the poor email server, which is sitting there
trying to serve hundreds, if not thousands of email clients. It needs
all the cycles it can get. With small exchanges, most of the CPU costs
are per-packet.

Long ago and far away, various web servers behaved like that mythical
email client. They set TCP_NODELAY and sent http headers separate from
URL data. As I recall, fixing that to be a gathering send with no
TCP_NODELAY was at least 10% on web server benchmarks. The memory is a
bit fuzzy - this was back in the second half of 1996...

rick jones
http://www.netperf.org/

--
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, or post, but please do not do both...
my email address is raj in the cup.hp.com domain...

Patrick McManus

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

On 5 May 1999 05:41:48 GMT, Rick Jones wrote:
>If TCP_NODELAY is set, that exchange will be three TCP segments - one
>email header, one message body (we assume a small message), and one
>email ack.

>If instead the client application is written (imo correctly) to
>present all logically associated data (ie the whole email message) to
>the transport in one gathering send (ie writev) it will only be two
>messages.
>
>Three trips up and down the protocol stack versus two.
>
>Now, some might say "so what, I've got plenty of cycles on my client."
>One needs to consider the poor email server, which is sitting there
>trying to serve hundreds, if not thousands of email clients. It needs
>all the cycles it can get. With small exchanges, most of the CPU costs
>are per-packet.

right on! and to go a little further..

not only is this less work on both the client and server stacks, it's
less work on every router in between, and it's a lot less consumed
overall bandwidth (40 bytes overhead on every packet)..

the only time an app should turn off nagle is when it is generating
sub-segment sized data at max(sub-delayed ack,rtt) times and wants to
stream them.. (i.e. creates 500 bytes every 100ms).. it makes sense
there. (note that this is actually telnet! and the cost was deemed too
great to handle, thus the aggregation.. so care must be taken in this
case to make sure the data introduced into the network will still be
reasonable.)

>Long ago and far away, various web servers behaved like that mythical
>email client. They set TCP_NODELAY and sent http headers separate from
>URL data. As I recall, fixing that to be a gathering send with no
>TCP_NODELAY was at least 10% on web server benchmarks. The memory is a
>bit fuzzy - this was back in the second half of 1996...

I guess the 1 packet reduction is helpful as small HTTP/1.0
transactions can be as small as 8 or 9 packets total, even including
the 3 way handshake, so this is plausible.

but to be fair, this application didn't often suffer from the delay problem
we constantly see misunderstood because the 2nd (3rd, 4th, etc..)
packets weren't sub mss sized and so nagle doesn't apply to them.

--
Patrick R. McManus - AppliedTheory Communications - Software Engineering
http://pat.appliedtheory.com/~mcmanus Lead Developer
mcm...@AppliedTheory.com 'Prince of Pollywood' Standards, today!
*** - You Kill Nostalgia, Xenophobic Fears. It's Now or Neverland. - ***

Thomas R. Truscott

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

> The example I like to use to demonstate an inappropriate use of
> TCP_NODELAY is that of an email client (or server) where the client
> writes the email header in one send, then the message body in another,
> and the server cannot respond until it has the whole message.

As you point out, whether TCP_NODELAY is on or off
makes no difference in the number of packets.
The only difference is that this exchange takes about
200 milliseconds by default, and one millisecond with TCP_NODELAY.

As you also point out, if the email client were
to manually buffer its data and write it all at once
then the exchange might take as little as 0.5 milliseconds.

So flipping on NODELAY gives a 200x speedup.
Overhauling the email client gives an additional 2x speedup.

Why not just have NODELAY active all the time?
It dramatically speeds up "badly written" clients,
and has no effect whatsoever on "well written" clients!!

> Long ago and far away, various web servers behaved like that mythical
> email client. They set TCP_NODELAY and sent http headers separate from
> URL data. As I recall, fixing that to be a gathering send with no
> TCP_NODELAY was at least 10% on web server benchmarks. The memory is a
> bit fuzzy - this was back in the second half of 1996...

Yes, 10% is a nice speedup (and more realistic than my idealized 2x).
And now it does not matter whether or not NODELAY is on, right?

Do you ever log onto a remote host
and use "more" or some such to scan through a text file,
and notice that the display is not entirely "snappy"?
That is, the screen painting sometimes pauses in the middle?
(It is quite subtle.)
Do you chalk it up to a network glitch, or timesharing delay?
I used to, until I put up two windows to a remote system
and paged through the same file in each window.
One of the connections was over an experimental protocol
that happened not to implement the Nagle algorithm.

Tom Truscott

Alun Jones

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

In article <7gpme9$7br$1...@hal.cs.duke.edu>, t...@cs.duke.edu (Thomas R. Truscott) wrote:
> So flipping on NODELAY gives a 200x speedup.
> Overhauling the email client gives an additional 2x speedup.
>
> Why not just have NODELAY active all the time?
> It dramatically speeds up "badly written" clients,
> and has no effect whatsoever on "well written" clients!!

Because the goal is not simply to speed up that one client - it is to make
more effective use of the network bandwidth in general.

Writing your code correctly saves on network bandwidth, as well as running
faster. Disabling Nagle runs faster, at the expense of increasing the
bandwidth required (i.e. wasting some of it).

You might view this as punishing bad programming - but is that really a bad
thing to do? Most people don't write good programs as a feel-good exercise
- really good programmers do, but the two-bit hacks of the world, who'll
(for instance) loop tightly around a select call to wait for their
connection to succeed, in the hopes of getting their data faster, and thus
decrease their benchmark timings, could do with an occasional shove every
now and again to do things _right_. These are the 'other programs' that
edge 'our programs' out of the processor/network/memory/resources that we
would like to use, for apparently very little purpose.

At this point, I'd like to start my usual rant about the general malaise
afflicting average programmers lately, who assume that their program is the
only one that the user (and the OS) should be interested in - Internet
Explorer, for instance, which forces itself to the foreground at least three
distinct times (probably more) during its startup, killing off any menus you
might be trying to activate in other apps; or QuickMail, which tells you via
a _system_modal_ dialog box that you've got mail; or the project management
package that locks down 4MB of _physical_ memory so that it can perform its
calculations in memory (and thereby cut its own nose to spite its face - on
the 8MB platform that was its target, the OS had to be swapped in and out
repeatedly, thus more than killing any performance gain that might be had).

I'm not suggesting that we shouldn't release software until we've reached
godhood, but I do wish that people would write their programs as right as
they know how, rather than accepting the quick hack. TCP_NODELAY is the
quick hack; it's rude to other users of the network (i.e. other apps, other
PCs, and the people sat controlling them), and it disables a rather
elegantly simple fix to the problem of overcongestion of network resources.

Thomas R. Truscott

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

> Writing your code correctly saves on network bandwidth, as well as running
> faster. Disabling Nagle runs faster, at the expense of increasing the
> bandwidth required (i.e. wasting some of it).

I agree (substituting "optimally" for "correctly").
And the tradeoffs are strongly in favor of disabling Nagle.
An "optimal" code runs the same whether or not Nagle is disabled.
A "suboptimal" code runs several times faster when Nagle is disabled.
Yes, the bandwidth required *might* increase because slightly
more packets are sent than would be otherwise,
but mostly the bandwidth required increases because
the code is not being pointlessly crippled!

> You might view this as punishing bad programming -
> but is that really a bad thing to do?

Yes, when the punishment is delivered without warning
and in such a way that people often do not "get it".

In a previous note I pointed out the "Nagle glitch"
that plagues "more" and other full-screen text display programs.
You might say that such programs should disply the entire page
with a single output call, and that those which do not
are incorrect and written by bad programmers.
But I would say that the tradeoffs are complex
and it is difficult to achieve optimality.
Suppose "more" is obtaining text to display
from a pipeline or other unpredictable source.
How long should "more" wait before displaying what it has collected?

Tom Truscott

Thomas R. Truscott

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

>> Long ago and far away, various web servers behaved like that mythical
>> email client. They set TCP_NODELAY and sent http headers separate from
>> URL data. As I recall, fixing that to be a gathering send with no

>> TCP_NODELAY was at least 10% on web server benchmarks. ...

TCP_NODELAY is still necessary, as explained in this comment
in http_main.c in the Apache HTTP Server:

/* The Nagle algorithm says that we should delay sending partial
* packets in hopes of getting more data. We don't want to do
* this; we are not telnet. There are bad interactions between
* persistent connections and Nagle's algorithm that have very severe
* performance penalties. (Failing to disable Nagle is not much of a
* problem with simple HTTP.)

The Nagle algorithm was a boon, once, for telnet.
It is a disaster for just about everything else.

Tom Truscott

Alun Jones

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

In article <7gq677$b8k$1...@hal.cs.duke.edu>, t...@cs.duke.edu (Thomas R. Truscott) wrote:
> I agree (substituting "optimally" for "correctly").
> And the tradeoffs are strongly in favor of disabling Nagle.
> An "optimal" code runs the same whether or not Nagle is disabled.
> A "suboptimal" code runs several times faster when Nagle is disabled.
> Yes, the bandwidth required *might* increase because slightly
> more packets are sent than would be otherwise,
> but mostly the bandwidth required increases because
> the code is not being pointlessly crippled!

This is not a good analysis of the improvement. The bandwidth required
_does_ increase because more packets are sent than would be otherwise, and
each packet contains at least 40 bytes of header. Hence, replacing one
packet with three adds 80 bytes that are unnecessary. On any network
topologies that require a frame to be filled, of course, the bandwidth used
is directly proportional to the number of packets sent, and the wastage is
even greater. With more packets, you also have more chance of collision on
a non-switched network, and thus retransmission.

In pretty much any network-based application, the network is the bottleneck.
Better use of the network, and reducing that bottleneck, is going to
improve not just your program's performance, but every other program that
hits that bottleneck. Disabling the Nagle algorithm makes your program go a
little faster at the expense of other programs. Working with the Nagle
algorithm in mind makes your program go even faster than that, without
adversely affecting any other programs.

> Yes, when the punishment is delivered without warning
> and in such a way that people often do not "get it".

It is not delivered without warning. I challenge you to find any book about
programming TCP/IP that does not mention the Nagle algorithm and how to work
with it. You might as well claim that TCP's ignorance of message boundaries
is also a punishment delivered without warning - it punishes people that
don't bother to know anything about TCP.

> In a previous note I pointed out the "Nagle glitch"
> that plagues "more" and other full-screen text display programs.
> You might say that such programs should disply the entire page
> with a single output call, and that those which do not
> are incorrect and written by bad programmers.
> But I would say that the tradeoffs are complex
> and it is difficult to achieve optimality.
> Suppose "more" is obtaining text to display
> from a pipeline or other unpredictable source.
> How long should "more" wait before displaying what it has collected?

Oh, gee, let's see - about one or two tenths of a second would be in line
with the way that the Nagle algorithm works. Actually, "more" shouldn't
bother to wait before displaying - it's not a network-aware application.
The telnet-server, however, _is_.

No, the Nagle algorithm is not always a perfect solution. However, more
often than not, those people that ask how to disable it are unaware that a
simple adjustment to their programs or a better design of their protocol
will not only avoid a hit from the Nagle algorithm, but will also run faster
than simply disabling the Nagle algorithm. As I mentioned, please do a
search on DejaNews for TCP_NODELAY, and you'll find _plenty_ of postings
where the problem was fixed by a simple matter of programming, and maybe one
or two where the protocol needed redesign. I can only think of one
particular person whose requirements made it truly difficult to rework his
protocol or his application to not want the Nagle algorithm disabled.

What this means is that _most_ applications benefit from the Nagle algorithm
- or at least other users of the network benefit from those applications
being handled by the Nagle algorithm. A few applications benefit from
disabling the Nagle algorithm. This sounds like it means the default should
be that the Nagle algorithm is enabled, and the option presented to disable
it, if you are aware of what the effects will be.

And, surprise surprise, this is exactly what happens.

Those that came before you in designing the operation of TCP/IP truly knew
what they were doing. Learn some, and start analysing network traffic as
deeply as they did, before you decide that they were "obviously" wrong.

Alun Jones

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

In article <7gq71v$bfg$1...@hal.cs.duke.edu>, t...@cs.duke.edu (Thomas R. Truscott) wrote:
> TCP_NODELAY is still necessary, as explained in this comment
> in http_main.c in the Apache HTTP Server:
>
> /* The Nagle algorithm says that we should delay sending partial
> * packets in hopes of getting more data. We don't want to do
> * this; we are not telnet. There are bad interactions between
> * persistent connections and Nagle's algorithm that have very severe
> * performance penalties. (Failing to disable Nagle is not much of a
> * problem with simple HTTP.)

And this is why there is a draft proposing a different coalescing algorithm
that works better for such a common scenario, without dropping entirely the
benefit provided by Nagle.

> The Nagle algorithm was a boon, once, for telnet.
> It is a disaster for just about everything else.

Okay - you've found _one_ protocol where the designers thought it necessary
to disable Nagle, and this means that Nagle is a disaster for "just about
everything". Look around at the other TCP-based protocols, such as SMTP,
FTP, NNTP, etc, which all benefit from the correct attention being paid to
the Nagle algorithm, and none of which have yet been proposed as a reason to
disable Nagle.

As I said, go study those that came before you, look at their work, and
analyse network traffic as carefully as they did. Then, and only then, you
can say that they were wrong.

The Nagle algorithm is something that every TCP programmer should be aware
of as much as they are aware that, say, TCP is a streams-based protocol that
runs over a packet-based protocol. If they learn one and not the other,
then they put the book down too early, or left the course before the first
week was over.

Rick Jones

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

Thomas R. Truscott (t...@cs.duke.edu) wrote:
: Why not just have NODELAY active all the time?

: It dramatically speeds up "badly written" clients,
: and has no effect whatsoever on "well written" clients!!

Because it tries to protect the network from badly written clients. It
forces a badly written client (or server) to take explicit action
allowing it to do nasty things to the network and the systems.

: Yes, 10% is a nice speedup (and more realistic than my idealized

: 2x). And now it does not matter whether or not NODELAY is on,
: right?

I believe it goes back to a (the?) fundamental maxim of the Internet -
"Be conservative in what you send and liberal in what you accept."

TCP_NODLEAY on by default is not "conservative in what you send."

Part of the difficulty is that some stacks (IMO) have implemented
Nagle "wrong" - they interpret Nagle on a segment by segment basis
instead of per-send. If an application makes a 4096 byte send on an
ethernet, all three segments should be tranmitted without
nagle-induced delay.

Would it be a good idea to hav ethe default for stdio be unbuffered?

rick jones

Rick Jones

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

: >Long ago and far away, various web servers behaved like that

: >mythical email client. They set TCP_NODELAY and sent http headers
: >separate from URL data. As I recall, fixing that to be a gathering
: >send with no TCP_NODELAY was at least 10% on web server

: >benchmarks. The memory is a bit fuzzy - this was back in the second
: >half of 1996...

: I guess the 1 packet reduction is helpful as small HTTP/1.0

: transactions can be as small as 8 or 9 packets total, even including
: the 3 way handshake, so this is plausible.

The SPECweb96 benchmark has 35% of the requests for URL's distributed
between 102 and 916 bytes, and 50% in the 1024 to 9KB range.

Rick Jones

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

Thomas R. Truscott (t...@cs.duke.edu) wrote:

: In a previous note I pointed out the "Nagle glitch" that plagues

: "more" and other full-screen text display programs. You might say
: that such programs should disply the entire page with a single
: output call, and that those which do not are incorrect and written
: by bad programmers. But I would say that the tradeoffs are complex
: and it is difficult to achieve optimality. Suppose "more" is
: obtaining text to display from a pipeline or other unpredictable
: source. How long should "more" wait before displaying what it has
: collected?

One "hole" that was pointed-out by others in another forum is that it
would be quite helpful were there a flush mechanism for a TCP
socket. You leave nagle on, and then when the application has finished
dribbling data into the socket - say more knows it has finished a
screen's worth of data - it say's "please flush" much like the fflush
of stdio.

Perhaps (and I'm guessing here, not being intimately familiar with the
inner workings of more, it might already be doing this) that the
specific case of more, more knows how many characters would fit in one
screen, so it could wait until it has that many characters. If there
are not going to be that many characters, the pipe will close on
it. That screen is going to sit half-painted in the case of short data
either way.

In the other case of something like more, those subtle delays migh
also be dealt with by a more appropriate selection of the socket/TCP
window size by the underlying telnet/rsh connection - some fraction
(perhaps 1/1) of the screen size. That way window updates are
triggered in a more timely fashion, the ACK's get piggy-backed on the
window updates, and the delays decrease.

An 80x24 window is something like 1920 bytes of data - it does not
seem as though there is a particularly good reason to have an 8192 or
32768 byte TCP window below that (which is an outgrowth of making the
defaults larger and larger so application developers do not have to
think about what window size they need - feels like a trend
:).

Telnet/rsh see the SIGWINCH/etc and could start with a window of 1920
or maybe 3840, and then adjust the socket buffer/window as the screen
size is changed.

rick jones
an ounce of thought is worth a pound of kludge

Michael Wojcik

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

[followups set to comp.protocols.tcp-ip]

In article <7gpme9$7br$1...@hal.cs.duke.edu>, t...@cs.duke.edu (Thomas R. Truscott) writes:

> [attribution to Rick Jones lost in Truscott's post]

> > The example I like to use to demonstate an inappropriate use of
> > TCP_NODELAY is that of an email client (or server) where the client
> > writes the email header in one send, then the message body in another,
> > and the server cannot respond until it has the whole message.

> As you point out, whether TCP_NODELAY is on or off
> makes no difference in the number of packets.
> The only difference is that this exchange takes about
> 200 milliseconds by default, and one millisecond with TCP_NODELAY.

> As you also point out, if the email client were
> to manually buffer its data and write it all at once
> then the exchange might take as little as 0.5 milliseconds.

Where did you get those times? They're certainly not in Rick's
post (except for the 200ms timer).

> So flipping on NODELAY gives a 200x speedup.
> Overhauling the email client gives an additional 2x speedup.

If the "overhaul" is done correctly, the performance gain for the
client will be the same as turning Nagle off. The client and the
server will be alternating sends, there will be no sends with
outstanding packets, and Nagle won't come into effect.

> Why not just have NODELAY active all the time?
> It dramatically speeds up "badly written" clients,
> and has no effect whatsoever on "well written" clients!!

You still don't understand Nagle. Disabling it would have an adverse
effect on well-written clients - and on poorly written ones as well -
because programs that do multiple small sends would be wasting
bandwidth.

> > Long ago and far away, various web servers behaved like that mythical
> > email client. They set TCP_NODELAY and sent http headers separate from
> > URL data. As I recall, fixing that to be a gathering send with no
> > TCP_NODELAY was at least 10% on web server benchmarks. The memory is a
> > bit fuzzy - this was back in the second half of 1996...
>

> Yes, 10% is a nice speedup (and more realistic than my idealized 2x).
> And now it does not matter whether or not NODELAY is on, right?

Wrong. Nagle still protects from multiple small sends consuming
excessive bandwidth. It's not a question of "how much does Nagle
affect my application's sends": the answer to that is *always*
for the better *if the application is written correctly*. We
*want* telnet keystrokes to be buffered by the stack, so that
everyone else's data can get through expediently. Even if there's
nothing on the network except telnet traffic, we want to prevent
those telnets from interfering with each other excessively.

(And skip the "telnet is an outdated example", unless you have an
actual argument to support that claim.)

> Do you ever log onto a remote host
> and use "more" or some such to scan through a text file,
> and notice that the display is not entirely "snappy"?
> That is, the screen painting sometimes pauses in the middle?

Nope. I can cat files across my 10Mb/s Ethernet, and they're
smooth as silk and faster than I can read (which is pretty damn
fast). They're probably even better over the 16Mb/s Token Ring,
but I can't tell the difference.

> (It is quite subtle.)

It would have to be.

> Do you chalk it up to a network glitch, or timesharing delay?
> I used to, until I put up two windows to a remote system
> and paged through the same file in each window.
> One of the connections was over an experimental protocol
> that happened not to implement the Nagle algorithm.

Which demonstrates approximately nothing about the Nagle
algorithm. Nagle isn't there to make your screen displays
"snappy". It's there to keep your screen displays from
congesting the line.

I've seen plenty of applications sped up by disabling Nagle.
I could do the same using QOS, on a stack that supported it,
or by hacking the stack to prioritize my app's traffic over
everything else. Would that be a good thing to do in the
general case?

Michael Wojcik michael...@merant.com
AAI Development, MERANT (block capitals are a company mandate)
Department of English, Miami University

This year's runner-up in the All-Usenet Creative Use Of English In A
Quasi-Legal But Probably Completely Ineffectual Signature Statement:

Disclaimer : I am a free denizen of this world and statements are of mine
and solly mine. Nobody dare sue me as you may end up even loosing your
attorney fees.
-- Sridhar (holag...@hotmail.com)

Eric A. Hall

unread,

May 5, 1999, 3:00:00 AM5/5/99

to Thomas R. Truscott

> The Nagle algorithm has become hopelessly obsolete.
> We should get rid of it.
> At the least it should be off by default.

No, no and no.

There are only two scenarios to look at when discussing Nagle. If your
application is *ALWAYS* sending small blocks of data in a chatty
exchange with the remote system, then you probably ought to turn it off
for your connection, since queuing won't help (no more data is coming).
But if you are going to be exchanging more data than will fit within a
single frame -- regardless of how much data you are writing to the
network -- you should leave Nagle off.

Most applications (mail, database, web, etc.) fall into the latter camp,
constantly generating 2-200k size chunks of data. The problems start
when developers in the latter group only write data in 512 byte chunks
and "see" that disabling Nagle gives a big boost in speed, since the
data isn't being queued up. What they don't see is that they're
generating three times as many packets.

Big deal? Probably not for one or two apps. But turning it off by
default (your suggestion) would mean that all of the apps on the network
would start generating a significantly greater number of packets, which
would cumulatively cause problems on a good number of networks. Most
networks already have utilization problems on their backbones, and this
would only worsen the problem dramatically.

--
Eric A. Hall eh...@ehsco.com
+1-650-685-0557 http://www.ehsco.com

Eric A. Hall

unread,

May 5, 1999, 3:00:00 AM5/5/99

to Thomas R. Truscott

> But if you are going to be exchanging more data than will fit within a
> single frame -- regardless of how much data you are writing to the
> network -- you should leave Nagle off.

"ON". You should leave Nagle ON.

Eric (who is going for another cup of coffee)

Eric A. Hall

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

> The Nagle algorithm has become hopelessly obsolete.
> We should get rid of it.
> At the least it should be off by default.

No, no and no.

There are only two scenarios to look at when discussing Nagle. If your
application is *ALWAYS* sending small blocks of data in a chatty
exchange with the remote system, then you probably ought to turn it off
for your connection, since queuing won't help (no more data is coming).

But if you are going to be exchanging more data than will fit within a
single frame -- regardless of how much data you are writing to the

network -- you should leave Nagle on.

Most applications (mail, database, web, etc.) fall into the latter camp,
constantly generating 2-200k size chunks of data. The problems start
when developers in the latter group only write data in 512 byte chunks
and "see" that disabling Nagle gives a big boost in speed, since the
data isn't being queued up. What they don't see is that they're
generating three times as many packets.

Big deal? Probably not for one or two apps. But turning it off by
default (your suggestion) would mean that all of the apps on the network
would start generating a significantly greater number of packets, which
would cumulatively cause problems on a good number of networks. Most
networks already have utilization problems on their backbones, and this
would only worsen the problem dramatically.

--

Rick Jones

unread,

May 6, 1999, 3:00:00 AM5/6/99

to

Thomas R. Truscott (t...@cs.duke.edu) wrote:

: TCP_NODELAY is still necessary, as explained in this comment

: in http_main.c in the Apache HTTP Server:

: /* The Nagle algorithm says that we should delay sending partial
: * packets in hopes of getting more data. We don't want to do
: * this; we are not telnet. There are bad interactions between
: * persistent connections and Nagle's algorithm that have very severe
: * performance penalties. (Failing to disable Nagle is not much of a
: * problem with simple HTTP.)

As I recall, a number of the experiments with Apache were on stacks
that (IMO) mis-implemented nagle as per-segment rather than
per-send. That indeed (can) has a really nasty effect on apps that
would otherwise not be affected -those alerady sending > MSS worth of
bytes. So, I'm not quite prepared to dismiss Nagle.

If the connection is persistent (not pipelined), and nagle is
implemented per-send (from the user) rather than per-segment, there is
no issue for an http server. Each send - the client's or the server's
is into an "idle" connection.

If the connection is pipelined, it starts to look more and more like a
bulk transfer, in which case we do want Nagle to buffer the small
sends. The client's first pipelined request (if written to the
connection "properly" as one send/writev/etc call) goes out onto an
idle connection. The other requests are now in a race between the
client generating them and the server responding to the first
request. If the client is first, the queued requests will wait for the
server's first response (which will ACK the client's first
request). If the server is first, we essentially start again from the
initial state and the client's next request goes into an idle
connection. The standalone ACK delays (which are the delays that
someone might notice in Nagle, not the RTTs (perhaps informed opinion
rather than observed fact) are what tempt people to dismiss or disable
Nagle.

So, I think that 99% of the "problems" with Nagle are problems of
implementation - both in the applications, and in some stacks.

rick jones

I suspect a good way to find such broken stacks (though I am not
certain) would be to run a netperf TCP_RR test between two machines
with request/response sizes of MSS+1 byte. If Nagle is implemented
"properly" the transaction rate will be decent. If it is implemented
improperly, there will be a transaction limit based on the standalone
ACK timers on either side. So, if one side is broken and the other a
200 ms standalone ACK timer, the transaction rate reported by netperf
will be ~5 per second. If both sides are broken, it would be
~2.5. Adjust as per the settings on your stack(s).

Here is HP-UX 10.20 talking to HP-UX 11.00:

$ ./netperf -t TCP_RR -H lag -- -r 1461,1461
TCP REQUEST/RESPONSE TEST to lag
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec

32768 32768 1461 1461 10.01 111.88
32768 32768

nice and zippy :)

D. J. Bernstein

unread,

May 6, 1999, 3:00:00 AM5/6/99

to

Thomas R. Truscott <t...@cs.duke.edu> wrote:
> How long should "more" wait before displaying what it has collected?

Flush the output buffer immediately before calling read().

This rule, unlike stdio's crude ``flush at the end of lines if stdout is
a tty,'' works extremely well in practice. It almost never needs to be
supplemented by manual flushing, yet it rarely flushes so often as to
cause the noticeable packet overhead that motivated Nagle's algorithm.

---Dan

Dmytro Myasnykov

unread,

May 6, 1999, 3:00:00 AM5/6/99

to

I guess it has nothing to do with size of data. It is more connected with
question: how often do you want to see new data on another side. For
example, if I do measuring/controlling software, it is very important to see
data values each 50ms (20Hz). In this case I set Nagle "off". If I do email
program, it makes no difference these small delays

Dmitriy

Eric A. Hall wrote:

> > But if you are going to be exchanging more data than will fit within a
> > single frame -- regardless of how much data you are writing to the

Alun Jones

unread,

May 6, 1999, 3:00:00 AM5/6/99

to

In article <7gqbfj$21bc...@news.io.com>, al...@texis.com (Alun Jones) wrote:
[Responding to Truscott]

> > * performance penalties. (Failing to disable Nagle is not much of a
> > * problem with simple HTTP.)
>

> And this is why there is a draft proposing a different coalescing algorithm
> that works better for such a common scenario, without dropping entirely the
> benefit provided by Nagle.

A few people have emailed me asking for details of this draft. It's
available at ftp://ftp.isi.edu/internet-drafts/draft-minshall-nagle-00.txt

I haven't given it more than a cursory glance, and as I've said before,
there's other people that have done more analysis and can comment far better
on it than I can.

Thomas R. Truscott

unread,

May 6, 1999, 3:00:00 AM5/6/99

to

>> TCP_NODELAY is still necessary, as explained in this comment
>> in http_main.c in the Apache HTTP Server:

>> ...

>
> Okay - you've found _one_ protocol where the designers thought it necessary

> to disable Nagle, ...

The Samba designers also thought it necessary to disable Nagle,
as it is off by default in newer versions.

<http://us1.samba.org/samba/ftp/docs/textdocs/Speed.txt>
Many people report that adding "socket options = TCP_NODELAY"
doubles the read performance of a Samba drive.

Would you like nore examples? Here are a few programs
running on machines where I happened to have an "enhanced netstat"
which reports the needed protocol flags info:

X Window protocol
NFS-over-TCP
Informix Database server
IBM DB2 database server

There were a dozen others, but were either vendor hackery
or were using port numbers that I couldn't easily decipher.

I can't resist mentioning one particular hack, though.
Hewlett-Packard (and they are probably not alone)
has disabled the Nagle algorithm in the telnet daemon.
A simple and safe way to fix the full-screen "Nagle glitch".
But isn't it delicious? Disabling the algorithm
in the very program for which the algorithm was invented!

Tom Truscott

Eric A. Hall

unread,

May 6, 1999, 3:00:00 AM5/6/99

to Thomas R. Truscott

> <http://us1.samba.org/samba/ftp/docs/textdocs/Speed.txt>
> Many people report that adding "socket options = TCP_NODELAY"
> doubles the read performance of a Samba drive.

And doubles network utilization, too. If they were filling the queue
with big segments, then there would still be the fast performance, with
lower levels of utilization.

You should be mad they are doing this, rather than gloating that they
found a way to boost performance at the expense of other network apps.

> X Window protocol

X is a well-known exception to the rule, since mouse movements must not
be delayed. This is documented everywhere.

> NFS-over-TCP
> Informix Database server
> IBM DB2 database server

As I said in my earlier post:

Most applications (mail, database, web, etc.) fall into the latter camp,
constantly generating 2-200k size chunks of data. The problems start
when developers in the latter group only write data in 512 byte chunks
and "see" that disabling Nagle gives a big boost in speed, since the
data isn't being queued up. What they don't see is that they're
generating three times as many packets.

Writing bigger blocks would have the same result, with much less
utilization. I don't get why you're happy they do this.

Thomas R. Truscott

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

> ... Look around at the other TCP-based protocols, such as SMTP,

> FTP, NNTP, etc, which all benefit from the correct attention being paid to
> the Nagle algorithm, and none of which have yet been proposed as a reason to
> disable Nagle.

This is your list of protocols, and I was tempted to dismiss them
as examples where Nagle has little effect either way.
But I will jump out on a limb and claim that all three
that you mentioned are hurt by Nagle to some degree,
and would benefit if it were disabled.

I did not have time to study all of them
and so I only examined ftp, telnet's venerable cousin.
It is so old that it pre-dates the Nagle algorithm,
and indeed RFC 896 examines file transfer scenarios
in which the worst case penalty turns out to be a mere 1.6%.

I looked at current ftp implementations, and guess what?

In most current ftp implementations,
the minimum time to get or put a file is one Nagle.

A typical Nagle is 0.2 seconds,
but systems such as Solaris have a dynamic penalty
which in my tests was about 0.07 seconds.
I found this penalty on Windows NT 4.0, HP-UX,
Sun Solaris, IBM AIX, Compaq Tru64 (Digital) Unix, and SGI IRIX.

Linux and Unixware 7 both seem to avoid the problem,
I do not know why, and prefer not to speculate here,

A DEMO OF THE FTP GLITCH
(This is for Unix, but works on NT too if you have ksh :-)
# create a directory with 100 small files
mkdir foo
cd foo
for i in 0 1 2 3 4 5 6 7 8 9
do
for j in 0 1 2 3 4 5 6 7 8 9
do
touch $i$j
done
done

# switch to an empty directory
cd ..
mkdir bar
cd bar
# I think loopback shows the problem,
# but to be safe you could do this from another box.
ftp <yourself>
cd foo
binary
prompt
mget #

The mget * will suffer 100 Nagles, one per file.
This typically means a little over 20 seconds.

On Linux the mget * takes about 3 seconds.
Of course, zero-length files are silly,
so I created 100 copies of /etc/passwd (888799 bytes)
in a directory on HP-UX, and ran ftp on Linux
to download them. The mget * took about 15 seconds.
I think this demonstrates that the Nagle algorithm
results in an ftp penalty larger than 1.6%.

THE CAUSE OF THE FTP GLITCH
The glitch is in ftpd, and is quite simple.
When the connection is opened, ftpd reports
150 Opening ... data connection ...
When the transfer is finished ftpd reports
226 Transfer complete.
These reports are on the same socket and so
the second message, and hence the file transfer,
can be delayed by as much as a Nagle.

THE SIGNIFICANT OF THE FTP GLITCH
The ftp protocol is one of the oldest and heavily
used in the world, transferring billions of files every year.
Yet many (perhaps most) of those transfers suffer a Nagle.
Do the math, this is not a trivial penalty.

How could we have let this happen?
Are we so rooted in the past that we overlook the present?

Tom Truscott

Rick Jones

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

: > Okay - you've found _one_ protocol where the designers thought it

: > necessary to disable Nagle, ...

: The Samba designers also thought it necessary to disable Nagle,
: as it is off by default in newer versions.

: <http://us1.samba.org/samba/ftp/docs/textdocs/Speed.txt>

: Many people report that adding "socket options = TCP_NODELAY"
: doubles the read performance of a Samba drive.

I've been meaning to look into why that is the case, but I've not had
the time. I am suspicious that the reason might be that the typical SMB
block size is not typically an integral multiple of the MSS and much
of the running of Samba has been on stacks that interpret Nagle
per-segment rather than per-send.

: Would you like nore examples? Here are a few programs running on

: machines where I happened to have an "enhanced netstat" which
: reports the needed protocol flags info:

: X Window protocol
: NFS-over-TCP

: Informix Database server
: IBM DB2 database server

I do not think that anyone is arguing that disabling Nagle _can_
speed-up some apps. The dissagreement seems to be more that some of us
are saying it implies the app is written poorly, and others are saying
that it means Nagle is unnecessary.

As is probably clear, I'm in the camp that would feel that simply
because an app might run faster with TCP_NODELAY does not mean that
was the right way to go about coding the app.

I think that somewhere in here, or in another group, there was a
discussion with someone saying how Nagle had to be disabled for a
database because SQL queries were configured to send 512 bytes of data
at a time. It was stated that waiting for more data was impractical
because it is not known in advance how much data would be forthcoming.

I believe the gist of the response was to say that using the
connections MSS size as the buffering size would be a big improvement,
using something larger than the MSS better still. If the query
finishes before filling that buffer, the completion of the query is
the implicit/explicit signal that there is no more data coming, so
write what residue exists.

I've seen some of the "networking code" written by database
vendors. At least when they started, they knew *far* more about
writing databases than networked applications.

: There were a dozen others, but were either vendor hackery

: or were using port numbers that I couldn't easily decipher.

: I can't resist mentioning one particular hack, though.
: Hewlett-Packard (and they are probably not alone) has disabled the
: Nagle algorithm in the telnet daemon. A simple and safe way to fix
: the full-screen "Nagle glitch". But isn't it delicious? Disabling
: the algorithm in the very program for which the algorithm was
: invented!

Simple yes. Safe, I'm not sure I would agree...

It is a perfectly reasonable thing to point-out. There are places
where that disabling of the Nagle Algorithm is causing the very
problems that led to its invention. (I am in email contact with the
HP-UX telnet project...sadly, not early enough to prevent this from
happening. They have been asking about getting rid of the TCP_NODELAY
setting in the telnetd).

I wonder how many telnet clients simply accept the ever increasing
default socket buffer/window sizes even though a 32768 byte window is
grossly (?) oversized for telnet and probably precludes window-updates
that would piggy-back ACKs which would resolve issues with
Nagle... Customer rings-up the system vendor (it couldn't _possibly_
be a problem with their telnet client or network...) and complains
that telnet is too slow, thus putting non-trivial pressure on the
server vendor to employ a gross kludge to give the customer the
appearance of improvement. Then, that same customer tries to squeeze a
zillion telnet sessions through a fractional T1...

2048 bytes is probably quite enough window for telnet.

rick jones

go to root cause, don't treat the symptom...

Rick Jones

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

Thomas R. Truscott (t...@cs.duke.edu) wrote:

: THE CAUSE OF THE FTP GLITCH

: The glitch is in ftpd, and is quite simple.
: When the connection is opened, ftpd reports
: 150 Opening ... data connection ...
: When the transfer is finished ftpd reports
: 226 Transfer complete.
: These reports are on the same socket and so
: the second message, and hence the file transfer,
: can be delayed by as much as a Nagle.

: THE SIGNIFICANT OF THE FTP GLITCH
: The ftp protocol is one of the oldest and heavily used in the world,
: transferring billions of files every year. Yet many (perhaps most)
: of those transfers suffer a Nagle. Do the math, this is not a
: trivial penalty.

: How could we have let this happen?
: Are we so rooted in the past that we overlook the present?

Perhaps because many (most?) of those billions of files transfered
every year take > 200 milliseconds to transfer, so the standalone ACK
timer on the ftp control connection expires before the transfer is
complete on the data connection; ACK'ing the 150 message and thus the
226 message goes into an idle connection and never knows Nagle
existed.

rick jones

Packet traces are always good.

Rick Jones

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

A couple of toss-up questions to ponder when deciding whether or not
Nagle should be disabled by default:

*) should stdio be unbuffered by default?

*) should every filesystem write made by an application be sent
straight to disc without trying to fill a complete filesystem
block?

rick jones

Mark Summerfield

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

Ultimately we are going to need policing on IP backbones to protect
us from cowboys like Tom!

I'm just kidding -- I don't mean any offense by that epithet -- but it
does seem to me that the whole debate is the result of people having
completely different views of what's important. On the one hand, the
pro-Nagle lobby make the very valid point that Nagle is designed to
avoid the kind of congestion problems that can occur if large amounts
of data are sent in lots of small packets instead of a smaller number of
large packets. The anti-Nagle lobby contends that in practice there
are many situations in which certain applications inherently generate
small transactions over TCP which are going to get sent in small packets
anyway, whether they go immediately or after a "Nagle-delay" and which
are therefore taking a pointless performance hit.

The solution is *not* to have Nagle disabled by default, because then
many badly-designed or implemented applications may just congest the
network, and make life more miserable for the rest of us. A well-
designed Internet application is bandwidth-efficient regardless of
whether Nagle is used or not. Since anyone can hack a TCP/IP stack to
try to get better performance at the expense of others, what we really need
to do is penalise people for beeing bad net.citizens. Anyone who sends
many small packets in rapid succession should be sent a FIN by the first
(closest) router that detects it, which should then block all further
packets on that connection. The programmer faced with having his or
her connections cut off this way would have two choices:

1) (the default) simply enable the Nagle algorithm. There may be a
performance penalty, but at least the connection won't be cut off!
2) redesign and/or recode the application to be more net.friendly.

Applications (like telnet) which genuinely require TCP_NODELAY in
order to maintain interactive responsiveness will be unaffected because
although they generate small packets, they don't do so consistently in
rapid succession. Many of the other applications which Tom has
suggested benefit from disabling Nagle (SMB, X Win protocol, NFS over
TCP etc.) would also be safe, because they are primarily used within
localised subnets, where there need not be policing routers in place.

The real answer is to have proper quality-of-service provisions for
TCP/IP, which hopefully the new generation of Internet Protocols will
eventually give us. Then it won't just be a question of "to Nagle or
not to Nagle" (which will never have a definitive "right" answer).
And programmers will *have* to design and implement their applications
carefully, because performance will depend upon matching the requirements
of the application to the quality-of-service parameters requested when
connections are established.

Of course, this opinion is worth exactly what you paid for it (or less,
if you have an expensive ISP ;-)

Mark

Dmytro Myasnykov

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

Hi
Under some circumstancies: YES. In other cases: NO.
Why everybody thinks only "yes" OR "no"???? I say: "yes" and "no".
It hardly depends on applications and system, and target.
Dmitriy

John Hascall

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

Rick Jones <f...@bar.baz> wrote:
}Thomas R. Truscott (t...@cs.duke.edu) wrote:
}: THE CAUSE OF THE FTP GLITCH
}: The glitch is in ftpd, and is quite simple.
}: When the connection is opened, ftpd reports
}: 150 Opening ... data connection ...
}: When the transfer is finished ftpd reports
}: 226 Transfer complete.
}: These reports are on the same socket and so
}: the second message, and hence the file transfer,
}: can be delayed by as much as a Nagle.

...

}Perhaps because many (most?) of those billions of files transfered
}every year take > 200 milliseconds to transfer, so the standalone ACK
}timer on the ftp control connection expires before the transfer is
}complete on the data connection; ACK'ing the 150 message and thus the
}226 message goes into an idle connection and never knows Nagle
}existed.

As network & disk speeds increase you can transfer
more and more in 200 msec. (On a 100Mb/s network,
in theory, upto 2.5MB).

What is the distribution of file sizes transfered
by FTP? I'm guessing, a significant portion may
well be smaller than that.

Perhaps the real problem is 200 msec is now too long
for modern networks/disks/cpus.

John
--
John Hascall, Software Engr. Shut up, be happy. The conveniences you
ISU Computation Center demanded are now mandatory. -Jello Biafra
mailto:jo...@iastate.edu
http://www.cc.iastate.edu/staff/systems/john/index.html <=- the usual crud

Barry Margolin

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

In article <37327D41...@ee.mu.oz.au>,

Mark Summerfield <m.summ...@ee.mu.oz.au> wrote:
>Ultimately we are going to need policing on IP backbones to protect
>us from cowboys like Tom!

I've been getting a weird kind of deja vu from this thread. Does anyone
remember a couple of years ago when someone posted an advertisement to lots
of comp.protocols.* groups for his company's "improved" TCP/IP stack,
boasting significant performance improvements? IIRC, it was just a TCP
implementation that ignored most of the rules for congestion avoidance and
recovery. Everyone in this group deputized themselves as Internet
policemen and pointed out his folly. Yes, his stack would improve
performance, as long as its users were in the minority; if his design
spread widely, congestion would increase and everyone (including his users)
would suffer (although I suppose users of traditional TCP implementations
would suffer more).

--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Rick Jones

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

John Hascall (jo...@iastate.edu) wrote:
: As network & disk speeds increase you can transfer more and more

: in 200 msec. (On a 100Mb/s network, in theory, upto 2.5MB).

: What is the distribution of file sizes transfered by FTP? I'm
: guessing, a significant portion may well be smaller than that.

Isn't it more important to know whether or not those "< 200 ms" files
are being transfered in an mget with prompts disabled or some other
mode without human intervention?

: Perhaps the real problem is 200 msec is now too long for modern
: networks/disks/cpus.

All we need for that discussion is a distribution of ACK times
(skews?) with a very large fraction below the proposed new standalone
ACKtimer, and a big spike out by the current standalone ACK
timer. And, of course, knowledge that all those standalone ACK's are
not the result of a poorly written application :) And whether or not
those standalone ACK's were in the time critical path of the
application... And... I'm sure there are lots of other questions
involved. It would make for an interesting discussion.

Rick Jones

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

bandwidth controls/limitations - outside of the control of the
application itself - would indeed provide an impetus to application
designers to make sure their apps were as network efficient as
possible - if and only if those bandwidth limits *included* the
protocol headers...

Alun Jones

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

In article <bMCY2.493$jw4.35822@burlma1-snr2>, Barry Margolin <bar...@bbnplanet.com> wrote:
> I've been getting a weird kind of deja vu from this thread. Does anyone
> remember a couple of years ago when someone posted an advertisement to lots
> of comp.protocols.* groups for his company's "improved" TCP/IP stack,
> boasting significant performance improvements? IIRC, it was just a TCP
> implementation that ignored most of the rules for congestion avoidance and
> recovery. Everyone in this group deputized themselves as Internet
> policemen and pointed out his folly. Yes, his stack would improve
> performance, as long as its users were in the minority; if his design
> spread widely, congestion would increase and everyone (including his users)
> would suffer (although I suppose users of traditional TCP implementations
> would suffer more).

As I believe I noted, this is a common trend with newer computer programmers
(and I just mean those new to the game - we had several of them years ago,
as well), who are of the opinion that _their_ app, _their_ machine, etc, is
_the_ most important of all. The same kind of people that wrote Windows 3.1
spreadsheet calculation routines to not yield to the OS; that grabbed 'real'
memory in Win3.1 to ensure their app wasn't slowed down by being swapped to
disk; that place system modal dialog boxes for mail notifications; that
cause Internet Explorer to repeatedly force itself to the front of the
Z-order; run a tight loop around a select call with a zero timeout; etc,
etc, etc.

Sadly these kind of programmers now seem to be arriving faster than the 'old
hands' can mentor them out of this behaviour.

jerry_f...@my-dejanews.com

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

In article <7gvbot$k0mc...@news.io.com>,

al...@texis.com (Alun Jones) wrote:
> In article <bMCY2.493$jw4.35822@burlma1-snr2>, Barry Margolin
<bar...@bbnplanet.com> wrote:

>
> As I believe I noted, this is a common trend with newer computer programmers
> (and I just mean those new to the game - we had several of them years ago,
> as well), who are of the opinion that _their_ app, _their_ machine, etc, is
> _the_ most important of all. The same kind of people that wrote Windows 3.1
> spreadsheet calculation routines to not yield to the OS; that grabbed 'real'
> memory in Win3.1 to ensure their app wasn't slowed down by being swapped to
> disk; that place system modal dialog boxes for mail notifications; that
> cause Internet Explorer to repeatedly force itself to the front of the
> Z-order; run a tight loop around a select call with a zero timeout; etc,
> etc, etc.
>
> Sadly these kind of programmers now seem to be arriving faster than the 'old
> hands' can mentor them out of this behaviour.
>
> Alun.
> ~~~~
>

This is becoming off topic but what the hell - you are just describing young
programmers - there will alway be some and there will always be older
programmers to shake their heads wearily muttering that " When I was a {boy|
whippernapper |beginner} we {did|didn't|wouldn't dream of}
{doing|programming|ignoring} that". It is the function and duty of the older
programmers to mentor the younger ones. If the younger ones seem to have a
weaker grasp of the underlying resources they are exploiting its probably a
result of the current programming environments ( Windows etc ) which alienates
the programmer from the hardware to a much greater extent then was normal when
we were beginners. The fact that industry markets programs written by such
programmers is witness to the fact that programmers ( unlike system analysts,
software architects, IT consultants ) are an increasingly rare resource and
companies have to make do with what they've got

Jerry

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

Alun Jones

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

In article <7gvif2$tlq$1...@nnrp1.deja.com>, jerry_f...@my-dejanews.com wrote:
> This is becoming off topic but what the hell - you are just describing young
> programmers - there will alway be some and there will always be older
> programmers to shake their heads wearily muttering that " When I was a {boy|
> whippernapper |beginner} we {did|didn't|wouldn't dream of}
> {doing|programming|ignoring} that".

Sadly too many young programmers are aging without turning into older
programmers :-)

Why, if older programmers were truly in charge, we'd never have this Y2K
problem now, would we? :-)

Thomas R. Truscott

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

> A couple of toss-up questions to ponder when deciding whether or not
> Nagle should be disabled by default:
>
> *) should stdio be unbuffered by default?

A fascinating question. Stdio and TCP is a complex analogy.
I will try to develop this analogy by making some changes
to the usual implementation of both buffered and unbuffered stdio.
And present it by telling the story of a young programmer.

ACT ONE. BUFFERED STDIO.

Joe was a new hire at XYZ Corp., fresh from college.
His first assignment was to debug a "legacy code"
that was needed for the "WD-40" project.
The code used to work, but then a team of XYZ programmers
determined that the use of raw display terminal I/O calls
was hurting performance.
So they developed new protocol, "Standard I/O".
Operating system kernel support was added for this protocol,
resulting in a very fast I/O path from the code
to the display terminal. Analysis showed a potential 25% speedup.
Standard I/O would also be used for accessing disk files,
but alas would be no faster than the old raw disk I/O calls.
Before the legacy code was fully revised, all the team members quit
to form a Stdio Society which promoted their new invention.
Joe was to complete the task.

Joe fired up WD-40 and all it did was print a single letter:
B
He pondered that for a while. Then more characters appeared:
Beginning Computation.
Aha! Joe guessed that this was due the two-minute
Standard Buffer Timeout (SBT), an innovative technology used
to avoid transmitting less-than-full buffers.
Joe knew from his coursework that some protocols
used "flush" technology, and asked colleagues if Stdio did too.
They winced at him as if he had made a crude joke,
and explained that if Joe sent a message which was
an exact multiple of the Standard Buffer Size (SBS)
then he would get the behavior we wanted.
They also explained that the Standard SBS was 576 bytes.
So Joe created a fancier message of that length,
and the program was off and running!

Unfortunately the very next message had the same problem.
He padded that one out too, but it still displayed strangely.
Then a colleague pointed out that Stdio has some overhead bytes
that must be accounted for in the SBS.
"How much overhead?", Joe asked. "It depends,", they replied.
Joe did some experimentation to figure all this out,
and with the adjusted padding both messages displayed properly.
But it did not work properly on newer display terminals.
The colleagues (somewhat annoyed by now) explained
that the SBS differs from one display terminal to another,
and it was up to the programmer to figure out the value.

Joe decided to get rid of all the padding
and let the users cope with the unusual display behavior.
Joe knew that padding hurt performance,
and performance was what the users wanted.
But there were other problems.
The old legacy code took two hours to run,
and the new code took three days!
And the printer output feature did not work at all.

Joe's boss was beginning to doubt his hiring decision,
and suggested that Joe take one of the new three-month
courses being offered by the Stdio Society.
But during a heroic weekend Joe debugged the "three day" problem.
The Standard Buffers in the kernel were a precious commodity,
and to avoid having to allocate so many
the kernel would delay a new request for one if the
code already had a partially-full one in use.
The delay could be as long as one SBT,
followed by buffer flush and switch.

On Monday Joe had a heated argument with colleagues.
"Why can't the kernel just allocate another buffer?
It is only 576 bytes?!", Joe asked.
"Hey, we have a hundred display terminals,
that buffer space adds up!" the colleagues replied.
Frustrated, Joe spent many hours rearranging the display terminal
output messages so they were grouped together.
This reduced the number of times an SBT would occur,
and reduced the run time to six hours.

An old hand at XYZ was able to deduce the printer problem.
"Probably the old buffer/spork problem.", she said.
She patiently explained to Joe that the legacy code
probably had an unflushed buffer when the Hot-Print 800
library routines were called, and Hot-Print was
probably using the spork kernel service.
This was a Stdio protocol violation, and a serious one at that.

Joe added a two-minute pause to the code just before
it did printer output, and the users revolted.
Joe was given an ultimatum: Get the printer working
without the pause, and get the time down to three hours.
Or get another job.

Joe went off to a local bar to examine his shattered career.
A high school student in a trenchcoat came over to Joe,
and whispered in his ear that in exchange for the WD-40 formula
he would reveal the dark secret of "Unbuffered Stdio".

ACT TWO. UNBUFFERED AND UNLEASHED.

Joe returned to work, hopeful but disappointed that
the secret of Unbuffered Stdio was to call "setbuf".
He had heard of the evil setbuf, of course,
and there were rumors that using it was a firable offense.
So he hid it well, leaving no comment or other trace of its presence.
With trembling hand he started the legacy code. And it worked.

And it was as slow as a dog!!
The "Beginning Computation" message crawled out painfully.
Then Joe remembered that message printing was done with
a loop of "putchar()", since everyone knew that was best.
On a lark he changed the loop into a single call to printf().
And with this change the message flashed onto the screen.
Not just the usual 'B', but the entire message!
It occurred to him that perhaps printf() was
using a sophisticated "local buffering technique."

Joe quickly replaced the other slow messages.
One slow spot with many "if"s and "else"s and "printf"s
was not so easy. So Joe took a deep breath and tried
writing a local buffering technique himself.
And it turned out not to be so difficult.
Besides fixing a few bugs that he found along the way,
we discovered that since the text was now collected together
it could be centered properly on the page.
The users had been requesting that for years.

The new run time was two hours and five minutes.
He could hardly believe it.
He tried the printer and that worked too.

Joe pondered his accomplishment.
Was setbuf as evil as people said it was?
Was there something evil about the new legacy code?
Joe tried to analyze the run time,
and concluded that even "perfect buffering"
would reduce the run time by only five minutes.
That just didn't seem like a big deal.
Was it hurting the operating system?
It was called more often, but only a little more often.
He thought about doing more "local buffering" to save more time,
but it seemed like misdirected effort.
The users were complaining that the program was giving
the wrong answers. Wasn't that more important?

Joe started looking at the rest of the legacy code.
He tried out a "Good C Compiler" on it,
and was amazed at all the problems GCC reported.
He fixed three uninitialized variables in one routine alone.
He found some code that was copying file data but never using it.
He deleted those lines, and the run time dropped to one hour.
And five minutes. It gnawed at him.

The Stdio Society was a big success.
File I/O was trendy, and companies fought
each other to obtain popular file names.
There was a setback when a NASA y2k fix added two bytes to a message,
and the Shuttle's landing gear deployed two minutes late.
But the Stdio Society recovered by issuing a preemptive
report expressing deep regret for the tragedy
that was caused by NASA's incompetent QA department.
And it announced a new task force to review Stdio change proposals,
the most promising being a reduction of the SBT to one minute.

Ironically, Joe was losing all interest in Stdio at the same time
that people at XYZ Corp were asking him for help with it.
He had become at unwilling expert on Stdio.
He knew how to use the Stdio debugger to ferret out
the deep secrets in programs,
and was startled to learn that even some of the
most vocal opponents of setbuf were using it.

Joe pondered this. Could Stdio buffering actually be evil?
That just didn't seem right.
Perhaps the Stdio team had quit too soon,
and everything could be made fine
if only they would stop pretending that it already was.

Tom Truscott

Rick Jones

unread,

May 8, 1999, 3:00:00 AM5/8/99

to

Thomas R. Truscott (t...@cs.duke.edu) wrote:

: ACT ONE. BUFFERED STDIO.

I think you missed your calling as a satirist :)

D. J. Bernstein

unread,

May 8, 1999, 3:00:00 AM5/8/99

to

Eric A. Hall <eh...@ehsco.com> wrote:
> Most networks already have utilization problems on their backbones,
> and this would only worsen the problem dramatically.

``Dramatically''? How dramatically?

How much bandwidth are they using now? Exactly what percentage was saved
by Nagle's algorithm? Where are your measurements?

In all the bandwidth-saving examples that Nagle fans typically cite, one
can save _even more_ bandwidth by delaying the first packet too. Exactly
how much more bandwidth could these networks have saved that way?

Some web servers generate small packets too slowly for Nagle's algorithm
to kick in. In these cases, one can save _even more_ bandwidth by adding
buffering beyond the RTT---let's say to 10xRTT. Exactly how much more
bandwidth could these networks have saved that way?

---Dan

Eric A. Hall

unread,

May 8, 1999, 3:00:00 AM5/8/99

to D. J. Bernstein

> ``Dramatically''? How dramatically?

App writes data in 512 byte blocks. Network MSS is 1460 bytes. When
Nagle is on, writes are delayed until 1460 bytes are sent. When Nagle is
disabled, 512 byte segments are sent. For an exchange of 200 KB, Nagle
on is 141 segments, while Nagle off is 400 segments. If this were the
default behavior, there would indeed be a dramatic penalty.

Thomas R. Truscott

unread,

May 8, 1999, 3:00:00 AM5/8/99

to

> App writes data in 512 byte blocks. Network MSS is 1460 bytes. When
> Nagle is on, writes are delayed until 1460 bytes are sent. When Nagle is
> disabled, 512 byte segments are sent. For an exchange of 200 KB, Nagle
> on is 141 segments, while Nagle off is 400 segments. If this were the
> default behavior, there would indeed be a dramatic penalty.

By "an exchange" do you mean one side does 400 512 byte "send()"s,
then the other side ACKs that?

I tried that and found the exchange rate was one per Nagle.
(But I must admit the data was erratic,
sometimes the average exchange rate was about half that.
Is there some reason that an exchange might take two Nagles?)

Then I tried adding TCP_NODELAY (uh, I took the "#if 0" back out)
and the data was much cleaner, just about two exchanges per Nagle.
That is only 2x to 4x faster, and I would hardly call it dramatic.

Do you mean that with TCP_NODELAY the network suffers a penalty
because the exchange rate is higher?
If your network doesn't like that much traffic,
you could ask your users not to exchange so much data.
Our maybe you should consider a different network vendor?

I am not quite sure what you mean by "MSS".
Isn't the network supposed to take care of that?
I thought there was some protocol where the network
sent the data as fast as it could, until it had
to wait for "ACKs" (hey, thats what they call it).
Then the out-going data starts backing up.
But finally the ACKs arrive and things smooth out.

As I programmer, I recommend you ignore the "MSS",
and just concentrate on your program.
It is doing 512 byte send()s, which is a strange number.
Wouldn't 1024 be more natural and simplify the arithmetic?
I personally use much larger numbers, but to each his own.

In the future perhaps you could post your timing results.
Maybe yours are different, or just possibly you have a bug.
(My apologies, your programs probably do not have bugs,
but I confess that I find bugs in my programs all the time!
That is why I like to keep them simple.)

I hope this is of some help to you with your problem,
Tom Truscott

Eric A. Hall

unread,

May 8, 1999, 3:00:00 AM5/8/99

to Thomas R. Truscott

> By "an exchange" do you mean one side does 400 512 byte "send()"s,
> then the other side ACKs that?

TCP's delayed acknowledgment strategy requires an ACK for every second
full-sized segment. Depending on a bunch of stuff, this may mean an ACK
for every sixth segment (512*6=3072, which is > 1460*2=2920).

> If your network doesn't like that much traffic,
> you could ask your users not to exchange so much data.
> Our maybe you should consider a different network vendor?

Better to just keep Nagle on by default so that I don't have these
problems in the first place.

It was your proposal of turning Nagle off by default for everything that
started this thread. I'm just trying to show you why it's a bad idea and
why nobody does it. You can either accept it or not. It really doesn't
matter to me either way.

> I am not quite sure what you mean by "MSS".
> Isn't the network supposed to take care of that?
> I thought there was some protocol where the network
> sent the data as fast as it could, until it had
> to wait for "ACKs" (hey, thats what they call it).
> Then the out-going data starts backing up.
> But finally the ACKs arrive and things smooth out.

Sliding window. If you don't know that then you probably shouldn't be
writing TCP apps.

Good day sir.

D. J. Bernstein

unread,

May 8, 1999, 3:00:00 AM5/8/99

to

Hall claims that Nagle's algorithm provides ``dramatic'' bandwidth
savings on ``most networks.''

I asked for measurements---which, naturally, Hall doesn't have. Hall
responded with a hypothetical example of writing 204800 bytes of data
``in 512 byte blocks.'' Again he claimed ``dramatic'' savings.

Let's pretend for the sake of argument that most Internet traffic
consists of 200K writes. Here's the actual network use measured in
several experiments with different write() sizes:

| Nagle | TCP_NODELAY
-------------------+-------+------------
512-byte buffer | 213K | 220K
8192-byte buffer | 211K | 212K
204800-byte buffer | 208K | 208K

Hall claims that 213K is a ``dramatic'' improvement over 220K, and that
this ``dramatic'' benefit for common 512-byte ``apps'' is saving ``most
networks'' from ``utilization problems.''

Why, then, is he not screaming that 208K is an even more ``dramatic''
improvement? This doesn't require any changes to his 512-byte ``apps'';
the networking stack can automatically collect data into 65536-byte
buffers, and defer each write for a few microseconds just in case
there's more data. This could save the entire Internet from destruction!

---Dan

Eric A. Hall

unread,

May 8, 1999, 3:00:00 AM5/8/99

to D. J. Bernstein

> | Nagle | TCP_NODELAY
> -------------------+-------+------------
> 512-byte buffer | 213K | 220K
> 8192-byte buffer | 211K | 212K
> 204800-byte buffer | 208K | 208K

Congratulations on finally seeing the light. As you point out, sending
small blocks as discrete units is a fruitless waste of bandwidth.

> Hall claims that 213K is a ``dramatic'' improvement over 220K, and
> that this ``dramatic'' benefit for common 512-byte ``apps'' is saving
> ``most networks'' from ``utilization problems.''

Actually, I was talking about frame rates. 400 segments is substantially
higher than 141 segments, even before you start looking at the effects
of this when all of the nodes on your network do it with everything all
day long. As I am sure someone as esteemed as yourself already knows, a
large number of small packets will cause more contention problems than a
medium number of large packets, particularly on congested links. Then
again, perhaps you are one of the lucky few who actually has infinite
bandwidth and never has congestion at any point in the network.

These arguments have grown tiresome. Nagle is on by default because it
has clear benefits. If you want to turn it off on all your machines,
then knock yourself out, but I don't think you'll keep it off for long.

EOT

Eric A. Hall

unread,

May 8, 1999, 3:00:00 AM5/8/99

to Thomas R. Truscott, D. J. Bernstein

I realize now that I have been wrong the entire time. Thom and DJ are
absolutely correct; Nagle is not at all necessary now that we all have
infinite bandwidth, and it should be turned off on all of the machines.
In fact, here are some other things that can be done to increase
performance. I suggest you incorporate all of these changes on all of
your hosts simultaneously, for maximimum effect.

NOTE: If you don't have a perfect network, do not do these things, as
they will surely fuck you up. These suggestions are only for Thom, DJ,
and whoever else might have a perfect network.

1) Disable slow-start. Back in the OLD DAYS, we used to worry about
network congestion causing problems with TCP. Now that this isn't
a problem any longer, you should just disable this unnecessary
feature and let the sender transmit all of the segments
immediately. This will boost your performance by at least 30%.

2) Set the default receive window to the maximum of 65k. Back in the
OLD DAYS, we used to worry about having to recover from lost
segments when the window was bigger than the pipe (retransmissions
used to get stuck behind the original packets on slow links), but
now that we all have perfect networks with no congestion or loss
and with equal forwarding rates, this concern is moot. Just set
the receive window on all your systems to 65k instead. When this
is combined with (1) above, your systems will send 65k of data
right off the bat, without even bothering to test the network's
capacity first. This will add another 15% boost.

NOTE: If possible, make use of the TCP Window Scale option and
set it the maximum of 1 gb. That will REALLY boost performance,
particularly with very large datasets!

3) Disable the use of delayed acknowledgements. Back in the OLD DAYS
we used to try to conserve overhead by cutting down on ACKs. But
since you have infinite processing power and bandwidth, you should
just go ahead and turn this feature off. All segments will get
acknowledged immediately, allowing the window to get adjusted
immediately as well.

4) Minimize the default retransmission timers to 5ms. Since you
never have any loss, you shouldn't have to worry about this, but
what the heck. Better safe than sorry right? Back in the OLD DAYS
we used to worry about detecting lost segments quickly, and a
medium-sized timer allowed this to work with different links. Now
that you've got blistering fast networks, you should just set the
default retransmission time down to 5ms or less. If you ever do
lose a packet, it will get noticed right away and a retransmission
will be sent immediately. This won't directly improve your
throughput, but it will make for faster recovery, particularly
when connecting to far-away sites.

There are other things you can do as well, but these are the easiest
(after disabling Nagle on a system-wide basis). Please let us know how
it all works out.

Cheers!

D. J. Bernstein

unread,

May 9, 1999, 3:00:00 AM5/9/99

to

Eric A. Hall <eh...@ehsco.com> wrote:

> 400 segments is substantially higher than 141 segments,

But what actually happens is more like 520 packets versus 320 packets---
and that's on top of 204800 bytes of actual data being transferred.

This is roughly 3% less traffic, in a hypothetical situation that you
deliberately selected to emphasize the ``dramatic'' benefits of Nagle's
algorithm, never mind how the network is used in the real world.

I explained a different buffering strategy that (without the latency
problems of Nagle's algorithm) would produce only 200 packets in the
same situation, saving an extra few percent of the traffic. Why aren't
you demanding that we all use _that_ strategy?

> To: "D. J. Bernstein" <d...@koobera.math.uic.edu>

I don't understand. After you've spent so much time wildly exaggerating
the importance of minor bandwidth hacks, how could you possibly be
sending people redundant copies of your postings? Have you forgotten
about the ``utilization problems'' on ``most networks''?

---Dan

Rick Jones

unread,

May 9, 1999, 3:00:00 AM5/9/99

to

I'm not sure if this will really help the discussion, but here are a
number of netperf TCP_STREAM tests between a pair of systems on a
100BT network.

First, a test sending 256 bytes at a time, without
TCP_NODELAY.

# ./netperf -v 2 -H ftpcli3 -c 1.79544e+08 -C 1.79544e+08 -- -m 256
TCP STREAM TEST to ftpcli3
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % I % I us/KB us/KB

32768 32768 256 10.01 70.01 100.00 47.36 117.008 55.412

Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 8.757e+07 256.00 342089 1552.41 56412

Sends and receives here are at the application level. Netperf cannot
count TCP segments...

Next, that same test with TCP_NODELAY set:

# ./netperf -v 2 -H ftpcli3 -c 1.79544e+08 -C 1.79544e+08 -- -m 256 -D
TCP STREAM TEST to ftpcli3 : nodelay
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % I % I us/KB us/KB

32768 32768 256 10.01 43.33 100.00 97.70 189.046 184.703

Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 5.422e+07 256.00 211808 980.13 55322

and then finaly, a test with 1024 byte sends:

# ./netperf -v 2 -H ftpcli3 -c 1.79544e+08 -C 1.79544e+08 -- -m 1024
TCP STREAM TEST to ftpcli3
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % I % I us/KB us/KB

32768 32768 1024 10.01 92.79 41.47 34.40 36.610 30.370

Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 1.161e+08 1024.00 113379 16125.01 7200

Dmytro Myasnykov

unread,

May 10, 1999, 3:00:00 AM5/10/99

to

Hi
I guess the key with Nagle method is not buffer size. It is how fast do you
send your data.
I have data flow 200Kb/s and if I enable Nagle, TCP/IP buffers get overflow
and I can't deliver my data. Normally data comes to TCP/IP layer MSS sized
or even more length. The question is if I send several TCP fragments, peer
accept them all by one ACK(on the last data packet) if everything is ok. If
I have Nagle on, I see delayed write. In this case peer has to send more
ACK. Also TCP/IP stack needs about 512KB memory for buffers to process
200KB/s flow...

But when you have not real-time data stream, like email, or ftp/ssh
protocols - it is more convinient and better to use Nagle.

And once more: 64Kb buffer will not save Internet - it will kill Internet
:-)
That's why ATM technology go to small packet sizes....

Do you think it is reasonable to do it? may be you know other solutions?
Until now, I didn't find anything better

Dmitriy

D. J. Bernstein wrote:

> Hall claims that Nagle's algorithm provides ``dramatic'' bandwidth
> savings on ``most networks.''
>
> I asked for measurements---which, naturally, Hall doesn't have. Hall
> responded with a hypothetical example of writing 204800 bytes of data
> ``in 512 byte blocks.'' Again he claimed ``dramatic'' savings.
>
> Let's pretend for the sake of argument that most Internet traffic
> consists of 200K writes. Here's the actual network use measured in
> several experiments with different write() sizes:
>

Patrick McManus

unread,

May 10, 1999, 3:00:00 AM5/10/99

to

On 08 May 1999 16:59:11 PDT, Eric A. Hall wrote:
>
>I realize now that I have been wrong the entire time. Thom and DJ are
>absolutely correct; Nagle is not at all necessary now that we all have
>infinite bandwidth, and it should be turned off on all of the machines.
>In fact, here are some other things that can be done to increase
>performance. I suggest you incorporate all of these changes on all of
>your hosts simultaneously, for maximimum effect.

*giggle*.. kudos on a nice tounge-in-cheek piece.. there is one point
that's probably worthy of discussion though, even if just for my own
education.

[Eric sarcastically writes..]

> 2) Set the default receive window to the maximum of 65k. Back in the
> OLD DAYS, we used to worry about having to recover from lost
> segments when the window was bigger than the pipe (retransmissions
> used to get stuck behind the original packets on slow links)

Last year's SIGCOMM had a paper: "Automatic TCP Buffer Tuning" by
Jeffrey Semke, Jamshid Mahdavi, and Matthew Mathis
(http://www.psc.edu/networking/papers/auto_abstract.html) that I'm
pretty sure you've read. Among their conclusions is that there is no
real advantage to using complex algorithms to try and tune the receive
window to approximate 2*cwnd as a max rwin is just as effective for
applications that are not latency sensitive. And latency sensitive
applications are going to have to manually setsockopt() the buffer
anyhow to indicate just what their tolerance is (which would seem to
depend on what kind of data they are shuffling back and
forth).. networks exhibiting real loss characteristics are going to be
inhibited by cwnd far before this limit is hit anyhow, right?

-P

--
Patrick R. McManus - AppliedTheory Communications - Software Engineering
http://pat.appliedtheory.com/~mcmanus Lead Developer
mcm...@AppliedTheory.com 'Prince of Pollywood' Standards, today!
*** - You Kill Nostalgia, Xenophobic Fears. It's Now or Neverland. - ***

coo...@ix.netcom.com

unread,

May 10, 1999, 3:00:00 AM5/10/99

to

I think you are looking for a registry key/setting

Try the Windows NT FAQ at www.ntfaq.com

That will lead you to this Microsoft doc

http://support.microsoft.com/support/kb/articles/q120/6/42.asp

If it's not there, you'll probably have to call setsockopt(...) from your
own code.

Marc

Dmytro Myasnykov <dim...@noir.crocodile.org> wrote in message
news:372EDDF9...@noir.crocodile.org...
> Hi
> What does it means MS/Wintel?
> Normally there is a function setsockopt() for socket or something like
this.
> You turn on option TCP_NODELAY and that's it - Nagle's algorithm is
> disabled.....
>
> Regards,
> Dmitriy
>
> Tony Moran wrote:
>
> > Hi could anyone point me to an executable for MS/Wintel machines to turn
> > off and on
> > Nagles Algorithm. Source code with it would be a major plus..
>
>
>

Thomas R. Truscott

unread,

May 10, 1999, 3:00:00 AM5/10/99

to

>> <http://us1.samba.org/samba/ftp/docs/textdocs/Speed.txt>
>> Many people report that adding "socket options = TCP_NODELAY"
>> doubles the read performance of a Samba drive.
>
> And doubles network utilization, too. ...
> You should be mad they are doing this, ....

I was hoping a Samba author would respond to this accusation.
Perhaps they do not read this newsgroup, or need encouragement.

The NODELAY option is wired into the Samba source code, right after
the comment "Dave XXX thinks we should default to TCP_NODELAY".
I would like for Dave to speak up on this subject.
It is up to you. I will not "out" you or anyone else.
But I feel like I am out here all alone, and would like company.

Dave, why didn't you update the documentation to explain this?
Surely you knew it was documented. The smb.conf.5 file
says TCP_NODELAY can cause Samba to fail completely,
and the Speed.txt file says that for users this is the
option that "seems to make the biggest single difference".
I agree, and if this newsgroup is any clue
then most of your users want it off.

Dave, why didn't you tell your users
how they can turn this option back off?

Tom Truscott

Eric A. Hall

unread,

May 10, 1999, 3:00:00 AM5/10/99

to mcm...@appliedtheory.com

> forth).. networks exhibiting real loss characteristics are going to be
> inhibited by cwnd far before this limit is hit anyhow, right?

Yes, that's an accurate assesment, but it is not a good justification
for increasing rwin to the max. There theory is based on systems that
don't necessarily reflect the real-world. For example, windows will
allocate memory for the socket, so if you were to set the default to max
then you'd suck up a lot of memory for nothing.

Overall, I strongly recommend against that strategy, although they
obviously feel the opposite. Whatever.

Rick Jones

unread,

May 11, 1999, 3:00:00 AM5/11/99

to

Eric A. Hall (eh...@ehsco.com) wrote:
: For example, windows will allocate memory for the socket, so if you

A socket buffer in Windows is an allocation, not a limit?!? No wonder
so many people think that large socket buffers suck-down RAM. On HP-UX
at least, it is a limit, not an actual allocation.

Jon Snader

unread,

May 11, 1999, 3:00:00 AM5/11/99

to

Rick Jones wrote:
>
> Eric A. Hall (eh...@ehsco.com) wrote:
> : For example, windows will allocate memory for the socket, so if you
>
> A socket buffer in Windows is an allocation, not a limit?!? No wonder
> so many people think that large socket buffers suck-down RAM. On HP-UX
> at least, it is a limit, not an actual allocation.
>

Same with the traditional BSD stack.

Jon Snader

John Hascall

unread,

May 11, 1999, 3:00:00 AM5/11/99

to

Rick Jones <f...@bar.baz> wrote:
}Eric A. Hall (eh...@ehsco.com) wrote:
}: For example, windows will allocate memory for the socket, so if you
}A socket buffer in Windows is an allocation, not a limit?!? No wonder
}so many people think that large socket buffers suck-down RAM. On HP-UX
}at least, it is a limit, not an actual allocation.

So how does HP-UX protect against the:

Hey a packet, oh heck, I'm out of memory,
I guess my advertised window was just a lie,
sorry.

situation?

Thomas R. Truscott

unread,

May 11, 1999, 3:00:00 AM5/11/99

to

> }On HP-UX at least, it is a limit, not an actual allocation.
>
> So how does HP-UX protect against the:
>
> Hey a packet, oh heck, I'm out of memory,
> I guess my advertised window was just a lie,
> sorry.
>
> situation?

When the memory runs out the packet is dropped,
and the other end must re-transmit.
This is fine so long as the number of times
this happens is small (e.g. when compared to the number
of packets dropped due to checksum error).

As it happens HP-UX (at least up to 10.20) suffers from this
more than most because of ancilary quirks.
Under heavy paging conditions it runs way low on free memory
and the packets start dropping.
I think a lot of the problem is that whoever did the HP
disk buffer cache thought they should hoard the memory
and refuse to give it back, quickly enough, when memory runs low.
But even under the heaviest paging conditions
the penalty due to dropped TCP packets is trivial
compared to the Nagle penalty.
(Speaking of which, isn't buffer space off topic?)
Of course, what the users really notice
is the slowdown due to all the process thrashing.
Fortunately memory is cheap.
Escaping from the Nagle penalty box is not so simple.

As the bandwidth*delay goes up,
pre-allocating space seems increasingly dubious.
But that is just my opinion.
I just think that memory allocation should be balanced
in a global fashion, and since TCP can easily survive
brief memory outages we can exploit that
by employing the memory elsewhere as appropriate.

Tom Truscott

Rick Jones

unread,

May 11, 1999, 3:00:00 AM5/11/99

to

John Hascall (jo...@iastate.edu) wrote:
: Rick Jones <f...@bar.baz> wrote:
: } On HP-UX at least, it is a limit, not an actual allocation.

: So how does HP-UX protect against the:

: Hey a packet, oh heck, I'm out of memory,
: I guess my advertised window was just a lie,
: sorry.

: situation?

It is protected because the storage was allocated when the NIC driver
posted the buffer to the card for inbound DMA. That buffer goes up the
stack, and then gets queued to the socket. If there was no memory, the
driver would have no buffer for the NIC, which would not be able to
DMA the packet into the host and it would be treated just like any
other lost packet.

When I say protected, I mean in the sense that TCP will not ACK a
segment that will subsequently be dropped.

Uri Raz

unread,

May 13, 1999, 3:00:00 AM5/13/99

to

John Hascall (jo...@iastate.edu) wrote:
> Rick Jones <f...@bar.baz> wrote:

> }Thomas R. Truscott (t...@cs.duke.edu) wrote:

> }: THE CAUSE OF THE FTP GLITCH
> }: The glitch is in ftpd, and is quite simple.
> }: When the connection is opened, ftpd reports
> }: 150 Opening ... data connection ...
> }: When the transfer is finished ftpd reports
> }: 226 Transfer complete.
> }: These reports are on the same socket and so
> }: the second message, and hence the file transfer,
> }: can be delayed by as much as a Nagle.
>
> ...
>
> }Perhaps because many (most?) of those billions of files transfered
> }every year take > 200 milliseconds to transfer, so the standalone ACK
> }timer on the ftp control connection expires before the transfer is
> }complete on the data connection; ACK'ing the 150 message and thus the
> }226 message goes into an idle connection and never knows Nagle
> }existed.
>
> As network & disk speeds increase you can transfer
> more and more in 200 msec. (On a 100Mb/s network,
> in theory, upto 2.5MB).
>
> What is the distribution of file sizes transfered
> by FTP? I'm guessing, a significant portion may
> well be smaller than that.
>
> Perhaps the real problem is 200 msec is now too long
> for modern networks/disks/cpus.
>
Though LAN & WAN speeds have increased, I think you're too optimistic, as :

1. People dont usually run FTP on a LAN - NFS is the natural choice.

2. Even when there's a wide pipe between the client and the server,
it's usually shared. If someone downloads a file from tucows,
he has to share the bandwidth with many others and will not get
even 10% of the bandwidth.

3. There's still a huge number of users who dial up to the Internet,
and most of them have either an analog line (56Kbps or less) or
an ISDN line (64Kbps or 128Kbps)

The RTT from my PC connected to a POP at haifa to the Technion's
hosts at Haifa rarely goes below 150ms. It was actually 6,000ms
early this week, to give an extreme example.

--
+---------+---------------------------+---------------------------------+
| Uri Raz | mailto:ur...@iil.intel.com | I speak for myself, not Intel. |
| Work is what employees do while managers lay down the work plans. ;-) |
| My home page <URL:http://www.private.org.il> |
+-----------------------------------------------------------------------+

Thomas R. Truscott

unread,

May 13, 1999, 3:00:00 AM5/13/99

to

The Nagle algorithm has a potential benefit (smaller packet count)
and a potential cost (larger packet latency),
and much of the debate has been which is larger and whether it matters.
I propose that someone conduct an experiment.
It would make a nice paper to present at a networking conference.

THE EXPERIMENT
Instrument the stack to measure Nagle effects on a per-socket basis.
[definition: append == a user does a socket "send" of outgoing data]

Whenever a packet is transmitted increment a "packet_sent" counter.
Whenever an append is delayed due to Nagle, note that
and for each subsequent append that is coalesced into the first one
increment a "packet_saved" counter.
(If the delayed append has been transmitted, stop incrementing.)
These two counters permit calculation of the Nagle benefit.

Accumulate time_active as the total time
during which the socket has un-ACKed transmitted data.
Whenever the socket has Nagle-delayed data
and receives a bare ACK that causes data to be transmitted
accumulate into time_delay the time since the most recent append.
These two counters permit calculation of the Nagle cost.

There should be a set of these counters for when the socket
has TCP_NODELAY on, and another set for when it is off
(since the bit can be flipped at any time, dynamically).
Additionally the port#s and perhaps other info
should be recorded so the results can be broken down by protocol.

===============

I am sure that the experiment as described above is flawed,
but I am confident that someone else could patch it up.
I have not thought out what to do with the data,
but again I think someone could find interesting things.

A bit of testing with programs having known characteristics
could be used to shake things out.
Then it could be turned loose on production applications.
It might be interesting to try turning TCP_NODELAY
on or off on programs such as Apache and NFS-over-TCP.
(Beware that e.g. newer versions of Samba have a placebo
control knob which keeps the users happy but has no effect.
Fortunately the instrumented stack will know the truth.)

As far as I know this experiment has never been conducted.

Tom Truscott

Thomas R. Truscott

unread,

May 13, 1999, 3:00:00 AM5/13/99

to

I've got to share this: One of the best programmers at the
company where I work just walked in and asked why
his FOO application was running slowly.
I asked him if it was using sockets. "Yes".
I asked him if he was getting five FOOs per second. "Yes".

This is so pathetically typical, and it makes me so angry.
We have created this camouflaged spike pit right
in the middle of our information super-highway,
and every single day someone gets the Nagle death penalty.
And all that the Masters of the Internet do is
claim that the attrition rate isn't so bad.
And they make fun of the dead. "Bad programmer".

Most of the time, though, I just sit back and smile.
Like a week ago when the guy from Cygnus was making
fun of a competitor's slow ... you know the drill.
From time to time it has been delightfully hilarious.
What a wonderfully wacky world we live in.
Nagle Madness belongs in a Dilbert episode somewhere.

Tom Truscott

Rick Jones

unread,

May 14, 1999, 3:00:00 AM5/14/99

to

Thomas R. Truscott (t...@cs.duke.edu) wrote:

: I've got to share this: One of the best programmers at the company

: where I work just walked in and asked why his FOO application was
: running slowly. I asked him if it was using sockets. "Yes". I
: asked him if he was getting five FOOs per second. "Yes".

: This is so pathetically typical, and it makes me so angry.

What was your suggested course of action?

Will the programmer go see if perhaps the application is presenting
logically associated data in separate send() calls and fix it?

Will the programmer go see if perhaps the underlying transport is
broken and interpreting Nagle per-segment rather than per-send? (The
netperf TCP_RR test with a request one byte larger than the MSS should
be an easy way to test)

: We have created this camouflaged spike pit right in the middle of

: our information super-highway, and every single day someone gets the
: Nagle death penalty. And all that the Masters of the Internet do is
: claim that the attrition rate isn't so bad. And they make fun of
: the dead. "Bad programmer".

Given that just about every book on TCP/IP programming discusses the
Nagle algorithm and its implications, how is it camoflaged?

Given the origins of "foo" (from fubar), perhaps describing it as "his
FOO application" was correct, and your freudian slip showing :)

I'd put this incident in the same space as someone not realizing that
TCP is a byte stream and expecting that 1024 bytes sends at one end
result in 1024 byte recieves at the other, or someone not taking into
account that the remote closing the connection will result in a
zero-byte return from recv(). Or forgetting to use ntoh/hton on IP
addresses and ports. All examples of diving into a space without first
studying how that space works.

Thomas R. Truscott

unread,

May 14, 1999, 3:00:00 AM5/14/99

to

> Given that just about every book on TCP/IP programming discusses the
> Nagle algorithm and its implications, how is it camoflaged?

This is an excellent opportunity for the pro-Nagle camp to contribute
their first scintilla of empirical evidence to the discussion.
Unlike previous missed opportunities,
this one does not even require a computer.

You claim "just about every book ...". PROVE IT.
Show me the data. Don't give me more "theory".

==========
A sample experiment:
Collect an UNBIASED sample of books on TCP/IP programming,
e.g. any that have code samples using "connect" and "accept".
For each book note the title, data, author, and answers to:

(1) Does the book even *mention* the Nagle algorithm?
(2) Does it mention TCP_NODELAY?
If either of the above, then:
(3) Does it generally indicate there are performance implications?
(4) Does it specifically say that huge performance penalties
(e.g. in excess of 100-fold) can and do occur?
(5) Does it specifically refer to the "5 FOOs per second"
phenomenon as a diagnostic indication of the Nagle trap?
==========

Modify this experiment as you feel fit,
but understand that selecting for books
which have the answers you want invalidates the experiment.
Report your results.

I eagerly await them.
Tom Truscott

Alun Jones

unread,

May 14, 1999, 3:00:00 AM5/14/99

to

In article <7hfbsj$qoe$1...@hal.cs.duke.edu>, t...@cs.duke.edu (Thomas R. Truscott) wrote:
> I've got to share this: One of the best programmers at the
> company where I work just walked in and asked why
> his FOO application was running slowly.
> I asked him if it was using sockets. "Yes".
> I asked him if he was getting five FOOs per second. "Yes".

Ya know, the other day, someone came up to me and said "I lose data
continually on TCP". I asked him if he was expecting to read the same
number of bytes at the receiver as he had stuck into the pipe at the sender.
"Yes."

> This is so pathetically typical, and it makes me so angry.

> We have created this camouflaged spike pit right
> in the middle of our information super-highway,
> and every single day someone gets the Nagle death penalty.

This is so pathetically typical, and it makes me so angry.

We have created this camouflaged spike pit right
in the middle of our information super-highway,

and every single day someone gets the streams death penalty.

> And all that the Masters of the Internet do is
> claim that the attrition rate isn't so bad.
> And they make fun of the dead. "Bad programmer".

And all that the Masters of the Internet do is claim that the developer
should understand what he's doing. And they make fun of the dead. "Slept
through TCP/IP 101".

You just don't seem to get the concept here - developers can learn how to
program TCP/IP in many ways. They can attend classes, they can read books,
or they can dive right in. Those that dive right in are likely to find
themselves out of their depth, and even the lifeguards get a chance to laugh
at them once they've been rescued. Being hit with the "Nagle penalty" (once
again, I apologise to John Nagle, since this is really a result of a
collision between _two_ algorithms, only one of which is his) is something
that indicates that you weren't properly taught (either by the class you
attended or by the book you read, etc) how to program TCP/IP. It's not an
indication that the stack needs a redesign.

This is a truly lame discussion of an acknowledged and well-explained bump
in the road. Its only camouflage is the yellow paint and warning stickers
all around it. Get a clue, get a life, and get a grip. Suggest to your
coworker that he goes and actually _learns_ how to program TCP/IP rather
than fumbling around in the dark. Suggest that to anyone else who pops up
with the 'five foos per second' that you're talking about. But don't
blithely tell everyone to disable Nagle. Let's face it, I can drive much
faster through a traffic jam if I use the shoulder, but it's still not a
good idea, and it doesn't mean that all the other car drivers are stupid.

Alun.
~~~~

--
Texas Imperial Software | Try WFTPD, the Windows FTP Server. Find it
1602 Harvest Moon Place | at web site http://www.wftpd.com or email
Cedar Park TX 78613 | us at al...@texis.com. VISA / MC accepted.
Fax +1 (512) 378 3246 | NT based ISPs, be sure to read details of
Phone +1 (512) 378 3246 | WFTPD Pro, NT service version - $100.
*WFTPD and WFTPD Pro now available as native Alpha versions for NT*

Alun Jones

unread,

May 14, 1999, 3:00:00 AM5/14/99

to

In article <7hhajb$eaj$1...@hal.cs.duke.edu>, t...@cs.duke.edu (Thomas R. Truscott) wrote:
> (1) Does the book even *mention* the Nagle algorithm?
> (2) Does it mention TCP_NODELAY?
> If either of the above, then:
> (3) Does it generally indicate there are performance implications?
> (4) Does it specifically say that huge performance penalties
> (e.g. in excess of 100-fold) can and do occur?
> (5) Does it specifically refer to the "5 FOOs per second"
> phenomenon as a diagnostic indication of the Nagle trap?

May I suggest you go search for Nagle's own comments regarding the 200ms
delayed ACK? Hit DejaNews for a while, and you'll find that, in Mr Nagle's
opinion at least, the setting of the ACK delay at a fixed 200ms is a
mistake.

Then, go look at some of the responses either side of those discussing the
Nagle algorithm. They usually start with your noted "5 foos per second"
(which is a symptom not of the Nagle algorithm, but of the interaction
between Nagle and delayed ACK fixed at 200ms), and end with "thank you so
much for showing me how to improve my performance", without suggesting in
between that TCP_NODELAY be set.

As for the books, let's look at the bibles for this field. Since your post
includes comp.os.ms-windows.networking.tcp-ip, let's look at the main
Winsock programming book:

Windows Sockets Network Programming, by Quinn & Shute:
1. Yes - it has several mentions in the index, including one entry for
'Nagle, John'.
2. Yes - in each instance, at the same point as discussing the Nagle
algorithm (well, duh!)
3. Yes, as well as outlining both the kinds of applications that might need
TCP_NODELAY, and those methods that can be used "to eliminate any negative
effects of the Nagle algorithm".
4. 100-fold is _your_ data. The performance penalty is that packets are
frequently limited to one every 200ms. Yes, _that_ is discussed. (Note the
mathematical difference here - the performance _ratio_ you get depends
entirely on your available bandwidth, the performance _difference_ you get
depends only on the ACK delay)
5. No - this is an instruction manual, not a repair book. If you're
actually paying attention as you read the book, you won't be programming
that way anyway, and you'll have read the notes about TCP_NODELAY before you
get into possibly hitting it as a problem.

You're boring me, and as long as you continue to avoid contributing anything
useful or intelligent to this discussion, I shall not bother to reply
further. Please do not take my silence as some vague indication of your
argument's validity. It is merely an indication of your argument's vacuity.

Thomas R. Truscott

unread,

May 14, 1999, 3:00:00 AM5/14/99

to

> As for the books, let's look at the bibles for this field.

You ignored my experiment in favor of choosing a *single*
book as evidence that all TCP/IP books warn of the Nagle trap.
This invalidates the experiment.

But since you made such a terrible choice, I will play along :-)

> Windows Sockets Network Programming, by Quinn & Shute:

On page 307 the book speaks glowingly of the Nagle algorithm.
On page 308 in the section "Which Applications Need TCP_NODELAY"
it says that that "two types of applications can benefit":

1) an application that does two or more sends of small amounts
of data expects immediate responses from each.

This is flat wrong! Such an application does not fall into the trap.
The author surely meant "an application that does two or more
small sends of logically associated data and expects a response after
the server has read all the associated data".

2) an application that needs to receive a steady flow of data ...

The "classic example" exhibited is X-Server mouse tracking.
But anyone remotely familiar with X Windows knows that clients
set TCP_NODELAY too and for reasons that go far beyond mouse tracking.

Later on page 308 it lists suggestions for how to
"keep your network administrators happy and ... your users
happy as well." The first one is a gem:

*) Don't do writes and reads in lock-step, but allow some overlap
between consecutive operations.

What?!! The safest way to avoid the Nagle trap (besides NODELAY)
when doing RPC is short lock-step write/read pairs.
The above advise is vague, but it sounds like a suggestion
which would trigger the trap for sure.
The X Window protocol does this kind of asynchronous RPC overlap,
which is the true reason why it must set TCP_NODELAY.
The mouse tracking excuse is just that, an excuse.
It placates the Nagle worshipers.

This book suggests (again page 308) you "might find an increase
in performance" from TCP_NODELAY. No where does it suggest
that the increase might be a many-fold speedup
as it so typcally is (Apache, Samba, ftp, the FOO application, ...)

No where does this book point out the "5 ops per second"
that is the hallmark of the Nagle death penalty.

Experiment or no, this "bible" fails horribly.

Tom Truscott

Thomas R. Truscott

unread,

May 14, 1999, 3:00:00 AM5/14/99

to

>Ya know, the other day, someone came up to me and said "I lose data
>continually on TCP". I asked him if he was expecting to read the same
>number of bytes at the receiver as he had stuck into the pipe at the sender.
>"Yes."
>
>> This is so pathetically typical, and it makes me so angry.

Okay, you are angry about this. What shall we do about it?

The need to check return codes on I/O calls is so universally necessary
that programmers who do not ... well, they should check.
This is not a TCP, pipe, disk, tape, or terminal issue.
There are numerous different ways that I/O calls on any of these
might unexpectedly fail or return a short count.
I think it is unlikely that a good programmer
will suddenly stop doing so when confronted with TCP,
no matter how unfamiliar with TCP they might be.
They still know that read/write/send/recv/whatever
are I/O calls with return codes, and so they will check.

Perhaps your anger is that a good programmer might on rare occasion
accidentally fail to check (or handle) a return code properly.
I consider that an entirely valid concern,
and I believe there exist programming languages which go
a long way in trapping such problems.
Unfortunately, most people do not use those languages.

This is not a TCP-specific issue.
It is a general issue which can and should be handled in a general way.

Tom Truscott

John Nagle

unread,

May 14, 1999, 3:00:00 AM5/14/99

to

t...@cs.duke.edu (Thomas R. Truscott) writes:
>I've got to share this: One of the best programmers at the
>company where I work just walked in and asked why
>his FOO application was running slowly.
>I asked him if it was using sockets. "Yes".
>I asked him if he was getting five FOOs per second. "Yes".

>This is so pathetically typical, and it makes me so angry.

>We have created this camouflaged spike pit right
>in the middle of our information super-highway,
>and every single day someone gets the Nagle death penalty.

>And all that the Masters of the Internet do is
>claim that the attrition rate isn't so bad.
>And they make fun of the dead. "Bad programmer".

Agreed.

The real problem is that the interaction between the
Nagle algorithm and delayed ACKs is awful. Both went in
around the same time, and when I developed the Nagle
algorithm, I was doing it on an implementation that didn't
have delayed ACKs. So it never delayed things by more than
one RTT, and it got rid of the awful case where someone is
writing to a socket one byte at a time, generating a full
IP packet for each byte and choking the link.

Delayed ACKs are badly designed; that fixed 200ms timer
is stupid. It's a human-scale timer, reflecting human response
times. I never would have done it that way. Previous
attempts to solve the tinygram problem had used an accumulation
timer to prevent the sending of too many tiny packets. Accumulation
timers had been used in X.25 and Tymnet, and they added delay even
when the link was fast. My approach only added one RTT at worst,
so on fast links, there was no visible delay, and on slow links,
you go the packet consolidation needed to avoid congestion collapse.

Worse, a delayed ACK is a bet. Delaying an ACK is done in hopes that
the local application will reply before the timer runs out. But
TCP doesn't keep score, and will happily delay ACKs forever even
though it's losing the bet every time.

I've been talking to Vern Paxton at LBL about fixing this in
the TCP spec. It might happen.

It's worth noting, though, that if you hit this problem, you're
doing something moderately dumb in your application protocol. The
usual problem is doing write-write-read to a socket; the first
write goes out immediately, the second one gets delayed, and the
read causes the application to block until both writes make it
through, which requires one ACK delay plus one RTT. If you can
buffer those two writes into one, the problem goes away.

There's also a completely unrelated problem with Windows apps
that don't provide enough receive buffers. The symptoms look
similar, but the cause is having only one receive buffer,
forcing the sender to send one packet at a time.

I should point out that I haven't worked on protocol design
for over a decade; I'm doing physically-based animation now.
But I try to point people in the right direction now and then.

John Nagle
www.animats.com

Joe Doupnik

unread,

May 14, 1999, 3:00:00 AM5/14/99

to

----------
Well put as usual, John.
We don't know what suggestions are being considered, and I'm not
inquiring at this point, but I would like to add one to the mix. It is the
modern use of the TCP PUSH bit. These days it is taken to mean "my transmit
buffer has been emptied with this transmission" rather than the original
intent of please let your app know these bytes have arrived. So, if we can
use the modern interpretation a delayed ACK should become an immediate ACK
if the PSH bit is set.
Joe D.

D. J. Bernstein

unread,

May 15, 1999, 3:00:00 AM5/15/99

to

John Nagle <na...@netcom.com> wrote:
> Delayed ACKs are badly designed; that fixed 200ms timer is stupid.

The 200ms timer for delayed ACKs is arguably too _small_. If the client
won't be retransmitting for (say) 1 second, why shouldn't the server's
ACKs wait (say) half a second for a window update? Nobody would be
asking for a smaller timeout if ACKs weren't being misused as GAs.

Anyway, even with immediate ACKs, your algorithm is intolerable for
interactive work over a modem.

> the awful case where someone is writing to a socket one byte at a time,

If a program is writing short lines through stdio, there are going to be
some annoying delays, whether or not your algorithm is used. Fix: change
stdio to flush before read(), rather than at the ends of tty lines. (Or,
under UNIX, feed the program's output through cat.)

If a user is typing characters one by one, then your algorithm changes a
tolerable, consistent delay into an incredibly annoying, variable delay.
For some reason users don't appreciate learning that this is saving them
as much as a whopping 1% of their modem capacity.

---Dan

Eric A. Hall

unread,

May 15, 1999, 3:00:00 AM5/15/99

to Thomas R. Truscott

> The Nagle algorithm has a potential benefit (smaller packet count)
> and a potential cost (larger packet latency), and much of the debate
> has been which is larger and whether it matters. I propose that
> someone conduct an experiment.

I agree that this would be a nice expirement. However, it should also be
combined with data about the effects on a global basis, rather than on
an isolated network. Most modern LANs on f500 companies could probably
handle the additional traffic without much of a problem, but the effect
on the Internet in general would likely be a different story.

Congestion is self-replicating. Congestion causes more congestion.
Congestion on a web site that gets thousands or millions of hits is
totally different issue than congestion on a web site that gets a few
hundred, for example, and when combined with oversubscribed ISP links
and backbone loss, the difference becomes very dramatic.

There has been a lot of discussion on one of the mailing lists recently
about the major periods of congestion collapse on the open Internet.
These collapses were not that long ago, and I seriously doubt that the
overall end-to-end nature of the Internet has changed so much that these
problems would not happen again if given the chance. Check out the very
bottom part of ftp://ftp.isi.edu/end2end/end2end-interest.mail for some
of these points, and keep them in mind when you do your testing.

Otherwise, I would like to reiterate that nobody cares if you turn it
off on your network, but suggesting that the default behvior should be
"off" is not appropriate.

D. J. Bernstein

unread,

May 15, 1999, 3:00:00 AM5/15/99

to

Eric A. Hall <eh...@ehsco.com> wrote:

> Congestion is self-replicating. Congestion causes more congestion.

[ ... ]

> congestion collapse on the open Internet.

[ ... ]

> suggesting that the default behvior should be "off" is not appropriate.

If you really believe your own rhetoric, why aren't you demanding that
the _first_ packet be delayed?

See http://pobox.com/~djb/sarcasm/modest-proposal.txt for some other
ideas that just might save the Internet from ``congestion collapse.''

Did you realize, for example, that TCP is actually an _eight-bit_
protocol? How many of your ``apps'' are using, at best, 87.5% of the
network's capacity?

And have you ever tried compressing your email before sending it? A good
algorithm can squeeze your ``congestion ... congestion ... congestion''
down to a few bytes, like ``mmmph.'' Surely _that_ would reduce mmmph!

---Dan

Marc Slemko

unread,

May 15, 1999, 3:00:00 AM5/15/99

to

In <7gq71v$bfg$1...@hal.cs.duke.edu> t...@cs.duke.edu (Thomas R. Truscott) writes:

>>> Long ago and far away, various web servers behaved like that mythical
>>> email client. They set TCP_NODELAY and sent http headers separate from
>>> URL data. As I recall, fixing that to be a gathering send with no
>>> TCP_NODELAY was at least 10% on web server benchmarks. ...

>TCP_NODELAY is still necessary, as explained in this comment
>in http_main.c in the Apache HTTP Server:

Don't believe everything you read.

> /* The Nagle algorithm says that we should delay sending partial
> * packets in hopes of getting more data. We don't want to do
> * this; we are not telnet. There are bad interactions between
> * persistent connections and Nagle's algorithm that have very severe
> * performance penalties. (Failing to disable Nagle is not much of a
> * problem with simple HTTP.)

In this case, I believe this comment is outdated and no longer accurate.

The only studies on this topic that I have seen are with web servers
that send the response headers in a separate packet from the response
body, as many silly servers still do. If you do that, then you can run
into problems.

I am not aware of any fundamental reason why Nagle has to be disabled
by a HTTP server and, in fact, am aware of numerous situations where
Apache's disabling of it hurts, sometimes quite badly especially when
combined with Apache 1.3's default of "unbuffered" CGIs.

On a mildly related note, from a network perspective a setsockopt()
similar to TCP_NOPUSH can be useful to let your application be sure
that a less than maximum sized packet is never sent over the network
until the application says "hey, I'm done with this logical chunk of
data".

>The Nagle algorithm was a boon, once, for telnet.
>It is a disaster for just about everything else.

That's funny.

You give example after example of people who don't know what they are
doing writing broken applications where Nagle ends up helping in the
big picture, not hurting.

Problem: "I can only transfer 5 bytes per second with my program. Looking
at the network, there are only 10 packets per second! Surely we can do
more than that?"

A1: "Oh, that is because Nagle is horrible. If you disable it, things will
be far better and you will be able to get much better transfer rates.
Nagle sucks. Look, now you can do 10000 packets/sec and get 10000 bytes/sec."

A2: "Well, lets look at your application and what you are doing a bit more
closely. Hmm, why are you doing one byte write()s? You really shouldn't
do that because of x, y, z. Fix that and your problems should go away.
Look, now you can do 10000 packets/sec and get 10000000 bytes/sec"
"

Even if both answers result in similar "good enough" performance when someone
tests the app in isolated conditions, A1 results in horrible overheads
on the network level. Saying that Nagle is bad because it can impact
application authors who don't understand TCP is bogus.

Joe Doupnik

unread,

May 15, 1999, 3:00:00 AM5/15/99

to

---------

Rather than trying to move the mountain I repeat my suggestion to
enable TCP receivers to emit a prompt ACK when a data-carrying segment
arrives with the PUSH bit set. So far as I am aware this is totally
complaint with all the rules, stated and implied, works with existing
transmitters (they already do the PUSH), and should cause no difficulty
while solving the delayed ACK (+ Nagle delay) problem.
This is so simple to implement that someone with tools handy can
run an experiment to give numbers to those who wish them.
Joe D.

Thomas R. Truscott

unread,

May 17, 1999, 3:00:00 AM5/17/99

to

> You give example after example of people who don't know what they are
> doing writing broken applications where Nagle ends up helping in the
> big picture, not hurting.

My Nagle-free ftp can download one-megabyte files faster than
your ftp can download cookie recipes.
Are you saying that ftp was written by clueless people?

I have indeed given example after example. It is your turn now.

The pro-Nagle camp has failed to demonstrate even a single example
of a reasonably popular program which actually benefits from Nagle.

Don't tell me that I wrong. PROVE that I am wrong.
Show me your data. Stop spouting your "theory".

Put up or shut up.

Tom Truscott

P.S. I am aware that tcpperf-type programs can be configured
to demonstrate a Nagle benefit. But those are testing tools!
To prove Nagle helps "the big picture" you must find something else.

Rick Jones

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

Thomas R. Truscott (t...@cs.duke.edu) wrote:

: "... just about every book on TCP/IP programming discusses

: the Nagle algorithm and its implications"

I think I was the source of that one, and I will now state that my
assertion was incorrect. I just spent about one hour at Computer
Literacy (www.fatbrain.com) going through a bunch of books on TCP/IP.

My search was limited to the index of the book.

About 3/4's of the books (broad handwaving) had no references in their
indicies for Nagle or TCP_NODELAY. I cannot say that those books did
not discuss the topic, just that it did not appear in the index. I was
not going to spend more than an hour searching books on Thomas' behalf
- I'm sure that he can continue the research at a bookstore near him
:)

: How easy must I make this for you?
: Please demonstrate the existence of even a single book
: on TCP/IP programming that does both of the following:

: 1. Warns the reader that the Nagle algorithm can in some case
: cause a huge performance penalty (e.g. in excess of 100-fold).

: 2. Alerts the reader to the "5 ops per second" phenomenon
: that is the hallmark of the Nagle spike pit.

I would put forth two texts for this. The first would be "Unix Network
Programming, Volume 1, 2dn Edition" by W. Richard Stevens. The second
would be "Windows Sockets Network Programming" by Quine and
Shute. Both had decent discussions of the Nagle algorithm, which types
of applications _need_ to disable it, and what application coding
practices are best for the others to avoid running into Nagle.

There were briefer, more cursory discussions of this topic in Comer's
"Introductions to TCP/IP, Volume1, 3rd Edition" and Pat Boner's
"Network Programming with WIndows Sockets."

Mark Summerfield

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

"Thomas R. Truscott" wrote:
> Collect an UNBIASED sample of books on TCP/IP programming,
> e.g. any that have code samples using "connect" and "accept".

Why should anybody do this? In every field I have ever been involved
with (and I've taught in a couple of fairly disparate examples, on
top of my own education and wider reading), there are dozens of books
to choose from, of which a handful (perhaps as few as two or three)
are actually any good (i.e. worthy of recommendation to students as
the book they should have if they only have one book; worthy of debate
in academic staff meetings as the "set text" for a course).

In short, your "unbiased sample" would contain too many *bad* books that
simply do not deserve consideration. Some of them would mention Nagle.
Some of them might even have a decent discussion of it. But this is
not relevant to the proposition. If you are a professional in some
field, your goal should not be to have an "unbiased" sample of books
on your shelf (unless your goal is mediocrity), but to have a small
number of high-quality books which make available to you the information
you need to be productive and do quality work.

We could just as easily argue that any book that does not adequately
discuss the Nagle algorithm, its purpose, and its implications, is
a priori a "bad" book. (Which is not to say that any book that *does*
discuss these issues is necessarily "good" -- this is not the only
criterion!) In reality, this might be a bit harsh -- no book ever
covers everything as well as possible, and so long as you have at
least one book in your library with a really top-notch discussion of
Nagle, you should be OK. As with the law, ignorance is no excuse --
would you apply the same standards to surgeons that you're applying
to programmers?!

Mark

Mark Summerfield

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

"D. J. Bernstein" wrote:
> And have you ever tried compressing your email before sending it? A good
> algorithm can squeeze your ``congestion ... congestion ... congestion''
> down to a few bytes, like ``mmmph.'' Surely _that_ would reduce mmmph!

You are showing your ignorance.

Sure, there's lots of data on the internet that could be compressed,
which would reduce the total amount of traffic, and hence make room for
other traffic. But that's a separate issue which has absolutely nothing
to do with congestion collapse (note, no scare quotes!) Congestion
collapse is real, and can (and will) eventually occur in any network
in which congestion can result in the generation of _extra_ traffic
(e.g. congestion causes loss, loss causes retransmission + additional
overhead => the result of congestion is the generation of more traffic
that the level which caused the congestion in the first place).

Congestion is not a result of application data. It is a protocol issue.
Feel free to compress your email as much as you like -- your internet
bills will be lower, if nothing else ;-)

If you don't understand the difference, and if you think congestion
collapse is nothing but a buzz-phrase or a catch-cry, then you don't
belong in this discussion. There are real experts contributing here
(and I certainly don't count myself amongst them -- John Nagle himself
has contributed his two cents' worth). And there are real issues to
be debated, but crying "compression" does not constitute a contribution.
It's actually completely irrelevant.

Mark

Mark Summerfield

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

"Thomas R. Truscott" wrote:
> The pro-Nagle camp has failed to demonstrate even a single example
> of a reasonably popular program which actually benefits from Nagle.

The "pro-Nagle camp" has, in fact, *never* said that any individual
program "benefits" from Nagle. Nagle is not there to benefit individual
programs. It's obvious that if you and I were the only two people on
the internet, we'd have no need for Nagle. We could send as many packets
as we liked. Nobody would care. There would be nobody *to* care!!!
In fact, we probably wouldn't even need TCP. With nobody else to get
in our way, we would hardly ever lose packets. And if we did, we could
just resend them. Hey, we could just send everything twice, or three
times, to be sure, to be sure, to be sure... We wouldn't need slow
start, we wouldn't need sliding windows. We could build our own, perfectly
workable, and astronomically efficient stream-oriented transport
protocol right on top of UDP. Or IP. But then, if there were only
you and me out there, why would we even need IP?!

Nagle benefits the network. Individual programs benefit indirectly,
because Nagle helps to prevent the network from becoming congested so
that nobody can get any data through. And applications can be
designed and written in such a way that they don't even trigger
Nagle. And, finally, Nagle *can* be disabled in those remaining
cases where it really does cause an unnecessary performance hit.

But, as people keep telling you (and you, it seems, keep missing the
point), this is *no argument* for disabling Nagle by default. Nagle can
not be "disabled" until it is replaced with something better. I don't
think that anybody who really knows what they're talking about would
argue that something better could not be devised (or has not been
devised). The practical problem is updating millions of implementations
overnight if we did decide to change.

> Don't tell me that I wrong. PROVE that I am wrong.
> Show me your data. Stop spouting your "theory".

We're not even talking about the same thing. The real problem is that
there is no common understanding here regarding the actual problem that
Nagle was designed to solve. You think in terms only of individual
applications; the people you have lumped into the "pro-Nagle camp"
are thinking about the whole network. There is a difference between
"writing programs" and "designing protocols" (even though many protocols
end up being implemented by programmers).

> Put up or shut up.

Hmmmm.

Mark

D. J. Bernstein

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

Mark Summerfield <m.summ...@ee.mu.oz.au> wrote:
> You are showing your ignorance.

Arrogance doesn't work too well when you make a fool of yourself.

You claim that compression is ``a separate issue which has absolutely
nothing to do with congestion collapse.''

That's blatantly incorrect. The percentage of time that a network is
congested depends heavily on the total amount of data that users are
trying to transmit.

It is, of course, possible to hurt a network with a flood of small
packets. (That's how I accidentally crashed the NSFNET-NYSERNet link
many years ago.) But most cases of congestion involve large packets,
and simply wouldn't have happened if the data had been compressed.

---Dan

John Hascall

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

Large packets simply don't happen if the data is compressed?!?

Thomas R. Truscott

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

>> Collect an UNBIASED sample of books on TCP/IP programming,
>

> Why should anybody do this?

Because the pro-Nagle comp continues to make statements such as:

"... just about every book on TCP/IP programming discusses
the Nagle algorithm and its implications"

"Its only camouflage is the yellow paint

and warning stickers all around it."

I am asking the Nagle worshipers to prove they
have more than empty rhetoric. Is it too much to ask?

How easy must I make this for you?
Please demonstrate the existence of even a single book
on TCP/IP programming that does both of the following:

1. Warns the reader that the Nagle algorithm can in some case
cause a huge performance penalty (e.g. in excess of 100-fold).

2. Alerts the reader to the "5 ops per second" phenomenon
that is the hallmark of the Nagle spike pit.

Tom Truscott

Uri Raz

unread,

May 18, 1999, 3:00:00 AM5/18/99

to

John Hascall (jo...@iastate.edu) wrote:
> D. J. Bernstein <d...@koobera.math.uic.edu> wrote:
>> Mark Summerfield <m.summ...@ee.mu.oz.au> wrote:
>>> You are showing your ignorance.
>> Arrogance doesn't work too well when you make a fool of yourself.
>> You claim that compression is ``a separate issue which has absolutely
>> nothing to do with congestion collapse.''
>> That's blatantly incorrect. The percentage of time that a network is
>> congested depends heavily on the total amount of data that users are
>> trying to transmit.

[snip]

> Large packets simply don't happen if the data is compressed?!?

The less data travelling in the network (e.g. due to compression) the
smaller the chance of the same network to get congested.

The fact that less data is sent is orthogonal to the question of
how to avoid congestion in general.

Mark Summerfield

unread,

May 19, 1999, 3:00:00 AM5/19/99

to

"D. J. Bernstein" wrote:
>
> That's blatantly incorrect. The percentage of time that a network is
> congested depends heavily on the total amount of data that users are
> trying to transmit.

Obviously. But when it comes to *protocol design* you're not the slightest
bit interested in what application data the packets contain. You predict
that under certain conditions, problems will arise, and you design your
protocol to try to avoid or mitigate those conditions. Whether it occurs
due to a small number of users sending large amounts of data, or a larger
number of users sending the same types of data in a compressed format
simply is not the issue.

> It is, of course, possible to hurt a network with a flood of small
> packets. (That's how I accidentally crashed the NSFNET-NYSERNet link
> many years ago.) But most cases of congestion involve large packets,
> and simply wouldn't have happened if the data had been compressed.

Given the same total amount of data to be pushed through the network,
congestion is more likely to occur if it's sent through as a larger
number of smaller packets than a smaller number of larger packets.
The additional overhead of the extra packet headers may only be small,
but it's only part of the problem. The extra packets increase the
processing load of the routers, and increase the probability of contention
for output ports (which may result in packet losses even though the
average data rate is within the capacity of the outgoing link -- it
will depend on buffer sizes).

Even if you consider only a simple, shared local Ethernet, the more packets
per second you send, the more collisions will occur. Every collision
results in the loss and retransmission of (at least) two Ethernet
packets. If the network is very heavily loaded, the probability that
these retransmissions will result in further collisions is high.
Thus congestion collapse can occur even on a single Ethernet bus, carrying
either small or large packets. (Note that in an Ethernet network, the
probability of collision is essentially independent of packet length --
once the 64-byte minimum has been sent, the probability of a subsequent
collision during that packet on a well-configured Ethernet network is
zero.)

Mark

John Hascall

unread,

May 19, 1999, 3:00:00 AM5/19/99

to

Perhaps this has already been mentioned (our newsfeed is somewhat
lacking since our news guy ran off to make his fortune as an
internet mortgage broker):

http://www.acme.com/software/thttpd/benchmarks.html

has a chart comparing the performance of various web servers.
Of note is a comment at the bottom:

That nice diagonal line is very interesting. A bunch of very different
servers follow it exactly for the first part of their curve.
Its slope is 5 hits/second per user, indicating that each hit takes
a minimum of 1/5th second to handle.

But other servers don't follow it, showing that it's not an inherent
limit in the test setup. thttpd-2.00 was following the line, while
mathopd, a very similar server, does not, so I made some changes to
thttpd to try and get its latency to be more like mathopd's.

Turns out the change that made the difference was sending the response
headers and the first load of data as a single packet, instead of as
two separate packets. Apparently this avoids triggering TCP's
"delayed ACK", a 1/5th second wait to see if more packets are coming in.