FTP data transfer rate data: NT, VMS, Linux

mat...@seqaxp.bio.caltech.edu

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

Since we've got a new DS10 which isn't yet in service I took the
opportunity to do some data transfer rate tests under varying conditions.

The system configurations tested were:

SAF05_SLOW: Asus P2BDS, Dual PII 400 Mhz, 10/100 Mb 3COM 905B ethernet
card, data stored/read from FAT partition on half of a 4.3 Gb
U2W drive, NT 4 SP5, plugged into 10BaseT jack.
SAF05_FAST: Same machine, plugged into 100BaseT jack.
VMS5UCX DS10, VMS 7.2-1, TCP/IP Services 5.0A, 100BaseT connection,
files to/from an ODS5 volume. Machine has an Intraserver U2W
card and 3 18Gb U2W disks.
VMS5MULTI Same machine after TCP/IP services was removed and
Multinet 4.2a plus all patches available as of 11/3/99
installed, read/write to an ODS5 volume.
VMS2MULTI Same config as VMS5MULTI, but read/write to an ODS5 volume.
LINUX Dual PII 400 Mhz, 100BaseT, 9 Gb U2W drive. (The
controller in this machine is very similar to the Intraserver
controller.) Redhat 5.2 as modified by VA Research.

All systems were basically idling during the tests. FTP client is the
command line program on the VMS and Linux boxes, and WS_FTP95LE on the NT
box. The test file was 3416905 bytes in size. (Yes, that's a funny size,
it was just something I had around). In some cases the destination file
was deleted in between transfers, in others it wasn't. It didn't seem to
make much difference one way or the other. Each time shown is a single
experimental transfer of the file in the direction indicated.

Server Direction Client Time to transfer (seconds)

VMS5UCX -> SAF05_SLOW 9.8 B
VMS5UCX <- SAF05_SLOW 4.2 A
VMS5UCX -> SAF05_SLOW 6.9 B
VMS5UCX -> SAF05_SLOW 9.8 B
VMS5UCX <- SAF05_SLOW 4.0 A
VMS5UCX <- SAF05_SLOW 4.0 A transfer rate in this direction is
VMS5UCX <- SAF05_SLOW 4.0 A consistent
VMS5UCX -> SAF05_SLOW 6.6 B Note the times in these 6 transfers
VMS5UCX -> SAF05_SLOW 5.0 B it repeats 6 5 3 6 5 3
VMS5UCX -> SAF05_SLOW 3.1 B
VMS5UCX -> SAF05_SLOW 6.3 B
VMS5UCX -> SAF05_SLOW 5.0 B
VMS5UCX -> SAF05_SLOW 3.1 B

move the plug from 10 base T to 100base T sockets and continue

VMS5UCX -> SAF05_FAST 46.0

strange eh? NT apparently doesn't handle a speed change on the
socket very well. Reboot the NT machine while it stays plugged
into the 100BaseT socket and then see:

VMS5UCX -> SAF05_FAST 0.4 D
VMS5UCX -> SAF05_FAST 0.4 D
VMS5UCX <- SAF05_FAST 4.3 C
VMS5UCX <- SAF05_FAST 4.3 C

LINUX <- SAF05_FAST 4.0 C
LINUX <- SAF05_FAST 4.0 C
LINUX -> SAF05_FAST 0.3 D
LINUX -> SAF05_FAST 0.3 D

VMS5UCX <- LINUX 4.24 C
VMS5UCX -> LINUX .46 D
VMS5UCX -> LINUX .46 D
VMS5UCX <- LINUX 4.28 C
VMS5UCX <- LINUX 4.32 C

LINUX <- VMS5UCX 3.66
LINUX -> VMS5UCX 4.34
LINUX -> VMS5UCX 4.41
LINUX -> VMS5UCX 0.37 directed output to nla0:
LINUX -> VMS5UCX 0.64 first set rms/extend=600, then FTP
LINUX -> VMS5UCX 4.8 first set rms/extend=60, then FTP
LINUX -> VMS5UCX 4.8 first set rms/extend=60, then FTP
LINUX -> VMS5UCX 0.40 first set rms/extend=1000, then FTP
LINUX -> VMS5UCX 0.34 first set rms/extend=1000, then FTP

VMS5MULTI -> SAF05_FAST 0.3
VMS5MULTI <- SAF05_FAST 60.8
VMS2MULTI <- SAF05_FAST 64.4
VMS2MULTI <- SAF05_FAST 60.8
VMS2MULTI -> SAF05_FAST 0.3

VMS5MULTI <- LINUX 7.73
VMS5MULTI <- LINUX 3.9
VMS5MULTI <- LINUX 3.9
VMS5MULTI -> LINUX .453
VMS5MULTI -> LINUX .359

VMS5MULTI -> VMS2MULTI 1.5 (estimated, no transfer time is displayed)
VMS5MULTI <- VMS2MULTI 1.5 (")
VMS5MULTI -> VMS2MULTI 1.5 (") first set rms/extend=1000, then FTP
VMS5MULTI -> VMS2MULTI 0.4 (") to nla0:

Lastly, I modified

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
on SAF05 by adding "TcpWindowSize" with a value of 58400, as suggested in
a previous related thread by L. Bohan, rebooted, verified that the change
had "stuck" and found no change:

VMS2MULTI <- SAF05_FAST 60.8
VMS2MULTI -> SAF05_FAST 0.3

Disk to disk copy on each system:

LINUX command: cp /u1/test /u2/test
too fast to clock (<.1 seconds)
SAF05 Only one disk in system, so copy,paste.
too fast to time by hand. Subjectively, instantaneous.
VMS5MULTI command: copy dka100:[dir]test.zip dka200:[dir]
command: copy/alloc=6705 dka100:[dir]test.zip dka200:[dir]
1.5 seconds with either type of COPY. I tried a variety
of SET RMS commands and couldn't get this down to a reasonable
time when copy was used.

We already knew about the problems with window sizes and ACK, and the
process folks have presented a "per FTP account" workaround.

New in this data is the result that the SET RMS values are important
for the UCX FTP client. Conversely, they don't seem to do anything at all
for the Multinet FTP client. When the RMS extend values are set high
enough the UCX ftp client seems to be able to write the data to disk faster
than you can do it with COPY from disk to disk! In any case, TCP/IP
services was within spitting distance of Linux on every test, at least
it is if the FTP client/RMS dependence is not negatively weighted.

The Multinet FTP client goes at the same rate as a disk to disk copy and is
independent of the RMS default values. The default is better than TCP/IP
services client does, but the latter can be optimized to go faster.

I don't have a clue where the bizarre cyclical pattern in the transfer
rates indicated by "B" above come from. The other direction ("A") was
the same for all attempts.

I think the folks at Multinet have some work to do on their FTP server as
it is roughly 15 times slower than the UCX or Linux servers for uploads
from an NT system. I don't think a per user logical is an adequate
solution in this case - better that the FTP process, or the TCP/IP stack
itself, detect that things are timing out and automagically adjust for the
type of client which is connected. Downloads work great though.

Lastly, anyone know why COPY is so incredibly slow on this system compared
to the Linux and NT boxes? All three have nearly identical hardware in
terms of SCSI adapters and disks. Hard to come up with an exact number
but the VMS system appears to be roughly 10 to 15 times slower at disk to
disk copies than is either Linux or NT.

Regards,

David Mathog
mat...@seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech

Dave Pickles

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

mat...@seqaxp.bio.caltech.edu wrote:

{fascinating results snipped}

>Lastly, anyone know why COPY is so incredibly slow on this system compared
>to the Linux and NT boxes? All three have nearly identical hardware in
>terms of SCSI adapters and disks. Hard to come up with an exact number
>but the VMS system appears to be roughly 10 to 15 times slower at disk to
>disk copies than is either Linux or NT.

The times measured are not necessarily the time taken for the entire task of
copying 3Mb of data from one disk to another. In the case of Linux, if the file
and the directories were in buffer memory from an earlier test, all the OS has
to do is create the appropriate directory entry *in memory*, twiddle the
attributes of the buffer holding the file and mark the pages 'dirty' - the
actual copy to disk will only happen when the memory is needed for other uses.
Under VMS you are seeing the entire time to copy the file (maybe from VIO
Cache), update directories and receive notification from the disks that all the
writes have completed successfully. If you have ACP_DATACHECK set to default
then the file headers and directories are re-read to confirm that the file
structure changes have been successful.

Dave Pickles

Mike Sullenberger

unread,

Nov 4, 1999, 3:00:00 AM11/4/99

to

With the multinet FTP client you can get transfer times by turning on statistics
printing (STAT command).

>VMS5MULTI <- VMS2MULTI 1.5 (")
>VMS5MULTI -> VMS2MULTI 1.5 (") first set rms/extend=1000, then FTP
>VMS5MULTI -> VMS2MULTI 0.4 (") to nla0:
>

Did you wonder why the Multinet <--> Multinet were fast and Microsoft --> Multinet
was slow. Could it be that Microsoft picked a small send window size and then
implemented a very poor send packet algorithm so that they get caught by Nagle and
Delayed ACK when working with a system that has a large receive window.

>Lastly, I modified
>
>HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
>on SAF05 by adding "TcpWindowSize" with a value of 58400, as suggested in
>a previous related thread by L. Bohan, rebooted, verified that the change
>had "stuck" and found no change:
>
>VMS2MULTI <- SAF05_FAST 60.8
>VMS2MULTI -> SAF05_FAST 0.3
>

I wish people would learn how FTP works before they start trying to do performance
testing. Your change of the "TcpWindowSize" on the NT system does ABSOLUTELY
NOTHING. Changing the receive window on the sending system won't do ANYTHING at
all. You need to match the send window on the sending system with the receive
window on the receiving system. Try dropping the Multinet Receive window for FTP to
8K.

$ define/system/exec multinet_ftp_window_size 8192

Now do your tests again.

With the multinet FTP client you can get transfer times by turning on statistics
printing (STAT command).

>Disk to disk copy on each system:
>
>LINUX command: cp /u1/test /u2/test
> too fast to clock (<.1 seconds)
>SAF05 Only one disk in system, so copy,paste.
> too fast to time by hand. Subjectively, instantaneous.
>VMS5MULTI command: copy dka100:[dir]test.zip dka200:[dir]
> command: copy/alloc=6705 dka100:[dir]test.zip dka200:[dir]
> 1.5 seconds with either type of COPY. I tried a variety
> of SET RMS commands and couldn't get this down to a reasonable
> time when copy was used.
>
>We already knew about the problems with window sizes and ACK, and the
>process folks have presented a "per FTP account" workaround.

It is obvious from the tests above that you don't really understand about window
sizes, etc.

You can set the logical name system wide.

>
>New in this data is the result that the SET RMS values are important
>for the UCX FTP client. Conversely, they don't seem to do anything at all
>for the Multinet FTP client. When the RMS extend values are set high
>enough the UCX ftp client seems to be able to write the data to disk faster
>than you can do it with COPY from disk to disk! In any case, TCP/IP
>services was within spitting distance of Linux on every test, at least
>it is if the FTP client/RMS dependence is not negatively weighted.
>
>The Multinet FTP client goes at the same rate as a disk to disk copy and is
>independent of the RMS default values. The default is better than TCP/IP
>services client does, but the latter can be optimized to go faster.
>
>I don't have a clue where the bizarre cyclical pattern in the transfer
>rates indicated by "B" above come from. The other direction ("A") was
>the same for all attempts.
>
>I think the folks at Multinet have some work to do on their FTP server as
>it is roughly 15 times slower than the UCX or Linux servers for uploads
>from an NT system. I don't think a per user logical is an adequate
>solution in this case - better that the FTP process, or the TCP/IP stack
>itself, detect that things are timing out and automagically adjust for the
>type of client which is connected. Downloads work great though.
>

>Lastly, anyone know why COPY is so incredibly slow on this system compared
>to the Linux and NT boxes? All three have nearly identical hardware in
>terms of SCSI adapters and disks. Hard to come up with an exact number
>but the VMS system appears to be roughly 10 to 15 times slower at disk to
>disk copies than is either Linux or NT.
>

Could it be that VMS considers it a good thing to be able to gaurantee that your
data is safely stored on disk, rather then just stashed in memory? Do you ever
wonder what happens when your NT system crashes right after you have finished a
transfer, but it hasn't had time to put it onto the disk? Yes thats right it is
gone, never to be found again.

If you want to compare TCP stack transfer rates then you should transfer your file
to NLA0: on the VMS, since that is going to be approximately the same as what NT and
LINUX does with the file (stash it in memory).

If you actually want to compare disk to disk transfer rates over FTP then you need
to force NT and LINUX to force the file all the way to disk instead of letting them
leave it in memory.

Also the file transfered is far to small to give you good data. You need a file
that takes on the the order of 100 seconds to transfer to get good data. Try a file
that is about 60 to 100 Mbytes, then any differences in startup (file creation ...)
will be a small part of the transfer time.

Mike.

+----------------------------------------------------------------------------+
| || || |
| Mike Sullenberger || || |
| m...@cisco.com |||| |||| |
| Escalation Team |||||||| |||||||| |
| Customer Advocacy ..:|||||||||||||||||||||:.. |
| C i s c o S y s t e m s |
|----------------------------------------------------------------------------|
| Any Y2K related information provided herein is covered by United States |
| Public Law 105-271, The Year 2000 Information and Readiness Disclosure Act |
+----------------------------------------------------------------------------+

John Macallister

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

Before comparing FTP (or network COPY) throughput rates you ought to check
what (FTP) window size is actually being used: you can use TCPDUMP to check.
The load on and condition of the network are also major factors. After
checking what are probably the two most important factors affecting
throughput you can worry about processor and disk power, system buffering,
etc.

However, who really cares whether a single copy operation takes 6 seconds or
12 seconds? If you're copying large datasets elapsed time will be a factor
but you're more likely to leave it in the background and do something else
while waiting rather than stare at the screen...

The transfer of files between different types of systems is more likely to
take longer as a result of problems with file attributes or data formats
than with data throughput. Like system processing speed, data throughput is
in general becoming less of an issue for mundane tasks. It's more important
to get the job done than to eke out that last 2% performance from the system
or network.

John

Name: John B. Macallister E-mail: j.macal...@physics.ox.ac.uk
Post: Nuclear and Astrophysics Laboratory, Keble Road, Oxford OX1 3RH,UK
Phone: +44-1865-273388 (direct) 273333 (reception) 273418 (Fax)

mat...@seqaxp.bio.caltech.edu

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

In article <9911041643...@Cisco.COM>, Mike Sullenberger <M...@cisco.com> writes:
>>Lastly, I modified
>>
>>HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
>>on SAF05 by adding "TcpWindowSize" with a value of 58400, as suggested in
>>a previous related thread by L. Bohan, rebooted, verified that the change
>>had "stuck" and found no change:
>>
>>VMS2MULTI <- SAF05_FAST 60.8
>>VMS2MULTI -> SAF05_FAST 0.3
>>
>
>I wish people would learn how FTP works before they start trying to do performance
>testing. Your change of the "TcpWindowSize" on the NT system does ABSOLUTELY
>NOTHING. Changing the receive window on the sending system won't do ANYTHING at
>all.

Well, that is what the results showed. I didn't say that I understood it,
just that I was reporting it. I had thought this paramters set the send
size as well. There is nothing in the name to indicate that it only
affects the receive window, and nothing on the web page where I saw it to
indicate that either. I tried it because somebody had posted it here,
but it didn't work.

>
> $ define/system/exec multinet_ftp_window_size 8192
>

No, I don't think so. I'll accept that the MS TCP/IP stack sucks in the
way that it works but since 90 something percent of all machines use it
it's up to the Multinet stack to deal with it gracefully in the defalt
configuration.

>It is obvious from the tests above that you don't really understand about window
>sizes, etc.

Why should I have to? It's up to the TCP/IP stack vendor to deal with that
level. From a strictly user level perspective my tests show that the
default configuration of Multinet 4.2a stinks at receiving uploads of a
typical file from Windows clients, being a full 15 times slower than TCP/IP
services or Linux. I did these timings in the first place because people
had been reporting that there were differences in various upload times
going to VMS, and I wanted to see for myself how it worked. The
measurements were for FTP only but I suspect that they are not atypical for
"uploads" of largish (>> window size) files for other protocols which run
over TCP/IP too.

I may not understand the details of the transport but it does seem
reasonable to me that the receiving stack might detect that it was

A: timing out on every read from a given address
B: only receiving N kbytes on each time out

and adjust the window size for that address downwards dynamically. At the
cost of a few packets worth of delay it would obviate the need for user
tuning of the window size.

>
>If you want to compare TCP stack transfer rates then you should transfer your file
>to NLA0: on the VMS, since that is going to be approximately the same as what NT and
>LINUX does with the file (stash it in memory).

That's true, and also false. My measurements are representative of the FTP
transfer times a real user would see moving a real sized file to/from the
various systems when they are lightly loaded. The user generally doesn't
know or care why something takes 4 seconds, only that it does. In any
case, the file copy time on VMS was only 1.5 seconds, and that's in the
noise for some of the measurements. Summarized another way, the data shows
(I think we can agree on this much):

1. network transfer time for the test data is .3 to .4 seconds
2. Out of the box, Linux and TCP/IP services add about 3.5 seconds of some
type of overhead for this size upload.
3. Out of the box, Multinet adds 59.5 seconds of some type of overhead for
this size upload.
4. Physical write time of the file to disk is 1.5 seconds

>Also the file transfered is far to small to give you good data. You need a file
>that takes on the the order of 100 seconds to transfer to get good data. Try a file
>that is about 60 to 100 Mbytes, then any differences in startup (file creation ...)
>will be a small part of the transfer time.

We never move files of that size between Windows and VMS. I agree that
moving a single huge file would have given a better steady state throughput
number. But that number is irrelevant for our usage pattern, which is
bursty and consists only of smaller files.

Aaron Leonard

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

When last I looked (in the 3.5 / 4.0 days), MultiNet's
TCP receiver had an issue as far as acking too stingily.
Here's a note I wrote up on the subject; I dunno whether
anything's changed since then.

Cheers,

Aaron

---
TCP has a problem in the case where the sender's SNDBUF is significantly
smaller (<35%) than the receiver's RCVBUF, and where the data flow
is unidirectional. In this case, the receiver is disinclined to
send window-update-triggered ACKs, and because the receiver is not
piggybacking ACKs onto responses, the ACKs must await the firing of
the delayed ACK timer (tied to the TCP fast timer, every 200 ms.)

The result is that, when the above conditions pertain, performance
degenerates such that throughput is on the order of one SNDBUF of
data getting thru every 200ms.

I believe that the correct fix for this is an modification to the
"want to send a window update" section of tcp_output(), such
that we take the size of the sender's SNDBUF into account [1].

At present, tcp_output() (for example, when called upon after a
PRU_RCVD tcp_userreq()) will base its decision to send a window
update upon:

/*
* Compare available window to amount of window
* known to peer (as advertised window less
* next expected input). If the difference is at least two
* max size segments or at least 35% of the maximum possible
* window, then want to send a window update to peer.
*/

[2]

I propose that we send the window update when at least one of the
following THREE conditions holds:

- available window unknown to peer is >= 2* tp->>t_maxseg
- available window is >= 35% of so->so_rcv.sb_hiwat
| - available window is >= 50-to-75% of tp->max_sndwnd

I believe that this will NOT result in unnecessarily large numbers
of ACKs, and I believe that it will solve commonly encountered
throughput problems.

[ Will this really work OK? If the receiver has a 32KB read
posted, and 4KB shows up, the data will get passed up to the
app with no delay, right? Which should trigger the tcp_output()
check for windowsize update, no? ]

[ Another approach would be to modify the "fast path" thru
tcp_input(), ("this is a pure, in-sequence data packet"),
where rather than always setting the TF_DELACK flag, we
would sometimes set TF_ACKNOW and then immediately call
tcp_output(). However, I think it would be harder to
keep this approach from generating too many ACKs. ]

To give a good example of typical behavior that demonstrates the
problem, consider an FTP sender sending data to our FTP implementation.
By default, our FTP uses a SNDBUF and RCVBUF of 32KB.
If the sender has a SNDBUF of < .35 * 32KB, then our TCP will
delay ACKs till the 200ms fast timer fires. This produces results
as follows:

Sender SNDBUF Throughput (Kbps) Comments
------------- ----------------- -------------------------------
32768 2074 No DELACK. Most ACKs every
11584 bytes (8*1148 segs),
about every 30-50 ms [3]
16384 2042 Identical to 32768 SNDBUF
12000 1400 Just a few DELACKs. Mostly
open window ACKs.
10000 472 Like 8192 SNDBUF
8192 450 Most ACKs delayed, some
triggered by fully open window
6144 330 like 8192 SNDBUF
4096 170 [4]
2048 65 Almost all ACKs delayed; open
window ACKs about 1/sec
1024 22 Really degenerate throughput

As you can see, the throughput collapses when the sender SNDBUF
goes below the .35 * RCVBUF mark. It approaches SNDBUF*5 Bps
- e.g. at SNDBUF=4096, we can expect 4096*8*5 bps throughput,
which is indeed about what we observe.

A conceivable workaround would be just to shrink the RCVBUF
size in anticipation of small-SNDBUF senders. For example,
$ DEFINE /SYSTEM MULTINET_FTP_WINDOW_SIZE 4096. This would
result in our FTP receiver performing better across a range
of sender SNDBUF sizes (assuming low-latency connections.)
However, this means that now WE become a small-SNDBUF sender
when talking to a default-RCVBUF MultiNet system, and also that
we lose when going over pipes with high bandwidth*delay product.

---

Notes.

[1]. Ideally, the TCP receiver would know how big its peer's SNDBUF
is, in order to tell when the peer has sent all the data it can, and
therefore deserves a prompt ACK. Unfortunately, this information is
not directly communicated. Therefore I propose to employ the heuristic
that RCV.MAXWND, the "Largest Window Peer has offered", is better than
nothing - i.e. the size of the peer's RCVBUF is probably the best
guess we can get at its SNDBUF.

[2]. The relevant code from tcp_output():

[ censored ]

[3]. TCPDUMP of 32KB sender SNDBUF, 32KB receiver RCVBUF

17:51:40.62 161.44.192.51.20 > 161.44.192.53.3348: S 0:0(0) win 16384 <mss 1448,,wscale 0,,,timestamp 458752:0> (DF)
17:51:40.63 161.44.192.53.3348 > 161.44.192.51.20: S 0:0(0) ack 1 win 32768 <mss 1460,,wscale 0,,,timestamp 720896:0> (DF)
17:51:40.63 161.44.192.51.20 > 161.44.192.53.3348: . ack 1 win 16384
17:51:40.67 161.44.192.53.3348 > 161.44.192.51.20: P 1:49(48) ack 1 win 32768
17:51:40.73 161.44.192.51.20 > 161.44.192.53.3348: . ack 49 win 32720

Slow start setup. Now sender can start slamming data.

17:51:40.77 161.44.192.53.3348 > 161.44.192.51.20: . 49:1497(1448) ack 1 win 32768
17:51:40.77 161.44.192.53.3348 > 161.44.192.51.20: . 1497:2945(1448) ack 1 win 32768
17:51:40.77 161.44.192.53.3348 > 161.44.192.51.20: . 2945:4393(1448) ack 1 win 32768
17:51:40.77 161.44.192.53.3348 > 161.44.192.51.20: . 4393:5841(1448) ack 1 win 32768
17:51:40.77 161.44.192.53.3348 > 161.44.192.51.20: . 5841:7289(1448) ack 1 win 32768
17:51:40.77 161.44.192.53.3348 > 161.44.192.51.20: . 7289:8737(1448) ack 1 win 32768
17:51:40.78 161.44.192.53.3348 > 161.44.192.51.20: . 8737:10185(1448) ack 1 win 32768
17:51:40.78 161.44.192.53.3348 > 161.44.192.51.20: . 10185:11633(1448) ack 1 win 32768
17:51:40.78 161.44.192.51.20 > 161.44.192.53.3348: . ack 11633 win 32768

In the midst of the data onslaught, we see the receiver send
his window update ACK. This makes sure that the data keeps
flowing without interruption.

17:51:40.78 161.44.192.53.3348 > 161.44.192.51.20: . 11633:13081(1448) ack 1 win 32768
17:51:40.78 161.44.192.53.3348 > 161.44.192.51.20: . 13081:14529(1448) ack 1 win 32768
17:51:40.78 161.44.192.53.3348 > 161.44.192.51.20: . 14529:15977(1448) ack 1 win 32768
17:51:40.78 161.44.192.53.3348 > 161.44.192.51.20: P 15977:16433(456) ack 1 win 32768
17:51:40.81 161.44.192.53.3348 > 161.44.192.51.20: . 16433:17881(1448) ack 1 win 32768
17:51:40.81 161.44.192.53.3348 > 161.44.192.51.20: . 17881:19329(1448) ack 1 win 32768
17:51:40.81 161.44.192.53.3348 > 161.44.192.51.20: . 19329:20777(1448) ack 1 win 32768
17:51:40.81 161.44.192.53.3348 > 161.44.192.51.20: . 20777:22225(1448) ack 1 win 32768
17:51:40.81 161.44.192.53.3348 > 161.44.192.51.20: . 22225:23673(1448) ack 1 win 32768
17:51:40.81 161.44.192.51.20 > 161.44.192.53.3348: . ack 23673 win 32768

Another window update ACK, without delay.

[4]. TCPDUMP of 4KB sender SNDBUF, 32KB receiver RCVBUF

17:41:15.45 161.44.192.51.20 > 161.44.192.53.3347: S 0:0(0) win 16384 <mss 1448,,wscale 0,,,timestamp 458752:0> (DF)
17:41:15.46 161.44.192.53.3347 > 161.44.192.51.20: S 0:0(0) ack 1 win 4096 <mss 1460,,wscale 0,,,timestamp 720896:0> (DF)

Receiver has 16KB RCVBUF. (This later jumps to 32KB due to an
application programming bug in our FTP implementation). Sender
has 4KB RCVBUF (also 4KB SNDBUF, which we can't see, but can infer
from the RCVBUF.)

17:41:15.46 161.44.192.51.20 > 161.44.192.53.3347: . ack 1 win 16384
17:41:15.49 161.44.192.53.3347 > 161.44.192.51.20: P 1:49(48) ack 1 win 4096
17:41:15.51 161.44.192.51.20 > 161.44.192.53.3347: . ack 49 win 32720

Slow start. Note that the DELACK costs us 0.2 sec here.

17:41:15.57 161.44.192.53.3347 > 161.44.192.51.20: . 49:1497(1448) ack 1 win 4096
17:41:15.57 161.44.192.53.3347 > 161.44.192.51.20: . 1497:2945(1448) ack 1 win 4096
17:41:15.57 161.44.192.53.3347 > 161.44.192.51.20: P 2945:4145(1200) ack 1 win 4096

Sender sends full 4KB window. With my proposal, the window update
ACK would be sent after the second or third segment is received.

17:41:15.71 161.44.192.51.20 > 161.44.192.53.3347: . ack 4145 win 28624

Instead, we wait on the fast timer (.14 sec) before sending our delayed ACK.

17:41:15.71 161.44.192.53.3347 > 161.44.192.51.20: . 4145:5593(1448) ack 1 win 4096
17:41:15.71 161.44.192.53.3347 > 161.44.192.51.20: . 5593:7041(1448) ack 1 win 4096
17:41:15.71 161.44.192.53.3347 > 161.44.192.51.20: P 7041:8241(1200) ack 1 win 4096

Another 4KB send window.

17:41:15.91 161.44.192.51.20 > 161.44.192.53.3347: . ack 8241 win 24528

Another delayed ACK.

17:41:15.91 161.44.192.53.3347 > 161.44.192.51.20: . 8241:9689(1448) ack 1 win 4096
17:41:15.91 161.44.192.53.3347 > 161.44.192.51.20: . 9689:11137(1448) ack 1 win 4096
17:41:15.91 161.44.192.51.20 > 161.44.192.53.3347: . ack 11137 win 32768

Sometimes the receiver will issue a non-delayed ACK when its
receive window is fully drained. I'm not quite sure when this
happens, but if not for this effect, we would truly see the
SNDBUF*5 BPS throughput.

17:41:15.91 161.44.192.53.3347 > 161.44.192.51.20: P 11137:12337(1200) ack 1 win 4096
17:41:15.92 161.44.192.53.3347 > 161.44.192.51.20: . 12337:13785(1448) ack 1 win 4096
17:41:15.92 161.44.192.53.3347 > 161.44.192.51.20: P 13785:15233(1448) ack 1 win 4096
17:41:16.11 161.44.192.51.20 > 161.44.192.53.3347: . ack 15233 win 28672

Back to delayed ACK mode.

Bug History : *** NOTES 10/15/96 17:12:29 aaron
Supposedly Ken A. fixed this problem in the V4.1 kernel.
I'll ask for a V4.0 build of this kernel.

*** NOTES 10/31/96 08:07:27 aaron
New ECO KERNEL-UPDATE-040 includes the Ken A. fix. This fix
supposedly addresses the ack starvation problem seen by [ censored ]
in testing large windowsize FTPs between systems over 100baseT.
However, it does NOT address the starvation problem when the
sender's send window is significantly smaller than the receiver's
receive window.

Mike Sullenberger

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

>>>Lastly, I modified
>>>
>>>HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
>>>on SAF05 by adding "TcpWindowSize" with a value of 58400, as suggested in
>>>a previous related thread by L. Bohan, rebooted, verified that the change
>>>had "stuck" and found no change:
>>>
>>>VMS2MULTI <- SAF05_FAST 60.8
>>>VMS2MULTI -> SAF05_FAST 0.3
>>>
>>
>>I wish people would learn how FTP works before they start trying to do
>>performance testing. Your change of the "TcpWindowSize" on the NT system does
>>ABSOLUTELY NOTHING. Changing the receive window on the sending system won't
>>do ANYTHING at all.
>
>Well, that is what the results showed. I didn't say that I understood it,
>just that I was reporting it. I had thought this paramters set the send
>size as well. There is nothing in the name to indicate that it only
>affects the receive window, and nothing on the web page where I saw it to
>indicate that either. I tried it because somebody had posted it here,
>but it didn't work.
>

That is Microsoft for you, I don't think that they really understand TCP very
well either.

>>
>> $ define/system/exec multinet_ftp_window_size 8192
>>
>
>No, I don't think so.

Excuse me, I give you a work around for Microsoft's stupidity, and you won't
use it.

>I'll accept that the MS TCP/IP stack sucks in the
>way that it works but since 90 something percent of all machines use it
>it's up to the Multinet stack to deal with it gracefully in the defalt
>configuration.

I disagree, just because the Microsoft did it wrong doesn't mean others have to
follow and clean up after their crap. Microsoft could easily correct the
problem themselves on those 90% of systems.

>>It is obvious from the tests above that you don't really understand about
>>window sizes, etc.
>
>Why should I have to? It's up to the TCP/IP stack vendor to deal with that
>level. From a strictly user level perspective my tests show that the
>default configuration of Multinet 4.2a stinks at receiving uploads of a
>typical file from Windows clients, being a full 15 times slower than TCP/IP
>services or Linux. I did these timings in the first place because people
>had been reporting that there were differences in various upload times
>going to VMS, and I wanted to see for myself how it worked. The
>measurements were for FTP only but I suspect that they are not atypical for
>"uploads" of largish (>> window size) files for other protocols which run
>over TCP/IP too.

This is the part where we disagree, if you just said that it took longer
uploading to MultiNet from NT in this scenario then I would agree with you. BUT
you keep placing blame on the MultiNet system when the blame properly lies with
the NT system.

If you are going to make claims and lay blame about performance or anything for
that matter then you need to understand what you are talking about.

>I may not understand the details of the transport but it does seem
>reasonable to me that the receiving stack might detect that it was
>
>A: timing out on every read from a given address
>B: only receiving N kbytes on each time out
>
>and adjust the window size for that address downwards dynamically. At the
>cost of a few packets worth of delay it would obviate the need for user
>tuning of the window size.

Sorry you cannot do that, it is illegal to reduce the size of a receive window
after the connection is initiated. Also, there are a bunch of other things that
could be happening in the network that are transitory that could look similar
but are not this issue. You don't want to change your window size everytime
there is some network glitch, that could destroy your performance even after the
network glitch corrects itself.

>>
>>If you want to compare TCP stack transfer rates then you should transfer your
>>file to NLA0: on the VMS, since that is going to be approximately the same as
>>what NT and LINUX does with the file (stash it in memory).
>>
>That's true, and also false. My measurements are representative of the FTP
>transfer times a real user would see moving a real sized file to/from the
>various systems when they are lightly loaded. The user generally doesn't
>know or care why something takes 4 seconds, only that it does. In any
>case, the file copy time on VMS was only 1.5 seconds, and that's in the
>noise for some of the measurements. Summarized another way, the data shows
>(I think we can agree on this much):
>
>1. network transfer time for the test data is .3 to .4 seconds
>2. Out of the box, Linux and TCP/IP services add about 3.5 seconds of some
> type of overhead for this size upload.

>3. Out of the box, Multinet adds 59.5 seconds of some type of overhead for
> this size upload.

This is where I disagree Multinet may be adding about .1 sec, VMS is adding
about 3.5 seconds of disk file overhead. Multinet is patiently waiting for the
NT to send data it is just that the NT is refusing to send more data in a timely
manner.

>4. Physical write time of the file to disk is 1.5 seconds

>>Also the file transfered is far to small to give you good data. You need a
>>file that takes on the the order of 100 seconds to transfer to get good data.
>>Try a file that is about 60 to 100 Mbytes, then any differences in startup
>>(file creation ...) will be a small part of the transfer time.
>
>We never move files of that size between Windows and VMS. I agree that
>moving a single huge file would have given a better steady state throughput
>number. But that number is irrelevant for our usage pattern, which is
>bursty and consists only of smaller files.

Then have your VMS system administrator set the one multinet logical name, so
that the Multinet system is tuned for your usage pattern (files being transfered
from Microsoft systems).

I am sure that you noticed that the Linux system didn't have any trouble
transfering files to the Multinet system. The linux system doesn't use a large
send window either, it is just that they properly did their send buffer handling
so that it will send multiple send buffers worth of data of full size packets.

Mike Sullenberger

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

>When last I looked (in the 3.5 / 4.0 days), MultiNet's
>TCP receiver had an issue as far as acking too stingily.
>Here's a note I wrote up on the subject; I dunno whether
>anything's changed since then.

I guess that I disagree with this, I feel that in a properly tuned and well
functioning network, you only need to see one ACK per received buffer worth of
data (okay maybe 2 acks). The sending application has more data to send it is
just that they sending TCP layer isn't coded to double buffer the send buffer so
that the application can be writing data to the buffer at the same time that TCP
is taking data from the buffer and sending it on the wire.

If this were programmed correctly then the sending TCP could always have a full
size segment to send and wouldn't need to stop until either it filled the
receive window or emptied the send buffer (application was slower then TCP
sending). If you can't do this then use a larger send buffer, there is NO
downside to using a larger send buffer. Maybe 10 years ago when memory was
tighter it made a difference between using a 4KB vs 16KB send buffer, but now I
need 64MB on my PC to make it run anyway so the difference is miniscule. Also
the send buffer size can be controlled by the application. So those
applications that do bulk data transfer should increase their send buffer size
and those that don't should leave their send buffer size at the smaller size.

Eventually send and receive windows are going to need to be increased anyway.

If you connect two systems together on a 100Mbit/sec ethernet and say we have a
1 millisecond RTT, then you would need >12 Kbyte window to get full data rate
(12.5Mbytes/sec); with a 4KB window the transfer would top out at 4MBytes/sec.

If you are going over a satellite link with 1 Mbit/sec and .5 sec RTT then a
4Kbyte window would give you a top trasnfer rate of 8Kbytes/sec.

Even going across country at T1 rates 1.5Mbit/sec with a delay of 50
milliseconds; a 4Kbyte window gives a top transfer rate of 80Kbytes/sec which is
about 2/5 of the T1.

I would hate to transfer a movie (~ 80Mbytes) over the internet and be limited
to 80Kbytes/sec just because of a small the send window size.

So small window sizes are not a good thing in general

HOEF...@dcmir.med.umn.edu

unread,

Nov 5, 1999, 3:00:00 AM11/5/99

to

>>I'll accept that the MS TCP/IP stack sucks in the
>>way that it works but since 90 something percent of all machines use it
>>it's up to the Multinet stack to deal with it gracefully in the defalt
>>configuration.

>I disagree, just because the Microsoft did it wrong doesn't mean others have to
>follow and clean up after their crap. Microsoft could easily correct the
>problem themselves on those 90% of systems.

Strangely enough, I've heard of a vendor claiming to be ***Microsoft
Compliant***. Really scary!

Ed Hoeffner
1-271 BSBE
312 Church St. SE
Mpls, MN 55455
hoef...@dcmir.med.umn.edu
612-625-2115
612-625-2163 fax

John Macallister

unread,

Nov 6, 1999, 3:00:00 AM11/6/99

to

On our network, on a quick check without checking network or system loads, I
see Multinet-Multinet rates of 5 Mbits/sec and NT->Multinet rates of 3.5
Mbits/sec. I did notice that the NT system had negotiated use of 8Kbyte
windows whereas the Multinet-Multinet systems agreed to to use 32Kbyte
windows.

In the past I have seen huge discrepancies, sometimes in one direction only,
between pairs of systems, sometimes of the same type, sometimes different.
The explanation was usually to be found in the ftp window size used, system
load, network errors, "misbehaving" network equipment, assuming the systems
are reasonably configured.

Larry D Bohan, Jr

unread,

Nov 6, 1999, 3:00:00 AM11/6/99

to

On Thu, 4 Nov 1999 16:43:23 -0800, Mike Sullenberger <M...@cisco.com>
wrote:

>I wish people would learn how FTP works before they start trying
>to do performance testing. Your change of the "TcpWindowSize" on
>the NT system does ABSOLUTELY NOTHING. Changing the receive
>window on the sending system won't do ANYTHING at all. You need
>to match the send window on the sending system with the receive
>window on the receiving system. Try dropping the Multinet
>Receive window for FTP to 8K.

in all fairness, you might be jumping on David Mathog's case a bit
quickly.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip

\ParametersTcpWindowSize:REG_DWORD:0xe420

tweaking this, i find does affect the *SEND* window on the NT side.
(0xE420 = 58,400 decimal).

$ mu tcpdump /prom/verb/hex/snap:128 host X.X.X.X and port 20
...[snip]...
07:06:01.99 X.X.X.X.1056 > Y.Y.Y.Y.ftp-data:
P 132949:134409(1460) ack 1 win 58400 (DF) (ttl 128, id 36746)

this would mostly be of use for folks that didn't want to (can't?)
tweak MULTINET_FTP_WINDOW_SIZE lower (ie 8192) on the VMS side.
(whether via process,group,system logical). t'was really
the only reason for suggesting its use.

on TCPIP V5.0a, one can do a TCPIP> set protocol TCP /NODELAY_ACK
apparently on the fly.

Does Multinet have a similar tweak (say, some, MU SET/KERNEL blah) ?

tweaking the VMS IP stacks to accomodate NT makes me gag;
i'll be the first to admit ...

on a final note, the TCPIP v5.0a seems to handle NT xfers well,
at least via the bog-standard DOS-box FTP.

It makes me wonder if they (DEC/CPQ) had implemented the sort
of things suggested by Aaron Leonard in an earlier thread.

I found that most NT ftp clients/vary wrt. their window usages
and it's often worthwhile to play with:
$ tcpip set prot /quota=(send:nnnn,rec:nnnn) /[no]delay_ack

C:\>ftp
ftp> open X.X.X.X
Connected to X.X.X.X.
220 abc.com FTP Server (Version 5.0) Ready.
User (X.X.X.X:(none)): JOEBLOGGS
331 Username JOEBLOGGS requires a Password
Password:
230 User logged in.
ftp> cd VDA3:[000000]
250-CWD command successful.
250 New default directory is VDA3:[000000]
ftp> binary
200 TYPE set to IMAGE.
ftp> put TMP.BIN
200 PORT command successful.
150 Opening data connection for VDA3:[000000]tmp.bin;
(y.y.y.y,1541)
226 Transfer complete.
31233936 bytes sent in 27.34 seconds (1142.43 Kbytes/sec)
ftp> bye
221 Goodbye.