Increasing TCP buffer and window for more throughput

yon...@yahoo.com

unread,

Mar 4, 2007, 4:44:35 AM3/4/07

to

What's the relationship between TCP buffer and window? Other than
increasing the buffer, any way to make the window bigger?

We modified rsync code to set socket send and receive buffer to 512k
and can verifiy this with pfiles on rsync process. When it's running,
`netstat -nP tcp' on the sending side shows send window changes
between about 61k and 49, and only between 24k and 64k on the
receiving host.

Although probably not needed, system-wide TCP parameters on both
sender and receiver are also modified: tcp_recv_hiwat and
tcp_xmit_hiwat are 524288. tcp_conn_hash_size on sender is also set to
8192 in case it's relevant. tcp_max_buf is left at 1M default. MTS is
1500 according to ifconfig.

Sender happens to be Solaris 8 and receiver Solaris10. But I can find
other systems to test.

Yong Huang

Message has been deleted

yon...@yahoo.com

unread,

Mar 4, 2007, 9:42:00 AM3/4/07

to

On Mar 4, 6:07 pm, Michael Vilain <vil...@spamcop.net> wrote:
> In article <1173001475.388702.209...@h3g2000cwc.googlegroups.com>,

> I've heard it's generally a Bad Idea(tm) to mess with the TCP buffering
> parameters unless you have a very good reason. With ethernet (10BaseT
> and maybe 100BaseT), the MTS should be 1500. I've been in installations
> where they had fiber and we changed the MTS to 5000 or so, but that's
> the frame size for that physical media. I don't think ethernet changes
> the frame size, but I could be wrong.
>
> What are you using for transport? Gigabit-Ethernet or 100BaseT or
> something else?
>
> --
> DeeDee, don't press that button! DeeDee! NO! Dee...

Thanks. We use Gigabit-Ethernet. Interface name is ge0 and /etc/
hostname.ge0 exists. There's no /kernel/drv/ge.conf.

Some books seem to say that increasing TCP buffer size larger than MTU
(I said MTS earlier) is useless. Some discuss buffer size without
reference to Ethernet MTU. I think I trust the latter more. Our goal
is to transfer a few very large files as fast as possible.

Yong Huang

Russell

unread,

Mar 4, 2007, 8:14:41 PM3/4/07

to

It is safe to increase the MTU for Gigabit. Here is some good reading:

* http://sd.wareonearth.com/~phil/jumbo.html
* http://www.aarnet.edu.au/engineering/networkdesign/mtu/size.html

- Russell

--
http://www.dynode.net/~rjw/

Frank Cusack

unread,

Mar 4, 2007, 9:57:42 PM3/4/07

to

On 4 Mar 2007 06:42:00 -0800 yon...@yahoo.com wrote:
> Some books seem to say that increasing TCP buffer size larger than MTU
> (I said MTS earlier) is useless. Some discuss buffer size without
> reference to Ethernet MTU. I think I trust the latter more.

TCP buffer size is unrelated (mostly) to ethernet MTU.
-frank

Rick Jones

unread,

Mar 5, 2007, 2:54:44 PM3/5/07

to

In broad handwaving terms, there is the "socket buffer" which will
place a limit on the size of the "effective" TCP window. The TCP
receive window is not going to be larger than the SO_RCVBUF size on
the receiver.

At any one time, TCP will have no more bytes outstanding on the
connection than the minimum of the receiver's advertised window, the
sender's calculated congestion window, and the sender's "SO_SNDBUF"
(socket send buffer) size. (The last one is there because TCP needs
to keep a reference to the data it has sent until it is ACKed by the
remote). Notice that there are components from _both_ ends of the
connection there - it is not sufficient to simply increase the TCP
receive window on the receiver without increasing the "SO_SNDBUF" on
the sender.

The window differs from the TCP MSS - Maximum Segment Size - which
controls the size of the segments (ie packets) TCP sends to get that
data outstanding on the connection. The TCP MSS will be
calculated/exchanged based on what the first hop IP MTU happens to be,
and perhaps the presence of PathMTU discovery information, or its
being enabled, and what the remote sent for the MSS.

rick jones
--
denial, anger, bargaining, depression, acceptance, rebirth...
where do you want to be today?
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

yon...@yahoo.com

unread,

Mar 5, 2007, 9:30:47 PM3/5/07

to

Thank you and all others. We hardcoded SO_SNDBUF and SO_RCVBUF to 512k
inside rsync code which is used on both the sender and receiver hosts.
Since we see rather small TCP windows (49k to 61k seen in `netstat -nP
tcp' on sender, and 24k to 64k on receiver), there must be something
that cuts this down. Latency is not an issue: on sender, `ping -s
receiver' shows 1 or 0 ms. Any advice on what we can do? Thanks.

Yong Huang

yon...@yahoo.com

unread,

Mar 6, 2007, 12:30:25 AM3/6/07

to

Addition. Even though ping is fast, traceroute shows several hops in
between. So I find two other hosts in the same network for another
test. They're both Solaris8. This time, one host shows about 512k TCP
window (525624 or 525616). That is, if I run `rsync file B:/tmp' on A,
netstat on A shows Swind 512k and netstat on B shows Rwind 512k.
(There's another socket that must be used by rsync for other
purposes). The speed of data transfer is up to 14MB/sec now.

bash-2.03$ while true; do
> netstat -nP tcp | grep 10.4.143.77; sleep 5
> done
10.4.143.118.514 10.4.143.77.549 525616 0 65160 1024
ESTABLISHED
10.4.143.118.622 10.4.143.77.548 64240 0 24820 0
ESTABLISHED

Perhaps we need to check router send and receive pipe size for the
first test to go well.

Yong Huang

Rick Jones

unread,

Mar 6, 2007, 5:02:41 PM3/6/07

to

yon...@yahoo.com wrote:
> Thank you and all others. We hardcoded SO_SNDBUF and SO_RCVBUF to
> 512k inside rsync code which is used on both the sender and receiver
> hosts.

I trust that means you make calls to setsockopt() (OK, I _am_ paranoid :)

> Since we see rather small TCP windows (49k to 61k seen in `netstat -nP
> tcp' on sender, and 24k to 64k on receiver),

Is that showing a window size, or actually showing what is queued to
the socket?

Are there ndd settings controlling whether or not the systems will use
window scaling?

Do you see the same things running a basic netperf TCP_STREAM test?

rick jones
--
The computing industry isn't as much a game of "Follow The Leader" as
it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
- Rick Jones

yon...@yahoo.com

unread,

Mar 7, 2007, 4:16:56 AM3/7/07

to

On Mar 7, 6:02 am, Rick Jones <rick.jon...@hp.com> wrote:

> yong...@yahoo.com wrote:
> > Thank you and all others. We hardcoded SO_SNDBUF and SO_RCVBUF to
> > 512k inside rsync code which is used on both the sender and receiver
> > hosts.
>
> I trust that means you make calls to setsockopt() (OK, I _am_ paranoid :)

Yes.

> > Since we see rather small TCP windows (49k to 61k seen in `netstat -nP
> > tcp' on sender, and 24k to 64k on receiver),
>
> Is that showing a window size, or actually showing what is queued to
> the socket?

Swind and Rwind, not Send-Q, Recv-Q, in netstat output.

> Are there ndd settings controlling whether or not the systems will use
> window scaling?

Doesn't look like. We may be up to something here. Which parameter
controls that? My reference is http://www.sean.de/Solaris/soltune.html
in addition to docs on sun.com.

> Do you see the same things running a basic netperf TCP_STREAM test?

Will research. I guess that benchmark eliminates some irrelevant
factors. Like what?

>
> rick jones
> --
> The computing industry isn't as much a game of "Follow The Leader" as
> it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
> - Rick Jones
> these opinions are mine, all mine; HP might not want them anyway... :)
> feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

Thanks very much.

Yong Huang

Rick Jones

unread,

Mar 7, 2007, 4:04:40 PM3/7/07

to

>> Do you see the same things running a basic netperf TCP_STREAM test?

> Will research. I guess that benchmark eliminates some irrelevant
> factors. Like what?

Well, no use of filesystem - that was what I was thinking about when I
was asking about the queue versus the window.

Also, netperf makes a getsockopt() call after the setsockopt() to
catch situations where setsockopt() may have silently truncated the
setting, or as often happens on Linux, makes it even larger.

Finally, being the netperf contributing editor I am compelled to
promote it whenever I can :)

rick jones

Another question is whether the window shown by netstat is the
remote's receive window, or the locally calculated congestion window.
If the latter,then one might wonder about congestion window clamps
and/or the effect of retransmitted data.

--
web2.0 n, the dot.com reunion tour...