Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

TCP Window Scaling not happening -sometimes-

6 views
Skip to first unread message

Dan Stromberg

unread,
Dec 19, 2005, 6:36:17 PM12/19/05
to

Hi folks.

I'm trying to get an AIX 5.1 ML 4 box talking fast over a gigabit network
to a variety of linux hosts. The linux hosts run Fedora, Redhat
Enterprise and CentOS.

I'm finding that -some- of the time, the communications with this AIX box
to various linux boxes get TCP Window Scaling enabled (RFC1323), but other
times, they do not, and there isn't an amazingly clear pattern in the
differences yet other than they always involved the AIX box so far.

This, despite ostensibly having RFC1323 and SACKS enabled on either end of
the communication.

I have a more complete writeup of what we're seeing at:

http://dcs.nac.uci.edu/~strombrg/optiputer-discussion/

The link about TCP windows on the above page is probably the most relevant.

I'm guessing this might be an interoperability problem that linux somehow
fixed and worked around, since Fedora 4 doesn't have the problem, and RH 4
didn't seem to either, but RH 3 and CentOS do.

Or is it possible that some router or switch between the two machines is
messing up the RFC1323 Sometimes?

Any suggestions?

Thanks!

PS: TCP Window Scaling allows your TCP windows to scale past 64K, as was
the max in earlier TCP due to a 16 bit field. A TCP Window, in turn, is a
buffer that holds unacknowledged packets, in case they need to be
retransmitted due to not being acknowledged soon enough/ever.

Allen McIntosh

unread,
Dec 19, 2005, 11:10:41 PM12/19/05
to
> Or is it possible that some router or switch between the two machines is
> messing up the RFC1323 Sometimes?
> Any suggestions?

I looked at the web page, and I couldn't tell easily if
1) window scaling is advertised, but it's zero
or
2) window scaling is not being advertised.

You need to look at the relevant SYN or SYN/ACK packets with something
like ethereal in order to see exactly what is going on.

If you are seeing (1), it's because the default buffer allocations
aren't big enough. I've been able to get 2.4.21-20.0.1.EL (CentOS 3.3)
to do window scaling, but only after some tuning. IIRC if you google
the relevant /etc/sysctl.conf entries you can find lots of advice, some
of which is useful :-)

If you are seeing (2) with 2.4.21-37.EL, then I'm baffled.

Dan Stromberg

unread,
Dec 20, 2005, 1:31:51 PM12/20/05
to
On Mon, 19 Dec 2005 23:10:41 -0500, Allen McIntosh wrote:

>> [quoted text muted]


>
> I looked at the web page, and I couldn't tell easily if
> 1) window scaling is advertised, but it's zero
> or
> 2) window scaling is not being advertised.
>
> You need to look at the relevant SYN or SYN/ACK packets with something
> like ethereal in order to see exactly what is going on.
>
> If you are seeing (1), it's because the default buffer allocations
> aren't big enough. I've been able to get 2.4.21-20.0.1.EL (CentOS 3.3)
> to do window scaling, but only after some tuning. IIRC if you google
> the relevant /etc/sysctl.conf entries you can find lots of advice, some
> of which is useful :-)
>
> If you are seeing (2) with 2.4.21-37.EL, then I'm baffled.

I checked the communication between the AIX box and an RHEL 3 system that
tcptrace was saying had TCP window problems, inspecting the SYN and
SYN/ACK as suggested, using the same tcpdump output file as what I fed to
tcptrace - this time in ethereal.

In both cases, it appears that TCP Window Scaling is enabled, with one
host wanting 2**1, and the other wanting 2**5. Both also had SACK
enabled, which I gather is a prerequisite for TCP Window Scaling.

The Y's and N's in the table (at the previously-posted URl) are what
tcptrace is outputting. I do not know if it is getting those
(fallaciously?) from the SYN and SYN/ACK, or by watching the
characteristics of the connection, as I hear is done for the congestion
window size.

I don't suppose you have a good google keyword at your fingertips? :)

Thanks for the reply!

prg

unread,
Dec 20, 2005, 2:36:14 PM12/20/05
to

Well, not knowing the setup -- TCP variables set -- or the hardware
encountered along the path makes even guessing pretty much a pointless
past time.

Seems:
-- the netstat stats are for an extended period (longer than the
tcpdump) and what stats belong to what traffic is, well, difficult to
discern :-)
-- tcptrace seems not to agree with netstat stats re: windowscaling,
etc.
-- your email chart does not agree with tcptrace output -- presume
chart data was collected from other traces

The tcpdump file can be processed through ethereal to _very_ good
effect in watching the connection setup (where window scaling is
negotiated) as well as a _very_ good filtering system to watch for
particular packets (eg. watching for tcp options, cwnd, ECN, etc.)

I have not had a chance to get RHEL/Centos on my system yet, but did
reconfirm that some of this code in the stack has been changing. Thus
different kernel versions _may_ behave differently. BTW, Centos makes
no changes to the functional RHEL source code: just trademark
cleaning.

As Allen McIntosh said, you will need to at least _look_ at the tcp
variables set in the /proc fs. If you can't get to the box, ask
someone to send you the info. Without it you're pretty much in the
dark. You may have to tweak the values (or have them changed) to
achieve acceptable GigE performance, though it's much better now than
several years ago.

Also note that at GigE speeds, hardware along the path can make a _big_
difference. That _very_ much includes NICs and (especially) the NIC
driver code.

In summary, without the same OS/kernel, OS settings, hardware, and
pathway you are solving for way too many simultaneous variables if you
don't have a good and complete set of data -- both the measured kind,
like ping and traceroute, and the OS tcp settings. Tcpdump is _really_
valuable to distinguish between host problems and pathway problems.

Here are some good links:

TCP Variables
http://ipsysctl-tutorial.frozentux.net/
/usr/share/doc/kernel-doc-2.[-X-]/networking/ip-sysctl.txt
/usr/src/linux-2.[-X-]/Documentation/networking/ip-sysctl.txt

TCP Perf Links
http://www.psc.edu/networking/projects/tcptune/
http://www-didc.lbl.gov/TCP-tuning/TCP-tuning.html
http://www-didc.lbl.gov/TCP-tuning/linux.html
http://www.uninett.no/tcpperf/
http://www.infosyssec.net/infosyssec/netprot1.htm
http://ltp.sourceforge.net/tooltable.php
http://www.csm.ornl.gov/~dunigan/netperf/netlinks.html
http://www.web100.org/

Kernel Netwoking Notes: (as far back as 7/8/04)
NAPI performance
http://lwn.net/Articles/139208/
Pluggable congestion avoidance modules
http://lwn.net/Articles/128062/
TCP window scaling and broken routers
http://lwn.net/Articles/91976/

hth,
prg

Allen McIntosh

unread,
Dec 20, 2005, 7:04:50 PM12/20/05
to
> I don't suppose you have a good google keyword at your fingertips? :)
tcp_rmem
Also, the reference given by prg:
http://www.psc.edu/networking/projects/tcptune/
seems to have the Linux story straight.
0 new messages