I'm testing a server I'm writing under heavy load and I'm getting reset problems
in the clients. I don't understand why. They connect fine without error, but
when I send data, they return a WSAECONNRESET.
What happens when the server's backlog gets full? I would think a client
connect should return WSAECONNREFUSED. I had this problem at first, but then
turned on SynAttackProtect and that helped quite a bit. But now with me pushing
it further, I'm getting loads of resets en masse on ALL connections.
Is it a behavior of the microsoft stack to dump all connections that came in on
a listening socket when the backlog becomes full? This note I found seems to
imply this:
http://www.web-polygraph.org/mail-archive/users/200007/0092.html
Special registry settings are:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
SynAttackProtect = 1
TcpMaxHalfOpen = 0x100
TcpMaxHalfOpenRetried = 0x80
TcpMaxConnectResponseRetransmissions = 2
Whether I have SynAttackProtect at 1 or 2, the dumping still happens.
--
David Gravereaux <davy...@pobox.com>
[species: human; planet: earth,milkyway(western spiral arm),alpha sector]
> I'm testing a server I'm writing under heavy load and I'm getting reset problems
> in the clients. I don't understand why. They connect fine without error, but
> when I send data, they return a WSAECONNRESET.
>
> What happens when the server's backlog gets full? I would think a client
> connect should return WSAECONNREFUSED. I had this problem at first, but then
> turned on SynAttackProtect and that helped quite a bit. But now with me pushing
> it further, I'm getting loads of resets en masse on ALL connections.
>
> Is it a behavior of the microsoft stack to dump all connections that came in on
> a listening socket when the backlog becomes full? This note I found seems to
> imply this:
>
> http://www.web-polygraph.org/mail-archive/users/200007/0092.html
>
> Special registry settings are:
>
> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
> SynAttackProtect = 1
> TcpMaxHalfOpen = 0x100
> TcpMaxHalfOpenRetried = 0x80
> TcpMaxConnectResponseRetransmissions = 2
>
> Whether I have SynAttackProtect at 1 or 2, the dumping still happens.
What OS is the server running? WSAECONNRESET *should* mean that the
client sent a RST to the server, and therefore it must be the client's
fault. You should snoop on the TCP connections and confirm that this is
what's happening. If a client reboots or drops its connection and its IP
gets reused, the new machine will deny all knowledge of previous
connections.
DS
WinXP personal. I'll try this on Win2K adv. server tomorrow to see if makes a
difference.
This is all in a test environment. All the clients (from the four client test
machines) all get valid connects. It's when they send, they get an RST back.
This is the server's fault -- non-graceful dump! Why? The server is at 100%
CPU under this load doing around 100 transactions per second. Without any work
asked of the server and nothing sent by the clients, I can clock around 4000 per
second. I'm kinda thinking I might be getting a starvation problem in the
completion thread, but it doesn't explain the RST issue.
If I throttle back the clients and add more machines hosing the server, I still
get the reset problem, too.
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\WinSock
UseDelayedAcceptance = 0
I don't see that listed in
http://www.microsoft.com/windows2000/docs/tcpip2000.doc
>All the clients (from the four client test
>machines) all get valid connects. It's when they send, they get an RST back.
>This is the server's fault -- non-graceful dump! Why?
http://www.cctec.com/maillists/nanog/historical/9806/msg00317.html
"An NT machine (presumably what is running on Microsoft's www and ftp
sites) will issue an RST on an incoming connection when the socket
queue is full."
Not exactly my issue, but close. I get a valid connect, but when the client
sends, the server replies with an RST.
Then in that case, the server is the one closing down those connections.
There is no behaviour in the Windows TCP stack that I'm aware of that
unexpectedly resets an established connection. I'd run a sniffer just to be
certain that the reset is created at the server - it could come from a
firewall or other intervening device, or from two machines with the same IP
address, or some similar situation.
Alun.
~~~~
[Please don't email posters, if a Usenet response is appropriate.]
--
Texas Imperial Software | Try WFTPD, the Windows FTP Server. Find us at
1602 Harvest Moon Place | http://www.wftpd.com or email al...@texis.com
Cedar Park TX 78613-1419 | VISA/MC accepted. NT-based sites, be sure to
Fax/Voice +1(512)258-9858 | read details of WFTPD Pro for XP/2000/NT.
>In article <5qrf7v8h265t9qqu3...@4ax.com>, David Gravereaux
><davy...@pobox.com> wrote:
>>Not exactly my issue, but close. I get a valid connect, but when the client
>>sends, the server replies with an RST.
>
>Then in that case, the server is the one closing down those connections.
>There is no behaviour in the Windows TCP stack that I'm aware of that
>unexpectedly resets an established connection. I'd run a sniffer just to be
>certain that the reset is created at the server - it could come from a
>firewall or other intervening device, or from two machines with the same IP
>address, or some similar situation.
Hi Alun,
I'm wondering if this is part of the SynAttackProtect logic? The AFD accepted
it right away and imparts a delay to winsock and before the AcceptEx completed
in the server the client sent data and got an RST for being too quick?
All theory, no packet sniffer logs to back it up.. Good idea, I'll see what I
can do to trace this with a sniffer.
I think I've figured this out. SynAttackProtect appears to be a bad
option to turn on. "on" I get 86 hits/sec. "off" I get 138 hits/sec
passing through the server from the same client box with the same
settings (a blocking connect). When the front-end is overrun with
requests faster than the server can process (this is the test I'm
interested in, non-blocking connect), the failure with the option
"off" is normal with the clients returning some WSAECONREFUSED for
connects. Unfortunatly, I get a few timeouts, too. At about 20%
WSAECONREFUSED, 1% or less are timeouts.
I guess I could keep testing this 'till i'm blue, but I guess if I'm
at this point, my server code must be working well.
>When the front-end is overrun with
>requests faster than the server can process (this is the test I'm
>interested in, non-blocking connect), the failure with the option
>"off" is normal with the clients returning some WSAECONREFUSED for
>connects. Unfortunatly, I get a few timeouts, too. At about 20%
>WSAECONREFUSED, 1% or less are timeouts.
Let me say that a bit different..
Frontend failure mode with SynAttackProtect on:
- connect() returns no error, but RST sent back for the first send() -- evil.
Frontend failure mode with SynAttackProtect off:
- connect() returns WSAECONREFUSED or lost SYNs for a time-out -- documented.
I'll prefer the second :)