Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

SQL Resets

88 views
Skip to first unread message

CQL Users

unread,
Sep 2, 2009, 9:03:02 AM9/2/09
to
We have a web environment that has multiple WEB fronts hitting a SQL 2000
Server with SP4. (Windows 2003 Server).

We are using the connection pooling, so it's not the maxuserport setting in
the registry.

From time to time, under heavy loads, we are getting resets at the TCP level
on SQL connections from the Web server.

I'll get the RST, ACK flag in the TCP packets and the web application will
throw an exception in the web app, but only under heavy loads.

My question, what would cause the RST, ACK in the TCP packet and what might
I look for to troubleshoot this further?

Erland Sommarskog

unread,
Sep 2, 2009, 6:03:27 PM9/2/09
to
Since I'm in a bit of hurry now, I will only give you a keyword to
search for on Google: SynAttackProtect. It sounds like this is your
issue.


--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

CQL Users

unread,
Sep 2, 2009, 8:40:04 PM9/2/09
to
Thanks Erland, that seems to be the general message we got earlier in the
year, but now we mostly get: Internal Connection Fatal Error.

When I captured the packets, I can see that the reset bit is set to 1 (set).
It doesn't seem to fit to a specific query. We CAN recreate it on our dev
environment, so I will test with your previous post first.

Let me know if I should look elsewhere too. I can test this first thing in
the AM EST.

CQL Users

unread,
Sep 2, 2009, 9:29:01 PM9/2/09
to
Also, is that the best method of just disabling this or should we be upping
other settings and keeping the protection? Like the Queue? or time?

CQL Users

unread,
Sep 3, 2009, 9:29:01 AM9/3/09
to
Tried applying the SynAttackProtect and that didn't seem to touch it. I
mean, we don't have to hammer the system very long, on each occurence with or
without this disabled to have it start throwing connection closures.

Other suggestions?

Ruben Garrigos

unread,
Sep 3, 2009, 10:26:56 AM9/3/09
to
Hi CQL,

Maybe it is not a SynAttack problem but using connection pooling is not always
protecting you from the maxuserport issue. The default value for MaxUserPort
is 5000 and that's pretty low if you have a good amount of connection pools
connecting to the same server. Try to increase MaxUserPort to its maximum
value (65534).

How big is your connection pool (min/max connections) per WEB front? How
many WEB fronts do you have? If your connection pools are decreasing/increasing
their size frequently or if you "overcommit" the number of free ports you
can still have problems. Try to decrease also the TcpTimeWaitDelay value
to 30 seconds to return the ports faster and check if it makes any difference.


Can you paste here the result of a "netstat -n" command on your database
server?

Regards,

Rub�n Garrig�s
Solid Quality Mentors

Blog: http://blogs.solidq.com/es/elrincondeldba

CQL Users

unread,
Sep 3, 2009, 11:05:01 AM9/3/09
to
Thx for the suggestions. I have currently 2 WEB fronts but am putting the
3rd into production shortly. I assume you are asking me to up it on the SQL
side.

When do you want the netstat -n command? During a hammer session or
anytime? I'll be giving you everything from my dev environment as we can
reproduce it on that with a single Web Front and a single SQL back end.

"Ruben Garrigos" wrote:

> Hi CQL,
>
> Maybe it is not a SynAttack problem but using connection pooling is not always
> protecting you from the maxuserport issue. The default value for MaxUserPort
> is 5000 and that's pretty low if you have a good amount of connection pools
> connecting to the same server. Try to increase MaxUserPort to its maximum
> value (65534).
>
> How big is your connection pool (min/max connections) per WEB front? How
> many WEB fronts do you have? If your connection pools are decreasing/increasing
> their size frequently or if you "overcommit" the number of free ports you
> can still have problems. Try to decrease also the TcpTimeWaitDelay value
> to 30 seconds to return the ports faster and check if it makes any difference.
>
>
> Can you paste here the result of a "netstat -n" command on your database
> server?
>
> Regards,
>

> Rubén Garrigós

Ruben Garrigos

unread,
Sep 3, 2009, 11:37:08 AM9/3/09
to
Hi CQL,

Have you tested the development environment with MaxUserPort=64534, TcpTimeWaitDelay=30
and SynAttackProtect=0 on the SQL Server side? I don't remember if a machine
restart is required.. better restart the machine to be sure that the TCP/IP
stack is reading the new parameters. If with these parameters it still happens,
please send the netstat -n results during a hammer session on the SQL Server
side.

Regards,

Rub�n Garrig�s
Solid Quality Mentors

Blog: http://blogs.solidq.com/es/elrincondeldba

> Thx for the suggestions. I have currently 2 WEB fronts but am putting


> the 3rd into production shortly. I assume you are asking me to up it
> on the SQL side.
>
> When do you want the netstat -n command? During a hammer session or
> anytime? I'll be giving you everything from my dev environment as we
> can reproduce it on that with a single Web Front and a single SQL back
> end.
>
> "Ruben Garrigos" wrote:
>
>> Hi CQL,
>>
>> Maybe it is not a SynAttack problem but using connection pooling is
>> not always protecting you from the maxuserport issue. The default
>> value for MaxUserPort is 5000 and that's pretty low if you have a
>> good amount of connection pools connecting to the same server. Try to
>> increase MaxUserPort to its maximum value (65534).
>>
>> How big is your connection pool (min/max connections) per WEB front?
>> How many WEB fronts do you have? If your connection pools are
>> decreasing/increasing their size frequently or if you "overcommit"
>> the number of free ports you can still have problems. Try to decrease
>> also the TcpTimeWaitDelay value to 30 seconds to return the ports
>> faster and check if it makes any difference.
>>
>> Can you paste here the result of a "netstat -n" command on your
>> database server?
>>
>> Regards,
>>

>> Rub�n Garrig�s

CQL Users

unread,
Sep 3, 2009, 11:49:01 AM9/3/09
to
I just finished testing with the WEB having maxuserport=40000 and
tcptimedwaitdelay=30 seconds. I did NOT set the SQL Side.

After about 7,000 - 13,000 packets, we started to see resets.

That's about how long it takes when I didn't have those entries as well.

I've also tried without setting the WEB side and only setting the SQL side.

I have removed the entries and rebooted the server so it's back to it's
original config and I'll hammer again and get you a netstat.

"Ruben Garrigos" wrote:

> Hi CQL,
>

> Have you tested the development environment with MaxUserPort=64534, TcpTimeWaitDelay=30
> and SynAttackProtect=0 on the SQL Server side? I don't remember if a machine
> restart is required.. better restart the machine to be sure that the TCP/IP
> stack is reading the new parameters. If with these parameters it still happens,
> please send the netstat -n results during a hammer session on the SQL Server
> side.
>
> Regards,
>

> Rubén Garrigós

> >> Rubén Garrigós

CQL Users

unread,
Sep 3, 2009, 12:22:01 PM9/3/09
to
Well, we are actually trying to test two different apps right now to see if
it's codebased, but...

Basically, I have captured several nbtstat's with us drilling different
apps. On one site, I had 10 active/established connections to the DB an no
errors were thrown after over 50,000 packets.

On Site 2, I had about 20 active/established connections and threw errors
after about 7,000 packets.

On Site 3, I had about 15 active/established connections and threw errors
after about 4,000 packets.

TCP 192.168.1.10:445 172.40.124.137:61478 ESTABLISHED
TCP 192.168.1.10:445 172.40.124.167:2158 ESTABLISHED
TCP 192.168.1.10:1262 192.168.1.51:1025 ESTABLISHED
TCP 192.168.1.10:1320 192.168.1.51:135 ESTABLISHED
TCP 192.168.1.10:1321 192.168.1.51:1025 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.135:1777 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.135:1819 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.135:4747 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.135:4806 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.142:1864 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.142:2505 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.142:2541 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.142:2543 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.142:2577 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.142:4306 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.160:51472 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.160:52874 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.160:52879 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.167:1356 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.167:1606 ESTABLISHED
TCP 192.168.1.10:1433 172.40.124.167:1607 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.48:3071 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1068 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1145 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1230 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1480 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1481 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1483 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1517 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1582 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1584 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1596 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1597 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1598 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1599 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1600 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1602 ESTABLISHED
TCP 192.168.1.10:1433 192.168.1.60:1603 ESTABLISHED
TCP 192.168.1.10:2057 172.40.124.142:2185 ESTABLISHED
TCP 192.168.1.10:2057 172.40.124.142:2187 ESTABLISHED
TCP 192.168.1.10:2057 192.168.1.48:3133 ESTABLISHED
TCP 192.168.1.10:2057 192.168.1.48:3259 ESTABLISHED
TCP 192.168.1.10:2057 192.168.1.48:3701 ESTABLISHED
TCP 192.168.1.10:2057 192.168.1.48:3702 ESTABLISHED
TCP 192.168.1.10:2057 192.168.1.48:3703 ESTABLISHED
TCP 192.168.1.10:3389 172.40.124.137:58308 ESTABLISHED
TCP 192.168.201.105:1025 192.168.201.39:3260 ESTABLISHED


"Ruben Garrigos" wrote:

> Hi CQL,
>

> Have you tested the development environment with MaxUserPort=64534, TcpTimeWaitDelay=30
> and SynAttackProtect=0 on the SQL Server side? I don't remember if a machine
> restart is required.. better restart the machine to be sure that the TCP/IP
> stack is reading the new parameters. If with these parameters it still happens,
> please send the netstat -n results during a hammer session on the SQL Server
> side.
>
> Regards,
>

> Rubén Garrigós

> >> Rubén Garrigós

CQL Users

unread,
Sep 3, 2009, 2:55:01 PM9/3/09
to
More on the test results front.

Tested using a "Multi-User" hammer test, where we hammer the web server with
over 400 concurrent users and generate over 150,000 packets of SQL data with
no errors occuring.

We use Xenu crawler on the site and show only 1 additional user connection
and with only about 7,000 - 8,000 packets generated between the web and SQL
Server, we begin to see the [RST,ACK] occuring.

We are going to try and drop the pool limit in the connection string to 10
and see if we can force the error to happen even quicker.

"Ruben Garrigos" wrote:

> Hi CQL,
>

> Have you tested the development environment with MaxUserPort=64534, TcpTimeWaitDelay=30
> and SynAttackProtect=0 on the SQL Server side? I don't remember if a machine
> restart is required.. better restart the machine to be sure that the TCP/IP
> stack is reading the new parameters. If with these parameters it still happens,
> please send the netstat -n results during a hammer session on the SQL Server
> side.
>
> Regards,
>

> Rubén Garrigós

> >> Rubén Garrigós

CQL Users

unread,
Sep 3, 2009, 3:29:01 PM9/3/09
to
One more testing update:

We disabled connection pooling on the app and left the default 5,000 user
ports. The site ran slower, but never threw nad error. Which makes us
believe that the timing is off due to the 5,000 user ports? We have the
connection pools set to their min/max settings of 0 and 100. We've tested
with the max set to 10 and it errored at about 7000 packets as usual.

Suggestions?

"Ruben Garrigos" wrote:

> Hi CQL,
>

> Have you tested the development environment with MaxUserPort=64534, TcpTimeWaitDelay=30
> and SynAttackProtect=0 on the SQL Server side? I don't remember if a machine
> restart is required.. better restart the machine to be sure that the TCP/IP
> stack is reading the new parameters. If with these parameters it still happens,
> please send the netstat -n results during a hammer session on the SQL Server
> side.
>
> Regards,
>

> Rubén Garrigós

> >> Rubén Garrigós

Ruben Garrigos

unread,
Sep 3, 2009, 8:35:27 PM9/3/09
to
Hi CQL,

What is different on site 1? Are you using different network protocol (named
pipes, tcp...) or provider (odbc, oledb,..) to access the database?

Regards,

RubīŋŊn GarrigīŋŊs
Solid Quality Mentors

Blog: http://blogs.solidq.com/es/elrincondeldba

> Well, we are actually trying to test two different apps right now to

>> RubīŋŊn GarrigīŋŊs

>>>> RubīŋŊn GarrigīŋŊs

Ruben Garrigos

unread,
Sep 3, 2009, 8:37:17 PM9/3/09
to
Hi CQL,

If you disable connection pooling and the app ran slower maybe you aren't
hitting any limit (TCP/IP or other). Have you had a look at the SQL Server
Error Log? Maybe there you can find a clue about the reset (if it is caused
by SQL Server). Anything on the Windows Error Log? The maximum amount of
connections that SQL Server 2000 can handle is 32,767 but I think that you
are far from that when using the connection pooling. Which error receive
your application when the connection is reset?

More ideas... maybe a buggy network card driver can be the problem. Can you
check if disabling all kind of TCP offloads (TOE) on the network card helps?
Different switches/routers on each Site?

Regards,

RubīŋŊn GarrigīŋŊs
Solid Quality Mentors

Blog: http://blogs.solidq.com/es/elrincondeldba

> One more testing update:


>
> We disabled connection pooling on the app and left the default 5,000
> user ports. The site ran slower, but never threw nad error. Which
> makes us believe that the timing is off due to the 5,000 user ports?
> We have the connection pools set to their min/max settings of 0 and
> 100. We've tested with the max set to 10 and it errored at about 7000
> packets as usual.
>
> Suggestions?
>
> "Ruben Garrigos" wrote:
>
>> Hi CQL,
>>
>> Have you tested the development environment with MaxUserPort=64534,
>> TcpTimeWaitDelay=30 and SynAttackProtect=0 on the SQL Server side? I
>> don't remember if a machine restart is required.. better restart the
>> machine to be sure that the TCP/IP stack is reading the new
>> parameters. If with these parameters it still happens, please send
>> the netstat -n results during a hammer session on the SQL Server
>> side.
>>
>> Regards,
>>

>> RubīŋŊn GarrigīŋŊs

>>>> RubīŋŊn GarrigīŋŊs

CQL Users

unread,
Sep 3, 2009, 9:34:11 PM9/3/09
to
First, thanks for sticking with this.

What's different between site 1 and site 2 is different code. We've also
tried running WAPT to load one of the failing sites on a page by page bases.
Loading the total number of concurrent connections and amount of data flowing
through the sites.

Nothing.

"Ruben Garrigos" wrote:

> Hi CQL,
>

> What is different on site 1? Are you using different network protocol (named
> pipes, tcp...) or provider (odbc, oledb,..) to access the database?
>
> Regards,
>

> Rubén Garrigós

> >> Rubén Garrigós

> >>>> Rubén Garrigós

CQL Users

unread,
Sep 3, 2009, 9:39:01 PM9/3/09
to
Well, the environments are completely different on production and in
development, but we can make it happen at both. Right now, the only thing we
can initiate with is the Xenu crawler. With the default values set with the
connection pool disabled. (5000 ports) we haven't had Xenu throw the errors.

When we turn on connection pooling, it loads up pretty fast to throw the
errors. The NIC's and switches are different in both environments.

It's really strange. I think it might be throwing some form of infinite
loop and ramping up realy quick in the pool. The RST's that I am seeing are
actually on the SQL server. so I know the traffic is hitting the server. I
am not seeing errors thrown in either the SQL Logs or the Web Logs. Just the
exceptions that we are capturing saying the connection was unexpectantly
reset.

I tend to believe it has something to do with the connection pooling. Is
there more detailed logging I can set on the SQL environment?

Any other suggestions/things to try?

"Ruben Garrigos" wrote:

> Hi CQL,
>

> If you disable connection pooling and the app ran slower maybe you aren't
> hitting any limit (TCP/IP or other). Have you had a look at the SQL Server
> Error Log? Maybe there you can find a clue about the reset (if it is caused
> by SQL Server). Anything on the Windows Error Log? The maximum amount of
> connections that SQL Server 2000 can handle is 32,767 but I think that you
> are far from that when using the connection pooling. Which error receive
> your application when the connection is reset?
>
> More ideas... maybe a buggy network card driver can be the problem. Can you
> check if disabling all kind of TCP offloads (TOE) on the network card helps?
> Different switches/routers on each Site?
>
> Regards,
>

> Rubén Garrigós


> Solid Quality Mentors
>
> Blog: http://blogs.solidq.com/es/elrincondeldba
>
> > One more testing update:
> >
> > We disabled connection pooling on the app and left the default 5,000
> > user ports. The site ran slower, but never threw nad error. Which
> > makes us believe that the timing is off due to the 5,000 user ports?
> > We have the connection pools set to their min/max settings of 0 and
> > 100. We've tested with the max set to 10 and it errored at about 7000
> > packets as usual.
> >
> > Suggestions?
> >
> > "Ruben Garrigos" wrote:
> >
> >> Hi CQL,
> >>
> >> Have you tested the development environment with MaxUserPort=64534,
> >> TcpTimeWaitDelay=30 and SynAttackProtect=0 on the SQL Server side? I
> >> don't remember if a machine restart is required.. better restart the
> >> machine to be sure that the TCP/IP stack is reading the new
> >> parameters. If with these parameters it still happens, please send
> >> the netstat -n results during a hammer session on the SQL Server
> >> side.
> >>
> >> Regards,
> >>

> >> Rubén Garrigós

> >>>> Rubén Garrigós

CQL Users

unread,
Sep 3, 2009, 9:49:01 PM9/3/09
to
Part of the exception error:

System.InvalidOperationException: Internal connection fatal error.
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior,
SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet
bulkCopyHandler, TdsParserStateObject stateObj)


"Ruben Garrigos" wrote:

> Hi CQL,
>

> If you disable connection pooling and the app ran slower maybe you aren't
> hitting any limit (TCP/IP or other). Have you had a look at the SQL Server
> Error Log? Maybe there you can find a clue about the reset (if it is caused
> by SQL Server). Anything on the Windows Error Log? The maximum amount of
> connections that SQL Server 2000 can handle is 32,767 but I think that you
> are far from that when using the connection pooling. Which error receive
> your application when the connection is reset?
>
> More ideas... maybe a buggy network card driver can be the problem. Can you
> check if disabling all kind of TCP offloads (TOE) on the network card helps?
> Different switches/routers on each Site?
>
> Regards,
>

> Rubén Garrigós


> Solid Quality Mentors
>
> Blog: http://blogs.solidq.com/es/elrincondeldba
>
> > One more testing update:
> >
> > We disabled connection pooling on the app and left the default 5,000
> > user ports. The site ran slower, but never threw nad error. Which
> > makes us believe that the timing is off due to the 5,000 user ports?
> > We have the connection pools set to their min/max settings of 0 and
> > 100. We've tested with the max set to 10 and it errored at about 7000
> > packets as usual.
> >
> > Suggestions?
> >
> > "Ruben Garrigos" wrote:
> >
> >> Hi CQL,
> >>
> >> Have you tested the development environment with MaxUserPort=64534,
> >> TcpTimeWaitDelay=30 and SynAttackProtect=0 on the SQL Server side? I
> >> don't remember if a machine restart is required.. better restart the
> >> machine to be sure that the TCP/IP stack is reading the new
> >> parameters. If with these parameters it still happens, please send
> >> the netstat -n results during a hammer session on the SQL Server
> >> side.
> >>
> >> Regards,
> >>

> >> Rubén Garrigós

> >>>> Rubén Garrigós

Ruben Garrigos

unread,
Sep 4, 2009, 4:06:06 AM9/4/09
to
Hi CQL,

Let's try another network configuration. This time we will try to increase
the backlog of the Winsock library that SQL Server uses to listen for connections.
The default backlog for SQL Server is 5. That means that the maximum pending
connection sockets you can hold is five. Pending connections shouldn't be
"on hold" for a long time but maybe the stress test push the server kernel
CPU pretty high and force the situation.

Try to increase in a small amount (5 by 5) the value of the WinsockListenBacklog
parameter at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\MSSQLServer\SuperSocketNetLib
or
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\<Instance Name>\MSSQLServer\SuperSocketNetLib.
You need to restart the SQL Server instance after changing this parameter.

Anyway, it is possible that the load you are pushing is too high for your
architecture and you are generating an effective DoS against your SQL Server.
The differences between the Site 1 and the others can be that Site 1 code
is "slower" hitting the database or its queries are lighter. It is OK to
tune the architecture as much as you can but you will always find a load
point that is "too much" for it. I will recommend you to add a cache layer
if you don't have one to avoid hitting the SQL Server too hard. It is a very
effective solution to increase the performance of a data access layer.


Regards,

RubīŋŊn GarrigīŋŊs
Solid Quality Mentors

Blog: http://blogs.solidq.com/es/elrincondeldba

> Part of the exception error:


>
> System.InvalidOperationException: Internal connection fatal error.
> at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior,
> SqlCommand cmdHandler, SqlDataReader dataStream,
> BulkCopySimpleResultSet
> bulkCopyHandler, TdsParserStateObject stateObj)
>
> "Ruben Garrigos" wrote:
>
>> Hi CQL,
>>
>> If you disable connection pooling and the app ran slower maybe you
>> aren't hitting any limit (TCP/IP or other). Have you had a look at
>> the SQL Server Error Log? Maybe there you can find a clue about the
>> reset (if it is caused by SQL Server). Anything on the Windows Error
>> Log? The maximum amount of connections that SQL Server 2000 can
>> handle is 32,767 but I think that you are far from that when using
>> the connection pooling. Which error receive your application when the
>> connection is reset?
>>
>> More ideas... maybe a buggy network card driver can be the problem.
>> Can you check if disabling all kind of TCP offloads (TOE) on the
>> network card helps? Different switches/routers on each Site?
>>
>> Regards,
>>

>> RubīŋŊn GarrigīŋŊs


>> Solid Quality Mentors
>> Blog: http://blogs.solidq.com/es/elrincondeldba
>>
>>> One more testing update:
>>>
>>> We disabled connection pooling on the app and left the default 5,000
>>> user ports. The site ran slower, but never threw nad error. Which
>>> makes us believe that the timing is off due to the 5,000 user ports?
>>> We have the connection pools set to their min/max settings of 0 and
>>> 100. We've tested with the max set to 10 and it errored at about
>>> 7000 packets as usual.
>>>
>>> Suggestions?
>>>
>>> "Ruben Garrigos" wrote:
>>>
>>>> Hi CQL,
>>>>
>>>> Have you tested the development environment with MaxUserPort=64534,
>>>> TcpTimeWaitDelay=30 and SynAttackProtect=0 on the SQL Server side?
>>>> I don't remember if a machine restart is required.. better restart
>>>> the machine to be sure that the TCP/IP stack is reading the new
>>>> parameters. If with these parameters it still happens, please send
>>>> the netstat -n results during a hammer session on the SQL Server
>>>> side.
>>>>
>>>> Regards,
>>>>

>>>> RubīŋŊn GarrigīŋŊs

>>>>>> RubīŋŊn GarrigīŋŊs

CQL Users

unread,
Sep 4, 2009, 5:54:01 AM9/4/09
to
I will give this a test today.

2 other thoughts. 1, we are not seeing any real loads on the SQL
environment. CPU is less than 20%, memory is holding with a low page file,
network traffic is less than 4%.

I would also expect that if server 1 is starting to throw errors, I would
expect to see errors on Server 2, but I'm not. It looks like it sticks to
one server at a time. If it were SQL that was falling short, I would expect
to see both servers failing at the same time. That's not the case. It's one
server or the other.

(unless the Xenu crawler is hitting both servers at the same time.)

"Ruben Garrigos" wrote:

> Hi CQL,
>

> Let's try another network configuration. This time we will try to increase
> the backlog of the Winsock library that SQL Server uses to listen for connections.
> The default backlog for SQL Server is 5. That means that the maximum pending
> connection sockets you can hold is five. Pending connections shouldn't be
> "on hold" for a long time but maybe the stress test push the server kernel
> CPU pretty high and force the situation.
>
> Try to increase in a small amount (5 by 5) the value of the WinsockListenBacklog
> parameter at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\MSSQLServer\SuperSocketNetLib
> or
> HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\<Instance Name>\MSSQLServer\SuperSocketNetLib.
> You need to restart the SQL Server instance after changing this parameter.
>
> Anyway, it is possible that the load you are pushing is too high for your
> architecture and you are generating an effective DoS against your SQL Server.
> The differences between the Site 1 and the others can be that Site 1 code
> is "slower" hitting the database or its queries are lighter. It is OK to
> tune the architecture as much as you can but you will always find a load
> point that is "too much" for it. I will recommend you to add a cache layer
> if you don't have one to avoid hitting the SQL Server too hard. It is a very
> effective solution to increase the performance of a data access layer.
>
>
> Regards,
>

> Rubén Garrigós


> Solid Quality Mentors
>
> Blog: http://blogs.solidq.com/es/elrincondeldba
>
> > Part of the exception error:
> >
> > System.InvalidOperationException: Internal connection fatal error.
> > at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior,
> > SqlCommand cmdHandler, SqlDataReader dataStream,
> > BulkCopySimpleResultSet
> > bulkCopyHandler, TdsParserStateObject stateObj)
> >
> > "Ruben Garrigos" wrote:
> >
> >> Hi CQL,
> >>
> >> If you disable connection pooling and the app ran slower maybe you
> >> aren't hitting any limit (TCP/IP or other). Have you had a look at
> >> the SQL Server Error Log? Maybe there you can find a clue about the
> >> reset (if it is caused by SQL Server). Anything on the Windows Error
> >> Log? The maximum amount of connections that SQL Server 2000 can
> >> handle is 32,767 but I think that you are far from that when using
> >> the connection pooling. Which error receive your application when the
> >> connection is reset?
> >>
> >> More ideas... maybe a buggy network card driver can be the problem.
> >> Can you check if disabling all kind of TCP offloads (TOE) on the
> >> network card helps? Different switches/routers on each Site?
> >>
> >> Regards,
> >>

> >> Rubén Garrigós


> >> Solid Quality Mentors
> >> Blog: http://blogs.solidq.com/es/elrincondeldba
> >>
> >>> One more testing update:
> >>>
> >>> We disabled connection pooling on the app and left the default 5,000
> >>> user ports. The site ran slower, but never threw nad error. Which
> >>> makes us believe that the timing is off due to the 5,000 user ports?
> >>> We have the connection pools set to their min/max settings of 0 and
> >>> 100. We've tested with the max set to 10 and it errored at about
> >>> 7000 packets as usual.
> >>>
> >>> Suggestions?
> >>>
> >>> "Ruben Garrigos" wrote:
> >>>
> >>>> Hi CQL,
> >>>>
> >>>> Have you tested the development environment with MaxUserPort=64534,
> >>>> TcpTimeWaitDelay=30 and SynAttackProtect=0 on the SQL Server side?
> >>>> I don't remember if a machine restart is required.. better restart
> >>>> the machine to be sure that the TCP/IP stack is reading the new
> >>>> parameters. If with these parameters it still happens, please send
> >>>> the netstat -n results during a hammer session on the SQL Server
> >>>> side.
> >>>>
> >>>> Regards,
> >>>>

> >>>> Rubén Garrigós

> >>>>>> Rubén Garrigós

Ruben Garrigos

unread,
Sep 4, 2009, 6:15:01 AM9/4/09
to
Hi CQL,

I think that you should check also other performance counters that are more
SQL Server specific. Maybe the CPU load is not the problem now but you can
be suffering from memory bottleneck or problems with excesive locking or...
anything else.

You can define a performance counter baseline with some interesting counters
like these:

\\ServerName\Memory\Available MBytes
\\ServerName\Memory\Page Faults/sec
\\ServerName\Memory\Page Reads/sec
\\ServerName\Memory\Page Writes/sec
\\ServerName\Memory\Pages/sec

\\ServerName\PhysicalDisk\Avg. Disk Queue Length
\\ServerName\PhysicalDisk\Avg. Disk Read Queue Length
\\ServerName\PhysicalDisk\Avg. Disk sec/Read
\\ServerName\PhysicalDisk\Avg. Disk sec/Write
\\ServerName\PhysicalDisk\Avg. Disk Write Queue Length
\\ServerName\PhysicalDisk\Current Disk Queue Length

\\ServerName\Processor(_Total)\% Interrupt Time
\\ServerName\Processor(_Total)\% Privileged Time
\\ServerName\Processor(_Total)\% Processor Time
\\ServerName\Processor(_Total)\Interrupts/sec

\\ServerName\SQLServer:Access Methods\Full Scans/sec
\\ServerName\SQLServer:Access Methods\Index Searches/sec
\\ServerName\SQLServer:Access Methods\Page Deallocations/sec
\\ServerName\SQLServer:Access Methods\Page Splits/sec
\\ServerName\SQLServer:Access Methods\Pages Allocated/sec

\\ServerName\SQLServer:Buffer Manager\Buffer cache hit ratio
\\ServerName\SQLServer:Buffer Manager\Checkpoint pages/sec
\\ServerName\SQLServer:Buffer Manager\Database pages
\\ServerName\SQLServer:Buffer Manager\Free pages
\\ServerName\SQLServer:Buffer Manager\Lazy writes/sec
\\ServerName\SQLServer:Buffer Manager\Page life expectancy
\\ServerName\SQLServer:Buffer Manager\Page lookups/sec
\\ServerName\SQLServer:Buffer Manager\Page reads/sec
\\ServerName\SQLServer:Buffer Manager\Page writes/sec

\\ServerName\SQLServer:Databases(_Total)\Active Transactions
\\ServerName\SQLServer:Databases(_Total)\Log Cache Hit Ratio
\\ServerName\SQLServer:Databases(_Total)\Transactions/sec

\\ServerName\SQLServer:General Statistics\Active Temp Tables
\\ServerName\SQLServer:General Statistics\Logins/sec
\\ServerName\SQLServer:General Statistics\Logouts/sec
\\ServerName\SQLServer:General Statistics\Temp Tables Creation Rate
\\ServerName\SQLServer:General Statistics\Temp Tables For Destruction
\\ServerName\SQLServer:General Statistics\Transactions
\\ServerName\SQLServer:General Statistics\User Connections

\\ServerName\SQLServer:Latches\Average Latch Wait Time (ms)
\\ServerName\SQLServer:Latches\Latch Waits/sec
\\ServerName\SQLServer:Latches\Total Latch Wait Time (ms)

\\ServerName\SQLServer:Locks(_Total)\Average Wait Time (ms)
\\ServerName\SQLServer:Locks(_Total)\Lock Requests/sec
\\ServerName\SQLServer:Locks(_Total)\Lock Timeouts/sec
\\ServerName\SQLServer:Locks(_Total)\Lock Wait Time (ms)
\\ServerName\SQLServer:Locks(_Total)\Lock Waits/sec
\\ServerName\SQLServer:Locks(_Total)\Number of Deadlocks/sec

\\ServerName\SQLServer:Memory Manager\Connection Memory (KB)
\\ServerName\SQLServer:Memory Manager\Memory Grants Outstanding
\\ServerName\SQLServer:Memory Manager\Memory Grants Pending
\\ServerName\SQLServer:Memory Manager\SQL Cache Memory (KB)
\\ServerName\SQLServer:Memory Manager\Target Server Memory (KB)

\\ServerName\SQLServer:Plan Cache(Temporary Tables & Table Variables)\Cache
Hit Ratio
\\ServerName\SQLServer:Plan Cache(SQL Plans)\Cache Hit Ratio
\\ServerName\SQLServer:Plan Cache(Object Plans)\Cache Hit Ratio
\\ServerName\SQLServer:Plan Cache(_Total)\Cache Hit Ratio

\\ServerName\SQLServer:SQL Errors(_Total)\Errors/sec
\\ServerName\SQLServer:SQL Statistics\Batch Requests/sec
\\ServerName\SQLServer:SQL Statistics\SQL Compilations/sec
\\ServerName\SQLServer:SQL Statistics\SQL Re-Compilations/sec

\\ServerName\SQLServer:Transactions\Longest Transaction Running Time
\\ServerName\SQLServer:Transactions\Transactions

\\ServerName\SQLServer:Wait Statistics(Average wait time (ms))\Lock waits
\\ServerName\SQLServer:Wait Statistics(Average wait time (ms))\Network IO
waits
\\ServerName\SQLServer:Wait Statistics(Average wait time (ms))\Page IO latch
waits

You can also decide to capture more performance counter categories to be
able to analyze more data after the tests. Then, stress the SQL Server and
check if some of the counters are skyrocketing comparing them with a previous
"regular" workload baseline. For example very high Lock Wait Time should
be investigated.

Regards,

Rub�n Garrig�s
Solid Quality Mentors

Blog: http://blogs.solidq.com/es/elrincondeldba

> other thoughts. 1, we are not seeing any real loads on the SQL

CQL Users

unread,
Sep 4, 2009, 3:08:06 PM9/4/09
to
Well,

We are down to the wire. I think we are going to have to open a support
case. I don't believe this to be a SQL Server side issue. I think it either
is a configuration thing on the WEB server side or in the .NET
library/communication TDS side. I really don't know how to narrow it down
any further than those errors we sent you. Can you think of any way to
capture more logging on the SQL Side to understand why it's being reset?

"Ruben Garrigos" wrote:

> Rubén Garrigós

Ruben Garrigos

unread,
Sep 4, 2009, 6:15:04 PM9/4/09
to
Hi CQL,

I think you are right. Maybe it is time to have somebody onsite from SQL
Server Support or from another SQL Server experts provider.

Regards,

Rub�n Garrig�s
Solid Quality Mentors

Blog: http://blogs.solidq.com/es/elrincondeldba

> Well,

>> Rub�n Garrig�s

CQL Users

unread,
Sep 4, 2009, 9:13:01 PM9/4/09
to
If you have any other suggestions/areas to check out, I'd appreciate the
input. Thanks for your assistance!

"Ruben Garrigos" wrote:

> Hi CQL,
>

> I think you are right. Maybe it is time to have somebody onsite from SQL
> Server Support or from another SQL Server experts provider.
>
> Regards,
>

> Rubén Garrigós

> >> Rubén Garrigós

Ruben Garrigos

unread,
Sep 5, 2009, 10:18:18 AM9/5/09
to
Hi CQL,

Unfortunately your problem seems not trivial to solve. We have tested the
more problable causes without success. More work in solving that can be done
but I think that is not something we can do through email. I think that an
onsite expert consulting work is more appropiate.

Regards,

RubīŋŊn GarrigīŋŊs
Solid Quality Mentors

Blog: http://blogs.solidq.com/es/elrincondeldba

> If you have any other suggestions/areas to check out, I'd appreciate


> the input. Thanks for your assistance!
>
> "Ruben Garrigos" wrote:
>
>> Hi CQL,
>>
>> I think you are right. Maybe it is time to have somebody onsite from
>> SQL Server Support or from another SQL Server experts provider.
>>
>> Regards,
>>

>> RubīŋŊn GarrigīŋŊs

>>>> RubīŋŊn GarrigīŋŊs

0 new messages