[AOLSERVER] Tuning Question

Visto 16 veces
Saltar al primer mensaje no leído

William Scott Jordan

no leída,
1 may 2009, 12:55:441/5/09
a AOLS...@listserv.aol.com
Hey all!

We're having an issue under high loads that's causing AOLserver to stop
taking new connections and I wanted to see if anyone could point me in
the right direction for diagnosing the problem.

We're running AOLserver 4.5 on CentOS 5.2 with the following connection
config:

ns_section "ns/server/main"
ns_param connsperthread 0
ns_param maxthreads 500
ns_param minthreads 10
ns_param threadtimeout 120
ns_param maxconnections 100
ns_param spread 20

The DB pool is set up as follows:

ns_section "ns/db/pool/main"
ns_param Connections 500
ns_param Datasource "$databaselocation:$databasename"
ns_param Driver postgres
ns_param LogSQLErrors Off
ns_param MaxIdle 600
ns_param MaxOpen 3600

Watching the system stats under load, the web server is only hitting
about 5% CPU usage and has ample free RAM. The DB server is only at
about 3% capacity.

The problem we're seeing is that the web servers are maxing out at about
200-250 simultaneous connections, and all additional connections are
being rejected.

Any guesses on why we don't have 500 connections available, even though
that's what's indicated in ns/server/main? Under 3.x, I remember that
there was an option to have queued pending connections. I don't see
that option in the docs for 4.5. Does that functionality still exist?

Any suggestions would be greatly appreciated!

-William


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <list...@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.

Dossy Shiobara

no leída,
1 may 2009, 13:55:571/5/09
a AOLS...@listserv.aol.com
On 5/1/09 12:55 PM, William Scott Jordan wrote:
> The problem we're seeing is that the web servers are maxing out at about
> 200-250 simultaneous connections, and all additional connections are
> being rejected.

Just a wild guess - does each request use two database handles?

Are you ensuring that database handles are ALWAYS being returned (i.e.,
[ns_db releasehandle $h])? Does your server log show any uncaught Tcl
errors that might preempt the code that would normally perform the
releasehandle?

I often use this pattern in my code:

if {[catch {
set db [ns_db gethandle poolname]

# database ops here

ns_db releasehandle $db
} err]} {
if {[info exists db]} {
ns_db releasehandle $db
unset db
}
}

Of course, you probably want to refactor this into a proc, but you get
the idea.

--
Dossy Shiobara | do...@panoptic.com | http://dossy.org/
Panoptic Computer Network | http://panoptic.com/
"He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on." (p. 70)

Mark Mcgaha

no leída,
1 may 2009, 14:05:151/5/09
a AOLS...@listserv.aol.com
Your nssock module needs to have maxsock set
ns_param maxsock 500

maxrun and maxthreads needs to be set with ns_limits and ns_pools
ns_limits set default -maxrun 500
ns_pools set default -maxthreads 500

William Scott Jordan

no leída,
1 may 2009, 18:17:261/5/09
a AOLS...@listserv.aol.com
Hi Dossy,

Thanks for the response on this. Each page uses one database connection
at most and the majority of our pages use none. Database handles are
always returned, and I don't see anything in the logs that would lead me
to believe that they're aren't being released.

We use similar code to what you listed here for grabbing DB connections
and log any connection errors. We're not seeing any DB connection
errors in the logs during these higher load times. If the problem is
related to the DB, I don't think it's because AOLserver is having
trouble getting DB connections.

I think it's worth noting that when AOLserver stops accepting
connections, nslog doesn't log any of the failed requests. New
connections seem to just be rejected outright.

I'm wondering if there's something in the OS itself that's limiting
connections. As near as I can tell, there's nothing in AOLserver's
settings that should be stopping additional connections. I'll dig
around a bit in CentOS to see if I can't find the culprit there.

Thanks!

-William

On 05/01/2009 10:55 AM, Dossy Shiobara wrote:
> On 5/1/09 12:55 PM, William Scott Jordan wrote:
>> The problem we're seeing is that the web servers are maxing out at about
>> 200-250 simultaneous connections, and all additional connections are
>> being rejected.
>
> Just a wild guess - does each request use two database handles?
>
> Are you ensuring that database handles are ALWAYS being returned (i.e.,
> [ns_db releasehandle $h])? Does your server log show any uncaught Tcl
> errors that might preempt the code that would normally perform the
> releasehandle?
>
> I often use this pattern in my code:
>
> if {[catch {
> set db [ns_db gethandle poolname]
>
> # database ops here
>
> ns_db releasehandle $db
> } err]} {
> if {[info exists db]} {
> ns_db releasehandle $db
> unset db
> }
> }
>
> Of course, you probably want to refactor this into a proc, but you get
> the idea.
>


--

Dossy Shiobara

no leída,
1 may 2009, 18:40:021/5/09
a AOLS...@listserv.aol.com
On 5/1/09 6:17 PM, William Scott Jordan wrote:
> I think it's worth noting that when AOLserver stops accepting
> connections, nslog doesn't log any of the failed requests. New
> connections seem to just be rejected outright.

Do you have nscp enabled? Is the control port still responsive to
commands? Can you capture the output of "join [ns_info threads] \n"
from the control port once AOLserver stops accepting connections?

--
Dossy Shiobara | do...@panoptic.com | http://dossy.org/
Panoptic Computer Network | http://panoptic.com/
"He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on." (p. 70)

Eduardo Santos

no leída,
3 may 2009, 12:47:273/5/09
a AOLS...@listserv.aol.com
Just another guess: do you have anything between AOLServer and the connections, such as Apache proxy and other things? Most performance problems in AOLServer we've seen are related to the environment, no to the AOLServer itself.

2009/5/1 Dossy Shiobara <do...@panoptic.com>

Tom Jackson

no leída,
3 may 2009, 13:34:573/5/09
a AOLS...@listserv.aol.com
I would guess there are a number of problems. First is that the number
of threads is way too high compared to the conns per thread. you have a
5:1 ratio, I wouldn't go with anything less than 1:1, and 1:10 would be
even better.

So 10 threads, 100 conns per thread before exit.

Another problem you can't solve is the queue operation. In the cvs head
version, thread wake-up is handled by using Ns_CondSignal instead of
Ns_CondBroadcast. Signal only wakes up one thread at a time, whereas
Broadcast gives every thread a chance to restart execution. In practice
what happens is that the thread which called Signal usually gets woken
up, so it starts executing again. What can happen is that slow requests
get stuck sleeping while one thread grabs quick requests. Essentially
you have on thread working the queue. It is then possible for this
thread to exit after serving all requests, just as the traffic dies
down. You end up with threads waiting to wake up, the driver waiting for
new requests and somehow nothing happening.

One thing you could do, just for testing if requests are still being
processed prior to going into the queue, is to register a prequeue
filter.

Register it to an exact url which will never be called under normal
conditions. When your server gets stuck you can visit the url and see if
it logs something (Just use the prequeue filter to issue an ns_log and
return filter_ok). But note: never use a prequeue filter in Tcl for
anything else. It causes a massive memory leak, probably because a Tcl
interp is created and never destroyed. You get about a 1 virtual server
sized memory leak per request if your request matches a prequeue
filter.

If something is logged, it means that the request was processed by the
driver thread and probably put into the queue to wait for a free conn
thread. If not, it means that something is blocking the driver thread,
or maybe something else is going on.

tom jackson

Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos