We're expanding use of LDAP as a data source, but my expertise lies
heavily on the RDBMS side and how the pooling behaves there. I did some
initial playing around adding a connection pool and am seeing what I kind
of expected, which is constant LDAP connection resets when the pools
validate when connections are idle.
I'm just wondering these kinds of things:
- are pools necessary to get reasonable performance on highly loaded IdPs?
- do they handle failed connections reasonably without ever surfacing them
as actual data connector failures?
- are there ways to maintain connections and avoid the timeouts from the
client end?
- is pool validation even needed, or does it just retry on failures and
handle things gracefully?
I use LDAP as a secondary source right now, with no pooling yet, and it's
basically fine, though I don't know the actual load on the AD servers.
Last thing I want is to add pooling and have it cause failures (which
poorly implemented RDBMS pools do cause).
-- Scott
--
To unsubscribe from this list send an email to users-un...@shibboleth.net
LDAP calls aren't any worse to establish than connections to a
database; one round trip in most cases, more if you're using StartTLS.
So that's not really an issue. You may want to "reserve" server
resources if your LDAP server is under heavy load. So, for your case,
I'd say that you should use connection pooling for LDAP if you used
connection pooling for databases because of the number of rapid
successive calls.
The LDAP data connector does allow you to customize what exactly
happens when a connection is closed unbeknownst to the IdP. The
default is to just retry, I believe. I think the only time you'll
ever see a connection failure is if your servers are actually
unreachable.
In terms of the timeout, there are some LDAP control extensions that
would allow you to adjust the timeout (i.e., the client can tell the
server how long it would like the timeout to be) but as best as I know
AD does not support that.
--
Chad La Joie
www.itumi.biz
trusted identities, delivered
>The LDAP data connector does allow you to customize what exactly
>happens when a connection is closed unbeknownst to the IdP. The
>default is to just retry, I believe. I think the only time you'll
>ever see a connection failure is if your servers are actually
>unreachable.
Is that option documented? I didn't see anything that jumped out. Not all
of the pool options are exposed in the XML, for example, so some of the
validateOnBorrow-type calls aren't exposed.
http://code.google.com/p/vt-middleware/wiki/vtldapProperties
--
Chad La Joie
www.itumi.biz
trusted identities, delivered
The only time I occasionally see a failure is on a test idp that gets
accessed once every few days. It sometimes fails to return any
attributes. Retry works as normal.
Jim
On Thu, 8 Sep 2011, Cantor, Scott wrote:
> Date: Thu, 8 Sep 2011 11:26:12 -0700
> From: "Cantor, Scott" <cant...@osu.edu>
> To: "us...@shibboleth.net" <us...@shibboleth.net>
> Reply-To: Shib Users <us...@shibboleth.net>
> Subject: IdP use of LDAP and connection pooling
Are you using any of the validation options in the pooling element? I see
the retry count defaults to 1 inside the vt-ldap code, so I'm sure no
matter what I do, it's just going to drop the failed connection and retry.
With some cases like that, the problem is if the closed connections hangs
(very common with database pools) but these don't seem to.
Jim
On Thu, 8 Sep 2011, Cantor, Scott wrote:
> Date: Thu, 8 Sep 2011 12:10:14 -0700
> From: "Cantor, Scott" <cant...@osu.edu>
> To: "us...@shibboleth.net" <us...@shibboleth.net>
> Reply-To: Shib Users <us...@shibboleth.net>
> Subject: Re: IdP use of LDAP and connection pooling
Ryan
>A setting that is not specified on that page, but is useful to set, is
>"com.sun.jndi.ldap.read.timeout". It will save you when the IdP can
>connect to the directory server host, but the directory server process
>doesn't reply to the bind (because it's dead, etc). Otherwise, the IdP
>waits until the TCP connection times out, which is way too long.
Great, that's the sort of thing I was looking for.
I added a link in the wiki page to the VT-LDAP docs because I didn't
realize that the generic property syntax supported its settings, I thought
it only handled the JDK stuff.
--
Chad La Joie
www.itumi.biz
trusted identities, delivered
I'm just wondering these kinds of things:
- are pools necessary to get reasonable performance on highly loaded IdPs?
- do they handle failed connections reasonably without ever surfacing them
as actual data connector failures?
- are there ways to maintain connections and avoid the timeouts from the
client end?
- is pool validation even needed, or does it just retry on failures and
handle things gracefully?
On 9/8/11 3:52 PM, "Daniel Fisher" <dfi...@vt.edu> wrote:
>
>- are there ways to maintain connections and avoid the timeouts from the
>client end?
>
>You should be able to configure periodic validation to guarantee your
>connections are always alive.
With AD, I'm seeing the connections close on the order of minutes, if not
sooner. That doesn't seem to fit with that strategy, so perhaps AD is
different.
I'll probably have to just try it under load and see how they behave. Even
if they close fast, constant use would keep them open.
On 9/8/11 3:02 PM, "Jim Fox" <f...@washington.edu> wrote:Are you using any of the validation options in the pooling element? I see
>
>We use connection pooling for all our accesses to LDAP. We use TLS, and
>the overhead of starting up a new session on each query seemed excessive
>to me. Our openldap servers keep the sessions open all day.
the retry count defaults to 1 inside the vt-ldap code, so I'm sure no
matter what I do, it's just going to drop the failed connection and retry.
With some cases like that, the problem is if the closed connections hangs
(very common with database pools) but these don't seem to.
Thanks...
With AD, I'm seeing the connections close on the order of minutes, if not
On 9/8/11 3:52 PM, "Daniel Fisher" <dfi...@vt.edu> wrote:
>
>- are there ways to maintain connections and avoid the timeouts from the
>client end?
>
>You should be able to configure periodic validation to guarantee your
>connections are always alive.
sooner. That doesn't seem to fit with that strategy, so perhaps AD is
different.
I'll probably have to just try it under load and see how they behave. Even
if they close fast, constant use would keep them open.
<resolver:DataConnector
id="myLDAP"
xsi:type="LDAPDirectory" xmlns="urn:mace:shibboleth:2.0:resolver:dc"
ldapURL="ldap://blah.blah.blah"
baseDN="o=blah,c=US"
principal="cn=blah,ou=blabbityblah,o=blahblah,c=US"
principalCredential="blahblah" >
<FilterTemplate>
<![CDATA[
(uid=$requestContext.principalName)
]]>
</FilterTemplate>
<ReturnAttributes>blah1 blah2 blah3</ReturnAttributes>
<LDAPProperty name="edu.vt.middleware.ldap.operationRetry" value="3" />
<LDAPProperty name="edu.vt.middleware.ldap.operationRetryWait" value="300" />
</resolver:DataConnector>
yuji
----
> --
> To unsubscribe from this list send an email to users-un...@shibboleth.net
----
Yuji Shinozaki
Technical Director, SHANTI
University of Virginia
http://shanti.virginia.edu
434-924-7171
ys...@virginia.edu
----
"Computers are useless. They only give you answers". --Pablo Picasso
There's also a load balancer involved (per your other comment) that could
be affecting it.
One thing that didn't make sense to me was that the expirationTime setting
in the connector is documented as causing the pool to eject stale
connections once they're unused for that length of time. If that's shorter
than the validation interval, I wouldn't expect the background validator
to even try those connections and see that they're closed, since they
should have just expired by then.
-- Scott
That at least does not seem to happen. It's not hung when the connections
are stale, just doesn't seem to make much use of the pool. I am obviously
much more concerned about my IdP than the AD server, so if they can handle
the load, I don't much care.
> This was with an older version of the IdP (2.1.5, right around when
>vt-ldap was first added).
>Due to an unrelated PeopleSoft LDAP-handling bug, we ended up lowering
>the idle timeout on out IdP to on the order of 2 minutes. This
>effectively made the LB timeout handling a non issue.
Which idle timeout are you referring to, if you don¹t mind?
Yeah, that was a typo. I almost replied to my own message to correct but didn't want to clutter the list if no one cared.
The timeout was really the LDAP server's base idle session timeout.
PeopleSoft has (had?) a known bug where it creates lots and lots of orphan LDAP sessions. The suggested workaround was to reduce the idle timeout on LDAP sessions at the LDAP server. So this particular bug really had nothing to do with Shibboleth except that it's the same LDAP server that Shibboleth queries.
The rest was just trying to say that since making the change (and finally upgrading to 2.3.3, yay!) we haven't noticed any performance issues on the IdP.
--- Eric
Eric Goodman
er...@ucsc.edu
just my two cents here...
> - are pools necessary to get reasonable performance on highly loaded IdPs?
We have an IdP cluster and define three OpenLDAP servers within the
attribute resolver's DataConnector (for failover reasons, i.e. each
server has the same data). Initially, we did not define a
ConnectionPool. With this setup, we did some load testing and debugging.
When performing just one login at a time, and some seconds later just
another one, the attribute resolver performs a costly LDAP BIND
operation for each of these logins (taking about 50ms), i.e. the LDAP
connection is closed shortly after attribute resolution. This is okay,
as the few users will not recognize this 50ms, and open connections are
reduced to a minimum.
Then, if we fire lots of login requests to the IdP (using "The
Grinder"), the IdP seems to use a connection pool even if it has not
been explicitly defined (there is no LDAP BIND operation each time).
This is just fine, too, as the system manager does not need to care
about connection pools, and each single user still gets good performance.
For comparison, after the LDAP BIND, a single LDAP request just takes
about 3ms here.
A bottleneck, however, appears to be in the LDAP JAAS login module at
login time (see login.config). By default, the SearchDnResolver (which
resolves the user's DN according to the specified userFilter) never does
connection pooling, thus the IdP always performs an LDAP BIND here where
it could keep the connection open. We replaced the SearchDnResolver by a
static one for test purposes, and our IdP cluster now handled about
twice as much logins per second. (The IdP is not "blocked" by the LDAP
BINDs, but maybe the number of threads or network connections is at a
limit here?!)
This issue has been reported at:
http://code.google.com/p/vt-middleware/issues/detail?id=118
Please remember that the LDAP AUTHN request itself (in contrast to the
SearchDnResolver issue) will always remain using an LDAP BIND, as it
must bind in the name of the user who is going to be authenticated.
> - do they handle failed connections reasonably without ever surfacing them
> as actual data connector failures?
This has not been an issue here yet. "The Grinder" reports approximately
20 errors per 10.000 logins, but I did never catch one in the browser
myself.
-Manuel
So to be explicit about how these properties are specified in the configs, would the following configure the vt-ldap connector to retry failed connections three times with a wait of 300 ms between retries?
<resolver:DataConnector
id="myLDAP"
xsi:type="LDAPDirectory" xmlns="urn:mace:shibboleth:2.0:resolver:dc"
ldapURL="ldap://blah.blah.blah"
baseDN="o=blah,c=US"
principal="cn=blah,ou=blabbityblah,o=blahblah,c=US"
principalCredential="blahblah" >
<FilterTemplate>
<![CDATA[
(uid=$requestContext.principalName)
]]>
</FilterTemplate>
<ReturnAttributes>blah1 blah2 blah3</ReturnAttributes>
<LDAPProperty name="edu.vt.middleware.ldap.operationRetry" value="3" />
<LDAPProperty name="edu.vt.middleware.ldap.operationRetryWait" value="300" />
</resolver:DataConnector>
On 9/8/11 4:14 PM, "Daniel Fisher" <dfi...@vt.edu> wrote:There's also a load balancer involved (per your other comment) that could
>
>That's sounds very aggressive, and would certainly discourage pooling.
>Perhaps they don't want you holding connections open? We configure
>keep-alive on the servers (OpenLDAP, not AD) to encourage it.
be affecting it.
One thing that didn't make sense to me was that the expirationTime setting
in the connector is documented as causing the pool to eject stale
connections once they're unused for that length of time. If that's shorter
than the validation interval, I wouldn't expect the background validator
to even try those connections and see that they're closed, since they
should have just expired by then.
Hi Scott,
just my two cents here...
A bottleneck, however, appears to be in the LDAP JAAS login module atlogin time (see login.config). By default, the SearchDnResolver (which
resolves the user's DN according to the specified userFilter) never does
connection pooling, thus the IdP always performs an LDAP BIND here where
it could keep the connection open. We replaced the SearchDnResolver by a
static one for test purposes, and our IdP cluster now handled about
twice as much logins per second. (The IdP is not "blocked" by the LDAP
BINDs, but maybe the number of threads or network connections is at a
limit here?!)
This issue has been reported at:
http://code.google.com/p/vt-middleware/issues/detail?id=118
>I think that's correct, if I understand what you're saying. A lower
>expirationTime will cause those connections to be removed before the
>validator even runs. So the connections won't be in the pool and
>therefore won't be validated.
I didn't explain it well, but what I'm seeing is that they do get
validated, even though they're all idle, and they all should have expired.
Otherwise there wouldn't be warnings about connection failures when it
attempts to validate them.
-- Scott
On 9/9/11 10:02 AM, "Daniel Fisher" <dfi...@vt.edu> wrote:I didn't explain it well, but what I'm seeing is that they do get
>I think that's correct, if I understand what you're saying. A lower
>expirationTime will cause those connections to be removed before the
>validator even runs. So the connections won't be in the pool and
>therefore won't be validated.
validated, even though they're all idle, and they all should have expired.
Otherwise there wouldn't be warnings about connection failures when it
attempts to validate them.
No, it's higher than the expiration period. The expiration is 5m and the
timer is 15m.
Each time it runs, I see one or two of the connections it's validating log
a Communications Failure.
On 9/9/11 1:44 PM, "Daniel Fisher" <dfi...@vt.edu> wrote:No, it's higher than the expiration period. The expiration is 5m and the
>
>
>Did you set the prune timer period to some value below the expiration
>time? One controls how often the pool is pruned, the other controls
>whether a connection should be pruned.
timer is 15m.
Each time it runs, I see one or two of the connections it's validating log
a Communications Failure.
Oh, ok, I didn't realize it was separate, I thought that was the same
operation.
Yes, I guess it could be coming from the close operation. It's just noisy,
I didn't think it actually hurt anything.
If what Manuel said is true and it reuses connections intrinsically even
if I don't specify a pool, I'll probably just remove them in any case.
Daniel,
How are the exceptions that are caught by the vt-ldap library specified in the configs? The vt-ldap docs (http://code.google.com/p/vt-middleware/wiki/vtldapProperties#Properties) show them as being specified like so:
{CommunicationException, ServiceUnavailableException}
So, I am assuming that it assumes javax.naming.* classes. Can you specify exception classes in other packages? Also, is it simply an instanceof operation and can you specify and match on superclasses or interfaces?
Admittedly, these last two questions are academic, because I just need to retry on javax.naming.TimeLimitExceededException, so I am presuming I just need to specify it thusly:
<LDAPProperty name="edu.vt.middleware.ldap.operationRetryExceptions" value="{CommunicationException, ServiceUnavailableException, TimeLimitExceededException}" />
Does that sound right?
Thanks!
yuji
----
On Thu, Sep 8, 2011 at 4:18 PM, Yuji Shinozaki <ys...@virginia.edu> wrote:
> So to be explicit about how these properties are specified in the configs, would the following configure the vt-ldap connector to retry failed connections three times with a wait of 300 ms between retries?
>
> <resolver:DataConnector ... >
> ...
> <LDAPProperty name="edu.vt.middleware.ldap.operationRetry" value="3" />
> <LDAPProperty name="edu.vt.middleware.ldap.operationRetryWait" value="300" />
> </resolver:DataConnector>
On Sep 9, 2011, at 9:34 AM, Daniel Fisher wrote:
> Correct. Note that by default the only exceptions that trigger retries are CommunicationException and ServiceUnavailableException.
>
> --Daniel Fisher
----
I am spinning off a different thread about the configuration of the vt-ldap implementation of the ldap dataconnector to keep the noise down on the original thread.
Daniel,
How are the exceptions that are caught by the vt-ldap library specified in the configs? The vt-ldap docs (http://code.google.com/p/vt-middleware/wiki/vtldapProperties#Properties) show them as being specified like so:
{CommunicationException, ServiceUnavailableException}
So, I am assuming that it assumes javax.naming.* classes. Can you specify exception classes in other packages?
Also, is it simply an instanceof operation and can you specify and match on superclasses or interfaces?
Admittedly, these last two questions are academic, because I just need to retry on javax.naming.TimeLimitExceededException, so I am presuming I just need to specify it thusly:
<LDAPProperty name="edu.vt.middleware.ldap.operationRetryExceptions" value="{CommunicationException, ServiceUnavailableException, TimeLimitExceededException}" />
No, you didn't, you've just changed the subject line.
-peter
Thanks. That makes more sense. That syntax is unclear in the documentation.
> Are you sure this is what you want? An operation retry closes and reopens the connection, then presumably you'd get the same TimeLimitExceededException again. Note that LimitExceededException is ignored by the search result handler. So even if it occurs, you'll still get any results that were retrieved before the exception.
>
Ok. This may be a matter of upgrading our shib implementation (it is getting quite old) as the error is resulting in an attribute resolution error bubbling all the way up. We are getting the time limit exceeded exceptions at irregular intervals, but with increasing frequency lately. So i was hoping a properly-spaced retry would workaround these brown-outs. Our campus ldap admins have reassured us that they are throwing new hardware at the general problem, but of course that won't be for the proverbial "few weeks".
Thanks for the info.
Yuji
----
P.S. To Peter: sorry about the threading faux pas: I am dumb but not stupid: I was certain that I cut and pasted my reply to a fresh new message, but apparently I was mistaken. I just wish my mailer had better In-Reply-To controls.
Thanks. That makes more sense. That syntax is unclear in the documentation.
> <LDAPProperty name="edu.vt.middleware.ldap.operationRetryExceptions" value="javax.naming.CommunicationException,javax.naming.ServiceUnavailableException,javax.naming.TimeLimitExceededException" />
>
Ok. This may be a matter of upgrading our shib implementation (it is getting quite old) as the error is resulting in an attribute resolution error bubbling all the way up. We are getting the time limit exceeded exceptions at irregular intervals, but with increasing frequency lately. So i was hoping a properly-spaced retry would workaround these brown-outs. Our campus ldap admins have reassured us that they are throwing new hardware at the general problem, but of course that won't be for the proverbial "few weeks".
> Are you sure this is what you want? An operation retry closes and reopens the connection, then presumably you'd get the same TimeLimitExceededException again. Note that LimitExceededException is ignored by the search result handler. So even if it occurs, you'll still get any results that were retrieved before the exception.
>