Sporadic issues with authentication stopping

248 views
Skip to first unread message

Ben Branch

unread,
Aug 2, 2016, 2:41:40 PM8/2/16
to cas-...@apereo.org

Hello All,

 

Over the course of the last several months we started notice these errors more frequently in our environment:

 

org.springframework.ldap.UncategorizedLdapException: Uncategorized exception occured during LDAP processing; nested exception is javax.naming.NamingException: LDAP response read timed out, timeout used:3000ms.; remaining name 'dc=xxx,dc=xxxx'

 

org.springframework.ldap.ServiceUnavailableException: xxxxx.xxxx.xxxxx:636; socket closed; nested exception is javax.naming.ServiceUnavailableException: xxxx.xxxx.xxxx:636; socket closed; remaining name 'dc=xxx,dc=xxx'

 

org.springframework.ldap.CommunicationException: Connection timed out; nested exception is javax.naming.CommunicationException: Connection timed out [Root exception is java.net.SocketException: Connection timed out]; remaining name 'dc=xxx,dc=xxx'

 

 

When we get these errors, all authentication comes to a halt, which is expected given the error messages.  We moved our AD environment behind a new hardware load balancer, in hopes that this would resolve our issue, but it has not.  After much thought, I began to think this might be an LDAP pooling issue.  I reviewed the Spring LDAP Pooling configuration documentation and it advises that we should see a NoSuchElementException error message in the logs when the pool has been exhausted, but we do not see that.  My AD admin does not see any issues on his side when the errors occur, and our Network team does not see any issues on the Load Balancer back to AD either.  I thought that maybe the load balancer might be blocking connections, but when I do a `netstat`, I see the proper amount of $minIdle connections back to AD and they all show a stated of Established.  I am in the process of rolling out our HA configuration to see if this might help, but I’m concerned that this will only lead to a 50% failure rate in authentications when the error occurs (1 node failing to connect back to AD, while the other may still be able to connect). While I understand this is better than 0% authentication, it still concerns me very much. I my only resolution to this issue right now is to restart services.  I’m at a pretty big loss as where else to look and I feel like I’m running out of avenues to explore.  Any help would be appreciated.

 

CAS Version: 3.5.2 + LPPE

OS: RHEL 6.8

JAVA: OpenJDK 1.7

JAVA App Server: Tomcat 6.0.24 (Official RHEL version)

 

LDAP Configuration Options from cas.properties file:

 

#LDAP Properties

ldap.pool.minIdle=3

ldap.pool.maxIdle=5

ldap.pool.maxSize=10

 

# Maximum time in ms to wait for connection to become available

# under pool exhausted condition.

ldap.pool.maxWait=10000

 

# Period in ms at which evictor process runs.

ldap.pool.evictionPeriod=600000

 

# Maximum time in ms at which connections can remain idle before

# they become liable to eviction.

ldap.pool.idleTime=1200000

 

# Set to true to enable connection liveliness testing on evictor

# process runs.  Probably results in best performance.

ldap.pool.testWhileIdle=true

 

# Set to true to enable connection liveliness testing before every

# request to borrow an object from the pool.

ldap.pool.testOnBorrow=false

 

# LDAP Search Results Exception

ldap.authentication.ignorePartialResultException=true

 

# LDAP Base Environment Properties

ldap.authentication.jndi.connect.timeout=3000

ldap.authentication.jndi.read.timeout=3000

ldap.authentication.jndi.security.level=simple

 

# Policy Enforcement

ldap.authentication.lppe.warnAll=false

ldap.authentication.lppe.dateFormat=AD

ldap.authentication.lppe.dateAttribute=pwdLastSet

ldap.authentication.lppe.warningDaysAttribute

ldap.authentication.lppe.validDaysAttribute=maxPwdAge

ldap.authentication.lppe.warningDays=14

ldap.authentication.lppe.validDays=90

ldap.authentication.lppe.noWarnAttribute=

ldap.authentication.lppe.noWarnValues=

 

 

Ben Branch
UNIX/Linux Administrator

University of Central Oklahoma

ITIL Foundation v3, Network+, RHCE

100 N. University Drive, Box 122

Edmond, OK 73034

D: 405.974.2649 | M: 405.550.6804 | bbranch@uco.edu | www.uco.edu

 

I am wiser than this man, for neither of us appears to know anything great and good; but he fancies he knows something, although he knows nothing; whereas I, as I do not know anything, so I do not fancy I do. In this trifling particular, then, I appear to be wiser than he, because I do not fancy I know what I do not know.”  - Socrates

 

Jeffrey Wong

unread,
Aug 22, 2016, 4:47:17 PM8/22/16
to jasig-cas-user, cas-...@apereo.org, BBr...@uco.edu
Hey Ben,

I don't have answers, but I've seen similar in my CAS install as well.

Updating to the 4.2 helped resolve the listed exception (I no longer see it in the logs), but I'm having some other issues with memory management now, every 2-3 weeks. All of these issues require a server reload to (temporarily) resolve.

Is there any pattern that you're seeing for when the issues occur?
--
You received this message because you are subscribed to the Google Groups "CAS Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cas-user+u...@apereo.org.
To post to this group, send email to cas-...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/cas-user/.
To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/CO2PR0801MB2181D3619A877525A7C8CABAAF050%40CO2PR0801MB2181.namprd08.prod.outlook.com.
For more options, visit https://groups.google.com/a/apereo.org/d/optout.

Riley Wills

unread,
Apr 7, 2017, 10:33:19 PM4/7/17
to CAS Community
Ben,

What was your solution to your issue? We are encountering similar issues with our configuration.

- Riley Wills

Reply all
Reply to author
Forward
0 new messages