Morning all,
Last week we started seeing the below [anonymised] error (and suffixed stack trace) for some users during logon:
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] Library - LDAP search(): Failed search on base 'o=uni' for '(|(uid=LOGIN)(mail=LOGIN))'; cause: 'Can't contact LDAP server' (0xffffffff)
This doesn't happen to everyone but is often hitting the same person repeatedly and will then, suddenly, work.
Our IdP implementation is as follows:
* 2 x Solaris 10 running
- non-standard Apache 2.2.21 + PHP 5.3.8 as mod_php
- SimpleSAMLphp 1.9.2
- LDAP authsource configured as LDAPS
- memcached 1.4.10 for session sharing
- Sun LDAP Sun-Java(tm)-System-Directory/6.3.1 B2008.1121.0156 (64-bit)
- Load balanced behind a pair of Linux LVS servers in Direct Routing mode with hash based sticky routing
The LDAP instances are on the same Solaris instances and aren't exhibiting any other problems (they are also Load Balanced and the load balancer service checks are not showing any problems) nor are they logging anything obviously bad in their logs.
More details available if it would be helpful.
It started on 25th September and we've not seen it before (grep -c 0xffff simplesamlphp.log.2012-09-2* across both nodes):
simplesamlphp.log.2012-09-20:0 simplesamlphp.log.2012-09-20:0
simplesamlphp.log.2012-09-21:0 simplesamlphp.log.2012-09-21:0
simplesamlphp.log.2012-09-22:0 simplesamlphp.log.2012-09-22:0
simplesamlphp.log.2012-09-23:0 simplesamlphp.log.2012-09-23:0
simplesamlphp.log.2012-09-24:0 simplesamlphp.log.2012-09-24:0
simplesamlphp.log.2012-09-25:219 simplesamlphp.log.2012-09-25:14
simplesamlphp.log.2012-09-26:63 simplesamlphp.log.2012-09-26:143
simplesamlphp.log.2012-09-27:5 simplesamlphp.log.2012-09-27:325
simplesamlphp.log.2012-09-28:149 simplesamlphp.log.2012-09-28:36
simplesamlphp.log.2012-09-29:198 simplesamlphp.log.2012-09-29:0
The only thing that "changed" in relation to any of these systems is that memcached was incrementally restarted (to increase available memory from 32M to 256M) across both nodes but >8h (max session time) apart and we have, since, restarted Apache and LDAP on both nodes to rule out "long running oddities".
Any suggestions would be greatly appreciated!
Full example session log:
# grep 6447140dca simplesamlphp.log
Oct 01 10:40:38 simplesamlphp INFO [6447140dca] SAML2.0 - IdP.SSOService: Accessing SAML 2.0 IdP endpoint SSOService
Oct 01 10:40:39 simplesamlphp INFO [6447140dca] SAML2.0 - IdP.SSOService: Incomming Authentication request: '
https://jordan.kent.ac.uk/_sp'
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] SimpleSAML_Error_Exception: Error 2 - ldap_search() [<a href='function.ldap-search'>function.ldap-search</a>]: Search: Can't contact LDAP server
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] Backtrace:
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] 8 /www/lib/simplesamlphp-1.9.2/www/_include.php:70 (SimpleSAML_error_handler)
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] 7 [builtin] (ldap_search)
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] 6 /www/lib/simplesamlphp-1.9.2/lib/SimpleSAML/Auth/LDAP.php:203 (SimpleSAML_Auth_LDAP::search)
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] 5 /www/lib/simplesamlphp-1.9.2/lib/SimpleSAML/Auth/LDAP.php:264 (SimpleSAML_Auth_LDAP::searchfordn)
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] 4 /www/lib/simplesamlphp-1.9.2/modules/ldap/lib/ConfigHelper.php:187 (sspmod_ldap_ConfigHelper::login)
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] 3 /www/lib/simplesamlphp-1.9.2/modules/ldap/lib/Auth/Source/LDAP.php:52 (sspmod_ldap_Auth_Source_LDAP::login)
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] 2 /www/lib/simplesamlphp-1.9.2/modules/core/lib/Auth/UserPassBase.php:176 (sspmod_core_Auth_UserPassBase::handleLogin)
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] 1 /www/lib/simplesamlphp-1.9.2/modules/core/www/loginuserpass.php:49 (require)
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] 0 /www/lib/simplesamlphp-1.9.2/www/module.php:135 (N/A)
Oct 01 10:40:43 simplesamlphp ERROR [6447140dca] Library - LDAP search(): Failed search on base 'o=uni' for '(|(uid=LOGIN)(mail=LOGIN))'; cause: 'Can't contact LDAP server' (0xffffffff)
Oct 01 10:40:55 simplesamlphp NOTICE STAT [6447140dca] saml20-idp-SSO-first
https://jordan.kent.ac.uk/_sp https://sso.id.kent.ac.uk/idp NA
Oct 01 10:40:55 simplesamlphp NOTICE STAT [6447140dca] saml20-idp-SSO
https://jordan.kent.ac.uk/_sp https://sso.id.kent.ac.uk/idp NA
Oct 01 10:40:55 simplesamlphp INFO [6447140dca] Sending SAML 2.0 Response to '
https://jordan.kent.ac.uk/_sp'
--
Matthew Slowe
Server Infrastructure Team e:
m.s...@kent.ac.uk
IS, University of Kent t:
+44 (0)1227 824265
Canterbury, UK w:
www.kent.ac.uk