Feb 3 20:42:22 ip-192-168-66-56 ssp-idp[6517]: 3 [TR21b12e19] SimpleSAML_Error_Error: MEMCACHEDOWN
Feb 3 20:42:22 ip-192-168-66-56 ssp-idp[6517]: 3 [TR21b12e19] Backtrace:
Feb 3 20:42:22 ip-192-168-66-56 ssp-idp[6517]: 3 [TR21b12e19] 7 /var/simplesamlphp/lib/SimpleSAML/Memcache.php:116 (SimpleSAML_Memcache::get)
Feb 3 20:42:22 ip-192-168-66-56 ssp-idp[6517]: 3 [TR21b12e19] 6 /var/simplesamlphp/lib/SimpleSAML/Store/Memcache.php:42 (SimpleSAML_Store_Memcache::get)
Feb 3 20:42:22 ip-192-168-66-56 ssp-idp[6517]: 3 [TR21b12e19] 5 /var/simplesamlphp/lib/SimpleSAML/SessionHandlerStore.php:52 (SimpleSAML_SessionHandlerStore::loadSession)
Feb 3 20:42:22 ip-192-168-66-56 ssp-idp[6517]: 3 [TR21b12e19] 4 /var/simplesamlphp/lib/SimpleSAML/Session.php:325 (SimpleSAML_Session::getSession)
Feb 3 20:42:22 ip-192-168-66-56 ssp-idp[6517]: 3 [TR21b12e19] 3 /var/simplesamlphp/lib/SimpleSAML/Session.php:245 (SimpleSAML_Session::getSessionFromRequest)
Feb 3 20:42:22 ip-192-168-66-56 ssp-idp[6517]: 3 [TR21b12e19] 2 /var/simplesamlphp/lib/SimpleSAML/Auth/Simple.php:54 (SimpleSAML_Auth_Simple::isAuthenticated)
Feb 3 20:42:22 ip-192-168-66-56 ssp-idp[6517]: 3 [TR21b12e19] 1 /var/simplesamlphp/modules/core/www/authenticate.php:34 (require)
There were no MEMCACHEDOWN errors (but we didn't shut down all the
memcache servers).
> Did you set memcache.allow_failover = Off in php.ini?
No, I failed to do this. phpinfo shows it is set to 1. When I
noticed that I hadn't turned it off, I thought it wouldn't matter as
there is only one server in each server group, and that SSP is
handling the failover itself. Is that the problem?
> How long did you run all 4 memcache servers? Maybe the users that had
> issues had their sessions stored prior to you adding new servers?
Many days. The two new servers, which are now the only ones in the
load balancer for SSO, have been running for months so should both
have all sessions, if they are set up properly.
As I said, when we turned off one old server (only used in the
memcache config) nobody noticed. When we turned off the second, users
started reporting problems. I would have expected SSP to failover to
the third and fourth servers which were still running, being the
current SSP servers themselves.