LDAP Failover

272 views
Skip to first unread message

Kostas Kalevras

unread,
Nov 7, 2024, 6:18:05 AM11/7/24
to CAS Community
Hello team

I am trying to understand how LDAP failover works and where I am in error in my configuration.

I am using CAS 6.6.15. I have a test Docker compose setup with a CAS and two LDAP servers (one "primary" and the other "secondary")

Relevant config:
cas.authn.ldap[0].ldap-url=ldap://ldap1:389 ldap://ldap2:389
cas.authn.ldap[0].connection-strategy=ACTIVE_PASSIVE

My process is as follows:
  1. Start the docker compose stack
  2. Perform an auth on CAS. I observe traffic on ldap1
  3. Inside the CAS container run route add -host <ldap1 ip> reject
  4. A telnet to ldap1 389 now returns an error as expected
  5. Perform an auth on CAS. After a few seconds I get authenticated and observe traffic on ldap2
  6. Perform an auth on CAS again. This time everything happens very fast with no problems. So far so good!
  7. Now for the main issue: Delete the route with route del -host <ldap1 ip> reject
  8. Now I can telnet to ldap1 389
  9. Yet no matter what I do, how much I wait, CAS will keep on using only ldap2
  10. I tried setting up the cas.monitor.ldap settings, explicitly setting the cas.authn.ldap[0].connect-timeout=PT5S to no avail
I seem to be stuck with failover working well when the primary LDAP server goes offline but not when it comes back online.

Am I missing something here?

Thanks for any help

Ray Bon

unread,
Nov 7, 2024, 9:53:22 PM11/7/24
to cas-...@apereo.org
Kostas,

An alternative option may be to add a load balancer in front of ldap.

Ray

On Thu, 2024-11-07 at 00:32 -0800, Kostas Kalevras wrote:
You don't often get email from kka...@gmail.com. Learn why this is important

Fisher, Daniel

unread,
Nov 7, 2024, 9:53:32 PM11/7/24
to cas-...@apereo.org
The behavior you’re seeing is consistent with how this functionality works.
The code attempts to keep working connections available and ensures the minimum pool size is correct.
When ldap1 is available you should expect *new* connections to be created using that directory.
The connection to ldap2 may be removed from the pool when it has been idle too long and the number of connections in the pool exceed the minimum.
Of course if ldap2 becomes unavailable, it will be removed from the pool.

tl;dr the connection to ldap2 will naturally cycle out of the pool over time as the pool grows and shrinks based on load.

There is a new feature in development that will allow you to configure a max connection age, I expect that to be available in a future CAS release.

—Daniel Fisher

kkalev

unread,
Nov 8, 2024, 2:25:14 AM11/8/24
to cas-...@apereo.org
Thanks a lot for the clarification. The planned feature will certainly be nice in order to be certain that the "passive" LDAP server will be cycled out after a certain period of time. Cycling out based on idle timeout will be hard in servers with heavy load.

I think it would be helpful to add what you wrote in the documentation for the connection-strategy LDAP option since it is a bit counter intuitive (at least with respect to what one would expect from an "ACTIVE_PASSIVE" label). I also went through the ldaptive documentation and could not find any helpful hint there as well.

The connection strategies should work as expected when not using a connection pool (and performing a new connection for every request). I guess that I was expecting automatic failover to the "ACTIVE" LDAP server due to the validation functionality of the ldaptive connection pooling: https://www.ldaptive.org/docs/guide/connections/pooling.html

--
- Website: https://apereo.github.io/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
---
You received this message because you are subscribed to a topic in the Google Groups "CAS Community" group.
To unsubscribe from this topic, visit https://groups.google.com/a/apereo.org/d/topic/cas-user/390ZR4y345c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cas-user+u...@apereo.org.
To view this discussion visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/D07463B9-4556-4DB0-ABFB-9E297CF2CFBE%40vt.edu.

Fisher, Daniel

unread,
Nov 8, 2024, 8:35:39 PM11/8/24
to cas-...@apereo.org
On Nov 8, 2024, at 1:05 AM, kkalev <kka...@gmail.com> wrote:

Thanks a lot for the clarification. The planned feature will certainly be nice in order to be certain that the "passive" LDAP server will be cycled out after a certain period of time. Cycling out based on idle timeout will be hard in servers with heavy load.


You can’t prune connections during heavy load. If your entire pool is occupied with servicing requests you should expect those connections to remain until the load subsides. That is, the implementation only considers _available_ connections for pruning. Not sure if that’s what you meant, but I wanted to clarify.

I think it would be helpful to add what you wrote in the documentation for the connection-strategy LDAP option since it is a bit counter intuitive (at least with respect to what one would expect from an "ACTIVE_PASSIVE" label). I also went through the ldaptive documentation and could not find any helpful hint there as well.


I’ll take a pass over the documentation, it can always be improved. Note that the connectionStrategy and the pruneStrategy are currently decoupled. The way that connections are created has no bearing on how they are selected for removal.

The connection strategies should work as expected when not using a connection pool (and performing a new connection for every request). I guess that I was expecting automatic failover to the "ACTIVE" LDAP server due to the validation functionality of the ldaptive connection pooling: https://www.ldaptive.org/docs/guide/connections/pooling.html


Can you elaborate a bit more on how you would prefer this type of feature to work? Specifically, would you want connections made to “passive” URLs to be pruned immediately or should they just be the first connections that are pruned if their idle timeout has occurred?

—Daniel Fisher

kkalev

unread,
Nov 13, 2024, 5:23:39 AM11/13/24
to cas-...@apereo.org
On Sat, 9 Nov 2024 at 03:35, Fisher, Daniel <dfi...@vt.edu> wrote:

On Nov 8, 2024, at 1:05 AM, kkalev <kka...@gmail.com> wrote:

Thanks a lot for the clarification. The planned feature will certainly be nice in order to be certain that the "passive" LDAP server will be cycled out after a certain period of time. Cycling out based on idle timeout will be hard in servers with heavy load.


You can’t prune connections during heavy load. If your entire pool is occupied with servicing requests you should expect those connections to remain until the load subsides. That is, the implementation only considers _available_ connections for pruning. Not sure if that’s what you meant, but I wanted to clarify.
Sorry for the delayed response!

If "active" server went offline, we transitioned to "passive" and then "active" went back online, one could argue that we want to actually use the active server and not keep connections to passive until every each one reaches an idle timeout. Transitioning would be a problem if the active server was not available. I understand that this is a border line philosophical discussion and not a purely technical one.

Imagine that the "Active" server might be our write master while our "passive" a read-only replica with limited resources which sometimes lags the master. So we would like to always use the "active" one if possible and use the "passive" only during outages with automatic and relatively quick failover when the primary is back online.
 

I think it would be helpful to add what you wrote in the documentation for the connection-strategy LDAP option since it is a bit counter intuitive (at least with respect to what one would expect from an "ACTIVE_PASSIVE" label). I also went through the ldaptive documentation and could not find any helpful hint there as well.


I’ll take a pass over the documentation, it can always be improved. Note that the connectionStrategy and the pruneStrategy are currently decoupled. The way that connections are created has no bearing on how they are selected for removal.

The connection strategies should work as expected when not using a connection pool (and performing a new connection for every request). I guess that I was expecting automatic failover to the "ACTIVE" LDAP server due to the validation functionality of the ldaptive connection pooling: https://www.ldaptive.org/docs/guide/connections/pooling.html


Can you elaborate a bit more on how you would prefer this type of feature to work? Specifically, would you want connections made to “passive” URLs to be pruned immediately or should they just be the first connections that are pruned if their idle timeout has occurred?

If active and passive are both available it might be preferable to "open a new connection to active, prune one connection to passive until no more connections to passive".
 

—Daniel Fisher

--
- Website: https://apereo.github.io/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
---
You received this message because you are subscribed to a topic in the Google Groups "CAS Community" group.
To unsubscribe from this topic, visit https://groups.google.com/a/apereo.org/d/topic/cas-user/390ZR4y345c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cas-user+u...@apereo.org.
Reply all
Reply to author
Forward
0 new messages