CAS 5.1.6 TGT is destroyed early - but only during high volume

71 views
Skip to first unread message

Duane Booher

unread,
Jan 18, 2018, 1:22:21 PM1/18/18
to CAS Community
Hi, we have been running a new production upgrade to CAS 5.1.6 for about a week. Most things are working, however during our peak login times, our TGT sessions do not last the expected default of two hours and require the user to re-login early. We have a two host cluster with ehcache enabled.

We are using these defaults, which work with TGT persistence up to two hours,  but only during medium to low volume login periods.

cas.ticket.tgt.maxTimeToLiveInSeconds=28800
cas.ticket.tgt.timeToKillInSeconds=7200

We also get a TGT TICKET_GRANTING_TICKET_DESTROYED as the new login authentication is processing.

Any ideas on possible mis-configurations areas, or how to best debug this?

Duane

Ray Bon

unread,
Jan 18, 2018, 2:04:36 PM1/18/18
to cas-...@apereo.org
Duane,

Is the problem the total number of logins or the rate of logins?
Could ehcache be 'filling up'?
I seem to recall that ehcache can be configured with a maximum cache size.

Ray
-- 
Ray Bon
Programmer analyst
Development Services, University Systems
2507218831 | CLE 019 | rb...@uvic.ca

Duane Booher

unread,
Jan 19, 2018, 11:46:37 AM1/19/18
to CAS Community
This indeed was ehcache filling up to the this default mark: 

specifically: # cas.ticket.registry.ehcache.maxElementsInMemory=10000

Which I have increased to 15000 and we have sufficient heap to handle this increase. However, we have run a VIP monitor using cas/status and as soon as we went above 10000 we get this cas/status warning:

1.SessionMonitor: WARN - Session count (11798) is above threshold 10000. 825 service tickets.

which results in the HTML cas/status code going from a 200 to a 400, which shutdown our VIP :-(  So I quickly disabled that monitor to keep us running. There must be another threshold that I need to increase, any ideas?

I will submit another cas/status question just to get greater visibility.

Thanks, Duane

Duane Booher

unread,
Jan 19, 2018, 1:01:30 PM1/19/18
to CAS Community
We ended up modifying our CAS monitor to accept both HTML 200 and 400 status codes to get around this problem. This is a new prod CAS4 to CAS5 deployment and we have already used up the maxElementsInMemory=15000, so we will be going to 20k next. 

Are there any other CAS system parameters that we should be concerned with and consider increasing?

Duane

On Thursday, January 18, 2018 at 11:22:21 AM UTC-7, Duane Booher wrote:

Duane Booher

unread,
Jan 22, 2018, 8:25:37 AM1/22/18
to CAS Community
As far as the original problem that was described, we seem to have resolved that issue with:
cas.ticket.registry.ehcache.persistence=LOCALTEMPSWAP

This was originally covered in this posting:
RE: [cas-user] Cas 5 and implementing ehcache replication.

I have also posted in CAS-DEV, the continuing saga once we ran over 15000 sessions.

Duane

On Thursday, January 18, 2018 at 11:22:21 AM UTC-7, Duane Booher wrote:
Reply all
Reply to author
Forward
0 new messages