Good morning. Related to increase in CPU usage post upgrade, yes we have indeed seen this. Yesterday we upgraded to 6.3.5.
From an experience perspective:
Quantitatively:


Has anyone turned off HTTP options? When looking into this, we saw two hosts that combined at 70k calls to the CAS servers using HTTP options. We are considering disabling this.
We are still investigating all of this.
Thanks, Jay
________________________________
Jason Rappaport (he/him)
Identity and Access Management Analyst
Office of Information Technology
Email: jaso...@princeton.edu
Office: 609-258-8464
--
- Website: https://apereo.github.io/cas
- Gitter Chatroom: https://gitter.im/apereo/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
---
You received this message because you are subscribed to the Google Groups "CAS Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cas-user+u...@apereo.org.
To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/75e422d7-d51f-4a69-8911-7d5005d4cc12n%40apereo.org.
Juan Quintanilla
Good afternoon. Just to add a bit more information to this.
Today we doubled the CPU and RAM for on our on-prem CAS servers; they are now at 4 CPUS and 16 GB of RAM. They are now stable (timeouts are gone for various services and the metrics endpoints are responding).
Our off-prem CAS servers are running fine with 2 CPUS and 8 GB of RAM; no change was made to them.
Juan – you mentioned Hazelcast, we use that as well for replicating information from our on-prem CAS servers to the off-prem CAS servers. We have also encountered several instances where our off-prem CAS servers CPU is pegged in both our QA and PROD environments. We have little to no traffic using our QA CAS servers and what is interesting is that both environments (QA and PROD) have pegged CPUs on the same days. When we investigated, we found that the Hazelcast cluster was constantly being reestablished. I posted a message on the Hazelcast support community https://groups.google.com/g/hazelcast/c/UmB1VzOBm-4 and then talked to the folks at Hazelcast. Basically what they said was that unless your virtual machine CAS servers are in the same datacenter, do not use the Hazelcast version that comes with CAS. The Hazelcast folks indicated that using TCP/IP to maintain session information on CAS severs that are too distant (like having CAS servers off-prem and some on-prem) would likely cause issues. They recommended purchasing their Hazelcast enterprise edition (which is really interesting and has a ton of cool features, but is also very expensive) that uses message queuing (MQ) technology instead of relying on TCP/IP to maintain session information. In your logs, look for “ Initialized new cluster connection between” we had 20k messages in one day that the CPUs were pegged.
We asked our networking team about the stability between on campus and off campus cloud provider and they indicated the connection was stable enough that we would not notice any glitches; which doesn’t explain why QA and PROD saw pegged CPUs on the same day as those hosts don’t talk to each other.
So for now, doubling the CPU and RAM on our on-prem CAS servers (which only handle ½ the authentication traffic as our off-prem CAS servers) seems to keep us stable…..for now.
Attached is a screenshot showing our CPU usage. We upgraded CAS on 7.6 (yesterday), which can be seen on the chart where the CPU is averaging about 50%. Today around 12:00pm, we doubled CPU and RAM, rebooted, and now CPU is at 1%. The gap in the data…happens sometimes with our data collection tools.

Thanks, Jay
________________________________
Jason Rappaport (he/him)
Identity and Access Management Analyst
Office of Information Technology
Email: jaso...@princeton.edu
Office: 609-258-8464
From: cas-...@apereo.org <cas-...@apereo.org> On Behalf Of Juan Quintanilla
Sent: Wednesday, July 7, 2021 1:42 PM
To: CAS Community <cas-...@apereo.org>
--
- Website: https://apereo.github.io/cas
- Gitter Chatroom: https://gitter.im/apereo/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
---
You received this message because you are subscribed to the Google Groups "CAS Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cas-user+u...@apereo.org.
To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/BN6PR05MB3474E121288FEA5212432C27861A9%40BN6PR05MB3474.namprd05.prod.outlook.com.
Related, I just got the latest shib-cas-authn plugin working on Shibboleth IDP 4.1.2 so we can delegate authentication to CAS (6.3.5). When I do this and try to authenticate, I see the following log message (45 times) and the response time from CAS is so long that our IDP timesout the seeion.
^XJul 8 10:12:53 105W user Loading SAML metadata from [/etc/cas/saml/metadata/sp_metadata.xml]
Jul 8 10:12:53 105W user No metadata signature location is defined for [/etc/cas/saml/metadata/sp_metadata.xml], so SignatureValidationFilter will not be invoked
Jul 8 10:12:53 105W user Initialized metadata resolver from [/etc/cas/saml/metadata/sp_metadata.xml]
Jul 8 10:12:53 105W user SAML metadata resolver [org.opensaml.saml.metadata.resolver.ChainingMetadataResolver] obtained from the cache is unable to produce/resolve valid metadata [/etc/cas/saml/metadata/sp_metadata.xml]. Metadata resolver cache entry with key [f2317……] has been invalidated. Retry attempt: [1]
Anyone ever seen this before?
To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/BL0PR04MB51569107299F3766876B135FCC1A9%40BL0PR04MB5156.namprd04.prod.outlook.com.