SAML functions very slow

177 views
Skip to first unread message

Hayden Sartoris

unread,
Feb 27, 2020, 10:21:43 AM2/27/20
to CAS Community
Morning,

I'm running a CAS 6 server that's kept pretty aggressively up to date. Since ~November 2019, SAML
functionality has been very slow, as has the /cas/samlValidate endpoint. I suspect XML parsing and
serialization is to blame, but I'm hard pressed to identify exactly where or how.

The server generally takes either ~7 or ~14 seconds to serve a response, and one CPU core is usually
maxed out while processing. Has anyone else run into this issue? It's making SAML integrations nigh
unusable.

Best,
Hayden Sartoris
Message has been deleted

bcolly

unread,
Feb 27, 2020, 11:44:19 AM2/27/20
to CAS Community
Yes, I am seeing the same delays with CAS as a SAML SP.
Thanks for mentioning this.

Hayden Sartoris

unread,
Feb 27, 2020, 3:49:03 PM2/27/20
to CAS Community
Sort of glad to hear that other people have this problem.

I've narrowed it down to AbstractSamlObjectBuilder, in org.apereo.cas.support.saml.util. Specifically, in constructDocumentFromXml, JDOM SAXBuilder is used to deserialize a String containing the XML data to a JDOM Document object. I have a local development instance, and I've tried a lot of things to get this to speed up, including disabling validation in every way possible, specifying a Xerces parser, upgrading from JDOM 1.1 to 2.0.6, etc.. No matter what I do, the call to SAXBuilder.build(String xmlString) takes either ~6.5 seconds or almost no time at all, very rarely anything in between.

I need to hook this up to a debugger and break during execution or something, but I don't have an appropriate Java development environment handy. This is pretty ridiculous; we're talking about ~440 characters of XML taking nearly seven seconds to parse.

Hayden Sartoris

unread,
Feb 28, 2020, 5:10:29 PM2/28/20
to CAS Community
Update: I'm not really sure why, but changing my deployment totally solved this issue, as well as other general sluggishness. I was deploying using the Spring Boot Embedded Tomcat instance, but switching to deploying to an external Tomcat instance with no embedded server has drastically increased performance. Consider trying that if you're having the same issue.

John Bond

unread,
Mar 11, 2020, 6:33:18 AM3/11/20
to CAS Community

We have also observed this slow down running cas 6.1.*.  We have been tracking our troubleshooting progress[1] but so far have not found anything concrete. however my colleague has tracked down one pause to the following part of sprin-webflow code


We will attempt to move to an external tomcat instance and see if that resolves the issue


Hayden Sartoris

unread,
Mar 11, 2020, 7:53:26 AM3/11/20
to cas-...@apereo.org
Interesting; I haven't had any such issues with my global principal attribute predicate script, but the delay times are similar. Also of note is that the suspicious code you've isolated, like mine, has to do with string processing (or so it seems at first blush).

Another testing route I took was running the application with the gradle bootRun task; this is what I had initial success with before moving to external Tomcat. You may need to reconfigure your gradle/springboot.gradle or gradle/tasks.gradle, I don't recall which, in order to get this task to work.

Good luck; I'll keep an eye on your troubleshooting.

--
- Website: https://apereo.github.io/cas
- Gitter Chatroom: https://gitter.im/apereo/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
---
You received this message because you are subscribed to a topic in the Google Groups "CAS Community" group.
To unsubscribe from this topic, visit https://groups.google.com/a/apereo.org/d/topic/cas-user/iMwglmoMBPc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cas-user+u...@apereo.org.
To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/226b9165-d3ea-4f2f-8dd0-ddabe860968c%40apereo.org.
--
Hayden Sartoris
Systems Administrator
Bard College IT
(he/him/his)

John Bond

unread,
Mar 24, 2020, 9:10:15 AM3/24/20
to CAS Community

Following up on this thread, it seems we have managed to reduce the lag on our infrastructure by adding the following to /et/cas/config/cas.properties

  spring.autoconfigure.exclude=org.springframework.boot.autoconfigure.web.embedded.EmbeddedWebServerFactoryCustomizerAutoConfiguration

I'm unsrue why this fixed the issue however i came across the suggestion while attempting to configure a standalone war to work with an external tomcat instance and hitting an error regarding a missing method.


Adding the above config fixed the issue with the with the external instance of tomcat however it also significantly reduced the lag we observed when using the embeded war. If anyone is able to provide insight into why this config parameter helped i would be intrested


Thanks

Hayden Sartoris

unread,
Mar 24, 2020, 12:03:31 PM3/24/20
to cas-...@apereo.org
Successfully reproduced this here.



This result strongly suggests that either the configuration specified by this class or the very existence of the Bean tomcatWebServerFactoryCustomizer (and thus some other part of the code) is the cause of this slowdown. I'll take a look at this hopefully in the near future; if anyone knows anything about this part of Spring please chime in.

Best,
Hayden

--
- Website: https://apereo.github.io/cas
- Gitter Chatroom: https://gitter.im/apereo/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
---
You received this message because you are subscribed to a topic in the Google Groups "CAS Community" group.
To unsubscribe from this topic, visit https://groups.google.com/a/apereo.org/d/topic/cas-user/iMwglmoMBPc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cas-user+u...@apereo.org.

Ocean Liu

unread,
Mar 13, 2024, 2:01:13 PMMar 13
to CAS Community, John Bond

Thank you for sharing your insights!

Though it’s been nearly 4 years since your original post, we wanted to provide an update on our progress.

We’re currently in the process of migrating from CAS 5.3 to CAS 7. During testing, we noticed an issue where CAS 7 took over 6 seconds to generate the SAMLResponse XML, with CPU usage exceeding 120% on an AWS EC2 instance with 1 vCPU.

We experimented with the spring.autoconfigure.exclude=org.springframework.boot.autoconfigure.web.embedded.EmbeddedWebServerFactoryCustomizerAutoConfiguration.
Surprisingly, this resulted in a significant improvement, reducing response time to just 150ms and lowering CPU usage to 11%.

It’s worth noting that CAS 7 utilizes Spring Boot 3.2, there may still be performance-related challenges with the embedded Tomcat auto configuration at this time.

While we would have liked to create a minimal sample to submit to Spring Boot, our current focus is on completing the upgrade within our timeline constraints.

Best,

Ocean

John Shrader

unread,
Mar 14, 2024, 1:14:58 PMMar 14
to CAS Community, Ocean Liu
Ocean,

Thank you for this suggestion. I've been dealing with slow and CPU intensive SAML response generation since switching to 7.0.x. Adding that to my cas.properties fixed the problem entirely.

John Shrader

unread,
Mar 15, 2024, 7:48:05 AMMar 15
to Ocean Liu, CAS Community
Thank you for the update and advice. I tested in our dev environment and saw no noticeable issues, but the safer option is preferred. I've updated to using server.tomcat.background-processor-delay=0s  property and the performance issues with SAML are still resolved.

On Thu, Mar 14, 2024 at 4:35 PM Ocean Liu <li...@whitman.edu> wrote:

Hi John,

We want to let you know we removed that configuration (which excludes the EmbeddedWebServerFactoryCustomizerAutoConfiguration) in our environment.
We added server.tomcat.background-processor-delay=0s configuration, and it fixed the performance issue.
This option is safer and has less impact.

From a Unicon support:

If you are deploying with an embedded tomcat container, excluding that component is likely catastrophic to your deployment and a major red flag.

Without knowing what that exclusion does, this should and could very severely jeopardize the stability of your deployment.

I would suggest that you remove the exclusion and instead set this: server.tomcat.background-processor-delay=0s
You can follow the conversation here: https://github.com/apereo/cas/pull/5652

Cheers,



--
John Shrader
Administrator of Network Systems
Northwest State Community College
22600 State Route 34
Archbold, OH 43502
(419) 267-1299
jshr...@northweststate.edu
Reply all
Reply to author
Forward
0 new messages