Deadlock on startup

131 views
Skip to first unread message

Richard Frovarp

unread,
Jun 18, 2025, 12:09:28 PMJun 18
to cas-...@apereo.org
We have been able to trigger a deadlock on startup in CAS 7.1. I do not
know if the problem exists on 7.2. I have only been able to replicate
this on production, and not in a test environment. There appears to be a
race condition between HTTP accepting connections, and CAS still
starting. So as CAS is starting, it is listening on its port before
startup has completed, and if a connection comes in at the right time,
which our busy CAS production system can have happen, it will deadlock.

Attached is the deadlock section from jstack when it happened. I don't
know if it is possible to stop it from listening for connections until
the end of startup or not. We're working on keeping traffic away from it
on startup, but that is something every deployment would need to do.

Thanks,

Richard
jstack-cas-ndsu.txt

Ray Bon

unread,
Jun 18, 2025, 10:32:53 PMJun 18
to cas-...@apereo.org
Richard,

We front our cas cluster with a load balancer. When one of the servers goes down, it is removed from the pool. When the health check is 'UP', it is added into the pool.

Ray

From: 'Richard Frovarp' via CAS Community <cas-...@apereo.org>
Sent: June 18, 2025 08:51
To: cas-...@apereo.org <cas-...@apereo.org>
Subject: [cas-user] Deadlock on startup
 
--
- Website: https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapereo.github.io%2Fcas&data=05%7C02%7Crbon%40uvic.ca%7Ca83978e61d3f49afd32e08ddae82807e%7C9c61d3779894427cb13b1d6a51662b4e%7C0%7C0%7C638858597720620695%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=U8AYVpFJDrOnSolGSORpqVIq7bQ5lpjnwlNE9dFUcCI%3D&reserved=0
- List Guidelines: https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgoo.gl%2F1VRrw7&data=05%7C02%7Crbon%40uvic.ca%7Ca83978e61d3f49afd32e08ddae82807e%7C9c61d3779894427cb13b1d6a51662b4e%7C0%7C0%7C638858597720642631%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=vrtbREILZ5XQzyazRqLjIAV9uVXio3FkydOYUeM%2BrC0%3D&reserved=0
- Contributions: https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgoo.gl%2Fmh7qDG&data=05%7C02%7Crbon%40uvic.ca%7Ca83978e61d3f49afd32e08ddae82807e%7C9c61d3779894427cb13b1d6a51662b4e%7C0%7C0%7C638858597720655596%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=VOq4smgdpTjJMUWnrDzhrPNqr7i8KZhHI7%2B2DZp3ufU%3D&reserved=0
---
You received this message because you are subscribed to the Google Groups "CAS Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cas-user+u...@apereo.org.
To view this discussion visit https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Fapereo.org%2Fd%2Fmsgid%2Fcas-user%2Fb879f7b3-a5ca-44c9-80e3-021f3f93ceb4%2540ndsu.edu&data=05%7C02%7Crbon%40uvic.ca%7Ca83978e61d3f49afd32e08ddae82807e%7C9c61d3779894427cb13b1d6a51662b4e%7C0%7C0%7C638858597720670606%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=ATAatTCxjzRkakNWqVfiGQKiXz15Uzcma0uCMhYbXX8%3D&reserved=0.

Florian Nari

unread,
Jun 19, 2025, 12:27:10 PMJun 19
to CAS Community, Richard Frovarp

Hello,

We're experiencing a similar problem (also in 7.1)
(our CAS server never starts if it receives a request at startup).

We've also noticed that the problem doesn't occur if CAS is running in an external tomcat

Richard Frovarp

unread,
Jul 8, 2025, 10:28:07 PMJul 8
to cas-...@apereo.org

We're working towards that solution. But I think it would be ideal if the system didn't open its port up until it was ready to accept traffic. Or at least until it wasn't going to deadlock. We're running a single node, so a load balancer isn't an obvious solution. And it's a solution that everyone needs to implement (or should implement). It's tricky because traffic at the right time will cause it to fail, which is a confusing error to try to debug. You may get lucky and it will start, or you may get unlucky and deadlock.

--
- Website: https://apereo.github.io/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG

---
You received this message because you are subscribed to the Google Groups "CAS Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cas-user+u...@apereo.org.
Reply all
Reply to author
Forward
0 new messages