Failing SAML2 metadata causes outage

27 views
Skip to first unread message

Richard Frovarp

unread,
Mar 14, 2023, 1:49:38 PM3/14/23
to cas...@apereo.org
I'm on CAS 6.6.6. I had a SAML 2 service that was trying to pull
metadata from a remote URL. This request timed out (discovered a
firewall in the way). That ended up causing all of my other SAML 2
services to time out as well. The CAS protocol services were just fine.
In my logs I see:

2023-03-14 11:02:05,949 ERROR [org.apereo.cas.util.HttpUtils] - <Connect
to hostname failed: connect timed out
        DefaultHttpClientConnectionOperator.java:connect:151
        PoolingHttpClientConnectionManager.java:connect:376
        MainClientExec.java:establishRoute:393
>
2023-03-14 11:02:05,949 ERROR
[org.apereo.cas.support.saml.services.idp.metadata.cache.resolver.UrlResourceMetadataResolver]
- <NullPointerException
        UrlResourceMetadataResolver.java:resolve:107
SamlRegisteredServiceMetadataResolverCacheLoader.java:lambda$load$1:66
        Unchecked.java:lambda$function$21:878


Since it timed out, there is no status line to get a status code for.
That caused the NPE. I see this error a few times, so I don't know if
CAS was doing a retry, or my browser was trying it.

I also see a different set of errors for the same service:

2023-03-14 10:44:23,080 ERROR
[org.apereo.cas.support.saml.services.idp.metadata.SamlRegisteredServiceServiceProviderMetadataFacade]
- <No metadata resolvers could be configured for service Grouper Devel
with metadata location path
SamlRegisteredServiceMetadataResolverCacheLoader.java:load:72
SamlRegisteredServiceMetadataResolverCacheLoader.java:load:31
        LocalLoadingCache.java:lambda$newMappingFunction$3:197
>
2023-03-14 10:44:23,080 WARN
[org.apereo.cas.support.saml.web.idp.profile.AbstractSamlIdPProfileHandlerController]
- <No metadata could be found for [entityId]>
2023-03-14 10:44:23,080 WARN
[org.apereo.cas.util.function.FunctionUtils] - <Cannot find metadata
linked to entityId
AbstractSamlIdPProfileHandlerController.java:verifySamlAuthenticationRequest:497
AbstractSamlIdPProfileHandlerController.java:initiateAuthenticationRequest:315
AbstractSamlIdPProfileHandlerController.java:lambda$handleSsoPostProfileRequest$4:652
>
2023-03-14 10:44:23,080 ERROR [org.apereo.cas.web.support.WebUtils] -
<Cannot find metadata linked to entityId
AbstractSamlIdPProfileHandlerController.java:verifySamlAuthenticationRequest:497
AbstractSamlIdPProfileHandlerController.java:initiateAuthenticationRequest:315
AbstractSamlIdPProfileHandlerController.java:lambda$handleSsoPostProfileRequest$4:652

While this was happening, my other SAML 2 services also timed out.
Guessing it has to do with the resolver being synchronized? A timeout
takes a while to happen, so that would hold up the other services that
were good anyway. The fix was to restart CAS. I don't know if the other
services were failing as this was continuing to try and timeout, or if
the NPE broke things enough. This happened on prod, so we hit the
restart pretty quickly. It was on a service that I used and I caused it,
so we detected it pretty quickly.

It now can pull the metadata, so things are fine. However, I'm not super
thrilled with the idea of one service metadata refresh timing out
killing the rest of my SAML 2 services. I could come up with my own
caching method into git external to CAS, but as of right now, I'd prefer
if CAS was doing it.

Richard

Misagh Moayyed

unread,
Mar 24, 2023, 10:55:26 AM3/24/23
to CAS Developer
Interesting. Is this something you can reproduce?

Richard Frovarp

unread,
Mar 24, 2023, 1:11:54 PM3/24/23
to Misagh Moayyed, CAS Developer
I was able to reproduce on that system. I haven't had a chance to put a "test" case together yet. I'll try to do that next week. The key thing I think is that the firewall just drops the connection. If the hostname didn't exist, or if the connection were rejected, I think it would return quicker and likely not cause problems. Since the connection is dropped, you sit there for the entire timeout, then through any retry mechanisms and their timeout.
--
You received this message because you are subscribed to the Google Groups "CAS Developer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cas-dev+u...@apereo.org.
To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-dev/fe092d69-51f1-46ce-a04f-b192e2db214en%40apereo.org.


Reply all
Reply to author
Forward
0 new messages