[Shib-Users] interpretation of cacheDuration

Adam Lantos

unread,

Mar 26, 2010, 9:51:25 AM3/26/10

to shibbole...@internet2.edu

Hi,

just a small question regarding
https://spaces.internet2.edu/display/SHIB2/TrustManagement and
https://spaces.internet2.edu/display/SHIB2/MetadataCorrectness...

What is the expected behavior if the metadata contains both validUntil
and cacheDuration? I understand that cacheDuration controls the reload
interval, but if the trusted source was unreachable or metadata was
corrupted, would Shibboleth continue to use the cached, still 'valid'
instance after cacheDuration expires?

I see the IdP has a configuration option for this, called
'maintainExpiredMetadata', and I suspect the SP takes this approach
(ie. 'if the metadata source is down, use the local cache until the
validUntil is expired'), but I didn't find any explicit documentation
on this. Is this behavior something which is hidden in the specs
somewhere or is it just an implementation-specific thing?

Can I rely on this robust behavior, and set short cacheDurations (1 or
2 hours) without being vulnerable to DoS attacks via bringing my
metadata hosting server down?

thanks,
Adam

Scott Cantor

unread,

Mar 26, 2010, 8:54:56 PM3/26/10

to shibbole...@internet2.edu

> I understand that cacheDuration controls the reload
> interval, but if the trusted source was unreachable or metadata was
> corrupted, would Shibboleth continue to use the cached, still 'valid'
> instance after cacheDuration expires?

The SP doesn't, when it does anything with it at all. This is documented in that topic.

In retrospect it could have been useful as a hint for telling sites how often to reload, and maybe should not have impacted validity, but that's not current state.

I don't do well with concepts like "sort of valid", so I had no idea what to do with it in the context of third party signed files. Probably I should just relax the code and stop modifying validity with it, just leave it at controlling the refresh interval. But changing it again makes things even more confusing too.

> I see the IdP has a configuration option for this, called
> 'maintainExpiredMetadata', and I suspect the SP takes this approach
> (ie. 'if the metadata source is down, use the local cache until the
> validUntil is expired'), but I didn't find any explicit documentation
> on this.

In the SP, expired means expired and there are no further options.

There are no "hidden" SP options. If it's not documented, it's not there, or I just forgot to document it.

> Can I rely on this robust behavior, and set short cacheDurations (1 or
> 2 hours) without being vulnerable to DoS attacks via bringing my
> metadata hosting server down?

No. Refresh policy should be set out of band right now.

-- Scott

Adam Lantos

unread,

Mar 27, 2010, 7:12:10 AM3/27/10

to shibbole...@internet2.edu

On Sat, Mar 27, 2010 at 1:54 AM, Scott Cantor <cant...@osu.edu> wrote:
>> I understand that cacheDuration controls the reload
>> interval, but if the trusted source was unreachable or metadata was
>> corrupted, would Shibboleth continue to use the cached, still 'valid'
>> instance after cacheDuration expires?
>

> The SP doesn't, when it does anything with it at all. This is documented in that topic.
>
> In retrospect it could have been useful as a hint for telling sites how often to reload, and maybe should not have impacted validity, but that's not current state.

Well, I've run some tests with the 2.2.1 SP.

I changed reloadInterval to 60 in shibboleth2.xml, then firewalled
connections to my metadata server. After that, I can still use all
entities in the remote metadata file, and logs say that

2010-03-27 10:59:31 INFO OpenSAML.MetadataProvider.XML [1]: reloading
remote resource...
2010-03-27 10:59:34 ERROR OpenSAML.MetadataProvider.XML [1]: error
while loading configuration from (...): unable to connect socket for
URL '...'
2010-03-27 10:59:34 WARN OpenSAML.MetadataProvider.XML [1]: using
local backup of remote resource
2010-03-27 10:59:34 INFO OpenSAML.MetadataProvider.XML [1]: loaded XML
resource (...)
2010-03-27 10:59:34 DEBUG OpenSAML.MessageEncoder.SAML2Redirect [1]:
validating input
2010-03-27 10:59:34 DEBUG OpenSAML.MessageEncoder.SAML2Redirect [1]:
marshalling, deflating, base64-encoding the message
2010-03-27 10:59:34 DEBUG OpenSAML.MessageEncoder.SAML2Redirect [1]:
marshalled message:
<samlp:AuthnRequest ...

I did even change the modification time of the backing file to be in
the past and it still uses that.

The IdP (2.1.5) also tries to use the cached metadata instance, but
then signature validation fails (?!), according to the logs, it seems
that the cached instance gets corrupted somewhere...

12:01:02.022 - WARN
[org.opensaml.saml2.metadata.provider.FileBackedHTTPMetadataProvider:101]
- Unable to read metadata from ... attempting to read it from local
backup
[...]
12:01:03.866 - DEBUG
[org.opensaml.xml.signature.SignatureValidator:76] - Signature did not
validate against the credential's key
12:01:03.870 - DEBUG
[org.opensaml.xml.signature.impl.BaseSignatureTrustEngine:143] -
Signature validation using candidate validation credential failed

I'll try to debug this further next week, because the same signature
validates well when retrieved from http.

> In the SP, expired means expired and there are no further options.
>
> There are no "hidden" SP options. If it's not documented, it's not there, or I just forgot to document it.
>

>> Can I rely on this robust behavior, and set short cacheDurations (1 or
>> 2 hours) without being vulnerable to DoS attacks via bringing my
>> metadata hosting server down?
>

> No. Refresh policy should be set out of band right now.

It's not refresh policy which my main concern is, it's the
availability requirements on my metadata hosting server. If
cacheDuration expires metadata, then the metadata server becames a
single point of failure and could take down our whole federation if it
becames unavailable for half a day (well, obviously it shouldn't,
but...).

In this case, I think it's probably smarter to fully omit
cacheDuration and agree on cache duration policy out of band, but it
still sounds pretty weird to me... And obviously there's no guarantee
that out of band refresh policies don't cause service disruption.

thanks,
Adam

Scott Cantor

unread,

Mar 27, 2010, 5:13:01 PM3/27/10

to shibbole...@internet2.edu

> I changed reloadInterval to 60 in shibboleth2.xml, then firewalled
> connections to my metadata server. After that, I can still use all
> entities in the remote metadata file, and logs say that

You didn't mention how small cacheDuration was, but I think the only reason it's not biting you or others is that all the released SP versions had a quirk with the backup file behavior such that failure to reload a new copy resulted in the backup copy being loaded over top of the running copy, rather than just maintaining the running copy.

Since cacheDuration acts as a delta against the initial time of load, this matters quite a bit, and is another reason why the SP's intended behavior is probably just wrong. I don't like changing it again, but I think it's justified in light of the fact that the reload behavior made the existing code intent irrelevant and it wasn't really affecting anybody.

> I did even change the modification time of the backing file to be in
> the past and it still uses that.

The backup file date isn't used for anything. The reload of the file is because of how the backup mechanism was implemented. It triggers any time there's an error in the primary load.

> The IdP (2.1.5) also tries to use the cached metadata instance, but
> then signature validation fails (?!), according to the logs, it seems
> that the cached instance gets corrupted somewhere...

I can't speak to the IdP.

> It's not refresh policy which my main concern is, it's the
> availability requirements on my metadata hosting server. If
> cacheDuration expires metadata, then the metadata server becames a
> single point of failure and could take down our whole federation if it
> becames unavailable for half a day (well, obviously it shouldn't,
> but...).

I agree, which is why I would never advise using cacheDuration in this kind of situation.

> In this case, I think it's probably smarter to fully omit
> cacheDuration and agree on cache duration policy out of band, but it
> still sounds pretty weird to me... And obviously there's no guarantee
> that out of band refresh policies don't cause service disruption.

They can only cause disruption if the metadata supplied is bad or if your service is down for so long that the validUntil comes into play.

-- Scott

Scott Cantor

unread,

Mar 27, 2010, 5:31:29 PM3/27/10

to shibbole...@internet2.edu

> Since cacheDuration acts as a delta against the initial time of load, this
> matters quite a bit, and is another reason why the SP's intended behavior is
> probably just wrong.

To clarify, what it was *supposed* to be doing seems wrong...what it's actually doing is probably the rough equivalent of what the next version would end up doing by design:

- use only validUntil to enforce expiration
- use cacheDuration to override (downward) the refresh interval
- keep using (valid) metadata if the refresh fails

The extra step it's applying to the validity now based on cacheDuration just gets wiped by the constant reloads of the backup (or successful refreshes), so it's basically a no-op.

-- Scott

Adam Lantos

unread,

Mar 27, 2010, 5:47:35 PM3/27/10

to shibbole...@internet2.edu

Allright, I understand it now. Something like this was my first
instinct when reading the specs, and observing Shib reload behavior
confirmed that short cacheDuration shouldn't cause disruption if
validUntil is present.

Thanks for the clarification, it is really appreciated!

cheers,
Adam

ps. please ignore my message sent from my other address...

Scott Cantor

unread,

Mar 27, 2010, 5:54:57 PM3/27/10

to shibbole...@internet2.edu

> Allright, I understand it now. Something like this was my first
> instinct when reading the specs, and observing Shib reload behavior
> confirmed that short cacheDuration shouldn't cause disruption if
> validUntil is present.

Ok, it's helpful if what I'm proposing (finally) matches your impression of the spec. This also aligns it to 1.3's behavior, since cacheDuration there was ignored because there was no remote access inside the software.

Thanks for prodding me on this, I might have missed a serious problem after changing all the reloading code.

-- Scott

Reply all

Reply to author

Forward