I'm having trouble with SimpleSAMLphp expiring IdP's before their stated
expiretime is reached. As far I I can tell, this stems from
SimpleSAML_Metadata_SAMLParser::getExpireTime() misinterpreting the
cacheDuration field in SAML metadata.
We're running into the problem with a Scandinavian federation. Their
metadata contains:
<md:EntitiesDescriptor Name="http://md.swamid.se/md/swamid-2.0.xml"
cacheDuration="PT8H" validUntil="2012-02-09T10:45:04Z">
However, when I load that metadata into simpleSAMLphp (using metarefresh),
it concludes that these entries must expire 8h from now. Since we refresh
it less often, their entries expire too early and their users cannot access
our service.
The code seems to confuse cacheDuration with expiretime, and the
documentation aswell:
* If both the 'cacheDuration' and 'validUntil' attributes are present,
the shorter of them
* will be returned.
As the respective names imply, *cache*Duration is about caching, and
*valid*Until is about validity. These are different, though slightly
related, processes. The SAML Metadata standard confirms this:
validUntil [Optional]
Optional attribute indicates the expiration time of the metadata
contained in the element and any contained elements.
cacheDuration [Optional]
Optional attribute indicates the maximum length of time a consumer
should cache the metadata contained in the element and any contained
elements.
Therefore, simpleSAMLphp must not conflate these two issues. in this case
the validUntil attribute must be used by simpleSAMLphp to get the correct
expiretime for the entries contained in this metadata, and cacheDuration
must not come into play there. Especially because the standard talks about
"should cache", not "must cache" - it's hence a hint of when to refresh and
the standard in no way requires to immediately discard entries the second
that cache has not been refreshed for the indicated time.
To summarise, getExpireTime must consider validUntil, and tools like
metarefresh can make use of cacheDuration, but no conflation needs to
happen.
I'd gladly write a patch for getExpireTime() to only consider validUntil.
As for metarefresh not considering cacheDuration, that seems inherent in
its design - it's called by cron so it can't influence how often it runs.
This may be documented though.
--
Thijs Kinkhorst <th...@uvt.nl> – LIS Unix
Universiteit van Tilburg – Library and IT Services
Bezoekadres > Warandelaan 2 • Tel. 013 466 3035 • G 236
DIck
> --
> You received this message because you are subscribed to the Google Groups "simpleSAMLphp" group.
> To post to this group, send email to simple...@googlegroups.com.
> To unsubscribe from this group, send email to simplesamlph...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/simplesamlphp?hl=en.
>
--
Dick Visser
System & Networking Engineer
TERENA Secretariat
Singel 468 D, 1017 AW Amsterdam
The Netherlands
I disagree :)
My interpretation of the specification for cacheDuration is that it
states a maximum lifetime of cached data. If it is not refreshed before
the cacheDuration expires, it is no longer valid. Note that fetching
metadata using metarefresh is a form of caching. (As opposed to
pointing simpleSAMLphp to a remote metadata file - doable, but slow :)
My assumption has been that it is present to allow federations to
ensure that the metadata is regularily refreshed without forcing a
short validUntil, so that those using offline signing can still get
fast updates.
> Therefore, simpleSAMLphp must not conflate these two issues. in this case
> the validUntil attribute must be used by simpleSAMLphp to get the correct
> expiretime for the entries contained in this metadata, and cacheDuration
> must not come into play there. Especially because the standard talks about
> "should cache", not "must cache" - it's hence a hint of when to refresh and
> the standard in no way requires to immediately discard entries the second
> that cache has not been refreshed for the indicated time.
There are several ways that this sentence can be interpreted. I do not
think too much weight can be placed on the presence of "should" in it,
especially since it is in lowercase. If it does not place limits on
caching, why do they talk about "maximum time", when they could rather
write something about when the metadata should be refreshed?
I am not totally against changing the behaviour, but I would like to
see a clear description of the expected cache behaviour from some sort
of authorative source.
Best regards,
Olav Morken
UNINETT / Feide
I've encountered this in the wild (at 10:45Z this morning; it's resigned
every 5 min):
<md:EntitiesDescriptor Name="http://md.swamid.se/md/swamid-2.0.xml"
cacheDuration="PT8H" validUntil="2012-02-09T10:45:04Z">
There, the federation admin states that I need to cache it for 8h but that
the data is valid for 24h.
If it's the case as you state, I don't understand why he would set
validUntil to thrice the time that he sets cacheDuration to. If he intends
the validity to be 8h, then he'd just set validUntil to now+8h aswell,
right?
More generally, if they are to be interpreted as the same and just the
minimum of both is taken, why are there two different attributes instead of
just one? That would just be confusing. I find it more likely that there
are two attributes instead of one because they have different semantics,
not just different syntax.
I can imagine very well why one would have a lower desired cacheDuration
even though entries do not expire directly after that duration. One example
is that you find it desirable to have changes to some metadata propagated
several times a day (e.g.: new IdP added; description changed; etc) while
you want your signatures to be valid for longer. Suppose your metadata URL
is down temporarily and can't be refreshed - your newly added IdP will
indeed not get propagated but at least existing IdP's won't fall off the
net in 8h. Compare it to DNS: changeDuration is a RR's TTL ('really should
refetch if at all possible'), while validUntil is the Expire ('really throw
away if not refreshed').
My main concern is that this federation makes a very explicit statement
about when their metadata is set to expire:
validUntil="2012-02-09T10:45:04Z". If that statement is present, I would
always want to use that as the authoritative entity expiration, regardless
of any cacheDuration values. Afterall, if they meant a shorter period
they'd have shortened that time, right?
From my previous message:
> > My assumption has been that it is present to allow federations to
> > ensure that the metadata is regularily refreshed without forcing a
> > short validUntil, so that those using offline signing can still get
> > fast updates.
> That would just be confusing. I find it more likely that there
> are two attributes instead of one because they have different semantics,
> not just different syntax.
I agree that they have two different semantics, which is why you need
two attributes. However, having two different semantics when publishing
does not automatically preclude the one that parses the metadata from
unifying those two values into a single value.
> I can imagine very well why one would have a lower desired cacheDuration
> even though entries do not expire directly after that duration. One example
> is that you find it desirable to have changes to some metadata propagated
> several times a day (e.g.: new IdP added; description changed; etc) while
> you want your signatures to be valid for longer. Suppose your metadata URL
> is down temporarily and can't be refreshed - your newly added IdP will
> indeed not get propagated but at least existing IdP's won't fall off the
> net in 8h. Compare it to DNS: changeDuration is a RR's TTL ('really should
> refetch if at all possible'),
Just to be picky: If the TTL expires, the data is typically dropped
from the cache. I do not servers such as bind willfully distribute
expired data. You could say that the implementation in simpleSAMLphp
follows the DNS TTL behaviour :)
You would be better off comparing it to the "refresh" value in the
SOA record.
> while validUntil is the Expire ('really throw
> away if not refreshed').
>
> My main concern is that this federation makes a very explicit statement
> about when their metadata is set to expire:
> validUntil="2012-02-09T10:45:04Z". If that statement is present, I would
> always want to use that as the authoritative entity expiration, regardless
> of any cacheDuration values. Afterall, if they meant a shorter period
> they'd have shortened that time, right?
See my above text.
To summarize: We disagree on the interpretation of the following text
in the standard:
The maximum length of time a consumer should cache the metadata
contained in the element and any contained elements.
Your interpretation of that line is:
The maximum length of time until a cached copy of this metadata
SHOULD be refreshed. (And implicitly: If a refresh of the metadata
fails, a consumer MAY continue using the stale metadata.)
My interpretation is:
The maximum length of time a consumer can cache this metadata. (And
implicitly: Stale metadata MUST not be used.)
Now, as I mentioned in my previous mail, I am not against changing the
code. (In fact changing the code would simplify it a great deal, which
is always nice :) But, before that I want some sort of authorative
text about how those should be interpreted.
That can be anything really - notes from the discussion where they
decided to add two different attributes, a clarification afterwards,
a "metadata consumer best practice document", +++. I'd probably even
accept a document description of how other metadata consumers interpret
it.
I'd suggest taking that to the REFEDS list and let Scott answer it
there (as this is where identity federations relying on SAML and SAML
MDIOP are resent).
Personally I was always a bit unclear on the exact use of
cacheDuration (in my mind it certainly wasn't as simple as Thijs
presented here) and so I only ever produced metadata with just
validUntil (and no cacheDuration).
-peter
I'm not sure if this qualifies, but I've found a discussion about which
seems to me the exact same thing pertaining to the Shibboleth SP
implementation, where Scott Cantor expresses what he thinks is the desired
behaviour:
http://groups.google.com/group/shibboleth-users/browse_thread/thread/e2998e1ad9d829b0/ff8c3c69f0a93ab1?#ff8c3c69f0a93ab1
Scott Cantor's thoughts on this topic are documented in the shib wiki:
https://wiki.shibboleth.net/confluence/display/SHIB2/TrustManagement
According to Scott, validUntil and cacheDuration are best thought of
as mutually exclusive approaches to metadata distribution. Your
mileage may vary of course.
FWIW, the InCommon Federation uses validUntil only.
Tom
So relying on cacheDuration implies every system processing SAML
metadata does full PKIX trust path verification on the TLS/SSL
certificate of the webserver serving up the metadata -- is that a
reasonable expectation? (I don't think it is.)
But it doesn't really say anything regarding the case when both
attributes are present. Maybe a question for the SAML TC (who's chair
Nate should even be on this list),
-peter
> My interpretation is:
>
> The maximum length of time a consumer can cache this metadata. (And
> implicitly: Stale metadata MUST not be used.)
My interpretation is the same as Olav's.
I think it is obvious that you should not cache data longer than the maximum duration you are allowed to cache it :)
Thijs says:
> More generally, if they are to be interpreted as the same and just the
> minimum of both is taken, why are there two different attributes instead of
> just one?
In a federation, I believe there is a need for a time period after which you are guaranteed that noones relies on your metadata. This can be used for certficiate rollover, revoking "bad" entities, maximum delay of introducing new services/entities etc.
Say you want this time window to be set to 8 hours.
If you generate metadata export on the fly, you typically can always include a validUntil of now() + 8 hours. In many situations metadata is generated more manually, and even involves signing using offline excuipment (for increased security). I believe the Haka federation is an example of this. In this case you would like to reduce the regularity of how often you perform a signature on your metadata. The solution here could be to sign metadata with validUntil of now() + 14 days, and sign it once a week. Then, set the cacheDuration to 8 hours.
It is important to always use validUntil, even if you use cacheDuration. If not, you may end up in a situation where really old metadata (several years old) may be injected and accepted. This is because there are no serial numbers in the metadata, and often metadata distribution is deployed with a limited trust of the transport (such as http without ssl, or https with "browser"-ca trustchain).
Still, there are use cases that are a bit more complex (and unclear), involving metadata re-publishing (common for cross federations). If you re-publish a metadata document that involves a cacheDuration property, I think you also should adjust the validUntil attribute accordingly. If not, there is no way for the reveicer to know if the document is valid according to the initial expiration limitations.
I've tried to target this in Basic Metadata Aggregation Profile, section 4.2.3, Calculating entity expiration time:
http://dl.dropbox.com/u/2381403/SAML/BasicMetadataAggregationProfile-0.4.pdf
Andreas
FWIW, I agree with this conclusion. If the metadata refresh process
runs outside simpleSAMLphp, the software has no choice but to ignore
cacheDuration since the time at which caching took place is unknown.
This suggests the software shouldn't be paying attention to
cacheDuration in the first place.
Moreover, a metadata client needs to be smarter in the presence of
cacheDuration. A client that typically issues HTTP Conditional GET
requests SHOULD NOT issue a Condition GET request for a metadata
resource beyond its cacheDuration.
On the flip side, producers need to be smart about their use of
cacheDuration as well. For instance, if a new metadata instance is
produced once a day, any cacheDuration value less than one day is,
well, not advised. In fact, I don't really see the benefit of
cacheDuration if the server supports HTTP Conditional GET. The latter
is easier to implement (on both the client and the server) and just as
effective.
Tom
If people want a definitive call on what the standard says, and I think everyone admits at this stage that the standard plus errata are getting fairly hard to interpret at this point, then they need to ask for a formal interpretation from the SAML TC. But here's my take, as someone who was involved in the discussions around what Shibboleth *should* do.
The cacheDuration in the metadata is intended as a hint to the client as to when it should attempt to fetch new metadata. Metadata which you've held beyond its cacheDuration is stale, but may still be valid if its validUntil is still in the future. Metadata which is stale but valid may still be used, if you have no alternative.
Your client's configuration may have additional parameters to say how it should react to finding that it has stale metadata, but obviously the expectation is that it should have fetched new metadata perhaps a little before the cacheDuration expires. In the Shibboleth implementation, I think we eventually ended up with something rather like DHCP leases and start looking for new metadata at some percentage of the cacheDuration. Shibboleth will now, however, retain stale metadata well beyond its cacheDuration if the validUntil has not been reached, as will happen if the metadata server goes down for an extended period.
Again in the Shibboleth implementation, conditional GET is always used when stale metadata is being refreshed. If the conditional GET says "not changed" then the cacheDuration can simply be reset.
If the attempt to fetch new metadata *fails*, then the client tries again fairly soon, and repeatedly. There's an algorithm involving some parameters and gradual back-off which is designed to allow the client to get new metadata quickly in the case of a temporary outage at the publisher, but not to end up with hundreds or thousands of clients all mobbing the publisher if it goes offline for an extended period. As someone from the federation side of the argument, I thought this was a really important characteristic to have in theory, although of course in practice extended outages in metadata servers are really rare.
So, how does this compare with what people are saying here?
On 8 Feb 2012, at 21:21, Tom Scavo wrote:
> A client that typically issues HTTP Conditional GET
> requests SHOULD NOT issue a Condition GET request for a metadata
> resource beyond its cacheDuration.
The fact that the cache duration on a copy of a resource has expired does not mean that the resource will always be changed. So it is entirely reasonable to perform a conditional GET when the cache duration expires: if the conditional GET indicates that the document is unchanged, you can reactivate the cached data as if you had just fetched it.
So I think I'd recommend exactly the opposite: clients SHOULD always issue conditional GET requests whenever they have an existing copy of a resource. The mechanisms are independent.
> For instance, if a new metadata instance is
> produced once a day, any cacheDuration value less than one day is,
> well, not advised.
I don't think that's right, either. Cache duration acts as a hint as to *how soon after you load metadata* you should look for fresh material. If you set it to exactly the same interval as your signing interval, then if your signing interval varies at all, some unlucky clients will go two days without a refresh. So for a one-day signing interval, you should in my view *always* use a smaller value for cacheDuration.
Of course this is largely academic for both you and I, as neither of our federations currently uses cacheDuration (in our case, because we know there is some software which falls over when it sees both cacheDuration and validUntil). But if we *were* to use cacheDuration, we'd probably set it to something like six hours so that in-day metadata changes reach clients in a timely fashion and in general before the start of each new working day.
> In fact, I don't really see the benefit of
> cacheDuration if the server supports HTTP Conditional GET. The latter
> is easier to implement (on both the client and the server) and just as
> effective.
Client software is best off implementing both mechanisms. Conditional GET reduces the bandwidth requirements, cacheDuration acts as a hint as to probe timing and therefore acts to reduce the number of transactions.
Olav and Andreas obviously disagree with the Shibboleth team about the interpretation of cacheDuration.
> Now, as I mentioned in my previous mail, I am not against changing the
> code. (In fact changing the code would simplify it a great deal, which
> is always nice :) But, before that I want some sort of authorative
> text about how those should be interpreted.
That's exactly the right approach. You should ask for a formal interpretation from the SAML TC if you're in doubt.
Andreas:
> In a federation, I believe there is a need for a time period after which you are guaranteed that noones relies on your metadata. This can be used for certficiate rollover, revoking "bad" entities, maximum delay of introducing new services/entities etc.
>
> Say you want this time window to be set to 8 hours.
That's clearly what validUntil is for. I don't think interpreting cacheDuration to mean "the same as validUntil but based on the time you fetch the document" is justified by the text of the standard.
> It is important to always use validUntil, even if you use cacheDuration.
Absolutely agreed, for the reasons you give. It makes me very sad to observe that even some fairly high-profile metadata publishers don't do this. eduGAIN is the most obvious case, but there are others.
A different Ian says:
> I agree that it would be good to have this discussion on REFEDS, but
> IMHO the specification is very clear about what behavior is actually
> specified.
This is really not a REFEDS discussion, except if you're using the REFEDS list as a proxy for asking Scott what he thinks. It's probably better in that case to, well, ask him directly or (better) formulate a specific question and have the SAML TC rule on it.
I happen to agree with you, by the way. I think the specification *is* clear on this, and that the only reason you'd even think of using a different interpretation is that the specification says "should" instead of "SHOULD". Having said that, the word isn't in capital letters so doesn't necessarily have the normative RFC2119 meaning. Plus, there's nothing in the published errata about this. If this ambiguity is causing enough people to do something other than intended by the TC, perhaps there should be an errata. Or do I mean there SHOULD be one?
-- Ian
I guess it all depends on how the client is caching metadata to begin
with, but yeah, I see your point.
>> For instance, if a new metadata instance is
>> produced once a day, any cacheDuration value less than one day is,
>> well, not advised.
>
> I don't think that's right, either. Cache duration acts as a hint as to *how soon after you load metadata* you should look for fresh material.
Yes, I see that now, thanks for the clarification.
>> In fact, I don't really see the benefit of
>> cacheDuration if the server supports HTTP Conditional GET. The latter
>> is easier to implement (on both the client and the server) and just as
>> effective.
>
> Client software is best off implementing both mechanisms. Conditional GET reduces the bandwidth requirements, cacheDuration acts as a hint as to probe timing and therefore acts to reduce the number of transactions.
But it's up to the client, really, how often to refresh. In the
presence of conditional GET, a client should update often, depending
on their perception of need.
Let me put it this way. If all software behaved similarly, then
cacheDuration would be more effective than documentation in the wiki,
but since this will never happen, cacheDuration is actually harmful
(which I think is what you were trying to say).
Thanks,
Tom
[...]
Hi,
thank you for the detailed explanation of how Shibboleth interprets the
cacheDuration option, and how it is implemented in practice. Now,
obviously I and Andreas have not agreed with that interpretation. I
think we focus more on not caching beyond cacheDuration than refreshing
after cacheDuration. (After all, there is currently no connection
between caching and refreshing in simpleSAMLphp.)
But in any case, in the interest of harmony with other implementations,
and also because we obviously are in the minority with our
interpretation, I think we can change simpleSAMLphp to ignore
cacheDuration when deciding on how long metadata is valid.
Thijs Kinkhorst: I think you volunteered to create a patch in your
original mail?
Thanks :-) I'm happy to hear that.
> Thijs Kinkhorst: I think you volunteered to create a patch in your
> original mail?
I will, but I have to run now. I will send it probably after the upcoming
weekend.
> But it's up to the client, really, how often to refresh.
Absolutely.
> In the
> presence of conditional GET, a client should update often, depending
> on their perception of need.
Well, that and things like federation policy. We've had people occasionally interpret "often" as "every 15 seconds, even when not using conditional GET". Which can be amusing, if you have a particular kind of sense of humour.
But yes, one of the nice things about conditional GET is that clients can *check* for new metadata more often but actually end up downloading the data *less* than before.
> Let me put it this way. If all software behaved similarly, then
> cacheDuration would be more effective than documentation in the wiki,
> but since this will never happen, cacheDuration is actually harmful
> (which I think is what you were trying to say).
I don't think I said that cacheDuration was ever harmful, except to observe that some software has been known to fall over if it is present. It's useful in principle for a federation operator to be able to signal their refresh recommendation, and I think the way that Shibboleth treats it as an upper bound on the time before the metadata is regarded as stale, but allowing configuration to override that value downwards, makes the most sense.
On the whole though (and I think I've seen some discussion of this on one of the other lists, maybe a year or so back) both cacheDuration and validUntil are pretty big hammers when we're dealing with federation-sized aggregates. They are more likely to come into their own if and when we move to per-entity metadata queries, so that we're not talking about clients cacheing or invalidating everything at once.
-- Ian
> thank you for the detailed explanation of how Shibboleth interprets the
> cacheDuration option, and how it is implemented in practice.
[…]
> But in any case, in the interest of harmony with other implementations,
> and also because we obviously are in the minority with our
> interpretation, I think we can change simpleSAMLphp to ignore
> cacheDuration when deciding on how long metadata is valid.
Sounds like that will make the OP happy, and I'm glad to have been able to help.
I have suggested to Scott that it might be worth clarify the intent that a client MAY continue to use stale-but-valid metadata by adding something in the errata. If anyone here would like to provide input to that process, it would be very welcome, particularly if you'd like to come up with specific suggested wording.
Note that the OASIS IPR rules require that suggestions of that kind come through the right channels, and that any comment you make in this forum can't be used by the TC. I think the appropriate place to provide comments to the SAML TC is:
security-ser...@lists.oasis-open.org
Cheers,
-- Ian