Maximum Merge Delay

243 views
Skip to first unread message

Kat Joyce

unread,
May 2, 2018, 2:42:37 PM5/2/18
to Certificate Transparency Policy
There is a discussion currently going on on the Digicert Yeti Chromium bug that is raising some good questions about what the merge delay of a Log actually means.  I think it would be worthwhile to have a proper discussion about this question on this mailing list (rather than have the conceptual discussion on the Chrome bug).

So the question is, what do we mean by merge delay, and once an SCT has been issued for a certificate, what are we expecting to occur within the maximum merge delay (MMD)?

Unfortunately, RFC6962 makes a handful of statements on this topic, which aren't entirely consistent, and don't give a clear answer to this question.  The Chrome Log Policy is equally vague.

I think many would agree that the *minimum* requirement is that the certificate be incorporated into the Log tree within the MMD.  What do we mean by that?  That an STH has been signed committing the certificate to the tree.

In an ideal world, creating such an STH would result in that STH being made public (via get-sth), and inclusion proof for the certificate being available (via get-proof-by-hash), and the certificate being publicly auditable (via get-entries).

The difficult questions come in to play when one or more of these Log endpoints are unavailable:
  • Said STH is not retrievable, because of an issue with get-sth.
  • An inclusion proof for the certificate is not possible because of an issue with get-proof-by-hash.
  • The certificate itself is not discoverable because of an issue with get-entries.
I would argue that the first point (STH being publicly retrievable) is also part of the MMD requirement, as otherwise there is nothing to stop a Log retroactively signing an STH with a backdated timestamp, after a period of downtime.

Note that the interesting situations are when some, but not all, of these end-points are not working.  For example:
  • if get-entries is down, but you can get an STH (and possibly proof of inclusion) that shows that the cert has been incorporated into the Log, is this a merge delay problem, or should it be considered more of an availability issue? 
  • A similar question could be, what happens if get-entries is up for a period of time after the STH is signed and made available, but goes down before anyone actually gets the relevant entries, and remains down for a prolonged period of time?  That sounds very much more like an availability problem.
  • What about if you can get the certificate via get-entries, but get-sth is down, so you are not able to get the STH that incorporated it until after the MMD has passed?  (Admittedly this second scenario is unlikely to happen due to get-sth being the simplest Log endpoint, and if that's down, chances are other things are too, but it's worth considering!)  Here, you cannot trust the entries provided by the Log until you have received an STH that commits to those entries, so we're back to the point of the STH being made public within the MMD should be a requirement.
There is an argument that the spirit of the MMD is that's the attack window for use of a mis-issued certificate before it is made publicly detectable via CT.  As a result of this, the merge delay should be the time between certificate issuance/submission to the Log, and when it can be retrieved via get-entries (with an accompanying STH committing to those entries).  But the second example (get-entries is briefly up, but then goes down) is then of interest.

When it comes to potential merge delay issues from Logs (such as the current Digicert discussion), I'd imagine part of the reason the Chrome policy is intentionally vague is to be able to assess the Log behavior on a case by case basis.  And it may end up being that this topic is so nuanced that that is how to best handle merge delay incidents.

I welcome your thoughts!

Ryan Hurst

unread,
May 2, 2018, 6:37:05 PM5/2/18
to Kat Joyce, Certificate Transparency Policy
Kat,

My feeling on this topic is that there are really two concepts here, the MMD and the Availability of the Log.

MMD exists, as I see it, as a means to keep logs honest, it is a representation of the internal state of the log and is, worthy of individual measurement.

Availability, on the other hand, has many perspectives to it, we can have temporal failures as well as prospective ones. For example, with the recent blacklisting of AWS IPs many services were not available to those located in Russia due to the countries censorship of IP addresses owned/operated by Amazon.

I also think as a function of policy a UA would respond differently to an availability issue than they would to an MMD issue an having two independent metrics would be a good way to help them make such a decision.

Ryan Hurst



--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+unsubscribe@chromium.org.
To post to this group, send email to ct-p...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/CAO%2BqTAmC9A-wMoQdpkzDzH1Cu7eVCDVNG8KfjVsXLrK%3D3mQhVw%40mail.gmail.com.

Kat Joyce

unread,
May 3, 2018, 10:54:20 AM5/3/18
to ryan....@gmail.com, Certificate Transparency Policy
If we take MMD to be *just* about the Log's internal state, then how do we (entities that are not the Log Operator, and that are monitoring for compliance) check that?

Requiring that the Log incorporates the certificate into its tree (i.e. signs an STH committing to it) within the MMD is a given.  I believe that we should also require the STH be made public (via get-sth) within the MMD (i.e. time between certificate being submitted, and STH committing to inclusion of said certificate being *seen externally* - not just signed - being less than the MMD).

If we do not *at least* have this second requirement, we are left with the following two requirements:
  • Log must incorporate a cert into it's tree within MMD.
  • Log must publish STHs no older than the MMD (from RFC6962 Section 3.5)
With only these two requirements, the following scenario would be completely valid:

Using an MMD of 24hrs for ease:
  • Time 0:  Certificate is submitted to Log, and SCT is signed and returned.
  • Time 23:59:  Log signs an STH with timestamp 23:59 that commits the certificate to the Log tree.  But Log does not yet publish this STH.
  • Time 47:58:  Log publishes the STH to the outside world, committing to adding the certificate to the Log.
Note that nothing in that scenario goes against the rules stated above, and results in the time between a certificate being added to a Log and the STH being made available to the public being just shy of 48hrs.

Is this fine?  I don't know.  It may be that we choose to say it is.  But it's important to note that the potential-attack-window-before-misissuance-detection becomes 2xMMD if we make the MMD *only* about the Log's internal state.

Otherwise, the MMD requirement should *at least* require that an STH committing to the inclusion of a certificate be *published* (not just signed) within the MMD after an SCT is issued for a certificate.

Kat

To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Ryan Hurst

unread,
May 3, 2018, 11:36:39 AM5/3/18
to Certificate Transparency Policy, ryan....@gmail.com
Kat,

I think the answer to the question of "then how do we (entities that are not the Log Operator, and that are monitoring for compliance) check that?" is rooted in what is policy and how is it enforced. It seems in this case, after the fact, we have enough data to tell that the log did incorporate within the MMD, even though we could not measure that reliably within the MMD.

Your right though that this has the consequence that the time between a certificate being added to a Log and the STH being made available to the public being just shy of 48hrs (with a 24 MMD) which isnt acceptable in my mind either.

I guess I just want to make sure that the UAs have the data and a policy framework that empowers them to consider the nuances of a given situation. 

Ryan

Jeremy Rowley

unread,
May 3, 2018, 12:54:16 PM5/3/18
to Ryan Hurst, Certificate Transparency Policy
I think there's also still a question of what availability means. Obviously, for Chrome, available should include available to Chrome's monitors. However, what about other monitors? Although some monitors showed us as down for longer than 24 hours, others did not. Which monitors do we need to be up for? One of the reasons we had issues (Rick will be posting more about this soon) is that the number of queries downloading the tree jumped from about 10 an hour to 600,000 an hour. If the we block the entity requesting 600,000 downloads an hour, are we no longer available to that monitor? 

To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+unsubscribe@chromium.org.

To post to this group, send email to ct-p...@chromium.org.

Alex Cohn

unread,
May 3, 2018, 2:47:25 PM5/3/18
to Jeremy Rowley, Ryan Hurst, Kat Joyce, Certificate Transparency Policy
As a thought experiment, I'd like to propose the following: 

If a malicious log wished to conceal the existence of a certificate for longer than their MMD (or even 2x their MMD), they could incorporate the certificate into their Merkle tree and publish corresponding SCTs and inclusion proofs, while returning an error for any get-entries request that included the entry they wished to hide. 

They could further attempt to distinguish between auditors and monitors that publish the contents of logs vs. monitors that merely are checking for cryptographic consistency and MMD adherence, and allow the latter to fetch the entries in question. They could therefore appear to be behaving entirely correctly from, e.g. the UA monitor's perspective.

How can we craft a policy that forbids this behavior while allowing for honest logs to occasionally experience downtime? Is this covered by the present "availability" requirements? (not if we only consider the perspective of Chromium's monitor) "Split view" requirements? (not if we interpret "split view" as "publishing different and inconsistent STHs to different parties")

Back to the issue with Yeti2018: on reflection, I was looking at the concept of the MMD with monitor-tinted glasses. Yeti2018 published STHs that proved they included every entry within their MMD, and I see the logic in only requiring a STH to prove MMD compliance. I still think this should count as misbehavior, but I am not convinced it does under the present CT log policy.

I have no problem with a log blocking or rate-limiting abusive traffic, but I note that the get-entries endpoint should be easy to horizontally scale and/or cache.

Alex


Kat Joyce

unread,
May 4, 2018, 11:48:59 AM5/4/18
to Alex Cohn, Jeremy Rowley, Ryan Hurst, Certificate Transparency Policy

Responses to both Jeremy and Alex's messages inline below.
 
I think there's also still a question of what availability means. Obviously, for Chrome, available should include available to Chrome's monitors.

Yes :)
 
However, what about other monitors? Although some monitors showed us as down for longer than 24 hours, others did not. Which monitors do we need to be up for?

This is a good question, and I'm not sure there's a better answer other than "all that you possibly can be".  Speaking for myself (i.e. this may not be the opinion of the Chrome guys), I believe that the lack of definitive answer to this question is part of the reason Chrome currently chooses to handle incidents on a case by case basis, and has open discussions about what the outcome of incidents should be on the ct-policy mailing list.  By involving the CT community in these discussions, unavailability to non-Google monitors can be taken into account, but doesn't always have to mean certain doom for the Log.  The openness of the CT ecosystem means that not having a set-in-stone answer to this question is manageable.
 
One of the reasons we had issues (Rick will be posting more about this soon) is that the number of queries downloading the tree jumped from about 10 an hour to 600,000 an hour. If the we block the entity requesting 600,000 downloads an hour, are we no longer available to that monitor? 

I don't think anyone would disagree with a Log temporarily blocking abusive traffic.

As a thought experiment, I'd like to propose the following: 

If a malicious log wished to conceal the existence of a certificate for longer than their MMD (or even 2x their MMD), they could incorporate the certificate into their Merkle tree and publish corresponding SCTs and inclusion proofs, while returning an error for any get-entries request that included the entry they wished to hide. 

They could further attempt to distinguish between auditors and monitors that publish the contents of logs vs. monitors that merely are checking for cryptographic consistency and MMD adherence, and allow the latter to fetch the entries in question. They could therefore appear to be behaving entirely correctly from, e.g. the UA monitor's perspective.

As I mentioned on the Chromium bug, we do run tooling that fetches all of the entries from Logs.  That tooling is used to back the various places that we do, in fact, publish the contents of Logs (e.g. the Transparency report, the DNS frontends etc).  So, from Google's perspective at least, a Log would be unable to hide a specific entry, as we both check the behaviour of the Log, and publish the entries.

However, as mentioned on the Chromium bug relating to the Digicert Logs, our tool that gets all of the entries from a Log was not yet getting entries from Yeti, as we had only been turning that on once a Log was added to Chrome (which Yeti is right on the brink of - but I don't believe has actually been added yet).  This incident has been a learning point for us, as it has made us realise this tool should begin getting entries from a Log when it's monitoring period begins.
 

How can we craft a policy that forbids this behavior while allowing for honest logs to occasionally experience downtime? Is this covered by the present "availability" requirements? (not if we only consider the perspective of Chromium's monitor)

As has been mentioned on other threads, we are in the process of developing a new monitor implementation that will measure the availability of all endpoints (rather than just get-sth), and publish these values.  Once this new monitor implementation is up and running, this will be covered by the Google compliance monitor.
 
"Split view" requirements? (not if we interpret "split view" as "publishing different and inconsistent STHs to different parties")

Back to the issue with Yeti2018: on reflection, I was looking at the concept of the MMD with monitor-tinted glasses. Yeti2018 published STHs that proved they included every entry within their MMD, and I see the logic in only requiring a STH to prove MMD compliance. I still think this should count as misbehavior, but I am not convinced it does under the present CT log policy.

It does under the policy itself, but, as I mentioned, development of a monitor that implements checking of the policy more fully than our current monitor is still ongoing.

The timing of the Digicert downtime is unfortunate with regards to this new monitor implementation.  If it had already been in place, the downtime of get-entries (and other endpoints) would likely have been flagged as an availability issue way before the question of MMD even arose.

Although a Log experiencing downtime is never a good thing, the Digicert incident has been useful to us, as it has shone a light on of a couple of significant shortcomings of our infrastructure, some we were already aware of and are working to improve, and at least one that we had not fully considered.  Whatever the decision from the Chrome side about repercussions (if any) for the Yeti Log, I believe that the CT ecosystem will come out stronger and more robust as a result of this.
 

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To post to this group, send email to ct-p...@chromium.org.

Andrew Ayer

unread,
May 15, 2018, 2:18:14 PM5/15/18
to Kat Joyce, 'Kat Joyce' via Certificate Transparency Policy
On Wed, 02 May 2018 18:41:58 +0000
"'Kat Joyce' via Certificate Transparency Policy"
<ct-p...@chromium.org> wrote:

> I welcome your thoughts!

During the discussion about Aviator's MMD violation, I argued that MMD
violations should be treated like availability issues, even when all
endpoints stay up. If I were setting the policy, I would define an "MMD
outage" as any period of time during which any certificate submitted
more than the MMD ago is not available for retrieval, whether due to
non-functional endpoints or a failure to incorporate. Then I'd count
MMD outages against the log's 99% uptime requirement.

The reason is that violating the MMD while keeping all endpoints
accessible has practically the same impact as incorporating a certificate
and then immediately going down. In both cases, monitors receive delayed
notification about a certificate. Since the practical impact is the same,
it makes sense to treat the two cases the same. In addition, treating MMD
violations like availability issues is a bit more objective, and allows
logs to make recoverable mistakes as long as they stay above 99% uptime.

One scenario that I think needs to be called out is when a log delays
publishing an STH. As I understand it, DigiCert's logs do this. I
think that if an STH incorporating a certificate is published after the
MMD has elapsed, it should be considered an MMD violation, even if the
timestamp in the STH indicates that the STH was signed within the MMD.
The reason is that it's not possible to verify that a certificate was
actually logged until the STH is made available.

Regards,
Andrew
Reply all
Reply to author
Forward
0 new messages