Proposed policy questions

68 views
Skip to first unread message

Jeremy Rowley

unread,
Apr 6, 2017, 2:08:28 PM4/6/17
to ct-p...@chromium.org
I mentioned on a previous thread that, from an ecosystem's
perspective, getting additional information about the log policy
during the log inclusion process would help evaluate the openness and
usefulness of a log. Here's what I'd like to see (as a CA) as
additional requirements about each log's policy:
1) Rate limits on adding certificates
2) Query limits
3) Individual rate limits (say by IP address) compared to overall rate limits
4) Other rate limits related to the public's use of the log
5) Redundancy of the log's operations, including which IP addresses
need to be whitelisted for redundancy sites
6) A list of requirements to embed a root
7) The fee charged for embedding a root/logging certificates
8) Restrictions on the certs being logged and whether any certs are
blocked from logging (such as expired certificates, code signing, etc)
9) Which roots are already included and which are planned for inclusion
10) The circumstances and process for removing a root.

Thoughts?
Jeremy

Ryan Sleevi

unread,
Apr 7, 2017, 12:35:40 PM4/7/17
to Jeremy Rowley, ct-p...@chromium.org
On Thu, Apr 6, 2017 at 2:08 PM, Jeremy Rowley <rowl...@gmail.com> wrote:
I mentioned on a previous thread that, from an ecosystem's
perspective, getting additional information about the log policy
during the log inclusion process would help evaluate the openness and
usefulness of a log. Here's what I'd like to see (as a CA) as
additional requirements about each log's policy:
1) Rate limits on adding certificates

Can you clarify what you mean by this? That is, how do you propose to quantify this, and how do you propose compliance be monitored to this?
 
2) Query limits

Same
 
3) Individual rate limits (say by IP address) compared to overall rate limits

Same
 
4) Other rate limits related to the public's use of the log

Same
 
5) Redundancy of the log's operations, including which IP addresses
need to be whitelisted for redundancy sites

Unclear what you mean here
 
6) A list of requirements to embed a root

I think this is uncontroversial
 
7) The fee charged for embedding a root/logging certificates

That assumes a fee is charged, right? How is this different than 6?
 
8) Restrictions on the certs being logged and whether any certs are
blocked from logging (such as expired certificates, code signing, etc)

Would it be better to formalize this as must?
 
9) Which roots are already included and which are planned for inclusion

It's already required they list what roots are included. The policy states as much. It also states to update when that changes.

Can you explain why it is/should be relevant to also document planned for inclusion?
 
10) The circumstances and process for removing a root.

Can you expand on what your goal with this is? For example, has this been an issue with existing log operators? What level of detail is expected? If a log operator needs to respond to changes, is it a violation if they go beyond that process? Is that conducive or harmful to the log ecosystem? 

Jeremy Rowley

unread,
Apr 10, 2017, 12:23:45 PM4/10/17
to rsl...@chromium.org, ct-p...@chromium.org
I didn't realize these all had to be measurable from the outside. Some
of them I just want to know from a monitoring and logging perspective.

I'll use DigiCert as an example since it's the log I'm most familiar with.

> 1) Rate limits on adding certificates

Can you clarify what you mean by this? That is, how do you propose to
quantify this, and how do you propose compliance be monitored to this?

[JR] It's pretty easy to monitor. Either you can make the queries at
the rate specified or you can't. However, as a CA, I do want to know
how many certs I can dump into a log at once. I'd like to know whether
the rate limit is tied to the total number of certs waiting to be
merged into the tree or if there is a limit on the number I should
submit a second.

(2 was merged with this one)

>
> 3) Individual rate limits (say by IP address) compared to overall rate limits

Same

[JR] Hard to monitor, but I believe Comodo's log throttles the certs
by IP address whereas we have an overall rate limit. For DigiCert it
doesn't matter who logs, but once the bandwidth is filled up, no more
certs can be added to the log. From a CA stand-point, I'd like to know
whether I'm hitting the limit or if its a general log use problem.

>
> 4) Other rate limits related to the public's use of the log


Same

[JR] If the goal is transparency, there should be transparency for log
operators. I don't expect this question to be answered, but are there
other rate limits that we haven't thought of that the log operator has
in place? No proposed monitoring on this one, I'd just like to know
what log operators are doing.

>
> 5) Redundancy of the log's operations, including which IP addresses
> need to be whitelisted for redundancy sites


Unclear what you mean here

[JR] As far as redundancy, I'd like to know whether the log will stay
up. Although Google initially monitors this, most of the external CA
logs have failed. If the log operator doesn't have redundant
operations, I'd prefer not to log to them. For the IP address issue,
we have our log operating out of three different data centers. This
means our log has three different IP addresses that the API could
resolve to. We had issues with people who didn't poke enough holes in
their firewall.


>
> 6) A list of requirements to embed a root


I think this is uncontroversial


> 7) The fee charged for embedding a root/logging certificates


That assumes a fee is charged, right? How is this different than 6?
[JR] It's not. We could merge the two.

>
> 8) Restrictions on the certs being logged and whether any certs are
> blocked from logging (such as expired certificates, code signing, etc)


Would it be better to formalize this as must?
[JR] What do you mean? Like a "must log x, y, and z?" From the CA
perspective, I want to know what I can include in the log. I think the
Google inclusion policy should specify what type of certs must be
permitted (i.e. - all non-expired SSL certs). The log operator could
then expand on those base requirements, such as permitting code
signing or expired certs.

>
> 9) Which roots are already included and which are planned for inclusion


It's already required they list what roots are included. The policy
states as much. It also states to update when that changes.

Can you explain why it is/should be relevant to also document planned
for inclusion?

[JR] Someone mentioned that they only wanted a few logs included in
Google's log store. If that is true, the selected logs should be as
broadly used as possible. If I submit a log knowing that it's only
intended for DigiCert (and only includes the DigiCert roots) then is
that valuable to the ecosystem? What if it's the WoSign log and no one
intends to log to it? If there are only a few logs permitted, we
should know whether they intend to serve the entire community or just
a small sub-set.

>
> 10) The circumstances and process for removing a root.


Can you expand on what your goal with this is? For example, has this
been an issue with existing log operators? What level of detail is
expected? If a log operator needs to respond to changes, is it a
violation if they go beyond that process? Is that conducive or harmful
to the log ecosystem?

[JR] The goal is to give CAs knowledge of when their root will be
kicked out of a log. For example, we removed a Comodo root because of
the volume of certs logged. I'm sure Comodo would have appreciated
more clear guidance on when this would happen. I know I'd like to be
evaluate the risk my roots will be removed.

Pierre Phaneuf

unread,
Apr 10, 2017, 2:04:40 PM4/10/17
to Jeremy Rowley, Ryan Sleevi, ct-p...@chromium.org
On Mon, Apr 10, 2017 at 5:23 PM, Jeremy Rowley <rowl...@gmail.com> wrote:

> I didn't realize these all had to be measurable from the outside. Some
> of them I just want to know from a monitoring and logging perspective.

I think the idea is that if we put them in the policy, it stands to
reason that they should be things that we are able to point and say
"yes, you are in compliance with the policy"?

>> 1) Rate limits on adding certificates
>
> Can you clarify what you mean by this? That is, how do you propose to
> quantify this, and how do you propose compliance be monitored to this?
>
> [JR] It's pretty easy to monitor. Either you can make the queries at
> the rate specified or you can't. However, as a CA, I do want to know
> how many certs I can dump into a log at once. I'd like to know whether
> the rate limit is tied to the total number of certs waiting to be
> merged into the tree or if there is a limit on the number I should
> submit a second.

In our case, it's not even a rate, but it's based on the size of the
backlog, which seems impossible to monitor?

Even for a simpler case, like an overall rate of submissions per
second, verifying compliance with that would be rather difficult? If
the compliance monitor tries to attain that advertised rate, it will
either crowd out everyone else, or it wouldn't be able to reach it,
due to other requests. Potentially, even if advertised rate is
followed to the letter, it could lead to a CA (or even the compliance
monitor itself) to be entirely unable to submit anything, in a way
that is impossible to tell from malicious behaviour!

One idea that's been thrown around is keeping an eye on growth rate
(from the STHs), but that doesn't allow verifying that when the growth
is small, requests aren't getting denied, it only provides a sort of
justification in case of a complaint (if a CA says that a log has been
dropping their requests a lot, the compliance monitor could provide
some evidence that the log was accepting a lot of other requests)?

As a more general comment, from what I've gathered by sitting next to
the people working on compliance monitoring and reviewing their code
on occasion, things are very rarely "easy to monitor"! Quantum
mechanics and general relativity appear to be involved, on
occasions... :-)

> [JR] Hard to monitor, but I believe Comodo's log throttles the certs
> by IP address whereas we have an overall rate limit. For DigiCert it
> doesn't matter who logs, but once the bandwidth is filled up, no more
> certs can be added to the log. From a CA stand-point, I'd like to know
> whether I'm hitting the limit or if its a general log use problem.

I totally agree with the CA point of view, though, you'd really want
to know if you're hitting a limit, if you're just having some bad luck
(trying to submit to a log that's overloaded), or if you're getting
malicious bad treatment from a log.

What I'm getting from this is that it might not be all that useful to
have some of these in the policy (mainly because they would be very
difficult to monitor/enforce), but it might make sense to say that
this kind of information (and others, such as the redundancy,
restrictions on the certs, etc) should be part of the Chromium bug.

> [JR] As far as redundancy, I'd like to know whether the log will stay
> up. Although Google initially monitors this, most of the external CA
> logs have failed. If the log operator doesn't have redundant
> operations, I'd prefer not to log to them. For the IP address issue,
> we have our log operating out of three different data centers. This
> means our log has three different IP addresses that the API could
> resolve to. We had issues with people who didn't poke enough holes in
> their firewall.

"ct.googleapis.com" resolves to 16 IPs for me right now, the
particular IPs it resolves to can change over time, and I'm not even
sure that the number is fixed! If you've had people with firewall
issues while trying to access your logs, I wonder how well they would
fare with ours? :-)

Ryan Sleevi

unread,
Apr 10, 2017, 2:41:02 PM4/10/17
to Jeremy Rowley, Ryan Sleevi, ct-p...@chromium.org
On Mon, Apr 10, 2017 at 12:23 PM, Jeremy Rowley <rowl...@gmail.com> wrote:
I didn't realize these all had to be measurable from the outside. Some
of them I just want to know from a monitoring and logging perspective.

Like Pierre mentioned, suggesting it's part of policy suggests that you're either compliant or not compliant with the policy. So finding ways to try to measure that is important for the policy as a quantifiable thing, otherwise, it doesn't really serve any value if it can't be measured/enforced.

That's not to say I don't think these are good things to document and/or recommend, and possibly even have a policy that requires minimum X using infrastructure and "stress tests", but it's why I have an adverse reaction to normative requirements of an unquantifiable nature.
 
I'll use DigiCert as an example since it's the log I'm most familiar with.

> 1) Rate limits on adding certificates

Can you clarify what you mean by this? That is, how do you propose to
quantify this, and how do you propose compliance be monitored to this?

[JR] It's pretty easy to monitor. Either you can make the queries at
the rate specified or you can't. However, as a CA, I do want to know
how many certs I can dump into a log at once. I'd like to know whether
the rate limit is tied to the total number of certs waiting to be
merged into the tree or if there is a limit on the number I should
submit a second.

I totally understand the desire for it, but I was more asking about the quantify it.

For example, your response seems to imply a minimum guarantee per-CA (as a form of SLO). Is it bounded by the overall set of CAs a log trusts? That is, does every CA get a 'guaranteed' X certs-per-second, but a 'burst' of Y certs-per-second? What happens if there's unused slack (e.g. a small CA that issues only 10 certs per day). Do other CAs get to leverage that QPS, or are they throttled by the log in order to ensure they don't come to rely on exceeding that throttle?

I totally get your goal here, and I think it's useful to understand in the overall ecosystem, but I'm unsure about what the policy should be, let alone how it should be measured.
 
>
> 3) Individual rate limits (say by IP address) compared to overall rate limits

Same

[JR] Hard to monitor, but I believe Comodo's log throttles the certs
by IP address whereas we have an overall rate limit. For DigiCert it
doesn't matter who logs, but once the bandwidth is filled up, no more
certs can be added to the log. From a CA stand-point, I'd like to know
whether I'm hitting the limit or if its a general log use  problem.

Is it your pipe bandwidth? I thought you were thinking log synchronization bandwidth, which may be a narrower pipe.

I agree, though, that there should be clearer ways of communicating and establishing this, so that we can have responsive backpressure to oversaturation.
 
Same

[JR] If the goal is transparency, there should be transparency for log
operators. I don't expect this question to be answered, but are there
other rate limits that we haven't thought of that the log operator has
in place? No proposed monitoring on this one, I'd just like to know
what log operators are doing.

I'm not disagreeing with that goal, mostly, it's what questions do we need to ask and expect answers for, are those answers "binding" (e.g. non-compliant) or simply advisory (meaning failure to uphold is not a log removal event), and what are the questions to ask? Are there limits to what can/should be shared? If the CA community (which dominates the current log operator) haven't though to ask the questions of log operators, what makes the log operators (which are largley CAs) more likely to know to ask them?
 
> 5) Redundancy of the log's operations, including which IP addresses
> need to be whitelisted for redundancy sites


Unclear what you mean here

[JR] As far as redundancy, I'd like to know whether the log will stay
up. Although Google initially monitors this, most of the external CA
logs have failed. If the log operator doesn't have redundant
operations, I'd prefer not to log to them.  For the IP address issue,
we have our log operating out of three different data centers. This
means our log has three different IP addresses that the API could
resolve to. We had issues with people who didn't poke enough holes in
their firewall.

Well, you don't _have_ to use distinct IPs, right? That's what BGP anycast is for? ;)

I'm trying to push on you to concretely quantify the questions you'd want asked, not why you want them asked :)

 
>
> 8) Restrictions on the certs being logged and whether any certs are
> blocked from logging (such as expired certificates, code signing, etc)


Would it be better to formalize this as must?
[JR] What do you mean? Like a "must log x, y, and z?" From the CA
perspective, I want to know what I can include in the log. I think the
Google inclusion policy should specify what type of certs must be
permitted (i.e. - all non-expired SSL certs).  The log operator could
then expand on those base requirements, such as permitting code
signing or expired certs.

Yes, I meant "You MUST log X, Y, and Z" (and possibly "You MUST NOT log A, B, C"). You described it as a description of policy, but I was asking whether it should be objectively quantified with "must support X", profiling the general mechanism from 6962/6962-bis into support for specific certificates.

> 9) Which roots are already included and which are planned for inclusion


It's already required they list what roots are included. The policy
states as much. It also states to update when that changes.

Can you explain why it is/should be relevant to also document planned
for inclusion?

[JR] Someone mentioned that they only wanted a few logs included in
Google's log store.

Yes, that was me.
 
If that is true, the selected logs should be as
broadly used as possible.

Indeed
 
If I submit a log knowing that it's only
intended for DigiCert (and only includes the DigiCert roots) then is
that valuable to the ecosystem?

I would suggest not really ;)
 
What if it's the WoSign log and no one
intends to log to it?

Why not? Are there objective reasons? Should there be (greater) community review on acceptance?
 
If there are only a few logs permitted, we
should know whether they intend to serve the entire community or just
a small sub-set.

I agree. But how is that not already accomplished by the existing policy regarding disclosure of policies? What new/additional questions would you want asked?

For example, I read your request as "The Log Operator should immediately notify the community that it's considering adding any CA's certificate", which I suspect was unintended?
 
>
> 10) The circumstances and process for removing a root.


Can you expand on what your goal with this is? For example, has this
been an issue with existing log operators? What level of detail is
expected? If a log operator needs to respond to changes, is it a
violation if they go beyond that process? Is that conducive or harmful
to the log ecosystem?

[JR] The goal is to give CAs knowledge of when their root will be
kicked out of a log. For example, we removed a Comodo root because of
the volume of certs logged. I'm sure Comodo would have appreciated
more clear guidance on when this would happen. I know I'd like to be
evaluate the risk my roots will be removed.

Can you quantify this into a set of questions to be asked of all Log Operators?

My goal is consistency here - asking clear questions so that there answers are easily available.

Jeremy Rowley

unread,
Apr 10, 2017, 3:47:18 PM4/10/17
to Pierre Phaneuf, Ryan Sleevi, ct-p...@chromium.org
All the information I mentioned was only intended to be part of the
initial Chromium bug for including the root, not part of the actual
policy.

Considering that the certs must be logged to a Google log, I'm not
sure why people had a problem with our IP addresses and not yours.
Perhaps they permit the entire Google block?

Ryan Sleevi

unread,
Apr 10, 2017, 3:54:36 PM4/10/17
to Jeremy Rowley, Pierre Phaneuf, Ryan Sleevi, ct-p...@chromium.org
If you want to formulate precise questions, that would totally be welcome.

(The GitHub set-up is still going through internal approvals. Sorry, Big Company woes)

Jeremy Rowley

unread,
Apr 10, 2017, 4:02:01 PM4/10/17
to Ryan Sleevi, ct-p...@chromium.org
Ah - these are all things I want to know about new logs and their
intended operation so I can work with them as both a CA and monitor,
not necessarily what needs to be monitored by Google for compliance
(although the confusion is definitely my fault as I started the
conversation geared towards policies about the ecosystem).

I cut some of the text:

>>> For example, your response seems to imply a minimum guarantee per-CA (as a form of SLO). Is it bounded by the overall set of CAs a log trusts? That is, does every CA get a 'guaranteed' X certs-per-second, but a 'burst' of Y certs-per-second? What happens if there's unused slack (e.g. a small CA that issues only 10 certs per day). Do other CAs get to leverage that QPS, or are they throttled by the log in order to ensure they don't come to rely on exceeding that throttle?

>>> I totally get your goal here, and I think it's useful to understand in the overall ecosystem, but I'm unsure about what the policy should be, let alone how it should be measured.

[JR] No policy here. I just want to know what I can put in my queries
without being blocked.

>>> I'm not disagreeing with that goal, mostly, it's what questions do we need to ask and expect answers for, are those answers "binding" (e.g. non-compliant) or simply advisory (meaning failure to uphold is not a log removal event), and what are the questions to ask? Are there limits to what can/should be shared? If the CA community (which dominates the current log operator) haven't though to ask the questions of log operators, what makes the log operators (which are largley CAs) more likely to know to ask them?

[JR] So far, people make it widely known on this mailing list when
they hit a rate limit. I'd rather know which rate limit I'm going to
hit before it happens.


>>> I'm trying to push on you to concretely quantify the questions you'd want asked, not why you want them asked :)

[JR] Ah - my mistake. I'll work on a concrete set of questions based
on your feedback so far..


>>>Yes, I meant "You MUST log X, Y, and Z" (and possibly "You MUST NOT log A, B, C"). You described it as a description of policy, but I was asking whether it should be objectively quantified with "must support X", profiling the general mechanism from 6962/6962-bis into support for specific certificates.

[JR] Yes - it should probably be in the policy. It should also be in
the policy whether log operators can make their permission set
more/less restrictive (depending on how that question is resolved).

>>>Yes, that was me.

[JR] Okay. Wasn't sure if it was you or Hurst.

> What if it's the WoSign log and no one
> intends to log to it?

>>> Why not? Are there objective reasons? Should there be (greater) community review on acceptance?

[JR] That's the question. Personally, I think there should be greater
review or commitment to use a log before it is included. What good is
a log if no one will use it?


>>> Can you quantify this into a set of questions to be asked of all Log Operators?

[JR] Yes. Let me think about it and get back to you.

Pierre Phaneuf

unread,
Apr 11, 2017, 6:53:28 AM4/11/17
to Jeremy Rowley, Ryan Sleevi, ct-p...@chromium.org
On Mon, Apr 10, 2017 at 9:01 PM, Jeremy Rowley <rowl...@gmail.com> wrote:

> Ah - these are all things I want to know about new logs and their
> intended operation so I can work with them as both a CA and monitor,
> not necessarily what needs to be monitored by Google for compliance
> (although the confusion is definitely my fault as I started the
> conversation geared towards policies about the ecosystem).

Right, that's what I was getting as well. I agree that some more
information in the Chromium bug for a log could be really useful for
CAs and monitors, looking forward to your suggestions!

> [JR] So far, people make it widely known on this mailing list when
> they hit a rate limit. I'd rather know which rate limit I'm going to
> hit before it happens.

I'm not certain that first statement is true at all. Perhaps because
of the large scale for which many of our systems are designed, but we
have many layers of protection, not all of them fully under our team's
direct control (Google's network infrastructure team is, quite
sensibly, a bit protective of their infrastructure!). Our culture is
maybe more orientated towards "when overload happens" than "if
overload happens", and systems are designed to just deal with it by
themselves, as much as possible. *Excessive* rate limiting shouldn't
be happening, but I'm quite certain there's all sorts of rate limiting
happening that none of us are even aware of, because it's just
business as usual, and reporting rate limiting would just get this
list a daily email saying that "there was rate limiting" (I'd expect
details of incidents might often remain confidential).

A log running on GCP, AWS, or some other cloud provider might have
similar "hidden" rate limiting, provided by different levels of the
platforms, I would expect that this isn't unique to our internal
systems.

I also understand that this isn't very helpful for a CA who is
wondering which rate limit they might be hitting, I'm just providing
background on why this might be difficult to define clearly.

Ryan Sleevi

unread,
Apr 24, 2017, 5:13:53 PM4/24/17
to Jeremy Rowley, Ryan Sleevi, ct-p...@chromium.org
On Mon, Apr 10, 2017 at 4:01 PM, Jeremy Rowley <rowl...@gmail.com> wrote:
>>> Can you quantify this into a set of questions to be asked of all Log Operators?

[JR] Yes. Let me think about it and get back to you.

Any thoughts here? I'd love to try to capture them :) 

I filed https://github.com/GoogleChrome/ct-policy/issues/5 to try and capture a minimal set for questions, but it'd be great to expand that.
Reply all
Reply to author
Forward
0 new messages