Final minutes of 2025-05-01 Validation sub-committee meeting

300 views

Skip to first unread message

Corey Bonnell

unread,

May 15, 2025, 12:06:00 PMMay 15

to valid...@groups.cabforum.org

Here are the final minutes of the meeting as indicated in the subject and recorded by Chris Clements and approved at the 2025-05-15 validation-sc meeting.

Meeting Date: 2025-05-01

Attendees: Aaron Gable (Let's Encrypt), Aaron Poulsen (Amazon), Ben Wilson (Mozilla), Bruce Morton (Entrust), Chris Clements (Google), Clint Wilson (Apple), Corey Bonnell (DigiCert), Corey Rasmussen (OATI), Eric Kramer (Sectigo), Gregory Tomko (GlobalSign), Gurleen Grewal (Google), Henry Birge-Lee (Henry Birge-Lee (Private person)), Jaime Hablutzel (OISTE Foundation), Johnny Reading (GoDaddy), Kiran Tummala (Microsoft), Mahua Chaudhuri (Microsoft), Martijn Katerbarg (Sectigo), Michael Slaughter (Amazon), Michelle Coon (OATI), Nargis Mannan (VikingCloud), Nate Smith (GoDaddy), Nome Huang (TrustAsia), Rebecca Kelly (SSL.com), Rollin Yu (TrustAsia), Ryan Dickson (Google), Scott Rea (eMudhra), Stephen Davidson (DigiCert), Thomas Zermeno (SSL.com), Trevoli Ponds-White (Amazon), Wayne Thayer (Fastly), Wendy Brown (US Federal PKI Management Authority), Yamian Quintero (Microsoft)

Corey Bonnell read the Note Well

Approval of Minutes:

- The 2025-03-27 F2F 64 VSC minutes were approved.

- Minutes from the previous meeting have not yet been circulated.

Agenda:

- Continued discussion on SC-85 (DNSSEC) https://github.com/cabforum/servercert/pull/579

- Continued discussion (if needed) on SC-82 redux (CA-assisted validation)

- Backlog grooming

Continued discussion on SC-85 (DNSSEC) https://github.com/cabforum/servercert/pull/579:

- Henry Birge-Lee led the conversation primarily focused on comments from GitHub and discussing changes since the last presented draft.

- The biggest text change is the conversation brought up by Corey, where we’re including the original draft included Section 4 of RFC 4035. There are several statements in Section 4 that are not needed to meet the security goals of this ballot, and several of these statements are updated by future RFCs. Even still, these are SHOULD statements, including these when they are outdated places the onus on us to include an updated SHOULD. Previously, Dimitris Zacharopoulos pointed out that we’re getting into the weeds with EDNS buffer size. At some point in the future it would be good to have a SHOULD in the BRs for that because it is a best practice, but for now we want to isolate the ballot topics.

- With this in mind, Henry reviewed the RFC, looking for a more scoped way to include the requirements. He noticed that Section 4 is about a more general concept of the security aware resolver, but the actual validation algorithm needed to validate the DNSSEC record is presented in Section 5. Section 5 is also updated by fewer RFCs. By moving the citation to Section 5, concretely we can remove the SHOULD statement regarding the EDNS buffer size, which results in less text. More abstractly, changing to the Section 5 reference essentially says that you only have to do the DNSSEC validation algorithm - any of the other properties of the security aware resolver is not really in scope for what we’re requiring.

- Henry also responded to a comment about the .cn domain, which does not conform to the current best practice for DNSSEC domains, but it is a perfectly valid configured domain. The type of misconfiguration that would actually somehow prevent issuance would be a serious global outage of the TLD because ~30% of all clients are using a DNSSEC validated resolver.

- The other comment was regarding what if a domain’s DNS server rejects DNSSEC queries. The nice thing about DNSSEC is that it’s an opt-in ecosystem, so if the parent TLD, or for non-DNSSEC TLD the root server, tells the resolver - don’t expect DNSSEC here - the resolver never goes and talks DNSSEC to that domain, it just talks the standard DNS queries. The current wording is optimal for addressing these two comments.

- Corey believes the changes made by Henry look good.

- Corey understands that you can take an off-the-shelf DNS resolver that does DNSSEC validation and you should be able to comply with these requirements. Henry clarified that the intent of the ballot is to be fully compliant with off-the-shelf DNS resolvers and the RFCs referenced are quite common ones that resolvers that publish the list of compliant RFCs typically include. The easiest way to enforce compliance is to turn it on and then show that the resolver is using RFCs.

- RFC 6840 Section 4 referenced on line 749 includes 4 subsections that includes MUST level statements that can be considered security related updates. RFC 6840 Section 2 had SHOULD level requirements that were problematic and this reference was previously removed.

- Gurleen Grewal recently surveyed resolvers that could be used and only identified 2 that they could reasonably use. She is wondering if this ballot will push everyone to using the same resolver. Henry had also done some implementation surveys and all 4 resolvers he reviewed seemed to be compliant. Gurleen is concerned that some resolvers are not production ready, nor could they scale. Gurleen will share a list in GitHub. Henry proposed he would also include a list in GitHub. Generally, DNSSEC slightly impacts DNS lookups at scale, but not to the extent that it’s not viable, particularly because it's completely interoperable with existing DNS caching. If you can cache a DNS result, you can cache a DNSSEC result, so it does not cause a cache to start missing all the time. The main answer to scale is that Cloudflare's 1.1.1.1 and Google’s 8.8.8.8 does this and they’re answering much higher volumes of queries than a CA would answer. Also, Let’s Encrypt and Sectigo, as some of the higher issuing CAs that have turned this on, it’s not caused a significant scaling problem. Aaron Gable confirmed that DNSSEC validation has caused no scaling or performance problems for Let’s Encrypt and it has been turned on for several years. Martijn Katerbarg confirmed the same for Sectigo.

- Ben Wilson asked if there was enough runway for CAs that have not done this? Trevoli Ponds-White suggested that 2 months seems like the best possible case for this ballot to be in effect. Martijn suggested that more generally, ballots should first have language completed, at least on the calls, and then we should start discussing the effective date. Clint Wilson asked how much is left to discuss about the ballot text, as it seems like the text is pretty much done. He suggested a February date would be fine for this ballot. Ryan Dickson stated that whenever a ballot goes into formal public discussion, changes can happen. Regarding the date, it is his understanding that when this topic was presented at the F2F, it was presented as an opportunity to close, or reduce the likelihood of real world observed attacks. The group should be thoughtful in selecting an implementation date with consideration for ongoing threats.

- In chat, Aaron offered the suggestion to update the CABF Bylaws Section 2.4(8) to say that the Chair or Vice-Chair is also allowed to update specifically-formatted date placeholders into concrete dates. Then the ballot text could literally say "Effective [the 15th day of the first month more than 6 months after the end of the IPR period], CAs MUST..." and the Chair would resolve that to "Effective Jan 15, 2026, CAs MUST...".

- Corey asked if Henry can share some of his test results so that everyone can make an informed decision on what changes might be needed. Henry will respond on GitHub with his results. This can help inform the effective date.

Continued discussion (if needed) on SC-82 redux (CA-assisted validation) https://github.com/slghtr-says/servercert/pull/3/files:

- Michael Slaughter incorporated feedback from the last few meetings into the draft and is looking for comments and additional feedback. Following this meeting, Slaughter will send out this link so that everyone can review.

- Martijn suggested being even more specific by stating this method does not incorporate the DCV reuse policy as stated in the other section? Slaughter was considering the scenario where someone puts the TTL in the DNS TXT record for 5 years. That doesn’t seem like something we want to allow. Aaron stated we do effectively have an upper bound on it anyway, which is the validation data reuse period, so even if they set the TTL to 5 years, we still wouldn’t be able to reuse the validation data more than 398 days out. Slaughter agrees and they had previous discussions related to whether or not to make it explicit that this method is excluded from that clause in the validation reuse period, but Aaron is right that the validation reuse period serves as the upper bound constraint. Martijn expressed concern that this might be read as that's the top constraint even if the TTL is a lot lower. Corey shares this concern and thinks it would be good to be explicit in 4.2.1 and then also have a reference to 4.2.1 in this note to say that the TTL or the 8 hours supersedes anything in 4.2.1. Aaron thinks line 1010 is saying that there are 2 possible boundaries on when you can issue a certificate; one is the validation data reuse period, so if by default you can only reuse validation data for up to 398 days, but if the TTL of the DNS TXT record is lower than the validation reuse period, then you have to abide by the TTL of the DNS TXT record with the exception that if the TTL of the DNS TXT record is unreasonably low, like one second, you still have 8 hours to complete the issuance process - right? Slaughter confirmed. Aaron suggested the intent is that you must respect the TTL of the DNS TXT record if it is between 8 hours and the validation data reuse period. If it is outside of that, less than 8 hours, or more than the validation reuse period, then you use those boundaries instead. Slaughter confirmed this was the intent. Martijn suggested that a TTL of 12 hours could be interpreted as well; the DCV reuse is still in effect, so I can reuse it for 398 days. That’s not intended right? Slaughter confirmed it should be 12 hours at that point. Martijn will propose some language.

- Ryan recalls Martijn posing the question of not allowing reuse at all in the last meeting and was wondering how that discussion concluded. Slaughter said this was the compromised position, which was making it as small as possible, even smaller than the end state destination of SC-081. We basically decided to anchor it to the language used for CAA records.

- Wayne Thayer thought the name “DNS Change” doesn’t feel like the right words to describe this method. He suggested “DNS Query for Static Value”. Technically you’re just looking for this value, nothing really has to change for this to work. Martijn suggested “DNS Static Value”. Aaron likes “DNS Query for…” or “DNS Record with Static Value” and the naming scheme for existing methods is really bizarre. Some describe an action that an applicant takes and some describe an action that a CA takes. There’s no clear precedent for us to follow here.

- Henry offered his interpretation of “identifying the Applicant in a DNS TXT record for an Authorization Domain Name that is prefixed with a Domain Label that begins with an underscore character” is that any underscore prefixed subdomain of an authorization domain is permissible for this method and that’s the language in the original DNS change method but it’s his interpretation that newer methods seem to be leaning towards a specific underscored prefix label. This is how the ACME challenge works and there's a couple of good reasons for this and he’ll include them in GitHub.

- Henry continued, as a member of the broader community we have a lot of interest in validation and transparency. If the CA says it validates using X, it’s really helpful to query the DNS record existing at that website and say does X exist or not exist. The broader community gains a ton of insight just by knowing what the underscore label is. Ending in hyphen persistent would be preferable (https://datatracker.ietf.org/doc/draft-sheth-identifiers-dns/). Wayne suggested since this method doesn't involve CNAMEs, having a single label is reasonable to him and what Henry said makes sense.

- Clint wants to especially support the point about the domain owner or operator being informed about the decision that they’re making. This is one of the main concerns that was highlighted initially and one of the challenges that this ballot creates. It’s functionally impossible to know what the domain owner is actually aware of, but if there are mechanisms that can be used to increase the likelihood that the domain owner is aware of what they’re doing - that this is persistent and that they’re providing an ongoing permission beyond just a one time random value demonstration of control - then we should add those mechanisms. Slaughter sought confirmation if this was from a self documentation kind of way? Clint suggested this can be part of it, but maybe there are other things that we can do to highlight the risk associated with this method.

- Wayne highlighted the concern with CNAME is that you can only have one in a zone. Slaughter confirmed the CNAME would have to be the underscore prefixed subdomain that points to a TXT record that has this value in it. Wayne needs to think through the way this mechanism would work. The concern with having a single static underscore value would preclude the use of CNAMEs the way it does with method 7 where we actually introduced method 21. Henry suggested from the DNS perspective, when the validation method itself is querying a CNAME, you can only have 1 CNAME per label specified in the RFC. So if I’m validating at 2 CAs and it’s a static label and I need to put a CNAME there, I can’t put both CAs requested CNAME values at that same label and that’s an RFC violation. In the TXT record space you can have many TXT records at the same label so that you can just put both of them there. You still can use CNAMEs and you can only use 1 CNAME per label, but this would be more the case of the ACME DNS challenge where you want to delegate that entire record out to a third party. Slaughter and Wayne are basically saying the same thing.

- Wayne suggested when working on the DNS account 01 method it was decided that method 7 didn’t allow for that because it creates multiple prepended labels. He sees Slaughter copied over the language in the first paragraph and believes “a” is doing a bit too much work there. If we mean one, we should say one. He thinks this is a flaw in 3.2.2.4.7.

- Wayne asked about a risk assessment of this method. In the past when we’ve introduced new methods, people have raised the need to have an assessment of the risks we’re taking on by introducing that new method. Has anything been done in this area? He understands previous work was done but that was completed before this method has evolved. Do we need to do a formal risk assessment of this method before we move forward with it? Slaughter stated Henry did an initial conceptual proof of this method or something very similar to it. Henry suggested a full ecosystem wide risk assessment in a fully formal setting isn't completely possible because ultimately the security depends on cloud and DNS registrants, but he thinks there is a fairly strong argument that even using the existing methods, if an advisory was capable of putting in some static value for a CNAME record, they could forever continue to validate that domain because the static CNAME would always point out to that adversary controlled domain and then the advisory can change the value at their domain. The reasoning at a high-level is that the existing DNS change is not actually enforcing a DNS change within the questioned domain zone because it permits following CNAMEs. It’s really just seeing if someone is capable of putting in a static CNAME value. It’s more optimal to do this method with TXT records. He doesn’t see any security reason why a TXT record at an underscore prefix would be harder or easier to upload than a CNAME record, but essentially the point is that while we make people do all of this work to validate this temporary nonce, an adversary could bypass that challenge with a static value - the 10,000 foot analysis is that because we permit following redirects out to another zone we’re not actually querying for a change in that requested zone. We’re actually querying for a static value in that zone that could point out to a redirect where the change actually happens. So we are not strictly enforcing a zone change, maybe we should just vastly improve the automation and scaling options here and just query a static value in the first place.

- Slaughter suggested the group needs to discuss and decide what needs to be done to incorporate some of the language changes discussed today.

Reply all

Reply to author

Forward

0 new messages