The API owners plan to approve increased Origin Trial limits for certain trials, feedback welcome.
* In cases where there is a strong justification, the API owners have approved increasing the total amount of web traffic for an Origin Trial from 0.5% to a higher number, potentially as high as 20%. The three origin trials we considered were FLoC [1], Conversion Measurement [2] and Trust Tokens [3].
* The higher limits are justified by situations where the experimental goal of the trial is inherently dependent on statistically significant measurements aggregated across a number of sites
* Approval for these increased limits comes with strong additional reporting requirements to the API OWNERS and transparency to blink-dev
Details
The API OWNERS have been asked to consider relatively high per-feature Origin Trials limits for a succession of privacy features in recent months [1][2][3]. We recently met with teams driving these APIs to help build a mutual understanding of the goals of the trials and the API OWNERS approach to risk in evaluating trials.
This message is an attempt to capture some of that context, document agreement about trial limits, and outline next steps.
A Risk-Based Approach
The job of the API OWNERS is to ensure the Blink process is followed in spirit. The design goal of our process has been to maintain the health of the platform, acknowledging that:
* Progress requires different views to be sifted in order to find good outcomes that maximize benefits over the long haul
* Features should be able to demonstrate that they solve important problems and solve them well
* Somebody must lead in designing features. Given how frequently Chromium is out in front, that’s uncomfortable, so we raise the bar where we’re taking larger risks
* We want to ensure that when features are developed, they have (and take) every opportunity to learn and adapt while they’re still malleable, both to avoid “burn in” of regretted designs, and cast the widest net for developer feedback possible
Basically, we guide you to pack as much iteration and feedback gathering into your development process as possible, recognizing that mistakes are incredibly expensive and that we learn by listening.
Commensurately, we give maximum consideration to developer feedback, and somewhat less to other factors (working group consensus, TAG feedback, etc.) in I2S threads. They’re important quality signals, but what gives us the most confidence that we’ve solved an important problem well are the results from the field.
Complex OT Requirements
Origin Trials are one of our best ways for getting feedback on the design and usefulness of features without creating new, undue risks for the ecosystem. But since these trials are in-the-wild experiments, we must be very careful to avoid burn in or “soft launch” risks.
Some Privacy-oriented features being trialed now have several additional properties that complicate the picture:
* These APIs are being consumed by third parties who are attempting to verify ML models in concert with the signals being given through the K-anonymity algorithm the APIs provide
* Multiple variants need to potentially be tried
* It’s difficult to build confidence in whether these ML models work at the fractional levels of traffic we’ve green-lit in the past
Instead of treating each API as a third-party Origin Trial with one-off limits, we want to unify the story for these trials, manage them coherently, and report back to the community about how it goes.
Risks We Face
By potentially raising limits for OTs, we face risks that we want to make explicit and address head-on:
* Burn-in: Developers who come to rely on a feature may “lean on” us not to take it away as we consider alternatives, change direction, or iterate on the design. Low usage limits + single-digit milestone time limits have been the way we have addressed this in the past. Some OTs have been allowed to use well above the general limit for short periods of time to collect data across big sites, so long as total use over, say, a week remains under the threshold. This requires careful partnership and coordination.
* Reputation risk: The OWNERS acknowledge a risk to the project from being seen to “pre launch” features. Required breakage between OT and launch has been our mechanism for tempering “go fever”. Recently we modified this to add a (new) extra step for requesting a “gapless OT” as part of an Intent to Ship which comes with a higher evidentiary bar regarding developer interest.
* Precedent: Each exception we make to policy is potential precedent. To ensure that we are not “playing favourites”, our approach has been to green-light exceptions after discussion with requesters, document them as we go, and (if they work) to perhaps change the process. Recent examples here include Gapless OT exceptions and the (conditional) TAG review exception policy.
We have discussed each of these risks with the folks running these origin trials and have come to a more nuanced understanding of the parties involved and our ability to count on the folks running the trials to prevail on partners regarding breaking changes.
As our approach is risk-based, these discussions are helpful in reducing the potential for burn-in from the API OWNERS perspective. It may also suggest a path for others to request higher limits over longer time-scales in conversation with us.
Proposed Policy
We want to facilitate broad-scale learning across the ecosystem in a way that will shorten the eventual path to launch.
Respectful of the risks above, the tentative agreement we have reached is:the straw-person agreement is:
* Frequent (perhaps monthly) reporting of total page load usage (via UMA) of the APIs in question. This matters because the top-line traffic limit won’t necessarily map to the amount of use by third parties who will be running their own experiments.
* Experiments also gated by finch flags will report on where those limits have been set and if/when they are substantially changed.
* High nominal limits for the length of the trial. Enablement of > 80% of page loads _in theory_ might be enabled by the OT, but they would be “cut” with fractional rollout via finch to ensure that limits of ~15-20% of page loads are never breached. This is much higher than usual, and is an acknowledged risk.
* 2 release duration for each variant of the trial, with agreement to change the API in ways that ensure partners must change their systems within the trial to keep pace. The goal here is to prevent burn-in despite high usage.
Per usual, we expect trial running developers to report back regarding the experience of developers and how it informs their APIs going forward.
Obviously, this is a departure from current practice and something we want community feedback on.
Thanks for taking the time to read this, and for your feedback in this thread.
The API OWNERs
[1]
https://groups.google.com/a/chromium.org/g/blink-dev/c/MmijXrmwrJs/m/v_6uzXRVBAAJ[2]
https://groups.google.com/a/chromium.org/g/blink-dev/c/C0P7ePjITJQ/m/V_XOCbDDAAAJ[3]
https://groups.google.com/a/chromium.org/g/blink-api-owners-discuss/c/WY17UlZFG3g