Note
This supersedes and replaces the prior proposal in Intent to Experiment: TCP Socket Pool per-Top-Level-Site. Unlike that proposal, this one uses randomization to mitigate attacks instead of partitioning the socket pool by top-level-site. Like the prior approach, this new experiment offers a similar probabilistic mitigation to a new-tab-socket-observation attack. Unlike the prior approach, this new experiment offers the same (non-complete) mitigation to a new-iframe-socket-observation attack. Further, this new experiment will be significantly easier to implement as it does away with the need to track partitioning information.
Contact emails
ari...@chromium.org, dsch...@chromium.org, awi...@chromium.org, a...@google.com, mme...@chromium.org, mike...@chromium.org, la...@chromium.org
None
Summary
This experiment takes the fixed per-profile maximum of 256 and increases it by a randomized amount ranging from 1 to a chosen upper bound (we will experiment with using 64, 128, and 256) as determined by the algorithm below:
(1) Define a function NEXT_POOL_STATE(STATE, MIN, MAX, VALUE) with the constraints MIN < MAX, VALUE in range [MIN, MAX], and STATE being either ‘capped’ or ‘uncapped’. This function returns ‘capped’ or ‘uncapped’ based on some probability distribution across [MIN, MAX] which might vary depending on STATE. If VALUE == MIN the return value will always be ‘uncapped’ and if VALUE == MAX the return value will always be ‘capped’. Although the exact distribution will be experimented on, in general we want transitions from ‘uncapped’ from ‘capped’ to be likely at the upper end of the range and transitions from ‘capped’ to ‘uncapped’ to be likely at the lower end of the range.
(2) Define LOWER_LIMIT as 256.
(3) Define UPPER_LIMIT as 320, 384, or 512 depending on the experiment arm.
(4) Consider a socket pool to have a STATE that is either ‘uncapped’ or ‘capped’, and that starts as ‘uncapped’.
(5) If a socket pool is ‘uncapped’ then it still processes socket releases as before, but for socket requests:
(5a) Define X as the number of active sockets before the allocation.
(5b) If X > LOWER_LIMIT update STATE to the result of NEXT_POOL_STATE(STATE, LOWER_LIMIT, UPPER_LIMIT, X)
(5c) If the value of STATE is ‘uncapped’ allocate the socket, otherwise queue the socket request for later processing when the STATE is ‘uncapped’.
(6) If a socket pool is ‘capped’ then it is queueing all socket requests for later processing when the STATE is ‘uncapped’, but for socket releases:
(6a) Define Y as the number of active sockets after the release occurs.
(6b) Update STATE to the result of NEXT_POOL_STATE(STATE, LOWER_LIMIT, UPPER_LIMIT, X).
(6c) Release the socket.
The feasibility of raising the per-profile limit to 512 was already studied and did not yield negative (or positive) results, so there should not be an issue with raising the limit to random numbers between 257 and 320/384/512.
This new randomized limit will be imposed independently for the WebSocket pool and the normal (HTTP) socket pool.
Limits on UDP sockets (e.g., HTTP/3), multiplexed streams for a single socket (e.g., HTTP/2), proxies, and HEv3 will not be evaluated in this experiment. In the future an experiment following this approach for them will likely be considered.
The intent is to roll this experiment directly into a full launch if no ill effects are seen. See the motivation section for more.
Blink component
TAG review
https://github.com/w3ctag/design-reviews/issues/1151
Motivation
Having a fixed pool of TCP sockets available to an entire profile allows attackers to effectively divinate the amount of network requests done by other tabs/frames, and learn things about them to the extent that any given site can be profiled. For example, if a site does X network requests if it’s logged in and Y if it’s logged out, by saturating the TCP socket pool and watching movement after calling window.open, the state of the other site can be gleaned. This sort of attack is outlined in more detail here: https://xsleaks.dev/docs/attacks/timing-attacks/connection-pool/
In order to address this sort of attack, we randomize the points at which a socket pool is considered to be full and the point at which it subsequently can be considered sufficiently drained to allow new allocations. We don’t want sites to be able to detect the point at which the pool becomes full without triggering a drain, and vice-versa. The probabilistic allocation of sockets when we are approaching the chosen upper bound and delaying allocations until the randomized lower bound ensures this, and even makes it difficult for a site to walk all the way up to a known maximum socket count on the assumption the final socket use will cause a drain.
Risks
Interoperability and Compatibility
While other user agents may wish to follow the results, we only anticipate compatibility issues with local machines or remote servers when the amount of available TCP sockets in the browser fluctuates up (256 -> 320/384/512) in a way Chrome did not allow before. This will be monitored carefully, and any experiment yielding significant negative impact on browsing experience will be terminated early.
Gecko: https://github.com/mozilla/standards-positions/issues/1299; current global cap of 128-900 (as allowed by OS)
WebKit: https://github.com/WebKit/standards-positions/issues/550; current global cap of 256
Debuggability
This will be gated behind the base::feature kTcpSocketPoolLimitRandomizationTrial, so if breakage is suspected that flag could be turned off to detect impact. For how to control feature flags, see this.
Measurement
The existing SOCKET_POOL_STALLED_MAX_SOCKETS event can be tracked to see if an uptick is noticed.
The existing metric Net.TcpConnectAttempt.Latency.{Result} will be used to detect increases in overall connection failure rates.
New metrics Net.TCPSocketPoolSize.{UpperBound|
LowerBound}.{Skipped|Enforced} to track usage of the NEXT_POOL_STATE function.
Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, ChromeOS, Android, and Android WebView)?
No, not WebView. That will have to be studied independently due to the differing constraints.
Is this feature fully tested by web-platform-tests?
No, as this is a blink networking focused change browser tests or unit tests are more likely.
Flag name on about://flags
None
Finch feature name
TcpSocketPoolLimitRandomizationTrial
Rollout plan
We will never test more than 5% in each group on stable, and will stay on canary/dev/beta for a while to detect issues before testing stable.
Requires code in //chrome?
No
Tracking bug
Estimated milestones
143
Link to entry on the Chrome Platform Status
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAGpy5DLvSp%2BnLTaxvTszzjqAiaNEG%3DJOHmBPdycORUXdrAbxsQ%40mail.gmail.com.