Intent to Experiment: TCP Socket Pool Limit Randomization [DRAFT]

56 views
Skip to first unread message

Ari Chivukula

unread,
9:48 AM (4 hours ago) 9:48 AM
to blink-dev, David Schinazi, Andrew Williams, a...@google.com, mmenke, Mike Taylor, Ladan Khamnian

Note

This supersedes and replaces the prior proposal in Intent to Experiment: TCP Socket Pool per-Top-Level-Site. Unlike that proposal, this one uses randomization to mitigate attacks instead of partitioning the socket pool by top-level-site. Like the prior approach, this new experiment offers a similar probabilistic mitigation to a new-tab-socket-observation attack. Unlike the prior approach, this new experiment offers the same (non-complete) mitigation to a new-iframe-socket-observation attack. Further, this new experiment will be significantly easier to implement as it does away with the need to track partitioning information.


Contact emails

ari...@chromium.org, dsch...@chromium.org, awi...@chromium.org, a...@google.com, mme...@chromium.org, mike...@chromium.org, la...@chromium.org

Explainer/Specification

None


Summary

This experiment takes the fixed per-profile maximum of 256 and increases it by a randomized amount ranging from 1 to a chosen upper bound (we will experiment with using 64, 128, and 256) as determined by the algorithm below:


(1) Define a function NEXT_POOL_STATE(STATE, MIN, MAX, VALUE) with the constraints MIN < MAX, VALUE in range [MIN, MAX], and STATE being either ‘capped’ or ‘uncapped’. This function returns ‘capped’ or ‘uncapped’ based on some probability distribution across [MIN, MAX] which might vary depending on STATE. If VALUE == MIN the return value will always be ‘uncapped’ and if VALUE == MAX the return value will always be ‘capped’. Although the exact distribution will be experimented on, in general we want transitions from ‘uncapped’ from ‘capped’ to be likely at the upper end of the range and transitions from ‘capped’ to ‘uncapped’ to be likely at the lower end of the range.

(2) Define LOWER_LIMIT as 256.

(3) Define UPPER_LIMIT as 320, 384, or 512 depending on the experiment arm.

(4) Consider a socket pool to have a STATE that is either ‘uncapped’ or ‘capped’, and that starts as ‘uncapped’.

(5) If a socket pool is ‘uncapped’ then it still processes socket releases as before, but for socket requests:

(5a) Define X as the number of active sockets before the allocation.

(5b) If X > LOWER_LIMIT update STATE to the result of NEXT_POOL_STATE(STATE, LOWER_LIMIT, UPPER_LIMIT, X)

(5c) If the value of STATE is ‘uncapped’ allocate the socket, otherwise queue the socket request for later processing when the STATE is ‘uncapped’.

(6) If a socket pool is ‘capped’ then it is queueing all socket requests for later processing when the STATE is ‘uncapped’, but for socket releases:

(6a) Define Y as the number of active sockets after the release occurs.

(6b) Update STATE to the result of NEXT_POOL_STATE(STATE, LOWER_LIMIT, UPPER_LIMIT, X).

(6c) Release the socket.


The feasibility of raising the per-profile limit to 512 was already studied and did not yield negative (or positive) results, so there should not be an issue with raising the limit to random numbers between 257 and 320/384/512.


This new randomized limit will be imposed independently for the WebSocket pool and the normal (HTTP) socket pool.


Limits on UDP sockets (e.g., HTTP/3), multiplexed streams for a single socket (e.g., HTTP/2), proxies, and HEv3 will not be evaluated in this experiment. In the future an experiment following this approach for them will likely be considered.


The intent is to roll this experiment directly into a full launch if no ill effects are seen. See the motivation section for more.


Blink component

Blink>Network


TAG review

https://github.com/w3ctag/design-reviews/issues/1151


Motivation

Having a fixed pool of TCP sockets available to an entire profile allows attackers to effectively divinate the amount of network requests done by other tabs/frames, and learn things about them to the extent that any given site can be profiled. For example, if a site does X network requests if it’s logged in and Y if it’s logged out, by saturating the TCP socket pool and watching movement after calling window.open, the state of the other site can be gleaned. This sort of attack is outlined in more detail here: https://xsleaks.dev/docs/attacks/timing-attacks/connection-pool/


In order to address this sort of attack, we randomize the points at which a socket pool is considered to be full and the point at which it subsequently can be considered sufficiently drained to allow new allocations. We don’t want sites to be able to detect the point at which the pool becomes full without triggering a drain, and vice-versa. The probabilistic allocation of sockets when we are approaching the chosen upper bound and delaying allocations until the randomized lower bound ensures this, and even makes it difficult for a site to walk all the way up to a known maximum socket count on the assumption the final socket use will cause a drain.


Risks


Interoperability and Compatibility

While other user agents may wish to follow the results, we only anticipate compatibility issues with local machines or remote servers when the amount of available TCP sockets in the browser fluctuates up (256 -> 320/384/512) in a way Chrome did not allow before. This will be monitored carefully, and any experiment yielding significant negative impact on browsing experience will be terminated early.


Gecko: https://github.com/mozilla/standards-positions/issues/1299; current global cap of 128-900 (as allowed by OS)


WebKit: https://github.com/WebKit/standards-positions/issues/550; current global cap of 256


Debuggability

This will be gated behind the base::feature kTcpSocketPoolLimitRandomizationTrial, so if breakage is suspected that flag could be turned off to detect impact. For how to control feature flags, see this.


Measurement

The existing SOCKET_POOL_STALLED_MAX_SOCKETS event can be tracked to see if an uptick is noticed.

The existing metric Net.TcpConnectAttempt.Latency.{Result} will be used to detect increases in overall connection failure rates.

New metrics Net.TCPSocketPoolSize.{UpperBound|
LowerBound}.{Skipped|Enforced} to track usage of the NEXT_POOL_STATE function.


Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, ChromeOS, Android, and Android WebView)?

No, not WebView. That will have to be studied independently due to the differing constraints.


Is this feature fully tested by web-platform-tests?

No, as this is a blink networking focused change browser tests or unit tests are more likely.


Flag name on about://flags

None


Finch feature name

TcpSocketPoolLimitRandomizationTrial


Rollout plan

We will never test more than 5% in each group on stable, and will stay on canary/dev/beta for a while to detect issues before testing stable.


Requires code in //chrome?

No


Tracking bug

https://crbug.com/415691664


Estimated milestones

143


Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/6496757559197696


Ari Chivukula

unread,
10:25 AM (3 hours ago) 10:25 AM
to blink-dev, David Schinazi, Andrew Williams, a...@google.com, mmenke, Mike Taylor, Ladan Khamnian
Please ignore the [DRAFT] in the subject line, this is a real request and not a draft of one :-P

~ Ari Chivukula (Their/There/They're)

Rick Byers

unread,
10:30 AM (3 hours ago) 10:30 AM
to Ari Chivukula, blink-dev, David Schinazi, Andrew Williams, a...@google.com, mmenke, Mike Taylor, Ladan Khamnian
The compat risk of this seems minor and manageable to me. The exact use of socket pools is really an implementation detail that leaks to sites, right? LGTM

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAGpy5DLvSp%2BnLTaxvTszzjqAiaNEG%3DJOHmBPdycORUXdrAbxsQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages