Gab,
Thanks for bringing this up! And for CC'ing net-dev@, to make sure this information gets out wider (or in the event I'm completely wrong :D)
As we discussed previously, the concern is about a specific sequence of calls, where:
1) //net calls into 3P library (blocking API from 3P library)
2) 3P library calls back into //net via 3P libraries extension hooks (and 3P library requires synchronous answer)
3) //net needs to asynchronously answer (e.g. prompt for UI)
The concern emerges from the fact that the 3P library may grab locks in the call during #1, thus causing any other calls to the 3P library to block until #2/#3 are completed for the original caller. If the async hop in #3 (e.g. to prompt the user) ends up calling back into #1 (which should never happen), we potentially deadlock. If the async hop in #3 (e.g. to prompt the user) ends up starved, because all available workers are also stuck in #1, then we potentially starve and also deadlock.
So far, the approach to //net has been to ensure that any calls to #1 end up on a dedicated WorkerPool, as that allows unbounded growth for #1, and thus hopefully reduces the risk of any blocking on #3. We've made sure that the UI prompts never end up blocking on #1 again, so it's really just a question of allocation strategy.
For NSS, the concrete concern is those that need to transit the PKCS11PasswordFunc, which is implemented using the CryptoModuleBlockingPasswordDelegate, and which needs to post (and wait for a response from) the UI thread. Further, since the UI thread may involve any number of other threads (such as DBus or IO thread for IPC'ing to the GPU process), we basically need to ensure that any other Chromium thread is responsive - that is, making sure any NSS call goes through #1.
If the TaskScheduler had the ability to indicate "These are tasks which must not block other pooled tasks" - that is, that the TaskScheduler ensures they will run on one or more dedicated threads which will not block other TaskScheduler'd tasks - then I think we might be able to safely move things to the TaskScheduler, because we'd be getting the same WorkerPool semantics that we rely on.
Having said that, and having had the opportunity to sit down with you and talk through these issues (thanks!) and better figure out exactly where the uncertainty and unease were, I'm now thinking the following three call sequences may be 'unsafe', because they represent tasks in the #1 category:
Basically, these are all potential entry points into the sequence I described (notably, they're focused on NSS files, unsurprisingly :D)
Note that while I've focused on NSS, the same concerns would exist on macOS/Windows. However, there, our integration with the OS/third-party APIs is much more limited, and thus the only changes for ending up in the scenario I mentioned (that I'm aware of) are contained in the ThreadedSSLPrivateKey, and thus safe.
I think you and I have covered all the other places of NSS integration - most notably, certificate verification - and we've made sure those use a WorkerPool (and thus continue to scale on the #1 tasks in a way that doesn't block the #2/#3 sequence).