OK great - so the 4.0 version is moving to a version of .Net that supports async/await, so if I did fork this, it sounds like it would be helpful if I approached it in a way that was useful to the 4.0 version.
Moving to async doesn't require duplicating anything; instead, it should all just be async, since what it's doing is fundamentally asynchronous. In performance testing, in the worst-case, adding async to something that MIGHT return as fast as running it synchronously, adds essentially 0 harm to performance because of how well .Net optimizes underneath the hood. In the best-case scenario, async calls that might be gone for seconds at a time, avoid thread exhaustion by giving back that thread to the threadpool. So, by moving every call that makes a network call to async - which is nearly everything in Selenium - you avoid API duplication, avoid harming performance, and for many common operations (like loading a webpage), eliminate a ton of thread stranding.
The argument that you could just buy more servers is not a very good argument - it's basically just saying the code can be inefficient if you spend more money. Yes that's true, but I have to say I'm opposed to making something slow just because you can afford it. Switching to async is pretty simple because of how much plumbing is already done for you in .Net. So, if it can be dramatically more efficient... it just should be. The upshot is that users of Selenium could run thousands of tests instead of hundreds on a small server without having to expand your buy to accommodate more threads.
As one example, one of the code paths we're using Selenium for is PDF creation. Chrome is a great place to test a webpage and see exactly how it will render before emitting as PDF. So we develop to our exact specifications in HTML for Chrome, then we spin up Selenium to emit relatively long PDF documents from it. These PDFs have a lot of data, so Selenium can be waiting a long time while other code fetches that data and renders the page. In fact, during that wait, the code that's fetching the data and rendering it is unable to use as many threads to do its work because Selenium is locking up a thread. It's a server, so many of these can happen at a time. Each instance of Selenium running synchronously is one more thread in the threadpool lost, doing nothing.