You run the risk of deadlocks with this modification. If your thread pool contains n number of threads, and you queue up n+1 futures where each derefs the future ahead of it, the n+1 future will never start execution, and thus deadlock the entire thread pool. So you have to be careful about the order of allocation of futures.
Something I've thought about in the past, is a sort of elastic ThreadPool. Here threads would only be allocated if no future as terminated in X amount of time. There would also need to be a watchdog thread that would occasionally query the backlog of futures every X amount of time and spin up threads for those items.
Timothy