Splitting hairs a little the EventLoop is actually an EventLoopGroup, which is a collection of threads dedicated to handling network traffic. Since networks are usually very fast, much faster than the CPU can send, it's okay to just have a few threads. (a theoretical ideal being the same as the number of cores).
Some processing does occur in the event loop, such as SSL encoding, but this is usually not a bottleneck. In the Async Client, there is a closed loop on each RPC, meaning that as soon as one completes, the next one starts. This bounds the number of active RPCs at any time, so the event loops are not overloaded.
ForkJoinPool is used because it's queues are sharded and do not have contention. ForkJoinPool is highly parallel, and implements work stealing to keep all threads active. Compare this to ThreadPoolExecutor, which has a single blocking queue. If you have 32 threads all fighting over the queue, it slows down a lot. (The QPS tripled when I switched us over to it!).
Carl