The long story short. I would actually suggest adding the following to the startup of the NodeJS server to improve its performance:
process.env.UV_THREADPOOL_SIZE = 20; // or some higher number as suitable based on tuning for performance
Having looked at synchronous threaded architectures vs asynchronous event architectures in detail, it is not always obvious at surface level why performance issues. In developing OfficeFloor, it uses a combination of both to improve performance. I will though avoid going into depth here as it is a very detailed topic. But in basic superficial summary, asynchronous is not always faster due to thread context switching overheads of event handling when using multiple threads to better utilise multi-core architectures (yes excuse the mouth full of words).
However, I was similarly intrigued that developers have left the ease of sequential threaded code and swung to the other side of event driven coding of NodeJS for the scaling benefits that an asynchronous event based architectures provides. I was similarly intrigued like yourself why NodeJS does not provide this performance improvement of thread-per-request architectures (even granted it is interpreted). So I did a little background reading on what goes on under the hood of NodeJS, and it seems the "single threaded event loop" is an abstraction and NodeJS does actually use multiple threads under the hood (mainly to do I/O). I won't go into detail here, but below is links to two articles explaining this (and come with pictures and code examples so better explanation I can give in words :p ):
From going on the architectures explained in these articles, NodeJS still suffers the same performance problem of thread-per-request architectures of a thread listening to I/O and then having to context switch to a worker thread to handle the request. It actually even seems further worse, in that I/O with the database requires further thread context switching - possibly explaining why it is not higher up in the results.
Note the real problem I see in squeezing out performance is in the management of executing the code. For the Plain Text and JSON tests there is no I/O involved. So you want the thread listening to the socket to also process the results (as the cost to service the request being some CPU instructions is less than a thread context switch overheads). Rapidoid is an example of doing this. However, when it comes to the tests using database interaction (other tests) the socket thread can not be tied up waiting for the database call to complete. In this case, the context switch is a lower cost than servicing the request so using multiple threads improves throughput. But too many threads creates significant overheads and therefore tuning is required. Hence, why I suggest the above fix for NodeJS to find the I/O thread pool appropriate size for optimal performance. But I'm no NodeJS expert, and will always defer to them (and put a big disclaimer on this in case I have the NodeJS architecture wrong or outdated). But I happily help good competition between frameworks to ensure we continue to see improvements and advancements in application/web server architectures :)