V8 is as fast as it can possibly be at running hot. Which for long-running node servers is great. In fast, in my testing, javascript in node is generally slightly faster than lua in luajit.
V8, however is not optimized for process startup. It uses a lot of ram and does a lot of up-front optimizing to get said runtime speed. All software has to make tradeoffs and my experience with V8 and the V8 team is they will sacrifice startup time of the vm itself and memory usage, if it makes the scripts run significantly faster.
Yes there is some blame in node as well. Node is under the assumption that most uses are for servers who startup rarely and run for a long time. So a hundred ms of extra startup time on your server doesn't matter at all.
I'm not convinced that combining all your js files into a single file will significantly affect process startup time. I'm happy to be proven wrong, but in my experience this is not the bottleneck.
One of my responsibilities when I worked on webOS was to make node startup fast-enough on older smart-phones that we could start node processes on-demand in reaction to user-input. We tried creating a fork-server, using snapshots, composing and minifying js code. But in the end, V8 and node were just too heavy and waiting on an on-demand node process affected user experience and introduced significant lag. We would have just kept the node services running all the time, but then they used too much ram. 10mb each was simply too much for a smart-phone.
My solution was to use another Vm and it solved all my problems, but then it wasn't javascript anymore, and not many people were interested. (Also the webOS project was killed by the then CEO)
If NPK can somehow remove the need for a JavaScript engine, then it will be fast, but if you end up needing all of V8 and js code to run, it will never help memory-usage or startup time, no matter how much you combine everything into a single file. Just starting a node repl on these devices can take over 1000ms.