Hi,
Resin is indeed a well-tuned Java Servlet container. We selected it for these tests because (a) we've used it extensively elsewhere and (b) we initially also included Tomcat in Round 1 for Servlet tests but dropped Tomcat because it was consistently somewhat slower than Resin. For Round 1 we didn't want the additional noise in the results charts.
However, since then, especially now that we have a filtering user interface for the results, we are considering re-introducing Tomcat (or perhaps Wildfly when it is released) as another test dimension for Servlet-based frameworks.
That said, the differential you are seeing between Play with Resin versus Play with Netty is considerably more dramatic than the differential between Tomcat and Resin in our tests for Servlet-based frameworks. If I recall correctly, the Tomcat versus Resin differential for our tests was in the 10% to 15% range.
Also, if possible, I encourage you to use a multi-threaded load tool rather than ab (ApacheBench). We use Wrk, but there is also WeigHTTP, which is a good multi-threaded clone ot ApacheBench. I believe WeigHTTP is available on Windows.
A multi-threaded tool is not likely to invert the results you're seeing, but generally speaking will reveal a more realistic measurement of the server's true top-end capacity. ApacheBench can fail to sufficiently exercise a high-performance server.
I assume the Resin versus Netty differential converges if you switch to one of the more computationally intensive tests such as the multiple-query test or the new Fortunes test (queries, collections, sorting, and server-side templates)? Are you able to run those? If not, let me know how we can help.