Hi Ricardo,
Welcome to the project! Others may be able to answer a bit more specifically on some of the points, but I'll try to respond to your questions:
0. For rapid-fire testing of your implementation, my personal advice is to your load generator directly. Running the full benchmark suite to test each tweak you make to your implementation may be a bit too cumbersome for development. Generally speaking, if a change improves performance in spot testing with wrk (or similar), it will likely improve the final results in a real benchmark run as well. Very few changes would make for illusory gains in spot testing that vanish in a real run.
1. In that particular case, that means that Vert.x generated 656,119 plaintext HTTP responses per second on our (old) i7 physical hardware test environment. Note that the i7 hardware environment was retired after Round 8. It may help to see the raw output from Wrk for that particular test in Round 8:
https://github.com/TechEmpower/TFB-Round-8/blob/master/i7/round-8-final/20131214101855/plaintext/vertx/rawIn that raw data, you can see that Vert.x performed best at 256 concurrency, and produced 9841798 responses in 15 seconds. 9841798 / 15 ~= 656,119.
2. No. In our benchmark runs, load is generated from a separate computer or virtual machine and sent over the network. In the Peak environment, the network is 10 gigabit Ethernet and in the EC2 environment, it's 1 gigabit Ethernet.
3. In the plaintext test, not only is the socket kept alive, but we allow for HTTP pipelining. Pipelining allows for a huge number of requests to be processed without a lot of lock-step coordination between the client and server. You can read the requirements for the plaintext test and others here:
https://www.techempower.com/benchmarks/#section=codeLet me know if you have any more questions!