In the 5.x days I played with Racket's Web servlets and found them slower than I'd expected. (My exceptions were, admittedly, quite high after seeing how much better Racket performed at other tasks than your average scripting language.) I've decided to try Web servlets out again, but this time to put some rough numbers on the performance with a reproducible benchmark.
My benchmark compares Racket's stateful and stateless servlets against the SCGI package for Racket, Caddy (HTTP server written in Go), Flask (Python web microframework), GNU Guile's Web server module, Ring/Compojure (Clojure HTTP middleware/routing library), Plug (Elixir HTTP middleware), and Sinatra (Ruby web microframework). On each of these platforms the benchmark implements a trivial web application that serves around 4K of plain text. It uses ApacheBench to stress it with a configurable number of concurrent connections. The application and ApacheBench are run in separate Docker containers, which lets you tune the memory and the CPU time available to them. I've published the source code for the benchmark at https://gitlab.com/dbohdan/racket-vs-the-world/. It should be straightforward to run on Linux with Docker (but please report any difficulties!).
I've attached the results I got on a two-core VM. According to them, Racket's servlets do lag behind everything else but Sinatra. The results are for 100 concurrent connections, which is the default, but the differences in throughput are still very similar with 20 connections and quite similar with just one. I'd appreciate any feedback on these results (do they look reasonable to you?) and the code behind the benchmark (did I miss any crucial bits of configuration for the servlet?).
Best,
D. Bohdan
Does Racket app make use of all CPU cores by having multiple processes ?
In go app, there isnt any need to becoz golang runtime uses all CPU avialble by default. So is the case with JVM and Erlang VM
--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This, of course, should say "expectations".
On Friday, September 1, 2017 at 9:38:25 PM UTC+3, Neil Van Dyke wrote:
> Thank you very much for doing this work, D. Bohdan.
You're welcome! I had fun doing it.
> This performance of Racket SCGI+nginx relative to the others you tested
is surprising to me, since I made the Racket `scgi` package for
particular non-performance requirements, and performance was really
secondary.
Thanks for making the 'scgi package. I rather like the SCGI protocol. It's a pity that it isn't as widely supported as FastCGI, considering that it's much simpler to implement (second only to plain old CGI), but still has a performance profile similar to FastCGI's.
> Not to look a gift horse in the mouth, [...]
No worries. The horse is given very much with that in mind. :-) To address your specific concerns:
> errors can cause good performance numbers. Sometimes I used
> JMeter instead of `ab` to rule out that cause of bad numbers in
> performance testing (as well as to implement some testing).
I think the SCGI benchmark works correctly because of the data sizes that ApacheBench reports. For example, here is the request data from one run:
> Complete requests: 178572
> Failed requests: 0
> Total transferred: 755002416 bytes
> HTML transferred: 733038060 bytes
733038060 / 178572 = 4105, which is exactly the size of the text message the application serves. The same is true of other data I've examined so far (5 runs). To help detect errors, the benchmark is also programmed to abort if the first request to an application doesn't serve exactly the right text (see `run-when-ready.sh`) or if ApacheBench sees enough of nginx's status 502 pages, which are served when the SCGI server doesn't respond correctly or at all.
I'll look into using JMeter in addition to ApacheBench.
> the OS pushing into swap
Good point. I thought I'd already disabled the containers' access to swap, but apparently it didn't work because of a thing with cgroups. The "benchmarked" container still must have used swap, because it began to run out of memory for some applications when I disabled the swap on the VM itself. I've increased "benchmarked's" memory quota to 768 MiB and added a recommendation to disable the swap system-wide in README.md.
> sporadic network latency (though looks like you might've controlled for
that one)
The application and the load generator communicate through a virtual network between two Docker containers on the same host, so this should not be an issue.
> some other OS/hardware/network burp outside of your Racket
process(es).
Such burps are possible, and even likely, because I run the VM on a machine I use for other tasks. I try to ensure no taxing tasks run alongside the benchmark and mitigate the inevitable CPU spikes by simply benchmarking every application for longer (three minutes by default).
On Friday, September 1, 2017 at 9:51:13 PM UTC+3, Neil Van Dyke wrote:
> `#:scgi-max-allow-wait
Thanks for the suggestion. This turned out to be the key to SCGI performance. Increasing #:scgi-max-allow-wait from 1 to 4 (default), 16, 64, 256 gives a moderate increase in throughput (from ~2350 req/s to ~2900 req/s), but *decreases* the maximum latency in a very major way (from ~50000 ms to ~250 ms). See scgi-max-allow.md in the attachments for some detailed data samples. The effect levels out at 256. There isn't an obvious difference between 256, 1024, 4096, and 16384. I've pushed the update to run the tests at #:scgi-max-allow-wait 256.
Besides scgi-max-allow.md, I've also attached the results for 1) a five-minute benchmark with one concurrent connection, 768 MiB RAM, no swap, #:scgi-max-allow-wait 4; 2) a rerun of the first benchmark with the updated settings (three minutes, 100 connections, 768 MiB RAM, no swap, #:scgi-max-allow-wait 256).
You're right. This worked for "places". I've rerun "single" and "many" along with "places".
======
results/custom-many.txt:Requests per second: 6720.43 [#/sec] (mean)
results/custom-places.txt:Requests per second: 7095.99 [#/sec] (mean)
results/custom-single.txt:Requests per second: 7609.11 [#/sec] (mean)
======
As for "many-places", I was mistaken about it running out of file descriptors. I accidentally tested "places" in its stead. As-is (https://gitlab.com/dbohdan/racket-vs-the-world/blob/97dd7858aecab9af2a66ed687d12ce45adb4899d/apps/racket-custom/lipsum.rkt), "many-places" does not send anything to incoming connections and never closes them.
On Tuesday, September 5, 2017 at 9:01:14 AM UTC+3, Jay McCarthy wrote:
> Yes, that's good.
All right.
> It is really surprising to me that the many version doesn't perform
> better, because I assumed that there would be IO delays on one
> connection and you wouldn't want to stall others while waiting to
> read/write it. Presumably this is a bit of an artifact of the
> benchmarking happening on localhost?
I was wondering about the reason myself. To tease it out, I'll try a few variations on the benchmark later.
Yes, scratch what I said. The "many-places" benchmark only fails this way for me on a particular Linux VM, which just so happened to be the one I was testing it on. Maybe I got the VM in a bad state. If the problem is meaningfully related to the benchmarked application, I'll follow up on it.
Meanwhile, here are some benchmark results for "many-places". The transferred data sizes suggest it worked correctly.
======
results/custom-many-places.txt:Requests per second: 4931.29 [#/sec] (mean)
results/custom-many.txt:Requests per second: 6449.73 [#/sec] (mean)
results/custom-places.txt:Requests per second: 7325.81 [#/sec] (mean)
results/custom-single.txt:Requests per second: 7793.91 [#/sec] (mean)
======
I'll try this again with two fixed cores available to the application container.
results/custom-many-places.txt:Requests per second: 6517.83 [#/sec] (mean)
results/custom-many.txt:Requests per second: 7949.04 [#/sec] (mean)
results/custom-places.txt:Requests per second: 7521.15 [#/sec] (mean)
results/custom-single.txt:Requests per second: 8675.64 [#/sec] (mean)
Is it common practice to spawn Thread for each request ? Is it that cheap from resource point of view ? can ThreadPool could be of some help here ?
Racket threads are not OS threads. They're "green threads" and are cooperatively scheduled by the Racket runtime. They're very cheap to create, even with a short life span.
While the VPS provider does impose a limit on throughput, at approximately 250 req/s * 5 KB/req = 1.25 MB/s I wasn't hitting it. The numbers were very similar for different applications because at 25 concurrent connections no application reached the maximum request rate it could sustain. I thought the memory constraints wouldn't allow for more than about 25 connections, but I was mistaken. With some tuning I was able to get the applications that ran with 25 concurrent connections to run with 50 and 100. I've rerun the benchmark with 1, 25, 50, 100, and 200 connections to show how the differences between the applications emerge.
======
CONNECTIONS=1
remote-results/caddy.txt:Requests per second: 12.57 [#/sec] (mean)
remote-results/compojure.txt:Requests per second: 12.41 [#/sec] (mean)
remote-results/custom-many-places.txt:Requests per second: 12.62 [#/sec] (mean)
remote-results/custom-many.txt:Requests per second: 12.55 [#/sec] (mean)
remote-results/custom-places.txt:Requests per second: 12.56 [#/sec] (mean)
remote-results/custom-single.txt:Requests per second: 12.58 [#/sec] (mean)
remote-results/flask.txt:Requests per second: 12.44 [#/sec] (mean)
remote-results/guile.txt:Requests per second: 12.53 [#/sec] (mean)
remote-results/plug.txt:Requests per second: 12.57 [#/sec] (mean)
remote-results/scgi.txt:Requests per second: 12.46 [#/sec] (mean)
remote-results/sinatra.txt:Requests per second: 12.08 [#/sec] (mean)
remote-results/stateful.txt:Requests per second: 12.42 [#/sec] (mean)
remote-results/stateless.txt:Requests per second: 12.41 [#/sec] (mean)
======
======
CONNECTIONS=25
remote-results/caddy.txt:Requests per second: 311.19 [#/sec] (mean)
remote-results/compojure.txt:Requests per second: 309.69 [#/sec] (mean)
remote-results/custom-many-places.txt:(Killed) Total of 9153 requests completed
remote-results/custom-many.txt:Requests per second: 309.63 [#/sec] (mean)
remote-results/custom-places.txt:(Killed) Total of 13085 requests completed
remote-results/custom-single.txt:Requests per second: 308.02 [#/sec] (mean)
remote-results/flask.txt:Requests per second: 310.91 [#/sec] (mean)
remote-results/guile.txt:Requests per second: 310.28 [#/sec] (mean)
remote-results/plug.txt:Requests per second: 313.60 [#/sec] (mean)
remote-results/sinatra.txt:Requests per second: 287.03 [#/sec] (mean)
remote-results/stateful.txt:Requests per second: 298.05 [#/sec] (mean)
remote-results/stateless.txt:Requests per second: 295.90 [#/sec] (mean)
======
======
CONNECTIONS=50
remote-results/caddy.txt:Requests per second: 594.78 [#/sec] (mean)
remote-results/compojure.txt:Requests per second: 604.64 [#/sec] (mean)
remote-results/custom-many-places.txt:(Killed) Total of 9444 requests completed
remote-results/custom-many.txt:Requests per second: 598.88 [#/sec] (mean)
remote-results/custom-places.txt:(Killed) Total of 13088 requests completed
remote-results/custom-single.txt:Requests per second: 591.44 [#/sec] (mean)
remote-results/flask.txt:Requests per second: 605.75 [#/sec] (mean)
remote-results/guile.txt:Requests per second: 612.28 [#/sec] (mean)
remote-results/plug.txt:Requests per second: 617.95 [#/sec] (mean)
remote-results/scgi.txt:(Killed) Total of 12020 requests completed
remote-results/sinatra.txt:Requests per second: 367.58 [#/sec] (mean)
remote-results/stateful.txt:Requests per second: 530.00 [#/sec] (mean)
remote-results/stateless.txt:Requests per second: 546.76 [#/sec] (mean)
======
======
CONNECTIONS=100
remote-results/caddy.txt:Requests per second: 1016.63 [#/sec] (mean)
remote-results/compojure.txt:Requests per second: 1103.84 [#/sec] (mean)
remote-results/custom-many-places.txt:(Killed) Total of 9908 requests completed
remote-results/custom-many.txt:Requests per second: 1140.40 [#/sec] (mean)
remote-results/custom-places.txt:(Killed) Total of 13081 requests completed
remote-results/custom-single.txt:Requests per second: 1134.93 [#/sec] (mean)
remote-results/flask.txt:Requests per second: 1024.25 [#/sec] (mean)
remote-results/guile.txt:Requests per second: 1085.03 [#/sec] (mean)
remote-results/plug.txt:Requests per second: 1140.43 [#/sec] (mean)
remote-results/scgi.txt:(Killed) Total of 10969 requests completed
remote-results/sinatra.txt:Requests per second: 384.41 [#/sec] (mean)
remote-results/stateful.txt:Requests per second: 726.84 [#/sec] (mean)
remote-results/stateless.txt:Requests per second: 682.58 [#/sec] (mean)
======
======
CONNECTIONS=200
remote-results/caddy.txt:Requests per second: 1093.88 [#/sec] (mean)
remote-results/compojure.txt:Requests per second: 1157.95 [#/sec] (mean)
remote-results/custom-many-places.txt:(Killed) Total of 9728 requests completed
remote-results/custom-many.txt:Requests per second: 1219.76 [#/sec] (mean)
remote-results/custom-places.txt:(Killed) Total of 13154 requests completed
remote-results/custom-single.txt:Requests per second: 1171.37 [#/sec] (mean)
remote-results/flask.txt:Requests per second: 937.90 [#/sec] (mean)
remote-results/guile.txt:Requests per second: 1182.95 [#/sec] (mean)
remote-results/plug.txt:Requests per second: 1222.03 [#/sec] (mean)
remote-results/scgi.txt:(Killed) Total of 7857 requests completed
remote-results/sinatra.txt:Requests per second: 381.36 [#/sec] (mean)
remote-results/stateful.txt:(Killed) Total of 97039 requests completed
remote-results/stateless.txt:(Killed) Total of 24712 requests completed
======
For comparison, here is one concurrent connection over a local network. With greater computing resources and a lower round-trip time (the latter is probably far more important) you get a much higher request rate.
======
CONNECTIONS=1
remote-results/caddy.txt:Requests per second: 679.79 [#/sec] (mean)
remote-results/compojure.txt:Requests per second: 680.88 [#/sec] (mean)
remote-results/custom-many-places.txt:Requests per second: 842.54 [#/sec] (mean)
remote-results/custom-many.txt:Requests per second: 899.55 [#/sec] (mean)
remote-results/custom-places.txt:Requests per second: 841.69 [#/sec] (mean)
remote-results/custom-single.txt:Requests per second: 775.15 [#/sec] (mean)
remote-results/flask.txt:Requests per second: 513.99 [#/sec] (mean)
remote-results/guile.txt:Requests per second: 661.27 [#/sec] (mean)
remote-results/plug.txt:Requests per second: 678.93 [#/sec] (mean)
remote-results/scgi.txt:Requests per second: 606.11 [#/sec] (mean)
remote-results/sinatra.txt:Requests per second: 247.25 [#/sec] (mean)
remote-results/stateful.txt:Requests per second: 406.68 [#/sec] (mean)
remote-results/stateless.txt:Requests per second: 412.21 [#/sec] (mean)
======
> For purposes of this last humble-VPS benchmarking (if we can keep making
> more benchmarking work for you), you might get those initial numbers
> from places/many-places/racket-scgi by setting Racket's memory usage
> limit.
When I limit the memory usage in racket-custom to the total RAM on the VPS minus what the OS uses (through custodian-limit-memory) Racket quits with an out of memory error at the point when it would be killed by the OS. racket-scgi seems to behave the same, though I didn't look at the memory usage split between Racket and nginx when I tested it.
> For the racket-scgi + nginx setup, if nginx can't quickly be tuned to
> not be a problem itself, there are HTTP servers targeting smaller
> devices, like what OpenWrt uses for its admin interface.
But do they support SCGI?
Once again, you're welcome! See my reply to Neil Van Dyke for some reasoning about the Internet workload and more results.