[racket] Web Framework Benchmarks

577 views
Skip to first unread message

adam moore

unread,
Jul 16, 2013, 3:49:57 AM7/16/13
to us...@racket-lang.org
Hello there, racket users!

My first post here - and it's pretty self serving, but, so it goes.

I've recently been enjoying diving into racket, starting with the
guide, and just getting my feet wet at the moment making some command
line tools, small web apps.
So, anyway, I'm not sure if anyone else on the list has seen this:

http://www.techempower.com/benchmarks/

It's by no means a perfect comparison, but I think it might be
worthwhile adding a racket implementation to compare how it stacks up.
There's a benefit to be had as well - a quite high visibility example
of a "best practices" web app. It's really nice to be able to compare
simple tasks implemented in different frameworks.

I've only just started using Racket, and would not be confident to
produce a best-of-breed example... so basically I'm sending out a
(shameless) request to someone more versed in the ways of racket to
have a go at adding a racket implementation....

The five current test types are covered here (marked "Present"):

https://github.com/TechEmpower/FrameworkBenchmarks/issues/133

Thanks for having a look.

-Adam
____________________
Racket Users list:
http://lists.racket-lang.org/users

Jay McCarthy

unread,
Jul 16, 2013, 8:26:15 AM7/16/13
to adam moore, users
That's a good idea. I'll check it out.

Jay
--
Jay McCarthy <j...@cs.byu.edu>
Assistant Professor / Brigham Young University
http://faculty.cs.byu.edu/~jay

"The glory of God is Intelligence" - D&C 93

hashim muqtadir

unread,
Jun 1, 2020, 6:43:06 AM6/1/20
to Racket Users
A new version of these benchmarks was just published ("Round 19"): https://www.techempower.com/benchmarks/#section=data-r19

Racket scores rather poorly (0 errors is nice, though). Anybody has any guesses as to why this is? Racket performs well enough as far as I'm concerned, and while I don't use the web server much, sounds unlikely that it should be so far below django, for example. Maybe they run it in a handicapped configuration or something.

Sam Tobin-Hochstadt

unread,
Jun 1, 2020, 11:13:00 AM6/1/20
to hashim muqtadir, Racket Users
I think the biggest thing is that no one has looked at optimizing
these benchmarks in Racket. If you tried out running one of these
benchmarks and ran the profiler it would probably show something
interesting.

Sam
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/2a1062a4-e4f1-484c-a8a2-f63c161959e8%40googlegroups.com.

George Neuner

unread,
Jun 1, 2020, 1:35:55 PM6/1/20
to racket users

On 6/1/2020 11:12 AM, Sam Tobin-Hochstadt wrote:
> I think the biggest thing is that no one has looked at optimizing
> these benchmarks in Racket. If you tried out running one of these
> benchmarks and ran the profiler it would probably show something
> interesting.
>
> Sam
The code[1] itself isn't bad.  There are a couple of minor things I
personally would tweak, but in my opinion the main problem is with the
database access.

Postgresql uses process parallelism - it forks a new backend process to
handle each connection.  Assuming (???) there are enough concurrent
requests to open the maximum number of DB pool connections, then 1024 is
WAY too many.  PG experts recommend no more than 5..10 backend processes
per core, but according to the environment details[2], the test machines
have only 14 HT cores (28 threads).

Mind you I'm only guessing, but it looks to me like this application
could be losing an enormous amount of time in the starting of new
backend database processes.  And since it is running inside a Docker
container, there also is some per connection overhead there.

The requests made by the tests[3] are so trivial (needing little
additional Racket processing) that I would consider reducing the number
of PG pool connections to just a few per core and see how that goes.  My
expectation is that the time saved by spinning up many fewer PG
processes will vastly outweigh any loss in absolute request
concurrency.  [Particularly if they also are running PG itself inside
Docker.]

YMMV,
George

[1]
https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Racket/racket/servlet.rkt
[2]
https://www.techempower.com/benchmarks/#section=environment&hw=ph&test=fortune
[3]
https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview

Bogdan Popa

unread,
Jun 1, 2020, 1:40:08 PM6/1/20
to hashim muqtadir, racket...@googlegroups.com
I replied earlier today off of my Phone, but, for whatever reason
(caught in the moderation queue?), it's not showing up in this thread.

Here's what it said:

The reason for poor performance relative to the other
langs/frameworks is that there is currently no easy way to take
advantage of multiple cores using the web framework so that's being
benchmarked is single-core performance.

This is mainly a problem for benchmarks such as this, but not really
an issue in the real world where you'd just run multiple processes
with a load balancer in front.

George Neuner

unread,
Jun 1, 2020, 2:12:19 PM6/1/20
to racket...@googlegroups.com

On 6/1/2020 1:40 PM, Bogdan Popa wrote:
> I replied earlier today off of my Phone, but, for whatever reason
> (caught in the moderation queue?), it's not showing up in this thread.
>
> Here's what it said:
>
> The reason for poor performance relative to the other
> langs/frameworks is that there is currently no easy way to take
> advantage of multiple cores using the web framework so that's being
> benchmarked is single-core performance.
>
> This is mainly a problem for benchmarks such as this, but not really
> an issue in the real world where you'd just run multiple processes
> with a load balancer in front.

Single core [by itself] doesn't explain the enormous performance
difference between Racket and Django.

I haven't looked at the Django submission - Python's (in)comprehensions
give me a headache.  But Python's DB pool is threaded, and Python's
threads are core limited by the GIL in all the major implementations
(excepting Jython).

There are a few things Python can do faster than Racket, but the VAST
difference in performance shown in the techempower tests isn't explained
by them.  My suspicion is that the Racket application is making too many
database connections and not relying enough on its open connection
pool.  Hundreds of trivial requests can be served in the time it takes
to spin up a new backend process.

YMMV,
George

Sam Tobin-Hochstadt

unread,
Jun 1, 2020, 3:40:52 PM6/1/20
to George Neuner, Racket Users
I'm skeptical both of the DB explanation and the multi-core
explanation. As you say, the difference between something like Django
and Racket is much too large to be explained by that. For example, on
the "plaintext" benchmark, Racket serves about 700 req/sec (I get
similar results on my machine). Many of the benchmarks in languages
like Python and Ruby do more than 1000x better, which means that even
if we had perfect speedup on 32 cores, we'd be nowhere close.
Additionally, the "plaintext" benchmark doesn't touch the DB at all. I
tried commenting out all of the DB code entirely, and it did not
change the results.

My guess is that the web server is just doing a lot of per-response
work that would need to be optimized.

Sam
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/9236dcff-81df-1db8-c2ef-06b20e4690ec%40comcast.net.

George Neuner

unread,
Jun 1, 2020, 4:02:14 PM6/1/20
to racket users

On 6/1/2020 3:40 PM, Sam Tobin-Hochstadt wrote:
> I'm skeptical both of the DB explanation and the multi-core
> explanation. As you say, the difference between something like Django
> and Racket is much too large to be explained by that. For example, on
> the "plaintext" benchmark, Racket serves about 700 req/sec (I get
> similar results on my machine). Many of the benchmarks in languages
> like Python and Ruby do more than 1000x better, which means that even
> if we had perfect speedup on 32 cores, we'd be nowhere close.
> Additionally, the "plaintext" benchmark doesn't touch the DB at all. I
> tried commenting out all of the DB code entirely, and it did not
> change the results.
>
> My guess is that the web server is just doing a lot of per-response
> work that would need to be optimized.
>
> Sam

Possibly ... I admit that I did not look at the plain text results: I
was drawn to the fact that the DB results impose a lot of non-Racket
overhead.

George

Bogdan Popa

unread,
Jun 1, 2020, 4:21:11 PM6/1/20
to George Neuner, racket...@googlegroups.com

George Neuner writes:

> But Python's DB pool is threaded, and Python's threads are core
> limited by the GIL in all the major implementations (excepting
> Jython).

Python's Postgres pooling does not[1] use POSIX threads under the hood
to manage the connections if that's what you mean, nor is the
concurrency of the Python applications based on system threads. All of
the Python examples use either asyncio + fork(2) or green threads +
fork(2). This includes the django example[3].

> There are a few things Python can do faster than Racket, but the VAST
> difference in performance shown in the techempower tests isn't
> explained by them.

Here's a benchmark that doesn't touch the DB at all, showing an even
bigger difference in throughput between the two:

https://www.techempower.com/benchmarks/#section=data-r19&hw=ph&test=json

Here's the same benchmark running on my local machine where I
intentionally limited the Django app to a single CPU and I made it use
the `gevent' library for its workers so it is more comparable to the
Racket implementation:

https://www.techempower.com/benchmarks/#section=test&shareid=14ecbf16-cdb3-4501-8b7d-a2b8a549f73c&hw=ph&test=json&a=2

And here's what happens when I let it use as much parallelism as it can:

https://www.techempower.com/benchmarks/#section=test&shareid=d3ad4d79-c7a7-4ca0-b297-ffda549947c8&hw=ph&test=json&a=2

I do agree that improving the parallelism part wouldn't be enough to
catch up (clearly, there's a 2x difference even on a single core), but
it is a large factor here.

I wrote the latest implementation of the Racket code for that benchmark
and I considered doing things like bypassing the "standard"
`dispatch/servlet' implementation to avoid the overhead of all the
continuation machinery in the web server, but that felt like cheating.

Another area where the web server does more work than it should is in
generating responses: the web server uses chunked transfer encoding for
all responses; whereas all the Python web servers simply write the
response directly to the socket when the length of the content is known
ahead of time.

Another thing of note about the django implementation is that it uses
ujson, written in C with the express intent of being as fast as
possible, to generate the JSON data.


[1]: They call the default implementation a `ThreadedConnectionPool'[2],
but that's just because it uses the mutexes that the `threading' module
provides.

[2]: https://github.com/psycopg/psycopg2/blob/779a1370ceeac130de07edc0510f2c55846be1bd/lib/pool.py#L155

[3]: https://github.com/TechEmpower/FrameworkBenchmarks/blob/c49524762379a2cdf82627b0032c654f3a9eafb6/frameworks/Python/django/gunicorn_conf.py#L8-L21

Brian Adkins

unread,
Jun 1, 2020, 4:50:43 PM6/1/20
to Racket Users
I may look into this in more detail later, but I ran a simple benchmark comparison on my modest AWS EC2 server (ApacheBench can behave poorly on MacOS).

1) I ran ApacheBench w/ 6 processes to fetch a simple "hello world" static html file using only nginx. I got roughly 650 requests per second.

2) I then ran ApacheBench w/ 6 processes against one of my Racket web apps using a monitoring endpoint that does a simple db query to determine the health of the server, so this went from nginx (acting as a load balancer and https handler) to the Racket processes via proxy_pass (only 2 running in parallel) which exercises my custom dispatching and a simple postgres query. No continuations and completely stateless. I got roughly 350 requests per second.

At first glance, that doesn't appear to be that much overhead when comparing the two. In fact, I would've expected the very small static html request to be significantly more than double the req/sec of the Racket app db request.

I developed Rails apps for over a decade, and I know my Racket web apps are significantly faster in similar "database backed web apps" context.

I believe there is something wrong with those benchmarks at the moment.

Brian
> To unsubscribe from this group and stop receiving emails from it, send an email to racket...@googlegroups.com.

George Neuner

unread,
Jun 1, 2020, 6:26:17 PM6/1/20
to Bogdan Popa, racket users

On 6/1/2020 4:21 PM, Bogdan Popa wrote:
> George Neuner writes:
>
> > But Python's DB pool is threaded, and Python's threads are core
> > limited by the GIL in all the major implementations (excepting
> > Jython).
>
> Python's Postgres pooling does not[1] use POSIX threads under the hood
> to manage the connections if that's what you mean, nor is the
> concurrency of the Python applications based on system threads. All of
> the Python examples use either asyncio + fork(2) or green threads +
> fork(2). This includes the django example[3].

I said nothing whatsoever about POSIX.  And FYI, POSIX does not imply
any particular implementation - POSIX specifies only the API, and a
compliant implementation may provide kernel threads, user space threads,
or a combination of both.

What I did say is that Python's threads are core limited - and *that* is
true.   As a technical matter, Python *may* in fact start threads on
different cores, but the continual need to take the GIL quickly forces
every running thread in the process onto the same core.


> > There are a few things Python can do faster than Racket, but the VAST
> > difference in performance shown in the techempower tests isn't
> > explained by them.
>
> Here's a benchmark that doesn't touch the DB at all, showing an even
> bigger difference in throughput between the two:
>
> https://www.techempower.com/benchmarks/#section=data-r19&hw=ph&test=json

That one actually is expected:  Racket's JSON (de)serializer is
relatively slow.


What wasn't expected was Sam's results from the "plain text" test which
also showed Racket much slower than Python.  That does hint at a lot of
overhead in the Racket framework.


> I wrote the latest implementation of the Racket code for that benchmark
> and I considered doing things like bypassing the "standard"
> `dispatch/servlet' implementation to avoid the overhead of all the
> continuation machinery in the web server, but that felt like cheating.

To my knowledge, continuations will not be a factor unless either 1) the
application is written in the #web-server language (which converts
everything to CPS), or 2) the code invokes one of the send/suspend/* 
functions.

FWIW: I try to avoid using client facing continuations in my own web
applications - for my money there are too many uncertainties connected
with them.

Also the stuffer does a fair amount of work to deal with long
continuation URLs.


> Another area where the web server does more work than it should is in
> generating responses: the web server uses chunked transfer encoding for
> all responses; whereas all the Python web servers simply write the
> response directly to the socket when the length of the content is known
> ahead of time.

My understanding is that the port passed to  response/output  is the
actual socket ... so you can front-end it and write directly.  But that
might be "cheating" under your definition.


George

Bogdan Popa

unread,
Jun 2, 2020, 3:45:55 AM6/2/20
to George Neuner, racket users
George Neuner writes:

> What I did say is that Python's threads are core limited - and *that*
> is true. As a technical matter, Python *may* in fact start threads
> on different cores, but the continual need to take the GIL quickly
> forces every running thread in the process onto the same core.

I was pointing out that the GIL is irrelevant in this case. All the
Python implementations in this benchmark use either green threads by
monkeypatching the standard threading and IO modules (gevent, meinheld)
or coroutines (asyncio) for concurrency and they fork subprocesses for
parallelism.


> That one actually is expected: Racket's JSON (de)serializer is
> relatively slow.

It's the same as the plaintext test, except JSON is written to the
client instead of plain text:

https://github.com/TechEmpower/FrameworkBenchmarks/blob/bfd7c1442f33e620524b6aa0751c11576b412e72/frameworks/Racket/racket/servlet.rkt#L140-L143


> What wasn't expected was Sam's results from the "plain text" test
> which also showed Racket much slower than Python. That does hint at a
> lot of overhead in the Racket framework.

Here's single-core Python vs Racket in the plaintext benchmark on my
machine:

https://www.techempower.com/benchmarks/#section=test&shareid=464938de-3ec0-4931-bc68-410000566c22&hw=ph&test=plaintext&a=2

It is surprising that Racket does worse on this benchmark that it does
on the JSON one, despite the fact that `response/json' uses
`response/output' under the hood.

I see these errors from Racket when I run the plaintext benchmark, but
they don't occur in any of the others:

racket: tcp-addresses: could not get peer address
racket: system error: Transport endpoint is not connected; errno=107
racket: context...:
racket: /usr/share/racket/collects/racket/contract/private/arrow-higher-order.rkt:375:33
racket: .../more-scheme.rkt:261:28
racket: /usr/share/racket/collects/racket/contract/private/arrow-higher-order.rkt:375:33
racket: /root/.racket/7.6/pkgs/web-server-lib/web-server/private/dispatch-server-with-connect-unit.rkt:144:4: connection-loop

I'll try to figure out what's causing these.


> To my knowledge, continuations will not be a factor unless either 1)
> the application is written in the #web-server language (which converts
> everything to CPS), or 2) the code invokes one of the send/suspend/*
> functions.

Whether you use the web interaction functions or not, servlets have to
do some bookkeeping (create new "instances", insert continuation
prompts) to support continuations:

* https://github.com/racket/web-server/blob/547b3fd736684651e94ebd78902633374be6bcae/web-server-lib/web-server/servlet-dispatch.rkt#L78-L92
* https://github.com/racket/web-server/blob/547b3fd736684651e94ebd78902633374be6bcae/web-server-lib/web-server/servlet/setup.rkt#L52-L72
* https://github.com/racket/web-server/blob/547b3fd736684651e94ebd78902633374be6bcae/web-server-lib/web-server/dispatchers/dispatch-servlets.rkt#L63-L100

Bypassing all of this is what I considered cheating, because most people
probably won't. At the same time, though, I haven't measured what the
overhead of all this stuff is. It could be minimal.


> My understanding is that the port passed to response/output is the
> actual socket ... so you can front-end it and write directly. But
> that might be "cheating" under your definition.

That's what I did in the benchmark:

https://github.com/TechEmpower/FrameworkBenchmarks/blob/bfd7c1442f33e620524b6aa0751c11576b412e72/frameworks/Racket/racket/servlet.rkt#L135-L138

Only dispatchers get direct access to the output port for the socket.
If you use `dispatch/servlet', then it takes care of taking your
`response' value and calling `output-response' on it. Unless the server
knows the connection should be closed and unless the request was a HEAD
request, then it outputs the response by chunking it:

https://github.com/racket/web-server/blob/547b3fd736684651e94ebd78902633374be6bcae/web-server-lib/web-server/http/response.rkt#L133-L181

Bogdan Popa

unread,
Jun 2, 2020, 6:30:56 AM6/2/20
to Bogdan Popa, George Neuner, racket...@googlegroups.com

Bogdan Popa writes:

> Only dispatchers get direct access to the output port for the socket.
> If you use `dispatch/servlet', then it takes care of taking your
> `response' value and calling `output-response' on it. Unless the server
> knows the connection should be closed and unless the request was a HEAD
> request, then it outputs the response by chunking it:
>
> https://github.com/racket/web-server/blob/547b3fd736684651e94ebd78902633374be6bcae/web-server-lib/web-server/http/response.rkt#L133-L181

I was wrong about this. The web-server does the right thing if the
`Content-Length' header is present in the response value:

https://github.com/racket/web-server/blob/547b3fd736684651e94ebd78902633374be6bcae/web-server-lib/web-server/http/response.rkt#L120

When I change the benchmark to take advantage of that, throughput doubles:

https://www.techempower.com/benchmarks/#section=test&shareid=ab930604-6b19-4ab4-a2a5-93e674a81804&hw=ph&test=plaintext&a=2

I'll make a PR against the main repo to make sure all of the responses
have content lengths.

Out of curiosity, I made the app fork through FFI calls[1]:

https://www.techempower.com/benchmarks/#section=test&shareid=0197c25f-d544-4357-b3be-a40d4a8760ef&hw=ph&test=plaintext&a=2

This worked although multiple processes wake up when a connection can be
accepted and the IO layer doesn't seem to handle `EAGAIN':

racket: Connection error: tcp-accept: accept from listener failed
racket: system error: Resource temporarily unavailable; errno=11
racket: context...:
racket: .../more-scheme.rkt:261:28
racket: /root/.racket/7.7/pkgs/compatibility-lib/mzlib/thread.rkt:75:14: loop


[1]: https://gist.github.com/Bogdanp/b7b72ff7845f7f2c51e64bde553128d0#file-fork-app-rkt-L232-L256

Bogdan Popa

unread,
Jun 2, 2020, 3:03:01 PM6/2/20
to hashim muqtadir, Racket Users
The reason for the poor performance relative to the other langs/frameworks is that there is currently no easy way to take advantage of multiple cores using the web framework so what’s being benchmarked is single core perf. This is mainly a problem for benchmarks such as this, but not really an issue in the real world where you’d just run multiple processes w/ a load balancer in front.

On Jun 1, 2020, at 1:43 PM, hashim muqtadir <hashim....@gmail.com> wrote:


A new version of these benchmarks was just published ("Round 19"): https://www.techempower.com/benchmarks/#section=data-r19

Racket scores rather poorly (0 errors is nice, though). Anybody has any guesses as to why this is? Racket performs well enough as far as I'm concerned, and while I don't use the web server much, sounds unlikely that it should be so far below django, for example. Maybe they run it in a handicapped configuration or something.

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/2a1062a4-e4f1-484c-a8a2-f63c161959e8%40googlegroups.com.

Bogdan Popa

unread,
Jun 7, 2020, 1:39:36 PM6/7/20
to Bogdan Popa, racket...@googlegroups.com
Small update on this: I've updated the benchmarks to run multiple Racket
processes with an Nginx load balancer in front. After some tuning[1], here
is what the results look like on my 12 core AMD Ryzen 9 3900 server:

https://www.techempower.com/benchmarks/#section=test&shareid=669bfab7-9242-4c26-8921-a4fe9ccd8530&hw=ph&test=composite&a=2

50k/s is a respectable number for the plaintext benchmark IMO and we
could get it to go higher if we could ditch Nginx or spend more time
improving the server's internals, as Sam suggested.

The `racket-perf' benchmark is for a branch[2] that I have where I've made
some small improvements to the server's internals.

[0]: https://github.com/TechEmpower/FrameworkBenchmarks/pull/5727
[1]: https://github.com/TechEmpower/FrameworkBenchmarks/pull/5737
[2]: https://github.com/racket/web-server/pull/94

Yury Bulka

unread,
Jun 8, 2020, 6:11:37 AM6/8/20
to Bogdan Popa, racket...@googlegroups.com
Wow, from 695 requests per second to 49,516 is a huge improvement!

Since we were comparing to django previously, it's now much closer with
django (which does 78,132 rps.)

Django also does run multiple worker processes (3 per cpu):
https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Python/django/gunicorn_conf.py#L8

The other tests are also doing much better.

So do I undestand correctly that the factors that contribute to the
improvement are:

1. running multiple worker processes behind nginx
2. adding content-length to all responses
3. using CS variant of Racket
4. using Racket 7.7
5. tuning nginx config (enabling http 1.1 especially)

I'm curious what was the individual contribution of these factors. In
the PR regarding #5 you already stated that it gives a 4-5x improvement.

#1 and #5 are the things one would normally do anyway in a production
setup I guess. As is #4 most likely.

#2 is something that seems to require manual work in the client code,
but maybe that can be made easier on web-server-lib side somehow.

Curious how much do #3 contribute.

Big thanks again for everyone investing time in investigating this.

--
Yury Bulka
https://mamot.fr/@setthemfree
#NotOnFacebook

Stephen De Gabrielle

unread,
Jun 8, 2020, 6:26:23 AM6/8/20
to Bogdan Popa, racket...@googlegroups.com
Thank you Bogdan!

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.

Bogdan Popa

unread,
Jun 8, 2020, 9:39:43 AM6/8/20
to Yury Bulka, racket...@googlegroups.com

Yury Bulka writes:

> Wow, from 695 requests per second to 49,516 is a huge improvement!
>
> Since we were comparing to django previously, it's now much closer with
> django (which does 78,132 rps.)

I expect the Racket benchmark will do even better on TechEmpower's hw
than it did on mine because their machines are more powerful and they
run the benchmarking code, the database and the servers on different
machines so the gap should end up being even smaller.

Also worth keeping in mind that the server Django uses for these
benchmarks, meinheld, is completely written in C:

* https://github.com/mopemope/meinheld/blob/311acbc4e7bd38fa3f3d0e158b35cde9ef73f8e5/meinheld/gmeinheld.py#L11
* https://github.com/mopemope/meinheld/tree/311acbc4e7bd38fa3f3d0e158b35cde9ef73f8e5/meinheld/server

So I think it's really impressive that we're able to get this close with
just pure Racket, with the overhead of running one nginx process per
core in front of the app and with the overhead of connecting those nginx
processes to the app via TCP. Speaking of which, another thing we could
do to improve performance in this benchmark is define a custom `tcp^'
unit based on `unix-socket-lib' and have nginx connect to the backends
through unix sockets rather than TCP.

> So do I undestand correctly that the factors that contribute to the
> improvement are:
>
> 1. running multiple worker processes behind nginx
> 2. adding content-length to all responses
> 3. using CS variant of Racket
> 4. using Racket 7.7
> 5. tuning nginx config (enabling http 1.1 especially)

I'm afraid I haven't kept notes on how much improvement each of these
improvements yielded. From what I remember: I don't think #4 was much
of a factor, #1 and #5 were the biggest factors followed by #2 and #3,
which I changed at the same time so I couldn't say which made more of a
difference.

> #2 is something that seems to require manual work in the client code,
> but maybe that can be made easier on web-server-lib side somehow.

The work one has to do is pretty minimal[1] and I personally like that
the default is to stream data so I think the only thing to improve here
is awareness: the docs for `response' should be updated to mention that
unless a `Content-Length' is provided, the responses will use chunked
transfer encoding.

[1]: https://github.com/TechEmpower/FrameworkBenchmarks/pull/5727/files#diff-b21f7e3ecfa09726dac9ce079f612719R47-R70

Alex Harsanyi

unread,
Jun 8, 2020, 8:05:06 PM6/8/20
to Racket Users


On Monday, June 8, 2020 at 6:11:37 PM UTC+8, Yury Bulka wrote:
Wow, from 695 requests per second to 49,516 is a huge improvement!

Since we were comparing to django previously, it's now much closer with
django (which does 78,132 rps.)

I know very little about web development, so these may be some stupid questions, but I am very curious.

I am looking at http://www.techempower.com/benchmarks/#section=data-r19&hw=ph&test=composite, and even with 78k requests/second, Django is at the bottom of that list (position 98 out of 104).  In fact, most of the web frameworks that I can recognize are at the bottom of this list, and their score is also very modest, compared to the frameworks at the top.  The highest framework that I can recognize in that list is "flask" and its performance is a mere 4.8% of the top one.

Question 1: Based on this benchmark, is there any reason to chose anything else but "drogon"?  Even if one chooses the second best on that list, which is "actix", they already loose about 6% performance and things degrade quickly afterwards.  The framework at position 10 is already half the speed of the top one.

Question 2:  Based on Bogdans message in this thread, it seems that most of the performance improvement for the Racket benchmark comes from the nginx configuration (which has nothing to do with Racket) and the next improvement has to do with how the user program is written (by supplying a "Content-Length" header).  So, is this benchmark really testing the Racket web server performance, or is it testing a very specific deployment?

Alex.

Bogdan Popa

unread,
Jun 9, 2020, 3:02:40 AM6/9/20
to Alex Harsanyi, racket...@googlegroups.com

Alex Harsanyi writes:

> Question 1: Based on this benchmark, is there any reason to chose anything
> else but "drogon"? Even if one chooses the second best on that list, which
> is "actix", they already loose about 6% performance and things degrade
> quickly afterwards. The framework at position 10 is already half the speed
> of the top one.

My take on these benchmarks is all that matters is that the framework
doesn't get in your way once you add business logic. The vast majority
of "real" web applications out there don't (and most likely can't) do
50k rps.

You can see that in the "multiple queries" and "data updates" tests
where the results are packed closer together because the logic is closer
to what a lot of database-backed web applications do and the database
ends up being a bottleneck. A description of the requirements of each
test can be found here:

https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview

If you know and are willing to deal with writing and maintaining C++
then drogon looks like it might be a great choice.

> Question 2: Based on Bogdans message in this thread, it seems that most of
> the performance improvement for the Racket benchmark comes from the nginx
> configuration (which has nothing to do with Racket) and the next
> improvement has to do with how the user program is written (by supplying a
> "Content-Length" header). So, is this benchmark really testing the Racket
> web server performance, or is it testing a very specific deployment?

The largest improvement comes from making the Racket application take
advantage of all the hardware threads on the machine. Because Racket
doesn't currently have a way to share TCP listeners across places and
because fork isn't natively supported (I mentioned that it works via the
FFI earlier in the thread, but I believe it needs some support from the
runtime (handling of `EAGAIN') to work efficiently and not cause a lot
of churn) I did the next best thing: I made the benchmark run one Racket
process for each thread[1] and added nginx as a load balancer in front.

The nginx process listens on port 8080, forks one subprocess per core
(which lets the subprocesses reuse the same port) and then proxies any
incoming requests on that port to one of the Racket processes so every
single request is ultimately served by the Racket app. What this means
in terms of this benchmark is that, compared to others, we're actually
paying a toll for using nginx here because its own workers are consuming
resources on the machine, but, to my knowledge, we don't have a better
alternative at the moment.

[1]: https://github.com/TechEmpower/FrameworkBenchmarks/blob/988f052c8170da661c49dd51d1f33d500a871031/frameworks/Racket/racket/scripts/run#L15-L19

George Neuner

unread,
Jun 9, 2020, 5:23:09 AM6/9/20
to racket...@googlegroups.com

On 6/9/2020 3:02 AM, Bogdan Popa wrote:
> Alex Harsanyi writes:
>
> > Question 1: Based on this benchmark, is there any reason to chose anything
> > else but "drogon"? Even if one chooses the second best on that list, which
> > is "actix", they already loose about 6% performance and things degrade
> > quickly afterwards. The framework at position 10 is already half the speed
> > of the top one.
>
> My take on these benchmarks is all that matters is that the framework
> doesn't get in your way once you add business logic. The vast majority
> of "real" web applications out there don't (and most likely can't) do
> 50k rps.

Or anywhere close to that given more realistic database queries.
Hmm.  SSL which can be tricky to set up right for Racket, but that's the
only reason I can see for using a separate HTTP server. Multiple Racket
applications *should* all be able to listen on the same port without
having been spawned from the same ancestor process.  If that isn't
working now, something has gotten hosed.

I'm not sure what you mean by "sharing" a listener,  but using a single
listener with a pool of processing places actually is possible (though
tricky).  TCP ports can be passed among places, so a single listener
instance can direct multiple processing instances.  I don't know how
passing off the port would interact with web-server response handling (I
would think the listener place could just shut down / abandon the port
and leave response to the process place but I have never actually tried
that).

Dynamic (thread in process) places seem to have issues when there are
many instances, so for lots of cores it is better to use distributed
(parallel process) places.  Paulo Matos's  Loci  package makes using
process places much easier [and it also works on Windows if that matters].
https://pkgs.racket-lang.org/package/loci


> [1]: https://github.com/TechEmpower/FrameworkBenchmarks/blob/988f052c8170da661c49dd51d1f33d500a871031/frameworks/Racket/racket/scripts/run#L15-L19

George

Bogdan Popa

unread,
Jun 9, 2020, 7:59:10 AM6/9/20
to George Neuner, racket...@googlegroups.com

George Neuner writes:

> Multiple Racket applications *should* all be able to listen on the
> same port without having been spawned from the same ancestor process.
> If that isn't working now, something has gotten hosed.

I don't know whether this used to work in the past or not, but currently
only `SO_REUSEADDR' is set on TCP sockets:

https://github.com/racket/racket/blob/60bf8f970e97caae391bfe919b78c370b2d01bdd/racket/src/rktio/rktio_network.c#L1427

I think we'd need to also set `SO_REUSEPORT', which is not available on
all platforms, to support multiple processes listening on the same port
without reusing file descriptors. Running a second instance of this
program:

#lang racket/base

(require racket/tcp)

(tcp-listen 9999 512 #t)
(sync/enable-break never-evt)

Currently fails with:

tcp-listen: listen failed
port number: 9999
system error: Address already in use; errno=48
context...:
...

Greg Hendershott pointed out to me on Slack a while back that the
`ffi/usafe/port' module could be used to bind sockets with custom
options using the FFI and then convert them to ports, but I have yet to
try that.

> I'm not sure what you mean by "sharing" a listener, but using a
> single listener with a pool of processing places actually is possible
> (though tricky).

I mean passing a `tcp-listener?' around between places so that each
place can call `tcp-accept' on it. Something like this:

#lang racket/base

(require racket/place
racket/tcp)

(define ch
(place ch
(define listener (place-channel-get ch))
(tcp-accept listener)
(place-channel-put 'ok)))

(module+ main
(define listener
(tcp-listen 9999 512 #t))
(place-channel-put ch listener)
(place-channel-get ch))

Currently, this fails with:

place-channel-put: contract violation
expected: place-message-allowed?
given: #<tcp-listener>
context...:
...

My understanding is that there's nothing preventing this from working
apart from the fact that no one's yet added support for this in the
rktio layer. As you mentioned, though, even if this was supported we
might run into other limitations when running many places at once.

Jon Zeppieri

unread,
Jun 9, 2020, 8:11:33 AM6/9/20
to Bogdan Popa, George Neuner, racket users list
On Tue, Jun 9, 2020 at 7:59 AM Bogdan Popa <bog...@defn.io> wrote:
>
> I think we'd need to also set `SO_REUSEPORT', which is not available on
> all platforms, to support multiple processes listening on the same port
> without reusing file descriptors.

And even where it is available, it doesn't work the same way. The
Linux version can be used to load balance accept()s across processes,
but the BSD version (also in OS X) cannot. (FreeBSD apparently has a
variant, SO_REUSEPORT_LB, that behaves like the Linux version of
SO_REUSEPORT.)

George Neuner

unread,
Jun 9, 2020, 10:40:22 AM6/9/20
to racket users

On 6/9/2020 7:59 AM, Bogdan Popa wrote:
> George Neuner writes:
>
> > Multiple Racket applications *should* all be able to listen on the
> > same port without having been spawned from the same ancestor process.
> > If that isn't working now, something has gotten hosed.
>
> I don't know whether this used to work in the past or not, but currently
> only `SO_REUSEADDR' is set on TCP sockets:
>
> https://github.com/racket/racket/blob/60bf8f970e97caae391bfe919b78c370b2d01bdd/racket/src/rktio/rktio_network.c#L1427
>
> I think we'd need to also set `SO_REUSEPORT', which is not available on
> all platforms, to support multiple processes listening on the same port
> without reusing file descriptors.

It has been discussed before, but perhaps now, with RacketCS imminent,
the socket functions should be changed / expanded to allow changing
options without resorting to FFI.

AFAICS, the functionality of SO_REUSEPORT (if not the verbatim option)
is available on all supported platforms.  Granted it is (relatively) a
recent addition to both Linux and Windows.

Android doesn't support it, but AFAIK, Android is not considered a
supported system.
Only serialized[1] functions can be passed between place ... and the
receiving place still has to have imported the libraries (if any)
necessary to understand it.  I don't know if a listener even can be
serialized, though ... AFAIK a serializable function has to be compiled
that way.

George

[1]  https://docs.racket-lang.org/web-server-internal/closure.html

George Neuner

unread,
Jun 9, 2020, 10:56:30 AM6/9/20
to racket users list
And Windows has 2 options that mean essentially the same thing:
SO_REUSE_UNICASTPORT  and  SO_PORT_SCALABILITY.

SO_PORT_SCALABILITY  was introduced in Windows 7 / Server 2008.
SO_REUSE_UNICASTPORT  was introduced in Windows 10.

Whichever port reuse option is supported by the platform, it normally is
set automagically whenever  SO_REUSEADDR  is specified. Only functions
that require an explicit bind need it to be set manually.



It's clear that enabling the functionality would have to be studied
carefully.

George

Bogdan Popa

unread,
Feb 10, 2021, 4:58:24 AM2/10/21
to hashim muqtadir, racket...@googlegroups.com
Round 20 was recently published and Racket's score improved, as expected:

https://www.techempower.com/benchmarks/#section=data-r20&hw=ph&test=composite

We're now beating many of the popular frameworks in the composite scores
and I think there are still plenty of improvements that could be made.
Reply all
Reply to author
Forward
0 new messages