Racket slower than Chez Scheme on interpreter benchmark, potential low hanging fruit?

philngu...@gmail.com

unread,

Mar 1, 2021, 2:39:29 AM3/1/21

to Racket Users

There’s this benchmark on BF interpreter where the Racket and Chez Scheme implementations are very similar, but Chez Scheme is much faster than Racket 8.0 at interpreting bench.b (3s vs 8s) and mandel.b (40s vs 136s).

There’s the “Racket (Syntax Object)” variant that directly parses BF’s syntax into Racket syntax object, which is faster (3.7s for bench, 82s for mandel), but still significantly behind Chez Scheme’s naive interpreter.

Profiling doesn’t give very illuminating results, saying most of the cost is from interpreting BF’s loop instruction.

Given that Racket is on Chez, could this benchmark reveal some low hanging fruit for improving Racket’s performance?

Sam Tobin-Hochstadt

unread,

Mar 1, 2021, 11:54:04 AM3/1/21

to philngu...@gmail.com, Racket Users

When Chez is faster than Racket CS, the usual culprits are either:
- mutable pairs
- very large code size that causes Racket CS to interpret the outer module

However, neither of those seem to be happening here.

Sam

> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/83b2819d-8295-4769-944d-fa0c155976dan%40googlegroups.com.

sleepnova

unread,

Mar 2, 2021, 3:37:29 AM3/2/21

to philngu...@gmail.com, Racket Users

I previously came out with a version that converts a BF program directly into a module, which has received some optimization contributions from Matthew Flatt.
Got some pretty good results.

Execute bench.b in cpu time: 1542 ms, (2.2s include compile time).

And mandel.b in cpu time: 3851 ms, (7s include compile time). But you have to manually "export PLT_CS_COMPILE_LIMIT=100000" environment variable in order to get compiled and executed properly.

https://gist.github.com/sleepnova/92d7b2a7f077e7de76da4ce31f60335e

philngu...@gmail.com <philngu...@gmail.com> 於 2021年3月1日週一下午3:39寫道：

--

You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/83b2819d-8295-4769-944d-fa0c155976dan%40googlegroups.com.

--

- sleepnova

呼叫小黃創辦人 & CEO

Dexter Lagan

unread,

Mar 3, 2021, 1:46:02 PM3/3/21

to Racket Users

For what it's worth, I ran my own benchmark on Racket 8.0 and Racket 8.0.10 (current), and current is between 50 and 100% faster for certain operations. There must have been some optimizations done recently to current.

Dex

Sam Tobin-Hochstadt

unread,

Mar 3, 2021, 2:12:30 PM3/3/21

to philngu...@gmail.com, Racket Users

First, there's no longer a difference because yjqww6 just had a PR
merged that improves the Racket performance.

The performance difference that was there was mostly because the Chez
code was run with `--optimize-level 3` which turns off safety. If that
was changed to `--optimize-level 2` the timing became much slower.

Sam

On Mon, Mar 1, 2021 at 2:39 AM philngu...@gmail.com
<philngu...@gmail.com> wrote:
>

philngu...@gmail.com

unread,

Mar 4, 2021, 3:22:57 AM3/4/21

to Racket Users

Thanks for the tip about PLT_CS_COMPILE_LIMIT! I submitted a revision to the syntax object variant that incorporated sleepnova's and yjqww6's improvements.

Also, I never knew about `(#%declare #:unsafe)` until I saw yjqww6's pull request. It makes a noticeable difference. One unsatisfying thing is that in one place, if I replace the 4 separate define clauses with just `(match-define (cons (op o val) rst) parsed)`, the benchmarks are more than twice slower.

Sam Tobin-Hochstadt

unread,

Mar 4, 2021, 12:40:22 PM3/4/21

to philngu...@gmail.com, Racket Users

I think there are two reasons that code gets slower.

1. The `match-define` includes pair and struct checks, which are
omitted for plain accessor uses because of the unsafe declaration.
2. That use of `match` expands to `define-values` which ends up as a
`call-with-values` and a `case-lambda` at the chez layer and is not
removed.

`match` could recognize that it's being compiled with unsafe mode and
omit these checks, although it's not that straightforward. Also
schemify (or Chez) could do more to eliminate the use of multiple
values, although that's hard without eliminating the failure cases.

Sam

> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/a5e77286-68b8-481a-8dea-0f547c5ce968n%40googlegroups.com.

philngu...@gmail.com

unread,

Mar 4, 2021, 3:29:13 PM3/4/21

to Racket Users

Oh I see. So one problem is here that `match-define` expands to `match` with an implicit error case, which at the low level, isn't distinguished from a user-written second case, and the tag check can't just be eliminated.

Sorawee Porncharoenwase

unread,

Mar 7, 2021, 9:51:45 AM3/7/21

to philngu...@gmail.com, Racket Users

With the recent improvements by Phil, the rank of the syntax object variant moves up from 26th to the second (what?!?), losing only to c++ / g++. Moreover, it's significantly faster than the third place.

To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/f771a13a-9fc4-4b71-9e47-3a83eb5290e7n%40googlegroups.com.

Reply all

Reply to author

Forward