Does Racket interpreter exist?

146 views
Skip to first unread message

zeRusski

unread,
Jul 26, 2020, 1:36:23 PM7/26/20
to Racket Users
Hi all. I wonder if such a thing exist or even possible? Question triggered by the trade off between "compile slowly now to run fast later" vs "start fast". Racket like other modern(ish) Scheme derivatives appear to have settled on being in the former camp. Is there anything in the language that would make interpretation infeasible (semantics?) or unreasonably slow (expansion?)? Chez sports an interpreter with some limitations. I think Gambit ships one although I'm not sure how performant that one is. IIRC Guile recently got something. What about Racket?

Thanks

Sam Tobin-Hochstadt

unread,
Jul 26, 2020, 4:17:43 PM7/26/20
to zeRusski, Racket Users
Hi,

Racket BC (the non-Chez version) does use an interpreter. The pipeline
in Racket BC is

source code => expanded code => compiled bytecode => interpreter

or

source code => expanded code => compiled bytecode => JIT compiler
=> machine code

You can turn off the JIT compiler with the `-j` flag, meaning that it
always uses the interpreter. There is also an interpreter for Racket
CS, but that is a little harder to control manually and less
efficient, so I'll ignore it for these purposes.

The compilation time for Racket is quite small, actually, and
typically pays for itself. The "start slow" effect that you see is
mostly 3 things, in varying proportions in different settings:

1. Time to run macro expansion.
2. Time to do IO to load the standard library (such as `racket/base`)
3. Time to execute the modules in the standard library

For example, if we look at the time to run the command `racket -l
racket/base`, which just loads `racket/base` and exits, provided that
you've fully compiled all of `racket/base`, that takes no time under
bullet 1. But it's still somewhat slow, about 70 ms on my machine.
Python, executing a single print command, is an order of magnitude
faster. That's because Python (a) loads many fewer python source files
on startup (`racket -l racket/base` looks at 234 files that are
actually there with `.rkt` in the name, Python looks at 96) and (b) I
believe the Python module loading semantics allows less work at load
time. Additionally, when Racket starts up, it executes the source of
the expander, which is stored as bytecode in the binary.

All these things add up to slower start time. For user programs, if
you time just expansion of the program (with `raco expand`) and also
compiling the program (with `raco make`) you'll see that most of the
time is expansion time. For the JIT compiler, turning it off
_increases_ startup time because JIT compilation is enough of a win
on, for example, the code in the macro expander.

To have a "start-fast" version of Racket, we would need to pursue some
of the following directions:
1. ways of loading code using `mmap()` instead of doing IO and
parsing bytecode or compiled machine code
2. ways to indicate that certain modules didn't need to do any real
execution, perhaps because they contain purely definitions
3. ways to flatten all of a racket program into something that can be
compiled and loaded as a single file, avoiding IO (this accomplishes 2
as well)
4. ways to make the macro expander substantially faster

Sam


On Sun, Jul 26, 2020 at 1:36 PM zeRusski <vladile...@gmail.com> wrote:
>
> Hi all. I wonder if such a thing exist or even possible? Question triggered by the trade off between "compile slowly now to run fast later" vs "start fast". Racket like other modern(ish) Scheme derivatives appear to have settled on being in the former camp. Is there anything in the language that would make interpretation infeasible (semantics?) or unreasonably slow (expansion?)? Chez sports an interpreter with some limitations. I think Gambit ships one although I'm not sure how performant that one is. IIRC Guile recently got something. What about Racket?
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/dd9dd201-5826-4453-8fbe-babc0c477dcdo%40googlegroups.com.

zeRusski

unread,
Jul 27, 2020, 12:35:15 PM7/27/20
to Racket Users
Thank you for this fantastic reply Sam! I now think I had a very naive model of "interpreter" when I asked the question. My CS degree from the nowhere university has it that language interpreters walk the tree and you know "execute" be it in the host language or generating native code. I feel a bit stupid now. Technically you're absolutely right - there is an interpreter for the bytecode (or whatever), so duh. But there are also a bunch of compilation steps in between. I am now completely lost as to what constitutes a compiler and what makes an interpreter. I always thought of interpreters as something I encountered in PLAI. I remember reading some old paper abound "fast interpreters" and all of them implemented a virtual machine they "compiled" to and I was lost then - how's that not a compiler - as I am lost now.

Sam Tobin-Hochstadt

unread,
Jul 27, 2020, 12:54:49 PM7/27/20
to zeRusski, Racket Users
A few thoughts on interpreters vs compilers:

- somewhere, there has to be an interpreter -- the x86 chip in my
laptop is interpreting the x86 code that Racket generates.
- there could certainly be a more direct AST-based interpreter for
(fully-expanded) Racket. My work on Pycket involved writing such an
interpreter, for example.

The best way to distinguish compilers from interpreters is that a
compiler takes a program and produces another program, whereas an
interpreter takes a program (along with some input) and produces an
answer.

Sam

On Mon, Jul 27, 2020 at 12:35 PM zeRusski <vladile...@gmail.com> wrote:
>
> Thank you for this fantastic reply Sam! I now think I had a very naive model of "interpreter" when I asked the question. My CS degree from the nowhere university has it that language interpreters walk the tree and you know "execute" be it in the host language or generating native code. I feel a bit stupid now. Technically you're absolutely right - there is an interpreter for the bytecode (or whatever), so duh. But there are also a bunch of compilation steps in between. I am now completely lost as to what constitutes a compiler and what makes an interpreter. I always thought of interpreters as something I encountered in PLAI. I remember reading some old paper abound "fast interpreters" and all of them implemented a virtual machine they "compiled" to and I was lost then - how's that not a compiler - as I am lost now.
>
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/b183ae65-524d-4e70-9bee-ce0531bf21feo%40googlegroups.com.

zeRusski

unread,
Jul 27, 2020, 2:07:43 PM7/27/20
to Racket Users
The best way to distinguish compilers from interpreters is that a
compiler takes a program and produces another program, whereas an
interpreter takes a program (along with some input) and produces an
answer.

Doesn't this trivialize the difference a bit too much? Does it really come down to the time you choose to compute? I persist intermediate state - I am a compiler. I do the exact same thing (run off the same "compiler" codebase) but ask for inputs and compute - I am an interpreter.

I scratched my head a bit and came up with the following: if there is one to one correspondence between target and host language "blocks" (there may be multiple such pairs till you get to machine code) then this is an interpreter. If you do nasty shenanigans in the generated code in the host (semantics altering optimisation passes and what not) then this is a compiler. Latter ought to effect debugging even error reporting quite drastically. This is almost certainly naive and amateur of me. I'm desperately trying to understand why I have the impression that there ought to be a non-trivial difference between the two. That's been my impression from the literature.

I think I had this paper in mind: https://www.jilp.org/vol5/v5paper12.pdf 
oh wait p4: "we do not have a precise definition for efficient interpreter" ... oh good. Although they do mention some justification for "compiling" to a VM in the following paragraphs that boils down to "avoid the overhead of re-parsing and re-interpreting intermediate representation". I vaguely recall that Lisp in Small Pieces switched to a VM when the "fast interpretation" was introduced.

I may have confused myself. Badly :(

I guess I would call Tcl and Picolisp interpreters and Racket and most Scheme implementations of note not at all.

Hendrik Boom

unread,
Jul 28, 2020, 7:17:56 AM7/28/20
to Racket Users
On Mon, Jul 27, 2020 at 11:07:43AM -0700, zeRusski wrote:
>
> >
> > The best way to distinguish compilers from interpreters is that a
> > compiler takes a program and produces another program, whereas an
> > interpreter takes a program (along with some input) and produces an
> > answer.
> >
>
> Doesn't this trivialize the difference a bit too much? Does it really come
> down to the time you choose to compute? I persist intermediate state - I am
> a compiler. I do the exact same thing (run off the same "compiler"
> codebase) but ask for inputs and compute - I am an interpreter.
>
> I scratched my head a bit and came up with the following: if there is one
> to one correspondence between target and host language "blocks" (there may
> be multiple such pairs till you get to machine code) then this is an
> interpreter. If you do nasty shenanigans in the generated code in the host
> (semantics altering optimisation passes and what not) then this is a
> compiler. Latter ought to effect debugging even error reporting quite
> drastically. This is almost certainly naive and amateur of me. I'm
> desperately trying to understand why I have the impression that there ought
> to be a non-trivial difference between the two. That's been my impression
> from the literature.


There is an important distinction here, and it has to do with resource
consumption when iterated.

This is most easily explained in the case(s) of a compiler and
interpreter that are written in the languages they implement.

Clearly you can compile such a compiler using itself, and similarly,
interpret such an interpreter, and you can do that n levels deep should
you want to.

But when going n levels deep, the total execution time with a compiler
is linear in n, and with an interpreter it's exponential.

That makes interpreting interpreters impractical when n gets large (even
with n around 3 or 4); whereas compiling compilers can be done even for
larger n.

This is important when the nesting implemntations are not identical.
This might involve interpreting different languages one in the other.
But it might, very practically, involve compiling different versions of
the same language in a proess of incremental development. The Algol 68
C compiler, for example, went through hundreds of versions, each
compiling the next one, an each providing better tools for the
implementer to use. That would have been practically impossible with a
self-interpreter.

By the way, one of their tests was to use the compiled compiler to
compile itself two levels deep and see if the two self-compiled object
codes were identical. If not, there was a problem.

Many systems are a mix of compiling and interpreting, such as many
Scheme implementations.

And ultimately, the actual machine hardware usually interprets machine
code.

Yes, there are ambiguous corner cases. Such as some of the Apple macs
that run previous hardware generations by compiling basic blocks of
old machine code one by one at run time. Or JIT compilation. Or running
Verilog code on a programmable gate array.

Some of these are situations where the self-implementation model just
doesn't really apply. Silicon is silicon, and is hardware, and isn't
interpreted. It just does what silicon does.

-- hendrik

>
> I think I had this paper in mind: https://www.jilp.org/vol5/v5paper12.pdf
> oh wait p4: "we do not have a precise definition for efficient interpreter"
> ... oh good. Although they do mention some justification for "compiling" to
> a VM in the following paragraphs that boils down to "avoid the overhead of
> re-parsing and re-interpreting intermediate representation". I vaguely
> recall that Lisp in Small Pieces switched to a VM when the "fast
> interpretation" was introduced.
>
> I may have confused myself. Badly :(
>
> I guess I would call Tcl and Picolisp interpreters and Racket and most
> Scheme implementations of note not at all.
>
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/98a4e936-1703-4182-bb85-ce78a8694feao%40googlegroups.com.

zeRusski

unread,
Jul 29, 2020, 4:56:05 AM7/29/20
to Racket Users
This is a really cool piece of history! Thank you.

I'll admit I'm somewhat fuzzy here - it maybe a bit too meta for me or perhaps I don't quite understand what you're trying to say. Isn't interpreting n levels deep also linear in n? Only difference between the two approaches  I see is that compiler lets you persist the fruits of its labor so you don't have to start from scratch every time. Couldn't you accommodate this with an interpreter (with some effort) although at this point it becomes a compiler I suppose. Definitely fuzzy here.

Hendrik Boom

unread,
Jul 29, 2020, 11:43:18 AM7/29/20
to Racket Users
On Wed, Jul 29, 2020 at 01:56:05AM -0700, zeRusski wrote:
> This is a really cool piece of history! Thank you.
>
> I'll admit I'm somewhat fuzzy here - it maybe a bit too meta for me or
> perhaps I don't quite understand what you're trying to say. Isn't
> interpreting n levels deep also linear in n?

Answer 1.

Let's say, for example, that it takes 10 instructions executed in an
interpreter to decode, take decisions, and execute an interpreted
instruction. (In real life this varies a lot between interpreters and
between interpreted instructions but let's keep the example simple.)
(and, also to keep things simple, let's say the interpreter in
interpreting the machine language of the machne it's running on --
not an unusual technique in a debugger.)

Then interpreting a program using an interpreter running on hardware
would take ten times as long as executing the program on the same
hardware.

And likewise, interpreting the program in an interpreter being
interpreted by the interpeter running on the hardware would take another
ten times as lng as just interpreting the program with an interpreter
running on the hardware.

Thus 10 x 10, times as slow; this is where the exponential comes in.

> Only difference between the
> two approaches I see is that compiler lets you persist the fruits of its
> labor so you don't have to start from scratch every time. Couldn't you
> accommodate this with an interpreter (with some effort) although at this
> point it becomes a compiler I suppose. Definitely fuzzy here.

That is exactly the difference between a compiler and an interpreter.

Answer 2:

Time to muddy the situation.

There are mixed-style language implementations.

If you only execute each piece of code once, interpreters tend to be
faster. But if you are to execute code many times, it's faster to
compile. It takes time to compile, but you get it back by saving time
in later executions.

So what thee mixed implementations to is to interpret the code, but
keeping track how often each piece of code (such as a function) is
called.

When the count reaches a particular threshold, it pauses interpretation
and compiles that code, the compiled code to be used thereafter.

There is some time penalty over compilation, because you waste time
interpreting functions several times before you compile them.
This is offset by not having to compile code that is used only a few
times.

The time behaviour of this kind of system overall is more like a
compiler than an interpreter because the code that is executed the most
does eventually get compiled and the rest gets interpeted only a limited
number of
times.

However, if it were to execute a copy of its own code, it would have to
recompile it, unless it had a mechanism to check if the newly input
program is the same as one it has already compiled. This isn't
impossible, but is not usually done.

>
> > But when going n levels deep, the total execution time with a compiler
> > is linear in n, and with an interpreter it's exponential.

I use this criterion to tell whether a particular implementation is more
like an interpreter or a compiler.

> >
> > That makes interpreting interpreters impractical when n gets large (even
> > with n around 3 or 4); whereas compiling compilers can be done even for
> > larger n.
> >

Answer 3.

On modern machines, it is also important to keep memory demands low.
Accessing large amounts of memory regularly tends to push other data our
of cache, or even out of RAM onto disk.

Conserving storage is a common reason for using interpreted bytecode as
the target language for a compiler. The bytecode is usually smaller
than the machine language. If the bytecode interpreter is small enough,
this is a definite win. Bytecode was first used, as far as I know, on
machines with small memories -- about 64K RAM total or even less.

Byte-code can also be portable. You just need to write a (usually
amall) bytecode interpreter for each new machine.

Of course it's still possible to, at run-time, compile the most-used
byte-coded functions in to actual machine code to trade memory use for
execution time.

Answer 4: An example:

The language FORTH used a *word*-code interpreter instead of a
byte-code interpreter, each word being two bytes, and containing the
address of the interpreter's machine-code routine that implemented that
instruction.

This meant each word representing an instruction could be
executed in the interpreter by a hardware function call to an
indirect address. In fact, user-coded functions could be called the
same way -- each would be a sequence of addresses preceded by the
machine code that invokes the word-code interpreter.

Still very compact on a machine with two-byte addresses.

Utterly impractical on a modern machine with 64-bit addresses, where the
machine code for an operation can often be smaller than a machine
address.

-- hendrik

I hope this set of answers clarifies the distinction between an
interpeter and compiler, how the distinction gets blurred in practice,
and what the criteria are for choosng between them.

-- hendrik

>
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/2f288d86-d90c-4881-b001-389b580fece8o%40googlegroups.com.

zeRusski

unread,
Jul 29, 2020, 1:19:31 PM7/29/20
to Racket Users
-- hendrik

I hope this set of answers clarifies the distinction between an
interpeter and compiler, how the distinction gets blurred in practice,
and what the criteria are for choosng between them.

This was both detailed, insightful and truly helpful. I can't thank you enough for taking the time Hendrik! I'm glad I pressed the matter.

Thank you

Arthur Nunes-Harwitt

unread,
Jul 30, 2020, 10:45:01 AM7/30/20
to Hendrik Boom, Racket Users
Hi,

While you're enumerating these possibilities, I think it's worthwhile to
mention a technique related to the FORTH implementation technique: Marc
Feeley style compilation (see "Using Closures For Code Generation" in
Computer Language Vol 12 No 1 pages 47-66). This idea is also mentioned in
SICP.

-Arthur



==============================================================
Arthur Nunes-Harwitt
Computer Science Department, Rochester Institute of Technology
Room GOL-3509
585-475-4916
==============================================================

"I don't know what the language of the future will be
called, but it will look like LISP."

This email is confidential and intended for the named recipient(s). In
the event the email is received by someone other than the recipient,
please notify the sender at a...@cs.rit.edu.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/20200729154310.dol2c2ks3d6kome6%40topoi.pooq.com.
>
>

zeRusski

unread,
Jul 31, 2020, 4:16:34 AM7/31/20
to Racket Users
Thank you Arthur. Indeed Lisp in Small Pieces cites this paper in the chapter about fast interpreters among two others but sadly in French :)
Reply all
Reply to author
Forward
0 new messages