LLVM: A superior Lisp compiler framework?

Adam Warner

unread,

Dec 18, 2004, 6:29:36 AM12/18/04

to

Hi all,

Today I learned of LLVM, the Low Level Virtual Machine framework. It is
*very* impressive: <http://llvm.cs.uiuc.edu/>

It's a BSD-style licensed ahead-of-time, just-in-time, highly optimising
compiler with a very attractive bytecode format that doesn't force a
particular object system upon bytecode generators. In addition:
<http://llvm.cs.uiuc.edu/pubs/2004-01-30-CGO-LLVM.html>

LLVM defines a common, low-level code representation in Static Single
Assignment (SSA) form, with several novel features: a simple,
language-independent type-system that exposes the primitives commonly
used to implement high-level language features; an instruction for
typed address arithmetic; and a simple mechanism that can be used to
implement the exception handling features of high-level languages (and
setjmp/longjmp in C) uniformly and efficiently.

Furthermore it has GNU C and C++ to LLVM translators based upon GCC (this
part is GPL'd, of course). This is handy for bootstrapping a language and
library support. However you can completely bypass GCC by generating LLVM
bytecode that is natively or JIT compiled (and we're talking state of the
art optimisation with multiple register allocation and spiller algorithms,
peephole optimisation, frame pointer elimination, etc). As a consequence,
<http://llvm.cs.uiuc.edu/docs/FAQ.html#cfe_code> ;-)

Where did all of my code go??

If you are using the LLVM demo page, you may often wonder what happened
to all of the code that you typed in. Remember that the demo script is
running the code through the LLVM optimizers, so if your code doesn't
actually do anything useful, it might all be deleted.

There's also an LLVM->LLVM optimiser which "takes LLVM bytecode as input,
runs the specified optimizations on it, and then outputs the optimized
LLVM bytecode" (this may assist naive bytecode generators). LLVM includes
type analysis that is able to deduce more safe optimisations than some C
compilers. Tests and benchmarks are run daily:
<http://llvm.cs.uiuc.edu/testresults/X86/#Program>

If you compare the GCC and CBE (C backend->LLVM) columns you will see its
native compilation occasionally does better than GCC -O2 (and its JIT
results are not too shabby).

Lisp developers have wanted access to GCC's backend for years but
political issues have made this technically and legally infeasible:
<http://gcc.gnu.org/ml/gcc/2000-01/msg00572.html>.

In addition to X86 there are SparcV9 and PowerPC backends. There is also
a plain interpreter (largely for presently unsupported JIT platforms).

There is also support for hooking in precise garbage collectors:
<http://llvm.cs.uiuc.edu/docs/LangRef.html#int_gc>
<http://llvm.cs.uiuc.edu/docs/GarbageCollection.html>
(A simple copying garbage collector example is available)

An instructive Scheme compiler has been written by Tobias Nurmiranta
<http://www.ida.liu.se/~tobnu/scheme2llvm/> in around a thousand lines of
code. As a batch compiler it converts a subset of Scheme to assembly which
can then be piped to llvm-as for bytecode generation and execution by lli
(the JIT or plain interpreter) or llc (the ahead-of-time compiler).

Note that the LLVM platform also supports in-memory code generation that
bypasses the file system. At this stage it appears the only JIT code
generation API is C++.

I hope there will soon be instructive solutions to incremental run time
compilation (or JIT compilation) without having to write a REPL in C++.
This is the most applicable discussion about JIT compilation from the
developer's mailing list:
<http://mail.cs.uiuc.edu/pipermail/llvmdev/2004-November/002566.html>

I may not be able to participate in any discussion this message generates.
I just want to pass this info on to the Lisp/Scheme community.

Regards,
Adam

Michael Sperber

unread,

Dec 18, 2004, 10:54:20 AM12/18/04

to

>>>>> "Adam" == Adam Warner <use...@consulting.net.nz> writes:

Adam> Today I learned of LLVM, the Low Level Virtual Machine framework. It is
Adam> *very* impressive: <http://llvm.cs.uiuc.edu/>

Without an efficient representation of first-class continuations, it's
too weak for Scheme. I also don't see direct support for proper tail
recursion, which would also be needed to make things work well.

--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla

Shriram Krishnamurthi

unread,

Dec 18, 2004, 5:51:54 PM12/18/04

to

In fact, why would anyone design an LLVM in this day and age without
tail-call support?

Shriram

Ray Dillinger

unread,

Dec 18, 2004, 7:34:50 PM12/18/04

to

Shriram Krishnamurthi wrote:
> In fact, why would anyone design an LLVM in this day and age without
> tail-call support?
>

I dunno; I think it's the compiler's job to turn tail calls
into iterative forms; I don't mind not having the VM do it.

The thing about continuations though, is that there are two
basic methods with different performance tradeoffs for
implementing continuations (stack copying vs. heap-allocating
call frames for GC to reap).

If the LLVM is going to explicitly model continuations, it
needs to be possible to pick which kind.

Bear

Lauri Alanko

unread,

Dec 18, 2004, 7:53:34 PM12/18/04

to

In article <K04xd.13159$_3.14...@typhoon.sonic.net>,

Ray Dillinger <be...@sonic.net> wrote:
> I dunno; I think it's the compiler's job to turn tail calls
> into iterative forms; I don't mind not having the VM do it.

Uh, how do you turn tail calls into iterative forms in the presence of
higher-order functions (i.e. when the function to call isn't
statically known)?

Lauri

Antoun Kanawati

unread,

Dec 18, 2004, 10:40:35 PM12/18/04

to

If you can call such a function then you can also jump (or goto) to
the same place.
--
A. Kanawati
NO.anto...@comcast.net

Adam Warner

unread,

Dec 19, 2004, 2:39:18 AM12/19/04

to

Hi Michael Sperber,

> Adam> Today I learned of LLVM, the Low Level Virtual Machine framework. It is
> Adam> *very* impressive: <http://llvm.cs.uiuc.edu/>
>
> Without an efficient representation of first-class continuations, it's
> too weak for Scheme. I also don't see direct support for proper tail
> recursion, which would also be needed to make things work well.

The tail call documentation is unofficial and currently available from the
primary author of LLVM, Chris Chamber:
<http://nondot.org/sabre/LLVMNotes/>
<http://nondot.org/sabre/LLVMNotes/GuaranteedEfficientTailCalls.txt>

...

Efficient support for aritrary tail calls is absolutely essential for a
broad class of languages, particularly functional languages. In
particular, the Scheme language requires, as part of its language
definition, that all implementations implement arbitrary tail calls
efficiently (even indirect calls!).

...

LLVM currently supports an optimization named 'tail recursion
elimination'. This optimization converts self recursive functions who
have simple tail calls into explicit loops. There is a large amount of
overlap between tail recursion elimination and proper tail call
support, but proper tail call support is more general: it applies to
arbitrary tail calls to arbitrary (even unknown) functions.

...

... languages that do not require all functions to be compatible with C
calling conventions should default to marking their functions as CC#0
explicitly: this will permit guaranteed predictable tail calls in all
non-varargs cases (which are all of the cases possible).

Regards,
Adam

Adam Warner

unread,

Dec 19, 2004, 2:50:32 AM12/19/04

to

Hi Michael Sperber,

> Adam> Today I learned of LLVM, the Low Level Virtual Machine framework. It is
> Adam> *very* impressive: <http://llvm.cs.uiuc.edu/>
>
> Without an efficient representation of first-class continuations, it's
> too weak for Scheme. I also don't see direct support for proper tail
> recursion, which would also be needed to make things work well.

The tail call documentation is unofficial and currently available from the
primary author of LLVM, Chris Lattner:

Anton van Straaten

unread,

Dec 19, 2004, 2:34:50 PM12/19/04

to

Adam Warner <use...@consulting.net.nz> wrote:

>[Michael Sperber wrote:]

> > Without an efficient representation of first-class continuations, it's
> > too weak for Scheme. I also don't see direct support for proper tail
> > recursion, which would also be needed to make things work well.
>
> The tail call documentation is unofficial and currently available from the
> primary author of LLVM, Chris Lattner:
> <http://nondot.org/sabre/LLVMNotes/>
> <http://nondot.org/sabre/LLVMNotes/GuaranteedEfficientTailCalls.txt>
...

> ... languages that do not require all functions to be compatible with C
> calling conventions should default to marking their functions as CC#0
> explicitly: this will permit guaranteed predictable tail calls in all
> non-varargs cases (which are all of the cases possible).

The latter quote is under a section "Proposed implementation" and the
document is dated Sep 5, 2004. IIUC, this implies that a compiler which
marks functions appropriately now will benefit from "guaranteed predictable
tail calls" when that feature is supported by LLVM.

There's another document also dated Sep 5:
http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt , which
describes a way to manage stack frames in LLVM to support "garbage collected
closures". The technique seems to be to convert code to CPS, and allocate
stack frames on the heap (standard stuff). But the document ends with "If
someone is interested in support for this, guaranteed efficient tail call
support and custom calling conventions are the two features that need be
added to LLVM." Sounds like fun!

Not surprisingly, the sample Scheme->LLVM compiler doesn't support call/cc
or tail calls.

It might be possible to use Cheney on the MTA with LLVM, although I wouldn't
be surprised to find that LLVM assumptions about how the stack is used, or
the way gc should be implemented, might get in the way of this.

Anton

Jonathan Bartlett

unread,

Dec 20, 2004, 11:31:35 AM12/20/04

to

> Uh, how do you turn tail calls into iterative forms in the presence of
> higher-order functions (i.e. when the function to call isn't
> statically known)?
>

Tail calls aren't limited to calling the same function. Once the
function is finished, it no longer needs its stack space. To implement
tail calls, the compiler just needs to clean up the stack spaced used
for the function call, and overwrite it with the next function call.

Let's say your function has two integer parameters and 4 integer local
variables in a C-like language:

int my_function(int a, int b)
{
int c,d,e,f;

/* do stuff here */

return another_calculation(c, d);
}

Here, another_calculation is in tail position. So, at the time of the
call, the stack would look like this:

PARAM2 (b)
PARAM1 (a)
RETURN ADDRESS
OLD EBP
LOCAL1 (c)
LOCAL2 (d)
LOCAL3 (e)
LOCAL4 (f)

So, the compiler needs to simply overwrite this stack frame with one
that looks like this:

PARAM2 (d)
PARAM1 (c)
RETURN ADDRESS FROM PREVIOUS STACK

Then, it just needs to do a jmp (instead of a call) to
another_calculation. See, we copied the return address, so that when
another_calculation returns, it doesn't have to bounce through this
function, it can return its value directly to whoever called my_function.

Jon
----
Learn to program using Linux assembly language
http://www.cafeshops.com/bartlettpublish.8640017

Jonathan Bartlett

unread,

Dec 20, 2004, 11:39:17 AM12/20/04

to

I forgot to mention that you also have to restore the old ebp before
doing the jump (the next function will save it for you). Also, this
doesn't work if the program is set to use offsets from esp instead of
ebp, or if it assumes that the parameters it sent will still be there
after the return.