speedups and roadmap (long)

Alex Shinn

unread,

Jul 14, 2011, 1:17:15 AM7/14/11

to chibi-...@googlegroups.com

The latest tip gives about a 10-30% speedup on the
Gabriel benchmarks over the 0.4 release. This
is mostly low-hanging fruit, and there's still room for
improvement, but keeping in line with Chibi's niche
I want to restrict optimizations to either simple
changes or dynamically loaded AST rewriting changes
written in Scheme. The idea is that, after we have
support for pre-compiled modules, we can load as
many slow and complex optimizations as we want
just during batch compilation, and keep the core
small.

Chibi will always be a tiny Scheme implementation,
easy to use from C, with features added by building
on a small core.

However, I want a _fast_ Scheme implementation.
There's no reason we shouldn't be able to have an
implementation as fast as, say, ocaml, which is to
say faster than C in some cases. This doesn't have
to be Chibi, or even my own implementation, but
the other Scheme compilers out there are difficult
to work with. I don't think they have to be. Unfortunately
(or fortunately?) I don't have time to sit down and
write a full optimizing compiler from scratch (or rather
flesh out my other, unpublished compiler). I have a
busy job, and spend maybe a half day on the weekends
and an hour or two in the odd evening working on my
hobbies, which lately is mostly devoted to R7RS.
Which means to make progress I need a plan.

The plan is to apply the ship of Theseus to Chibi,
to turn it into a smart, optimizing native compiler. I will
replace individual components of Chibi one at a
time, such that each change is useful and complete
in itself, is a step towards the final goal, and Chibi
remains small and suitable as a C extension language
in the meantime. The major steps are:

1. Simple native (x86) backend. With a compiler
setting enable compiling directly to machine code
instead of the VM (i.e. directly, not a JIT). The
infrastructure for this is complete, and some simple
examples work, so I hope to get this published soon.
This is the basis for all future speed - it's just not
worth spending that much time tuning the VM if we
can go native. This will be a very naive compiler,
with greedy register allocation and no peephole
optimizations.

2. Rewrite the x86 backend in Scheme. This will be
a separate module, bootstrapped by the version above.
This will make it easier to include low-level optimizations
such as peephole optimizations and smart register
allocation.

3. Rewrite the frontend compiler in Scheme. This will
also be a separate module, bootstrapped by the C
compiler. This will make mid-level optimizations and
macro extensions easier.

4. (optional) Rewrite most/all of the runtime in Scheme.
Use a restricted language or special optimizations to rewrite
the GC.

5. (optional) Write an ELF module and hooks needed to
generate host OS libraries and executables. At this point
we're self-hosting - we just need the C Chibi for bootstrapping.
The C generated library and Scheme generated library
are both usable directly from C programs. The VM is
still usable (crucial for non-x86 platforms until we add
new native backends).

6. (dream) Rewrite all the libc calls in Scheme, and
use to build an OS.

In parallel with these steps I'll continue to work on high-level
optimizations, which work directly with the AST and are
thus applicable to both the VM and the native compiler.

There are very broadly speaking 5 tiers of Scheme compiler
speeds:

* fast native compilers: ikarus, larceny, gambit, bigloo,
mit-scheme, stalin, racket
* slower native compilers: chicken
* fast VMs: gauche
* VMs: just about everything else including Chibi
* tree interpreters: TinyScheme, Scheme9

Yes, compared to other native compilers Chicken lags
noticeably (though it probably has the fastest call/cc).
Yes, compared to other VMs Gauche stands out from
the pack.

At 0.4 Chibi was lagging at the slow end of the VMs,
but I hope for it to shift to the high end soon. After
step 1 I hope for it to reach Chicken speeds, and after
step 2 it should more match the fast native compilers,
and with step 3 it should be better at "smart" optimizations.

--
Alex

ceving

unread,

Jul 14, 2011, 5:16:51 AM7/14/11

to chibi-scheme

On Jul 14, 7:17 am, Alex Shinn <alexsh...@gmail.com> wrote:

> 1. Simple native (x86) backend. With a compiler
> setting enable compiling directly to machine code
> instead of the VM (i.e. directly, not a JIT). The
> infrastructure for this is complete, and some simple
> examples work, so I hope to get this published soon.
> This is the basis for all future speed - it's just not
> worth spending that much time tuning the VM if we
> can go native. This will be a very naive compiler,
> with greedy register allocation and no peephole
> optimizations.

The VM has a big portability advantage, which will get lost. This
might be a drawback to use Scheme on smart phones or network
appliances.

If compilation is wished why not using LLVM? There is support for
different calling conventions, garbage collection and just in time
code generation. And it supports more than just Intel.

Alex Shinn

unread,

Jul 14, 2011, 5:21:48 AM7/14/11

to chibi-...@googlegroups.com

On Thu, Jul 14, 2011 at 6:16 PM, ceving <cev...@gmail.com> wrote:
> On Jul 14, 7:17 am, Alex Shinn <alexsh...@gmail.com> wrote:
>
>> 1. Simple native (x86) backend. With a compiler
>> setting enable compiling directly to machine code
>> instead of the VM (i.e. directly, not a JIT). The
>> infrastructure for this is complete, and some simple
>> examples work, so I hope to get this published soon.
>> This is the basis for all future speed - it's just not
>> worth spending that much time tuning the VM if we
>> can go native. This will be a very naive compiler,
>> with greedy register allocation and no peephole
>> optimizations.
>
> The VM has a big portability advantage, which will get lost. This
> might be a drawback to use Scheme on smart phones or network
> appliances.

The important part:

"With a compiler setting enable compiling directly to machine code"

That is, if you compile

make CPPFLAGS=-DSEXP_USE_NATIVE_X86

then you get the native compiler, otherwise you get the
VM. This is already how it works - vm.c is only compiled
in conditionally, and with the above setting you get x86.c
(which works for simple examples but isn't checked in yet).

In both cases the compiler (eval.c) is shared.

So the VM will always be supported.

--
Alex

John Cowan

unread,

Jul 14, 2011, 11:09:17 AM7/14/11

to chibi-...@googlegroups.com

Alex Shinn scripsit:

> 1. Simple native (x86) backend. With a compiler setting enable
> compiling directly to machine code instead of the VM (i.e.
> directly, not a JIT). The infrastructure for this is complete,
> and some simple examples work, so I hope to get this published
> soon. This is the basis for all future speed - it's just
> not worth spending that much time tuning the VM if we can go
> native. This will be a very naive compiler, with greedy register
> allocation and no peephole optimizations.

How does this work? Are definitions compiled as they are entered, or do
you compile everything available (that is not already compiled) when an
expression is evaluated?

> 2. Rewrite the x86 backend in Scheme. This will be a separate
> module, bootstrapped by the version above. This will make it
> easier to include low-level optimizations such as peephole
> optimizations and smart register allocation.

I don't understand your claim (snipped here) that this step will make
things much faster. The cost of compilation has to be added to the cost
of execution. Why would this step-2 compiler, whether compiled by the
step-1 compiler or by itself, be expected to run faster than the step-1
compiler, which is written in hand-optimized C? Certainly it'll be
easier to maintain, but I would expect that the slower compilation would
tend to outweigh the faster execution. This is why many JITs have two
compilers, a faster one that generates mediocre code, used normally, and a
slower one that generates good code, used only when it's apparent that the
mediocre code is executing too often.

--
We pledge allegiance to the penguin John Cowan
and to the intellectual property regime co...@ccil.org
for which he stands, one world under http://www.ccil.org/~cowan
Linux, with free music and open source
software for all. --Julian Dibbell on Brazil, edited

Alex Shinn

unread,

Jul 15, 2011, 2:47:35 AM7/15/11

to chibi-...@googlegroups.com

On Fri, Jul 15, 2011 at 12:09 AM, John Cowan <co...@mercury.ccil.org> wrote:
> Alex Shinn scripsit:
>
>> 1. Simple native (x86) backend. With a compiler setting enable
>> compiling directly to machine code instead of the VM (i.e.
>> directly, not a JIT). The infrastructure for this is complete,
>> and some simple examples work, so I hope to get this published
>> soon. This is the basis for all future speed - it's just
>> not worth spending that much time tuning the VM if we can go
>> native. This will be a very naive compiler, with greedy register
>> allocation and no peephole optimizations.
>
> How does this work? Are definitions compiled as they are entered, or do
> you compile everything available (that is not already compiled) when an
> expression is evaluated?

It works the same way the VM works - definitions are compiled
as they are entered.

I'll worry about whole-module optimizations later.

>> 2. Rewrite the x86 backend in Scheme. This will be a separate
>> module, bootstrapped by the version above. This will make it
>> easier to include low-level optimizations such as peephole
>> optimizations and smart register allocation.
>
> I don't understand your claim (snipped here) that this step will make
> things much faster. The cost of compilation has to be added to the cost
> of execution.

Indeed, it's likely to make compilation slower, but it
will make it easier to generate faster code. In particular,
the register allocation and peephole optimizations will
be written in Scheme.

--
Alex

Reply all

Reply to author

Forward