Try the native compilers: Ikarus, GNU/MIT Scheme, and Larceny.
Or one of the Scheme-to-C compilers: Bigloo, Chicken, and Gambit.
Note also that to get the best speed out of MzScheme, remember
to use modules in order to give the JIT-compiler a fair chance.
--
Jens Axel Søgaard
Guile and scm are two of the slower implementations of Scheme.
You might enjoy reading a defense of scm's speed by its author
[1].
MzScheme is slow when interpreted, but is reasonably fast when
JIT-compiled on PowerPC and x86 machines; you won't see that
speed until you incant (require mzscheme), however. Even then,
it won't be as fast overall as openmcl.
Chez Scheme is definitely in the same league as openmcl, and
may actually be faster, but it's a commercial product you'd have
to purchase. Ikarus (x86) and native Larceny (x86 and SPARC)
offer similar performance and convenience for other processors,
but not for the PowerPC.
If you're limited to freeware and the PowerPC, then I would
recommend the following systems: Gambit, Bigloo, MzScheme,
Petit Larceny, Chicken, Petite Chez, Scheme 48. Marc Feeley
and I have benchmarked those systems on x86 processors,
and the results may give you some idea of the specific things
they do well or badly [2,3].
Stalin earned its reputation for being the fastest Scheme-to-C
compiler, but I cannot recommend it to someone new to Scheme
because it's too hard to install and to use and because it doesn't
conform to any of the current de facto standards for Scheme.
Will
[1] http://www-swiss.ai.mit.edu/~jaffer/CNS/benchmarks
[2] http://www.ccs.neu.edu/home/will/Twobit/benchmarks2007.html
[3] http://www.iro.umontreal.ca/~gambit/bench.html
MzScheme supports compilation to C (see the mzc utility) but some of
the other Scheme->C compilers do a better job. MzScheme is best used
with the JIT compiler where possible.
George
--
for email reply remove "/" from address
If you are interested in speed for numerical type computations then the
best bet, in my opinion, is Bigloo scheme. This is the only scheme
implementation that supports optional typing that I am aware of. You can
really boost up the speed of code with Bigloo when you make such
declarations. In my experience this is true of Common lisp
implementations as well, i.e., to get within the realm of C speed you need
to add type declarations. I quite like Bigloo's type annotation syntax as
I find it less cumbersome than common lisp's. Another nice feature of
Bigloo is that it has a pretty good FFI to C making linking to numerical
code in C fairly trivial.
If you aren't interested in adding typing and want to go fully dynamic
then gambit scheme is pretty nice (see the benchmarks linked in a previous
post in this thread). I've played around with it a bit and might do so
more as it has some nice threading capability.
Note: Bigloo has some warts of its own and there are many excellent scheme
implementations out there, as has already been mentioned, but if speed is
want you need then you can't beat it.
Jason
Bigloo is a good candidate. Used it for my phd in physics (post-,
preprocessing and steering radiative transfer codes). I made also all
my plots on my Macintosh in Bigloo by means of DISLIN. I wrote a
binding to dislin. If you are interested in drop me a note. Problem: I
hope I will find some old backups of my binding of Linux and Mac OSX
because someone broke into my flat and stole my ibook with all my
data. I recently bought a new Mac Book but haven't had time to figure
things out.
The important thing for me: Bigloo works as well on my Mac OSX and
Linux and Sun OS. It is a very stable compiler (some of my
calculations took many days on computer time and Bigloo never let me
in the dark).
Worth to note: you must definitely browse through the manual. Giving
types in Bigloo is really easy and very straightforward. C integration
is sometimes straightfoward and sometimes not. I failed to start out
writing my own binding to the gnu scientific library because my C
knowledge is weak and never figured out this t_size (as far as I can
remember).
Stalin is also often mentioned related to speed. However, my Stalin
experience dates back some couple of years. I was never impressed
because at this time a typical Stalin compile time cycle was around
over 30 seconds. That is not big a problem when your calculation will
take some hours but it is unbearable when you develop small pieces of
code.
Regards,
Sprecher des Vatikan
btw: I am now post-doc, though, have to work with Fortran 90 and idl.
But when they interviewed me I honestly said: "never had to do
anything big with Fortran and IDL". Their reaction was: 'if you can
handle Scheme it should be no problem to get aquainted with Fortran as
well'. Sure, learned some basic Fortran in the last couple of months,
but man, how do I miss my Scheme. Programming in idl and fortran is
programming with tied hands: horrible.
> Stalin earned its reputation for being the fastest Scheme-to-C
> compiler, but I cannot recommend it to someone new to Scheme
> because it's too hard to install and to use and because it doesn't
> conform to any of the current de facto standards for Scheme.
It is certainly the case that stalin is fast. It is the only
Scheme that can seriously compete in the arena of scientific
(i.e., numeric) computing.
But stalin is not (at least for me) hard to install:
$ apt-get install stalin
$ cat > hello.sc
(display "Hello world!")
(newline)
$ time stalin -On hello.sc
real 0m5.002s
user 0m4.816s
sys 0m0.148s
$ ./hello
Hello world!
$ strip hello
$ ls -l hello
-rwxrwxr-x 1 barak barak 27616 2007-12-10 21:57 hello
As to standards, since stalin is the only scheme game in town for
scientific computing, if you want to do scientific computing and
write in scheme then stalin *is* the de facto standard!
--
Barak A. Pearlmutter
Hamilton Institute & Dept Comp Sci, NUI Maynooth, Co. Kildare, Ireland
http://www.bcl.hamilton.ie/~barak/
> William D Clinger wrote:
>
>> Stalin earned its reputation for being the fastest Scheme-to-C
>> compiler, but I cannot recommend it to someone new to Scheme
>> because it's too hard to install and to use and because it doesn't
>> conform to any of the current de facto standards for Scheme.
>
> It is certainly the case that stalin is fast. It is the only
> Scheme that can seriously compete in the arena of scientific
> (i.e., numeric) computing.
Can you be more specific?
Joel
--
Joel J. Adamson
Biostatistician
Pediatric Psychopharmacology Research Unit
Massachusetts General Hospital
Boston, MA 02114
(617) 643-1432
(303) 880-3109
> "Barak A. Pearlmutter" <ba...@cs.nuim.ie> writes:
>
>> It is certainly the case that stalin is fast. It is the only
>> Scheme that can seriously compete in the arena of scientific
>> (i.e., numeric) computing.
>
> Can you be more specific?
In the hands of a skilled user, stalin is capable of numeric
(i.e., floating point) performance that rivals or exceeds C
or Fortran. One example I personally found quite compelling
was a numeric double integral, which was roughly 20x faster in
stalin than in tuned C or Fortran.
Whole-program analysis is the crucial ingredient for these kinds
of speedups, as the integration routine would typically be in one
file (a numeric library) while its invocation would be in another.
Because stalin is standards conformant (with R4RS), it is not
difficult to develop in an interactive environment and leave the
stalin compilation for final production runs. Which should be
intensive, in order to amortize the considerable expense of the
compilation itself.
This is the domain for which stalin was designed, and where it
excels.
My intent there was just to show that stalin's engine was
turning over, and to feel it rumble.
You don't use a Formula 1 race car to go three blocks and
buy some milk, even though you could. As a tool, stalin
is appropriate for big crunchy programs that will run for
a long time. It is the only compiler for a higher-order
language available today that can give Fortran a run for
its money. It is the only scheme you could write a physics
engine for a game in and hope to compete on performance
rather than just code clarity.
> My intent there was just to show that stalin's engine was
> turning over, and to feel it rumble.
Funny :-)
I usually don't get any fuzzy feeling about software that
turns over and rumbles, but, I guess, different people
different customs ;)
I laughed at that, too, but there's a good reason
for it: Stalin compiles to a standalone executable,
and most of the compile time was actually spent in
the linker.
As I said, MzScheme appears to be the only freeware
that provides incremental native-code compilation
(as in openmcl, which the original poster preferred)
for Scheme on the PowerPC.
On my 800 MHz PowerBook G4, using Petit Larceny, it
takes about 0.875s real (0.418s cpu) to compile the
hello program to C and then to native code; most of
that time is spent in the C compiler, of course. The
resulting .fasl and .so files occupy 9957 bytes on disk.
Although the original poster probably isn't concerned
about licensing, I'll mention it anyway since it has
come up with Ikarus. Stalin is GPL'ed. Gambit is
licensed under LGPL or Apache licenses. Bigloo is
partly GPL and partly Library GPL. MzScheme is LGPL,
Chicken BSD. Petit Larceny is not encumbered by GPL
or LGPL or BSD or Apache licenses.
As Barak and I both said in our different ways, the
advantage of Stalin is that it generates fast code
for numerical applications that don't need Scheme's
entire numerical tower or macros or various other
things.
Will
The C compile times in this case are negligible. You must
have made a mistake.
cheers,
felix
-JG
Quite possibly, and it would hardly be the
first time either.
I said nothing about the C compile times
in this case, however, except to imply they
were less than the link time.
Will
On the other hand, I did say that most of
(what Aziz took to be) the compile time was
spent in the linker, and that isn't true.
Aziz was right. Most of the time is spent
in Stalin's compiler. Presumably the time
is spent optimizing Scheme's standard io
libraries to specialize them for the particular
needs of the hello-world program.
My apologies to Aziz and Felix!
Will
> In the hands of a skilled user, stalin is capable of numeric
> (i.e., floating point) performance that rivals or exceeds C
> or Fortran. One example I personally found quite compelling
> was a numeric double integral, which was roughly 20x faster in
> stalin than in tuned C or Fortran.
I'd like to see that code.
Most cases of such claims of "being faster than hand-tuned C" turn out
to be apocryphal or based on very poorly written or artificially
constrained C/C++ (e.g. disallow use of real-world constructs such as
pointer-aliasing optimizations, no template meta-programming, no 3rd
party frameworks or libraries, and no use of profile-guided
optimizations).
It would be very interesting to see the evidence of this amazing claim
that Stalin is "roughly 20x faster" than C and Fortran. Comparative
source code or it isn't true!
Of course, the Scheme code is most likely far easier to understand and
far less prone to defects than the hand-tuned C/C++ code.
> > In the hands of a skilled user, stalin is capable of numeric
> > (i.e., floating point) performance that rivals or exceeds C
> > or Fortran. One example I personally found quite compelling
> > was a numeric double integral, which was roughly 20x faster in
> > stalin than in tuned C or Fortran.
>
> I'd like to see that code.
>
> Most cases of such claims of "being faster than hand-tuned C" turn out
> to be apocryphal or based on very poorly written or artificially
> constrained C/C++ (e.g. disallow use of real-world constructs such as
> pointer-aliasing optimizations, no template meta-programming, no 3rd
> party frameworks or libraries, and no use of profile-guided
> optimizations).
Barak's claim, however, is fairly plausible.
Think about it: a double integral. He's talking
about a general purpose library procedure, highly
tuned, that performs numerical integration. You'd
pass it a procedure in Scheme, or a function pointer
in C, and the two limits of integration. For a
double integral the library routine gets called
twice, something like this:
(integrate (lambda (x) (integrate sin 0 x))
0 y)
Translation of that call into efficient C is left as
an exercise. Note, however, that you can't change
the code for integrate; that's a fixed library routine.
Stalin optimizes the whole program. That includes
specialization of the library routines, effectively
unrolling the inner call to integrate into the outer
call to integrate. That eliminates indirect calls
to sin, and it *also* eliminates indirect calls to
integrate from integrate itself. You end up with
two nested loops that contain a lot of numerical
code, with no procedure calls. That's what C
compilers know how to optimize, so Stalin feeds
that code to the C compiler, which turns it into
efficient machine code.
C compilers don't know how to optimize across library
boundaries, however. If you write your double integral
in C, the inner call to integrate will spend a lot more
of its time in procedure call overhead (calling sin)
than in crunching numbers. The outer call to integrate
won't know it is calling itself, either, and won't be able
to take advantage of that fact.
That's why I would expect a double integral written in
Scheme and compiled by Stalin to run faster than the
equivalent code written in C. I'm a little surprised that
it's 20 times as fast, but I'm not at all surprised that it's
faster.
> Of course, the Scheme code is most likely far easier to understand and
> far less prone to defects than the hand-tuned C/C++ code.
That too.
Will
That claim was substantiated over ten years ago in this very
newsgroup. See:
and:
It appears that such scepticism arises repeatedly despite the fact
that these
results have been made public for a very long time. So I have decided
to put
these up on the web.
ftp://ftp.ecn.purdue.edu/qobi/integ.tgz
integ.sc This is the straighforward version in Scheme.
integ2.sc This is a version that has a large amount of code
duplication. Note
that this is not code specialization.
integrate-1D{a,b[1-9]} are
identical to each other and to integrate-1D from integ.sc
except
for parameter duplication. Similarly for integerate-2D and zark.
However, this code duplication facilitates fully automatic
specialization. It is not expected that a user write code
like this. But straighforward extensions of the
polyvariance
mechanisms that already exist in Stalin could produce such
code duplication automatically from integ.sc.
integ3.sc This is a version that has a small amount of manual code
duplication. Again note that this is not code
specialization.
integrate-1D{a,} are identical to each other and to
integrate-1D
from integ.sc.
integ-c.c This version is written in GNU C and uses the GNU C
extension of
internal function definitions to implement nonescaping
closures.
integ2-c.c This version does not use the GNU C extension of internal
function
definitions and implements closures by way of global
variables.
integ-f.f This version implements closures by way of a common block.
But I
don't know how to store function pointers in a common block
and
invoke such function pointers so this version hardwires the call
to
zarkon in inner. Thus it doesn't implement the full
generality of
either the Scheme or the C versions.
Richard O'Keefe wrote integ.sc (along with Pascal and Ada versions)
and sent
them to me in July 1996. I wrote integ2.sc in July 1996. I don't
recall when I
wrote integ3.sc but the mtime for the file is 31 March 1998. Richard
wrote a C
version and told me about it in July 1996. I don't recall at this
point
whether he or I wrote integ-c.c and integ2-c.c and whether these are
the
versions that he told me about in July 1996. But the mtimes for these
files are
30 March 1998. I wrote integ-f.f today (13 December 2007).
Here are results using Stalin 0.11, GCC 4.1.2, and G77 3.4.6, (the
current
versions on Debian Etch), under Debian Etch on a Shuttle SN25P with an
FX-60
and 2G RAM (I believe with 3-3-3-8 timings) with the following options
stalin: -d -On -db -no-clone-size-limit -split-even-if-no-
widening
-copt -O2 -copt -fomit-frame-pointer
gcc & g77: -O2 -fomit-frame-pointer
as produced by the run script:
qobi@tlamachilistli>./run
Thu Dec 13 13:55:27 EST 2007
Stalin Version
0.11
GCC Version
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c+
+,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-
system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-
threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit
--enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --with-
tune=i686 --enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
G77 Version
Reading specs from /usr/lib/gcc/i486-linux-gnu/3.4.6/specs
Configured with: ../src/configure -v --enable-languages=c,c+
+,f77,pascal --prefix=/usr --libexecdir=/usr/lib --with-gxx-include-
dir=/usr/include/c++/3.4 --enable-shared --with-system-zlib --enable-
nls --without-included-gettext --program-suffix=-3.4 --enable-
__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --with-
tune=i686 i486-linux-gnu
Thread model: posix
gcc version 3.4.6 (Debian 3.4.6-5)
Stalin integ
0.0
0.848u 0.000s 0:00.84 100.0% 0+0k 0+0io 0pf+0w
Stalin integ2
0.0
0.060u 0.000s 0:00.06 100.0% 0+0k 0+0io 0pf+0w
Stalin integ3
0.0
0.328u 0.004s 0:00.32 100.0% 0+0k 0+0io 0pf+0w
GCC integ-c
0.000000
2.224u 0.000s 0:02.23 99.5% 0+0k 0+0io 0pf+0w
GCC integ2-c
0.000000
0.412u 0.000s 0:00.41 100.0% 0+0k 0+0io 0pf+0w
G77 integ-f
0.
0.432u 0.000s 0:00.42 102.3% 0+0k 0+0io 0pf+0w
22.761u 0.224s 0:23.01 99.8% 0+0k 0+0io 0pf+0w
qobi@tlamachilistli>
Note that integ (the straightforward Scheme version compiled with
Stalin) is
2.6 times faster than integ-c (the C version using internal function
definitions). Also note that integ3 (the Scheme version with only a
small
amount of manual code duplication) is 6.7 times faster than integ-c,
1.24 times
faster than integ2-c (the C version without internal function
definitions),
and 1.3 times faster than integ-f (the Fortran version). Finally note
that
integ2 (the Scheme version with a large amount of manual code
duplication) is
37.1 times faster than integ-c, 6.87 times faster than integ2-c, and
7.2 times
faster than integ-f.
-------------------------------------------------------------------------------
This entire benchmark exercise was initiated by Richard O'Keefe in
correspondence with me in July 1996. Here are relevant excerpts from
that
correspondence:
I wanted to compare the performance of
- integration using nested functions and lambda expressions in
Scheme
- the same thing (or as close as possible) in Pascal
- the same thing, but using generic procedures, in Ada
because I was of the opinion that Ada generics are not as clean as
lambdas
and I wanted to see what speed advantage there was, if any.
Imagine my surprise to find Stalin coming out first again!
% gnatmake -gnatp integ -cargs -O4
[This is the subtype Real is Float; version]
% time integ
2.61993E+13
45.0u 0.0s 1:08 66% 0+0k 0+0io 0pf+0w
% gnatmake -gnatp integ -cargs -O4
[This is the subtype Real is Long_Float; version]
0.00000000000000E+00
18 0.0s 0:21 85% 0+0k 0+0io 0pf+0w
[Yep, double is a fair bit faster than single!]
% pc -s -native -O4 integ.pas
% time a.out
0.00000000000000e+00
18.0u 0.0s 0:22 80% 0+0k 0+0io 0pf+0w
[Pascal and Ada versions very close]
% stalin -copt "-O4" -d -On -dc integ
% time integ
0.0
6.0u 0.0s 0:10 57% 0+0k 0+0io 0pf+0w
[That's right; Scheme was fastest]
The interesting thing about this is that
Scheme-> C C
gcc -O4 < Sun C < gcc -O4
6 sec 13 sec 20 sec
[...] The evidence is that Stalin is
able to generate *better* code (even going through C as an
intermediate
language, and C with a lot of temporary variables that need optimising
away too) than the Ada, Pascal, or C compilers produced by high-
powered
multi-expert groups.
-------------------------------------------------------------------------------
I summarized that benchmarking exercise in August 1996 with the
following:
[...] Richard O'Keefe participated in a thread on comp.lang.ada
arguing
about Ada generics. He argued that lambda experessions are a much
better
mechanism for abstration. To prove his point he wrote a double
integration
benchmark using a simple quadrature rule. It used a higher order
procedure to
implement the integrator. He wrote this benchmark in Scheme, C,
Pascal, and
Ada. And he ran them on the same machine (an UltraSPARC). Here are the
results (in CPU seconds):
C(gcc) | Ada | Pascal | C(Sun) | Scheme | Scheme(with inlining)
-------------------------------------------------------------------
20.0 | 18.0 | 18.0 | 13.0 | 5.0 | 0.9
All versions use double precision.
The C(gcc) version was compiled with:
% gcc -O4 integ.c
The Ada version was compiled with:
% gnatmake -gnatp integ -cargs -O4
The Pascal version was compiled with:
% pc -s -native -O4 integ.pas
The C(Sun) version was compiled with:
% cc -xinline=Int1,Int2,Zarkon -fsimple -native -fast -xO4 integ.c
The Scheme version was compiled with:
% stalin -cc cc -copt "-native -xO4" -d -On integ
The Scheme(with inlining) version was compiled with:
% stalin -cc cc -copt "-fsimple -native -fast -xO4 -v" -d -On integ2
[...] you should bear in mind that
(a) Stalin generates C code
(b) The C code generated by stalin ran between 2.5 and 14 times faster
than
handwritten C code run through the very same compiler with the
very same
options.
-------------------------------------------------------------------------------
Note that this benchmark was not written by me. It thus constitutes
independent
substantiation of the claim. Such independent substantiation has
occurred a
number of times. For example, see the Ray Tracer example written by
Jon Harrop:
http://www.ffconsultancy.com/languages/ray_tracer/index.html
For something more recent see:
http://www.bcl.hamilton.ie/~qobi/manuscript2008/
This site benchmarks a new compiler under development, called
Stalingrad, not
Stalin. The goal of Stalingrad is to incorporate automatic
differentiation
(AD). For these AD benchmarks, Stalingrad outperforms every other
implementation of Scheme (including Stalin), as well as every other
implementation of ML and Haskell, by one to three orders of magnitude.
It is
6 to 32 times faster than a standard C++-based AD system and the same
speed as
the state of the art standard Fortran-based AD systems.
> Think about it: a double integral. He's talking
> about a general purpose library procedure, highly
> tuned, that performs numerical integration. You'd
> pass it a procedure in Scheme, or a function pointer
> in C, and the two limits of integration.
Or, in C++, it could be templated on a class having the function as a
member; then the compiler will, pardon my French, defunctorize the
result and thus specialize the integral routine for that function.
> C compilers don't know how to optimize across library
> boundaries, however.
Not entirely true. If the compiler squirrels away a serialized
representation of its IR along with the object code, then when invoked
to do linking it can perform whole-program optimizations.
Such things have been known in commercial C compilers for some time;
more recently the LLVM project has been working in this area, and a
separate effort is known to exist for GCC.
--
(let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map
((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l)))))) (lambda
(f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l))
(C k))))))) '((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)))))
Right. Barak's claim compared Scheme to C and Fortran,
however, not to C++. Ada generics are pretty similar to
C++ templates, of course, so Siskind's data for Ada are
relevant here.
> Not entirely true. If the compiler squirrels away a serialized
> representation of its IR along with the object code, then when invoked
> to do linking it can perform whole-program optimizations.
>
> Such things have been known in commercial C compilers for some time;
> more recently the LLVM project has been working in this area, and a
> separate effort is known to exist for GCC.
I stand corrected. Should I count the time spent waiting for
this to be implemented in gcc as part of the compile time?
Will
Only if the competition was specific to GCC, rather than just "C". I
remember using a MIPS toolchain back in the late 80s/early 90s that had a
mode where it would emit some sort of half-compiled tree representation,
rather than object code, and the "linker" would then do whole-program
optimization. I never used it, because it was (a) slow, and (b) I tended
to compile all of my programs in a "whole program" way anyway, they
didn't depend much on separately compiled libraries.
It would be interesting to see how well a modern Java implementation
managed it, too. One of the things that current versions of HotSpot are
supposed to do is inlining and specialization.
Cheers,
--
Andrew
I'm not so sure that C++ templates can match the performance of whole-
program
interprocedural polyvariant flow analysis. Take a look at the
Stalingrad and
FADBAD++ examples on:
http://www.bcl.hamilton.ie/~qobi/manuscript2008/
(It's not there yet because the host is down. Hopefully it will be
setup
tomorrow.) Now admittedly this is for a different set of benchmarks
(saddle
and particle, not integ). And this is for a different compiler
(Stalingrad,
not Stalin) for a different language (VLAD, not Scheme but Scheme-
like). Yet I
give this example as counter-evidence to the conjecture that C++
templates
could be as effective as whole-program interprocedural polyvariant
flow
analysis for procedure integration across higher-order function calls.
Like integ, both saddle and particle make heavy use of higher-order
functions.
The C++ version is written in a style that makes heavy use of
templates,
including for all higher-order function paramters. It is based on
FADBAD++, an
existing AD system listed on www.autodiff.org. Stalingrad performs
whole-program interprocedural polyvariant flow analysis (though while
its name
is similar to Stalin and Stalin also performs whole-program
interprocedural
polyvariant flow analysis, the techniques employed by the two
compilers differ
significantly and substantively in the details). Stalingrad, like
Stalin,
generates C code. The code generated by Stalingrad for saddle is 6
times
faster than the C++/FADBAD++ code and the code generated by Stalingrad
for
particle is 32 times faster than the C++/FADBAD++ code.
People made exactly that same unsubstantiated claim 10 years ago when
I first
posted these results.
Let me give some evidence against that claim. These examples are
written as
single files. Thus a compiler for which the compilation unit is the
file is
doing whole-program compilation. Gcc 4.1.2 is such a compiler. I have
tried my
best to get the gcc 4.1.2 to perform complete procedure integration
for these
examples. The *-inline.c versions have been modified so that *every* C
function (except for the internal function definitions), both the
function
declaration (i.e. prototype) and the function definition is declared
static
inline. The *-always-inline.c versions have been further modified so
that
*every* inline specifier is paired with __attribute__
((always_inline))
(except for zark in integ-c-always-inline.c which GCC complains is not
inlinable). You can examine the assembler output in *.s by compiling
with -S
instead of -o <file>. You will see that it never succeeds in inlining
everything. Furthermore, it never even approaches the performance of
the
Stalin-generated code for integ2.sc from my post earlier today.
Stalin integ2
0.0
0.060u 0.000s 0:00.06 100.0% 0+0k 0+0io 0pf+0w
qobi@ingqondo>gcc -o integ-c -O2 -fomit-frame-pointer integ-c.c
qobi@ingqondo>time ./integ-c
0.000000
2.940u 0.000s 0:02.94 100.0% 0+0k 0+0io 0pf+0w
qobi@ingqondo>gcc -o integ2-c -O2 -fomit-frame-pointer integ2-c.c
qobi@ingqondo>time ./integ2-c
0.000000
0.412u 0.000s 0:00.41 100.0% 0+0k 0+0io 0pf+0w
qobi@ingqondo>gcc -o integ-c-inline -O2 -fomit-frame-pointer integ-c-
inline.c
qobi@ingqondo>time ./integ-c-inline
0.000000
2.584u 0.000s 0:02.59 99.6% 0+0k 0+0io 0pf+0w
qobi@ingqondo>gcc -o integ2-c-inline -O2 -fomit-frame-pointer integ2-c-
inline.c
qobi@ingqondo>time ./integ2-c-inline
0.000000
0.300u 0.000s 0:00.30 100.0% 0+0k 0+0io 0pf+0w
qobi@ingqondo>gcc -o integ-c-inline -O3 -fomit-frame-pointer integ-c-
inline.c
qobi@ingqondo>time ./integ-c-inline
0.000000
5.176u 0.004s 0:05.19 99.6% 0+0k 0+0io 0pf+0w
qobi@ingqondo>gcc -o integ2-c-inline -O3 -fomit-frame-pointer integ2-c-
inline.c
qobi@ingqondo>time ./integ2-c-inline
0.000000
0.296u 0.000s 0:00.30 96.6% 0+0k 0+0io 0pf+0w
qobi@ingqondo>gcc -o integ-c-inline -O3 -fomit-frame-pointer -finline-
limit=100000 --param large-function-insns=1000000 --param inline-unit-
growth=1000000 integ-c-inline.c
qobi@ingqondo>time ./integ-c-inline
0.000000
5.228u 0.004s 0:05.24 99.6% 0+0k 0+0io 0pf+0w
qobi@ingqondo>gcc -o integ2-c-inline -O3 -fomit-frame-pointer -finline-
limit=100000 --param large-function-insns=1000000 --param inline-unit-
growth=1000000 integ2-c-inline.c
qobi@ingqondo>time ./integ2-c-inline
0.000000
0.296u 0.000s 0:00.29 100.0% 0+0k 0+0io 0pf+0w
qobi@ingqondo>gcc -o integ-c-always-inline -O3 -fomit-frame-pointer
integ-c-always-inline.c
qobi@ingqondo>time ./integ-c-always-inline
0.000000
4.812u 0.004s 0:04.82 99.7% 0+0k 0+0io 0pf+0w
qobi@ingqondo>gcc -o integ2-c-always-inline -O3 -fomit-frame-pointer
integ2-c-always-inline.c
qobi@ingqondo>time ./integ2-c-always-inline
0.000000
0.292u 0.000s 0:00.29 100.0% 0+0k 0+0io 0pf+0w
Now, of course, nothing in principle precludes a C compiler from
performing
the same kind of whole-program interprocedural polyvariant flow
analysis that
Stalin performs, and thus matching or exceeding the performance of
Stalin. It
is just that I am not aware of such a C compiler. If one exists, it is
likely
to be a research prototype, like Stalin, and not a well known and
heavily used
compiler.
> Jed Davis wrote:
>> Or, in C++, it could be templated on a class having the function as a
>> member; then the compiler will, pardon my French, defunctorize the
>> result and thus specialize the integral routine for that function.
>
> Right. Barak's claim compared Scheme to C and Fortran, however, not
> to C++. Ada generics are pretty similar to C++ templates, of
> course, so Siskind's data for Ada are relevant here.
Someone else in the thread had mentioned C++; and, since C++ not only
allows that kind of specialization, but allows the programmer to more
or less require it of the compiler, that seemed worth mentioning.
>> Not entirely true. If the compiler squirrels away a serialized
>> representation of its IR along with the object code, then when invoked
>> to do linking it can perform whole-program optimizations.
>>
>> Such things have been known in commercial C compilers for some time;
>> more recently the LLVM project has been working in this area, and a
>> separate effort is known to exist for GCC.
>
> I stand corrected. Should I count the time spent waiting for
> this to be implemented in gcc as part of the compile time?
Yes.
> I'm not so sure that C++ templates can match the performance
> of whole-program interprocedural polyvariant flow analysis.
> Take a look at the Stalingrad and FADBAD++ examples on:
>
> http://www.bcl.hamilton.ie/~qobi/manuscript2008/
>
> (It's not there yet because the host is down. Hopefully it
> will be setup tomorrow.) Now admittedly this is for a different
> set of benchmarks (saddle and particle, not integ). And this is
> for a different compiler (Stalingrad, not Stalin) for a different
> language (VLAD, not Scheme but Scheme-like). Yet I give this
> example as counter-evidence to the conjecture that C++ templates
> could be as effective as whole-program interprocedural polyvariant
> flow analysis for procedure integration across higher-order
> function calls.
The site is down, yes. Hopefully, Barak can post the numbers
comparing the performance of the 11 Scheme compilers on the
saddle and particle benchmarks. I could not run the benchmarks
myself last time I looked because I still could not get Stalin
to run anything on my Macbook.
Aziz,,,
Or you can post them if you have them and it's not too much
trouble. :-)
Aziz,,,
> On Dec 13, 4:24 pm, Jed Davis <j...@panix.com> wrote:
>> Or, in C++, it could be templated on a class having the function as a
>> member; then the compiler will, pardon my French, defunctorize the
>> result and thus specialize the integral routine for that function.
>
> I'm not so sure that C++ templates can match the performance of
> whole- program interprocedural polyvariant flow analysis.
I wouldn't be surprised; C++ templates are, essentially, just an
unusually ornate macro system. (My point was just that that kind of
specialization/inlining is not a terribly esoteric feat; I hadn't
meant to imply anything in particular about the Stalin benchmarks.)
--
extern"C"void putchar(int);template<char a,class D>struct C{template<class K>
void z(K k){putchar(a);k.z(D());}};template<char a,class D>C<a,D>c(D){return
C<a,D>();}struct N{template<class K>void z(K){}}n;int main(){c<'J'>(c<'d'>(c<
'D'>(c<'v'>(c<'s'>(n))))).z(c<'e'>(c<'\040'>(c<'a'>(c<'i'>(c<'\n'>(23))))));}
I am curious. What does the C++ template solution look like?
--
Jens Axel Søgaard
Benchmarks from 1996 may no longer be relevant in 2007 after more than
a decade of advances in compilers, multi-core CPUs and coding
practices. Time to revisit those benchmarks. This time, it would be
interesting to compare Stalin against the code rewritten in C++, and
compiled with a couple of popular compilers (gcc and Intel icc to name
two). I might try this over the New Year break (although my C++
skills are somewhat questionable). Stay tuned, but don't hold your
breath. Looks like Stalin still holds the performance crown until it
can be proved otherwise!
I did. See my post yesterday. I reran the Scheme and C code yesterday
with the
most recent Debian Etch compilers. And I wrote a Fortran variant
yesterday and
ran it with the most recent Debian Etch compiler.
I don't see how many of the things you mention (e.g. pointer-aliasing
optimizations, 3rd party frameworks or libraries, profile-guided
optimizations,
multi-core CPUs, and advances in coding practises) have any relevance
to the
benchmarks and issues at hand.
> This time, it would be
> interesting to compare Stalin against the code rewritten in C++,
Other than templates, I don't see how C++ could be different than C
for these
benchmarks. And see my post yesterday regarding a different pair of
benchmarks
comparing C++ templates to whole-program interprocedural polyvariant
flow
analysis.
> On Dec 13, 7:57 pm, Andrew Reilly <andrew-newsp...@areilly.bpc-
> users.org> wrote:
>> On Thu, 13 Dec 2007 14:47:26 -0800, William D Clinger wrote:
>> >> Not entirely true. If the compiler squirrels away a serialized
>> >> representation of its IR along with the object code, then when
>> >> invoked to do linking it can perform whole-program optimizations.
>>
>> >> Such things have been known in commercial C compilers for some time;
>> >> more recently the LLVM project has been working in this area, and a
>> >> separate effort is known to exist for GCC.
>>
>> Only if the competition was specific to GCC, rather than just "C". I
>> remember using a MIPS toolchain back in the late 80s/early 90s that had
>> a mode where it would emit some sort of half-compiled tree
>> representation, rather than object code, and the "linker" would then do
>> whole-program optimization. I never used it, because it was (a) slow,
>> and (b) I tended to compile all of my programs in a "whole program" way
>> anyway, they didn't depend much on separately compiled libraries.
>
> People made exactly that same unsubstantiated claim 10 years ago when I
> first posted these results.
What unsubstantiated claim? I wasn't saying that the MIPS C compiler
could do the *sort* of whole-program optimization that stalin does, or
even that it was in any way competitive. Just that I know for certain
that they had a mode of operation in which "object" files weren't fully
compiled, and a "linker" that did (or was supposed to) finish the job,
including some measure of inlining and constant propagation.
> Now, of course, nothing in principle precludes a C compiler from
> performing
> the same kind of whole-program interprocedural polyvariant flow analysis
> that
> Stalin performs, and thus matching or exceeding the performance of
> Stalin. It
> is just that I am not aware of such a C compiler. If one exists, it is
> likely
> to be a research prototype, like Stalin, and not a well known and
> heavily used
> compiler.
Well, I can't help on that front: I don't know what "polyvariant flow
anaysis" is.
Cheers,
--
Andrew
You find the BSD licence to be an encumbrance? Wow.