I would like to announce plc, a proof-of-concept Prolog compiler that
translates regular Prolog programs to standalone, native-code
executables. It is able to achieve significantly better performance
than conventional Prolog implementations, typically at least 4-5x
faster than Sicstus and 10-20x faster than SWI Prolog. It was
nonetheless a rather limited effort to develop.
plc currently supports a very reasonable set of Prolog features:
- Core features: predicates, variables and atoms; backtracking and unification
- Compound terms (functors)
- Syntactic sugar to parse/print lists to/from compound terms
- Unify (=) and failure-to-unify (\=)
- Integers, arithmetic (is, +, -) and relations (=:=, =\=, <, >, =<, >=)
- Negation as failure (not)
- Cut (!)
- Built-in predicates: true, fail, repeat, write, nl
There are three important aspects about the approach taken by plc:
1. plc is developed as a front-end for the OCaml compiler; it was
itself written in OCaml using the camlp4 macro system. This point is
pragmatic, but it is perhaps the most important enabler. plc can take
advantage of OCaml's existing high-performance native-code compiler
(which supports most common architectures), and since it translates
Prolog to a high-level functional language, a number of things come
"for free", most notably garbage collection and tail-call optimization.
Thanks to the integration with camlp4, semantics errors encountered by
the OCaml compiler can be reported using their original source code
positions.
2. plc encodes Prolog's goal solving process as a depth-first search
using continuation-passing style. This means that every predicate is
implemented as a higher-order function that calls a given argument
function with every encountered solution (in case there are no
solutions, the argument function is never called). As such, the
argument function represents a continuation for every encountered
solution, and backtracking occurs implicitly, when the continuation is
not called. A disjunction of two predicates is translated to the
sequence of first calling the first predicate's function and then the
second predicate's function, while a conjunction is translated to a
call to the first predicate's function with the second predicate's
function as a part of the continuation argument. To stop at the first
solution or to "cut" the consideration of backtracking alternatives, an
exception is raised.
3. plc precompiles different predicates versions according to the
possible state (open/closed respectively uninstantiated/instantiated
respectively free/ground respectively out/in) of each argument. So
there are in fact a number of higher-order functions that implement
different versions of the same predicate. [Mercury apparently employs a
very similar technique when it considers different "modes" for a
predicate.] These functions have different signatures and each version
is written with knowledge of each variable's state, so it can in turn
call the specific versions of its subgoals, according to the state of
the arguments. As such, variables are excluded as first-class values.
This is perhaps the most debatable design choice: while most programs
do not require first-class variables and benefit from this
optimization, some Prolog idioms rely on this feature (e.g.
open/difference lists).
DOWNLOAD / USAGE
plc requires OCaml 3.10.0 or higher (previous versions of OCaml will
not work for accidental reasons).
The plc code is available at the following URL. Use WebDAV or
Subversion to download all files at once.
http://ssel.vub.ac.be/svn-gen/bdefrain/plc/trunk/
A Makefile is included for Unix platforms. Building plc is done through
"make plc.opt". To get started, look at "demo.pl" and try "make demo"
(this will use the bytecode compiler). You can also look at
"nqueens.cpp" and try "make nqueens.opt" for an example that uses the
native-code compiler. To see the generated Ocaml code, execute "make
<basename>.output".
PERFORMANCE
To evaluate the performance of plc-compiled programs, I conducted a
benchmark with a typical Prolog version of nqueens (what else) on a
Debian/Linux (etch) machine with an Intel Pentium 4 (2.66GHz) processor
and 1G RAM. plc employed OCaml 3.10.0, the other contenders were:
- SWI-Prolog version 5.6.14 for i386 (from Debian unstable)
- SICStus 4.0.1 (x86-linux-glibc2.3)
- Mercury, version 0.13.1, configured for i686-pc-linux-gnu (default
low-level C backend)
I included Mercury after a suggestion by Markus Triska. Although the
Mercury language is neither a subset nor a superset of Prolog, it's
possible to convert most Prolog code to Mercury with minimal effort
(see "nqueens.cpp" and "nqueens_mercury.m" to see exactly how much
effort). Moreover, Mercury is a valid reference point since it seems to
apply similar optimizations (and more), and it is situated in the same
area of integrating ideas from logic programming and (statically-typed)
functional programming. (Note that I had no prior experience with
Mercury so I hope my adaptations are representative.)
Each approach was used to solve the nqueens problem of size 12, six
times in a row. Mercury and plc started with a compiled binary, while
swipl and sicstus started from the original source file. This hardly
has an influence however, since the compilation cost is very small
(about 0.25s for plc and 0.75s for Mercury to produce a binary that can
solve nqueens for any n), and swipl and sicstus compile faster anyway
(both report 0 msec), so it is doubtful they would have benefited from
loading some intermediate code file anyway. Naturally, all approaches
produce the same solutions in the same order, as required by the Prolog
semantics (I do not know if the Mercury semantics guarantees order, but
it was the same as Prolog's). Below are the averages and standard
deviation of the sample of 6 runs.
execution time speedup over swipl
avg stddev avg stddev
swipl 8457.46 177.32 1.00 0.02
sicstus 1368.21 42.95 6.19 0.19
mercury 270.76 25.81 31.49 3.19
plc 281.45 0.78 30.05 0.08
The results show that plc and Mercury clearly outperform SWI Prolog and
Sicstus for this problem. The comparison between plc and Mercury is
interesting: although Mercury is 11s faster on average, its performance
is less consistent, as indicated by the high stddev. (In fact, the
Mercury times are negatively skewed, and plc is a little faster when
the medians are compared instead of averages.) It is unclear where this
variation comes from. The Mercury binary also seems relatively large
for such a small program (1.8M versus 104K for plc, both after
strip-ping), perhaps this is related.
Despite these issues, the Mercury approach seems promising, especially
because of the availability of other backends (which I did not try).
The main difference with (current) plc is the static typing of terms,
whereas plc only employs one general term type to match Prolog
semantics (i.e. the goal "foo is 1+1" is valid, but always fails).
Consequently, plc intends to run Prolog programs without modification,
whereas Mercury is essentially a new language. Nevertheless, applying a
typing discipline such as Mercury's might be beneficial for more than
just performance. In absence of that, plc seems to provide some
interesting middle ground.
Kind regards,
Bruno De Fraine
> execution time speedup over swipl
> avg stddev avg stddev
> swipl 8457.46 177.32 1.00 0.02
> sicstus 1368.21 42.95 6.19 0.19
> mercury 270.76 25.81 31.49 3.19
> plc 281.45 0.78 30.05 0.08
Plc seems a very interesting approach, and I will certainly have a closer look
at it. I had a quick look (and a short experiment) with your nqueens
benchmark and have the following remarks:
1) noattack is a leaf-predicate (and single clause); I would't be surprised
that Mercury inlines it, and plc perhaps does the same; Prolog compilers
like SWI and SICStus typically don't (one reason being that one is allowed
to interactively load a different definition of noattack - keeping this consistent
is possible and not difficult, but beyond what small teams usually want to go into)
2) nodiag benefits from second-argument indexing; again, typically, Prolog systems
only do first argument indexing, while Mercury does "all"-argument indexing; and
I assume plc does too, because it would otherwise be difficult to approach the speed
of Mercury
Now it certainly is a good point for plc if it does inlining and all-argument indexing;
but if one wants to compare implementation technologies, it would be good if you could
factor those optimisations out - I would certainly require that if I were a referee
of a forthcoming paper on plc :-)
>
> The results show that plc and Mercury clearly outperform SWI Prolog and
> Sicstus for this problem. The comparison between plc and Mercury is
> interesting: although Mercury is 11s faster on average, its performance
> is less consistent, as indicated by the high stddev. (In fact, the
> Mercury times are negatively skewed, and plc is a little faster when the
> medians are compared instead of averages.) It is unclear where this
> variation comes from.
You have installed Mercury with the Boehm-collector I presume ? If that's the case,
it might be (part of) the explanation of the variance in timings.
Also, if it is the case, try installing it with the native Mercury collector (which
might not work at the moment, but it is not needed for nqueens anyway).
I am not saying this is the "right" configuration to do the bencmarking in, but it
might also give some insight in underlying technology issues.
> The Mercury binary also seems relatively large for
> such a small program (1.8M versus 104K for plc, both after strip-ping),
> perhaps this is related.
It is a known problem with Mercury.
Two more things: try comparing with the fastest (traditional) Prolog compilers, e.g.
include Yap in your suite and forget about SWI (the only reason to compare to SWI is that
you are guaranteed to be able to show big speedups - but SWI was not build for speed).
And try finding the compiling-to-C implementation of Ciao: it has also reported performance
close to Mercury's.
Finally, you seem located some 20km away from us - can we entice you to come over and give
a seminar about it ?
Cheers
Bart Demoen
> Plc seems a very interesting approach, and I will certainly have a closer look
> at it.
Thanks.
> I had a quick look (and a short experiment) with your nqueens
> benchmark and have the following remarks:
>
> 1) noattack is a leaf-predicate (and single clause); I would't be surprised
> that Mercury inlines it, and plc perhaps does the same; Prolog compilers
> like SWI and SICStus typically don't (one reason being that one is allowed
> to interactively load a different definition of noattack - keeping this
> consistent
> is possible and not difficult, but beyond what small teams usually want
> to go into)
plc itself does only inline the builtin predicates. However, the
underlying OCaml compiler is known to inline small functions defined in
the same module. This is another case where plc could benefit from
OCaml.
Certain dynamic (reflection) features of Prolog are certainly
complicated by a compiled approach such as plc, yes. This does not mean
plc is excluded from interactive use; for example, I use plc-compiled
programs from the OCaml toplevel, and it is probably possible to build
a Prolog toplevel on top of it. But these things require further
investigation.
> 2) nodiag benefits from second-argument indexing; again, typically,
> Prolog systems
> only do first argument indexing, while Mercury does "all"-argument
> indexing; and
> I assume plc does too, because it would otherwise be difficult to
> approach the speed
> of Mercury
Why only first argument? This seems rather arbitrary if this means that
the "likes" relation defined as:
likes(a,X) :- sweet(X).
likes(b,X) :- pretty(X).
is more efficient than defined as:
islikedby(X,a) :- sweet(X).
islikedby(X,b) :- pretty(X).
Anyway, in plc the head "foo(a,b)" becomes the parallel pattern matching:
match (_arg0, _arg1) with
| (A,B) -> ...body...
| _ -> ()
This can be fast, since patterns such as (A,B) are static, and the
OCaml compiler can exploit the knowledge of the concrete thing it must
match against to produce efficient code. [As an aside, the use of
pattern matching is one of the only two conscious optimizations in plc
(the first one is the usage of continuation-passing style instead of
e.g. lazy lists to generate solutions).]
There is still room for improvement though: if a predicate has two
exclusive rules, e.g. one with head "foo(a)" and the other with head
"foo(b)", then this now becomes the sequence:
(match _arg0 with
| A -> ...body1...
| _ -> ();
match _arg0 with
| B -> ...body2...
| _ -> ())
If plc were to detect that the rules are exclusive, it could generate a
single match:
match _arg0 with
| A -> ...body1...
| B -> ...body2...
| _ -> ()
I suspect this will produce even more efficient code: the OCaml can try
to minimize the number of cmp's and jmp's based on the representation
it chose for A, B, etc.
> Now it certainly is a good point for plc if it does inlining and
> all-argument indexing;
> but if one wants to compare implementation technologies, it would be
> good if you could
> factor those optimisations out - I would certainly require that if I
> were a referee
> of a forthcoming paper on plc :-)
Inlining by the OCaml compiler can be turned off with the "-inline 0"
flag. (However, it does not seem to make a significant difference, see
below.) There is no option to make pattern matching slow. :-)
>> The results show that plc and Mercury clearly outperform SWI Prolog and
>> Sicstus for this problem. The comparison between plc and Mercury is
>> interesting: although Mercury is 11s faster on average, its performance
>> is less consistent, as indicated by the high stddev. (In fact, the
>> Mercury times are negatively skewed, and plc is a little faster when
>> the medians are compared instead of averages.) It is unclear where this
>> variation comes from.
>
> You have installed Mercury with the Boehm-collector I presume ? If
> that's the case,
> it might be (part of) the explanation of the variance in timings.
> Also, if it is the case, try installing it with the native Mercury
> collector (which
> might not work at the moment, but it is not needed for nqueens anyway).
> I am not saying this is the "right" configuration to do the bencmarking
> in, but it
> might also give some insight in underlying technology issues.
I installed a default Mercury (i.e. a source distribution and
./configure --prefix=$HOME; make; make install). Do you have some
pointers on how to enable the things you talk about? Are they command
line options or configure flags?
>> The Mercury binary also seems relatively large for such a small program
>> (1.8M versus 104K for plc, both after strip-ping), perhaps this is
>> related.
>
> It is a known problem with Mercury.
>
> Two more things: try comparing with the fastest (traditional) Prolog
> compilers, e.g.
> include Yap in your suite and forget about SWI (the only reason to
> compare to SWI is that
> you are guaranteed to be able to show big speedups - but SWI was not
> build for speed).
> And try finding the compiling-to-C implementation of Ciao: it has also
> reported performance
> close to Mercury's.
This is an updated table that includes YAP version Yap-5.1.1, and
plc/ocamlopt with -inline 0 and -inline 100 (the default is -inline 1):
execution time speedup over swipl
avg stddev avg stddev
swipl 8457.46 177.32 1.00 0.02
sicstus 1368.21 42.95 6.19 0.19
yap 715.81 7.46 11.82 0.12
mercury 270.76 25.81 31.49 3.19
plc 281.45 0.78 30.05 0.08
plc0 281.25 0.99
plc100 281.10 0.70
Bye,
Bruno
>> You have installed Mercury with the Boehm-collector I presume ? If that's
>> the case,
>> it might be (part of) the explanation of the variance in timings.
>> Also, if it is the case, try installing it with the native Mercury
>> collector (which
>> might not work at the moment, but it is not needed for nqueens anyway).
>> I am not saying this is the "right" configuration to do the bencmarking in,
>> but it
>> might also give some insight in underlying technology issues.
>
> I installed a default Mercury (i.e. a source distribution and ./configure
> --prefix=$HOME; make; make install). Do you have some pointers on how to
> enable the things you talk about? Are they command line options or configure
> flags?
Configuration flags.
The native collector has not (yet) been completed for the low-level C
backend. IIRC, in Mercury 0.13.x it's not even in a runnable state.
Rather than attempting to use the native collector, I suggest installing one
of the no-gc grades, e.g. asm_fast, and using that.
The no-gc grades are enabled by the configuration flag `--enable-nogc-grades'.
The agc grades, i.e. the native collector are enabled by the configuration
flag `--enable-agc-grades'. You can choose exactly which grades to install
by using the flag `--enable-libgrades'. See configure --help for details.
(This last one might save you a bit of time.)
Cheers,
Julien.