question about last call optimization

14 views
Skip to first unread message

blindsearch

unread,
Jul 26, 2005, 6:47:16 PM7/26/05
to
I have a question about last call optimization using euclids algorithm
as an example. Okay the prolog literature gives the procedure as:

gcd(A,0,A).
gcd(A,B,GCD) :- B > 0, C is A mod B, gcd(B,C,GCD).

Now will prolog "last call optimize" this. It is easy to see that when
prolog gets to the goal gcd(B,C,GCD) in the last clause that there are
no alternative clauses. I am assuming then that this procedure is "last
call optimized". Is this true?

What about if we switch the order of the clauses as such:

gcd(A,B,GCD) :- B > 0, C is A mod B, gcd(B,C,GCD).
gcd(A,0,A).

Now upon reaching the last goal in the first clause, prolog can not
determine that there are no alternative clauses. Right?

Okay finally, say we write:

gcd(A,0,GCD) :- !, GCD = A.
gcd(A,B,GCD) :- T is A mod B, gcd(B,T,GCD).

Now, this should be "last call optimized" right? So in terms of speed
of execution would this last example execute equivalently to the first?
What about to the second?

One more question. There is no big difference between the three
versions of gcd. The first two do leave behind some choicepoints
whereas the last does not leave any behind. So is there any point in
writing it the last way? We're not supposed to use cuts all the time
right? or is it a good habit to get into in the event that we run into
procedures that have the potential of leaving behind a lot of choice
points?

Okay wait, one more question, Am I analyzing this too much?

Neng-Fa Zhou

unread,
Jul 26, 2005, 8:58:32 PM7/26/05
to

"blindsearch" <dpa...@cs.umbc.edu> wrote in message
news:1122418036.8...@o13g2000cwo.googlegroups.com...

>I have a question about last call optimization using euclids algorithm
> as an example. Okay the prolog literature gives the procedure as:
>
> gcd(A,0,A).
> gcd(A,B,GCD) :- B > 0, C is A mod B, gcd(B,C,GCD).
>
> Now will prolog "last call optimize" this. It is easy to see that when
> prolog gets to the goal gcd(B,C,GCD) in the last clause that there are
> no alternative clauses. I am assuming then that this procedure is "last
> call optimized". Is this true?

It is true in most Prolog systems.

> What about if we switch the order of the clauses as such:
>
> gcd(A,B,GCD) :- B > 0, C is A mod B, gcd(B,C,GCD).
> gcd(A,0,A).
>
> Now upon reaching the last goal in the first clause, prolog can not
> determine that there are no alternative clauses. Right?

Because of the existence of a choice point, the last call cannot reuse the
frame of the caller in this predicate. But if you rewrite it into the
following:

gcd(A,B,GCD) :- B > 0, C is A mod B, gcd(B,C,GCD).

gcd(A,B,GCD):-B=:=0,GCD=A.

then you still have last-call optimization in some Prolog systems (e.g.,
B-Prolog).

> Okay finally, say we write:
>
> gcd(A,0,GCD) :- !, GCD = A.
> gcd(A,B,GCD) :- T is A mod B, gcd(B,T,GCD).
>
> Now, this should be "last call optimized" right? So in terms of speed
> of execution would this last example execute equivalently to the first?
> What about to the second?
>
> One more question. There is no big difference between the three
> versions of gcd. The first two do leave behind some choicepoints
> whereas the last does not leave any behind. So is there any point in
> writing it the last way? We're not supposed to use cuts all the time
> right? or is it a good habit to get into in the event that we run into
> procedures that have the potential of leaving behind a lot of choice
> points?

The difference should be very big. The version that is deterministic and
tail-recursive is the fastest and takes the least amount of memory.

You may want to add a version that uses if-then-else into your collectition.

--nf


Bart Demoen

unread,
Jul 27, 2005, 10:51:15 AM7/27/05
to
Neng-Fa Zhou wrote:

>
> The difference should be very big. The version that is deterministic and
> tail-recursive is the fastest and takes the least amount of memory.
>
> You may want to add a version that uses if-then-else into your collectition.

The following is not a response to what Neng-Fa wrote, but fits in this thread.
Here are some timings for different systems of the following predicates (very
similar to the gcd predicates, but with simpler arithmetic)

test1(0).
test1(N) :- N > 0, M is N - 1, test1(M).

test2(N) :- N > 0, !, M is N - 1, test2(M).
test2(0).

test3(N) :-
(N > 0 ->
M is N - 1,
test3(M)
;
true
).

The queries were:

?- test1(10000000).
?- test2(10000000).
?- test3(10000000).

The results - all on the same machine, and in millesecs:

| SICStus | SWI -O | Yap | GNU | XSB | hProlog| B-Prolog
| 3.12.0 | 5.5.23 |4.4.2|1.2.13| 2.5 | 2.4.11 | 4.0#3
------+---------+--------+-----+------+-----+--------+-----------
test1 | 700 | 3980 | 487 | 1707 | 1262| 396 | 1310
test2 | 750 | 3929 | 728 | 1535 | 1479| 327 | 670
test3 | 820 | 4231 |1094 | 1549 | 784| 133 | 1290


I have mentioned the release id for each system, because
I am sure I don't have always have the most recent one.

Te figures must be taken with some salt, but I am confident they are
accurate enough to conclude that the if-then-else version
is not the most efficient in every system: it is often the slowest.
Kind of unexpected.

Cheers

Bart Demoen

Mauro Di Nuzzo

unread,
Jul 27, 2005, 12:28:18 PM7/27/05
to
Hi! Even more unexpected (oh well, for me)... I tried the so-called soft
cut:

test(N) :- N > 0 *-> M is N-1, test(M); true.

Then (I am using SWI prolog), after some tries:

test(17000). takes 0.02 secs
test(18000). takes 1.80 secs!!!
test(20000). raised an OUT OF GLOBAL STACK exception!!!!!!

Did I never understand the action-mechanism of the SOFT-CUT *-> ???
Does it generate a so much looooong goal to be proved?
Thank you.
M

"Bart Demoen" <b...@cs.kuleuven.ac.be> ha scritto nel messaggio
news:11224758...@seven.kulnet.kuleuven.ac.be...

Joachim Schimpf

unread,
Jul 27, 2005, 12:54:37 PM7/27/05
to
Bart Demoen wrote:
>
> test1(0).
> test1(N) :- N > 0, M is N - 1, test1(M).
>
> test2(N) :- N > 0, !, M is N - 1, test2(M).
> test2(0).
>
> test3(N) :-
> (N > 0 ->
> M is N - 1,
> test3(M)
> ;
> true
> ).


I added a few other variants:

test4(0) :- !.
test4(N) :- N > 0, M is N - 1, test4(M).

test5(0) :- !.
test5(N) :- M is N - 1, test5(M).

test6(0).
test6(N) :- integer(N), N>0, M is N - 1, test6(M).


Results for ECLiPSe 5.8 (probably different machine than Bart's)
------+--------
test1 | 1320
test2 | 1320
test3 | 1510
test4 | 1330
test5 | 690
test6 | 530


--
Joachim Schimpf / phone: +44 20 7594 8187
IC-Parc / mailto:J.Sc...@imperial.ac.uk
Imperial College London / http://www.icparc.ic.ac.uk/eclipse

Neng-Fa Zhou

unread,
Jul 28, 2005, 1:31:24 AM7/28/05
to
Bart,

The version of B-Prolog you used is more than five years old. If I remember
it correctly, if-then-else is not inline expanded in version 4.0. Try the
latest version and you will get much better result.

Cheers,
Neng-Fa

"Bart Demoen" <b...@cs.kuleuven.ac.be> wrote in message
news:11224758...@seven.kulnet.kuleuven.ac.be...

Neng-Fa Zhou

unread,
Jul 28, 2005, 2:04:21 AM7/28/05
to
Here are the times needed to run the different versions on a Windows XP
machine:

BP 4.0 #3 BP 6.7 #3 Eclipse 5.8 #96

test1 687 860 1146
test2 359 359 1156
test3 625 750 1782

I was wrong. It seems that if-then-else was inline expanded in BProlog
version 4.0 for this particular case. But on Linux, BP 6.7 should outperform
4.0 because of threaded switch in the emulator.

Cheers,
Neng-Fa

"Neng-Fa Zhou" <nz...@acm.org> wrote in message
news:42e86da9$1...@news.unimelb.edu.au...

zu61

unread,
Jul 28, 2005, 3:08:39 AM7/28/05
to
Hello = 你們好
test0(N) :- N > 0 ,M = N-1, test(M); true.

test1(0).
test1(N) :- N > 0, M = N - 1, test1(M).

test2(N) :- N > 0, !, M = N - 1, test2(M).
test2(0).

test3(N) :- N > 0 , M = N - 1, test3(M); true.

test4(0) :- !.
test4(N) :- N > 0, M = N - 1, test4(M).

test5(0) :- !.
test5(N) :- M = N - 1, test5(M).

test6(0).
test6(N) :- N>0, M = N - 1, test6(M).

By Turbo prolog 20

N=0, N=1, N=2, N=3
test0(N) 8 , 13 , 18 , 23 step
test1(N) 4 , 10 , 16 , 22 step
test2(N) 6 , 11 , 16 , 21 step
test3(N) 8 , 13 , 18 , 23 step
test4(N) 4 , 10 , 16 , 22 step
test5(N) 4 , 09 , 14 , 19 step
test6(N) 4 , 10 , 16 , 22 step
all the number is step

Bye

zu61

unread,
Jul 28, 2005, 10:42:03 AM7/28/05
to
Hello = 你們好
test5 have cut, but test 7 have not

test0(N) :- N > 0 ,M = N-1, test(M); true.

test1(0).
test1(N) :- N > 0, M = N - 1, test1(M).

test2(N) :- N > 0, !, M = N - 1, test2(M).
test2(0).

test3(N) :- N > 0 , M = N - 1, test3(M); true.

test4(0) :- !.
test4(N) :- N > 0, M = N - 1, test4(M).

test5(0) :- !.
test5(N) :- M = N - 1, test5(M).

test6(0).
test6(N) :- N>0, M = N - 1, test6(M).

test7(0). % That is a best
test7(N) :- M = N - 1, test7(M).


at Turbo prolog 20

N=0,N=1,N=2,N=3


test0(N) 8 , 13 , 18 , 23 step
test1(N) 4 , 10 , 16 , 22 step
test2(N) 6 , 11 , 16 , 21 step
test3(N) 8 , 13 , 18 , 23 step
test4(N) 4 , 10 , 16 , 22 step

test5(N) 4 , 09 , 14 , 19 step that is a best


test6(N) 4 , 10 , 16 , 22 step

test7(N) 4 , 09 , 14 , 19 step that is a best

Bart Demoen

unread,
Jul 29, 2005, 2:58:39 PM7/29/05
to
Joachim Schimpf wrote:

> test4(0) :- !.
> test4(N) :- N > 0, M is N - 1, test4(M).
>
> test5(0) :- !.
> test5(N) :- M is N - 1, test5(M).
>
> test6(0).
> test6(N) :- integer(N), N>0, M is N - 1, test6(M).
>
>
> Results for ECLiPSe 5.8 (probably different machine than Bart's)
> ------+--------

[...]


> test4 | 1330
> test5 | 690
> test6 | 530

Joachim,

Please tell us more about this difference in timings ... the decrease in
particular.

Cheers

Bart Demoen

Joachim Schimpf

unread,
Aug 1, 2005, 11:33:03 AM8/1/05
to

Obviously, test5 is faster than test4 because N>0 is omitted.
test6 is as fast as test5 because the (integer(N), N>0) test
is subsumed by the indexing instruction.

Btw, the reason that test3 (the if-then-else version) is slower
than test4 is that it creates an environment (at least in ECLiPSe).

-- Joachim

Nameless

unread,
Aug 3, 2005, 6:44:14 AM8/3/05
to
"Joachim Schimpf" <j.sc...@imperial.ac.uk> wrote in message
news:dclfbf$m92$1...@jura.cc.ic.ac.uk...

> Bart Demoen wrote:
>> Joachim Schimpf wrote:
>>
>>> test4(0) :- !.
>>> test4(N) :- N > 0, M is N - 1, test4(M).
>>>
>>> test5(0) :- !.
>>> test5(N) :- M is N - 1, test5(M).
>>>
>>> test6(0).
>>> test6(N) :- integer(N), N>0, M is N - 1, test6(M).
>>>
>>>
>>> Results for ECLiPSe 5.8 (probably different machine than Bart's)
>>> ------+--------
>>
>> [...]
>>
>>> test4 | 1330
>>> test5 | 690
>>> test6 | 530
>>
>>
>> Joachim,
>>
>> Please tell us more about this difference in timings ... the decrease in
>> particular.
>
> Obviously, test5 is faster than test4 because N>0 is omitted.
> test6 is as fast as test5 because the (integer(N), N>0) test
> is subsumed by the indexing instruction.
[snipped]

> Btw, the reason that test3 (the if-then-else version) is slower
> than test4 is that it creates an environment (at least in ECLiPSe).

What do mean by 'creates an environment'?

--
Mail sent to this email address is deleted unread
on the server. Please send replies to the newsgroup.


Joachim Schimpf

unread,
Aug 3, 2005, 8:55:07 AM8/3/05
to
Nameless wrote:
> What do mean by 'creates an environment'?
>

Terminology from the WAM (Warren Abstract Machine): a stack
frame that stores a clause's variables when they can't be
stored in the machine's registers.

-- Joachim

Nameless

unread,
Aug 4, 2005, 2:44:55 PM8/4/05
to
"Joachim Schimpf" wrote in message
news:dcqerb$5ij$1...@jura.cc.ic.ac.uk...

Ok, but does this explain why there is up to 50% increase
in execution time vis-a-vis clauses with cuts?

Note: this 50% increase has resulted in that I have more or
less ceased using if-then-else constructs in ECLiPSe. :(

Joachim Schimpf

unread,
Aug 5, 2005, 7:33:54 AM8/5/05
to
Nameless wrote:
>
> Ok, but does this explain why there is up to 50% increase
> in execution time vis-a-vis clauses with cuts?

The variants which are comparable are test2 and test3 (these
differ only in that test2 uses two clauses with cut, while
test3 uses if-then-else). Here you have a 15% increase, not 50.

Actually, i see now that my explanation involving environments
was wrong, sorry. The slowdown is due to test2 using indexing,
where test3 creates a choicepoint.


>
> Note: this 50% increase has resulted in that I have more or
> less ceased using if-then-else constructs in ECLiPSe. :(
>

Apart from the fact that there is no 50% increase (unless you
compare versions with slightly different semantics like test4
and test5), such a conclusion would be wrong in general.
Whether an if-then-else version is faster or slower than a
multi-clause-cut version depends on a number of factors, some
of which are not obvious even when you know something about
WAM implementations, e.g.
- presence of indexable information in the condition
- whether the condition is simple or complex
- number of arguments of the clause
- number of permanent variables in the clause

The good news is that in my plans for a new ECLiPSe compiler,
I'm making sure that both versions have the same performance.
The bad news is, I don't know if we'll ever get round to
implementing it :-/


-- Joachim

Nameless

unread,
Aug 5, 2005, 11:04:30 AM8/5/05
to
"Joachim Schimpf" <j.sc...@imperial.ac.uk> wrote in message
news:dcvir2$lfe$1...@jura.cc.ic.ac.uk...

> Nameless wrote:
>>
>> Ok, but does this explain why there is up to 50% increase
>> in execution time vis-a-vis clauses with cuts?
>
> The variants which are comparable are test2 and test3 (these
> differ only in that test2 uses two clauses with cut, while
> test3 uses if-then-else). Here you have a 15% increase, not 50.
>
> Actually, i see now that my explanation involving environments
> was wrong, sorry. The slowdown is due to test2 using indexing,
> where test3 creates a choicepoint.
>>
>> Note: this 50% increase has resulted in that I have more or
>> less ceased using if-then-else constructs in ECLiPSe. :(
>>
> Apart from the fact that there is no 50% increase (unless you
> compare versions with slightly different semantics like test4
> and test5), such a conclusion would be wrong in general.

Well, Joachim, I did say 'up to'. Besides, test3 is just
about as simple as it can get! You know, I can send you an
example that does exhibit 50% increase, to show you that I
am not exaggerating.

> Whether an if-then-else version is faster or slower than a
> multi-clause-cut version depends on a number of factors, some
> of which are not obvious even when you know something about
> WAM implementations, e.g.
> - presence of indexable information in the condition
> - whether the condition is simple or complex
> - number of arguments of the clause
> - number of permanent variables in the clause

I concur.

> The good news is that in my plans for a new ECLiPSe compiler,
> I'm making sure that both versions have the same performance.
> The bad news is, I don't know if we'll ever get round to
> implementing it :-/

Please do try (harder)! ;)

Joachim Schimpf

unread,
Aug 5, 2005, 1:53:04 PM8/5/05
to
Nameless wrote:
>
> Well, Joachim, I did say 'up to'. Besides, test3 is just
> about as simple as it can get!

Precisely because it is so simple it should exhibit the worst
case in terms of slowdown.

> You know, I can send you an
> example that does exhibit 50% increase, to show you that I
> am not exaggerating.

Please do. But I suspect you can only get that when you have more
than two clauses vs. an if-then-else cascade. In that case you
can in fact get arbitrary slowdown because if-then-else cascades
are not indexed in Eclipse.


-- Joachim

Nameless

unread,
Aug 5, 2005, 3:10:01 PM8/5/05
to
"Joachim Schimpf" <j.sc...@imperial.ac.uk> wrote in message
news:dd0920$rta$1...@jura.cc.ic.ac.uk...

> Nameless wrote:
>>
>> Well, Joachim, I did say 'up to'. Besides, test3 is just
>> about as simple as it can get!
>
> Precisely because it is so simple it should exhibit the worst
> case in terms of slowdown.

Hmm, I'll have to give that some thought next week because
my grey cells are clearly exhibiting signs of the week's
wear and tear.

> > You know, I can send you an
>> example that does exhibit 50% increase, to show you that I
>> am not exaggerating.
>
> Please do. But I suspect you can only get that when you have
> more than two clauses vs. an if-then-else cascade. In that
> case you can in fact get arbitrary slowdown because if-then-
> else cascades are not indexed in Eclipse.

Will do, and thanks for the insight on the workings of
ECLiPSe's if-then-else. Have a nice weekend.

Mostowski Collapse

unread,
Aug 17, 2020, 4:32:32 PM8/17/20
to
The internet is a wonderful wayback machine.
Was just redoing some of the benchmarks below.

Test SWI 8.3.5 YAP 6.3.0
test1 676 359
test2 750 422
test2 750 343

So there was some improvement already in the past.
The average ratio between SWI and YAP went down

from 5.3 and 1.9. It would be quite a gas if this
sound barrier could be broken.

Mostowski Collapse

unread,
Aug 18, 2020, 7:28:39 AM8/18/20
to
Jan W. wrote on SWI-Prolog discourse, which
could indicate that YAP 6.5.0 can’t do last
call optimization anymore:

> The GIT version of YAP (6.5.0) crashes
> on all three tests

Maybe YAP 6.5.0 shows good results in
tail recursion examples because it doesn’t
have last call optimization anymore and you
were lucky to run the test cases since
you had enough stack space?

Based on these doubts, how do these benchmarks
here workout with YAP 6.3.0? There was once
a pre-compiled .exe on the internet
somewhere.

Mostowski Collapse

unread,
Aug 18, 2020, 7:32:11 AM8/18/20
to
The second reference to test cases, refers
to the test campaign that Jan W. recently
did SWI versus YAP on SWI-Prolog discourse.
Reply all
Reply to author
Forward
0 new messages