Tail recursion optimization in Python

329 views
Skip to first unread message

Nils Bruin

unread,
Sep 10, 2010, 2:43:07 PM9/10/10
to sage-flame
Surprisingly, one can write a decorator in python that optimizes
functions that use tail-recursion (and no other recursion!) to have a
flat call-stack. There have been many versions, all well-documented
(use google). The one to bookmark is a cython version
http://fiber-space.de/wordpress/?p=349 (by Kay Schluehr as far as I
can tell).

I tried his code and it works quite nicely. I changed his "cdef int
CONTINUE" to "cdef object CONTINUE" because the "self.CONTINUE=
id(object())" would occasionally give overflow errors, and the
subsequent
"result == self.CONTINUE" was slower than the "result is
self.CONTINUE" now.

Results:
- a @tail_recursive decorated function does not have its tail
recursion limited by the 999 stack frames (or whatever bound one
configures in python)
- for both "factorial" and for "silly odd/even test", the decorated
version is considerably faster for calls requiring depth 900
- in all cases, the iterative version is much faster
- for small depths, the undecorated version is quite a bit faster
- indiscriminate use of the decorator invalidates your code. In
particular, a function that mainly calls itself in the tail but also
calls itself occasionally in the body is invalidated, probably
eliminating the one really useful application of tail-call
optimization. (Tail-call is a property of the call-site, not the
function called)

Anyway, when I stumbled into this I thought it worthwhile to remember
its existence and perhaps other people might enjoy it too.

Oh, and it's good fodder for people wanting to point out that LISP
compilers do this stuff automatically and correctly all the time.

For reference, the transcript of my experiment:

sage: %cython
sage: cdef class tail_recursive:
... cdef int firstcall
... cdef object CONTINUE
... cdef object argskwd
... cdef object func
...
... def __init__(self, func):
... self.func = func
... self.firstcall = True
... self.CONTINUE = object()
...
... def __call__(self, *args, **kwd):
... if self.firstcall:
... self.firstcall = False
... try:
... while True:
... result = self.func(*args, **kwd)
... if result is self.CONTINUE: # update
arguments
... args, kwd = self.argskwd
... else: # last call
... return result
... finally:
... self.firstcall = True
... else: # return the arguments of the tail call
... self.argskwd = args, kwd
... return self.CONTINUE
sage: def fact_iter(n):
... r=1
... for i in srange(1,n+1):
... r*=i
... return r
...
sage: @tail_recursive
sage: def fact_decor(n,acc=1):
... if n == 1:
... return acc
... else:
... return fact_decor(n-1,acc=n*acc)
...
sage: def fact_undecor(n,acc=1):
... if n == 1:
... return acc
... else:
... return fact_rec(n-1,acc=n*acc)
sage: fact_iter(900) == fact_decor(900) == fact_rec(900)
True
sage: timeit("fact_iter(900)")
625 loops, best of 3: 643 µs per loop
sage: timeit("fact_decor(900)")
125 loops, best of 3: 2.02 ms per loop
sage: timeit("fact_rec(900)")
125 loops, best of 3: 3.22 ms per loop
sage: def odd_iter(n):
... T=False
... while n > 0:
... T=not(T)
... n-=1
... return T
...
sage: def odd_undecor(n):
... if n == 0:
... return False
... else:
... return even_undecor(n-1)
...
sage: def even_undecor(n):
... if n == 0:
... return True
... else:
... return odd_undecor(n-1)
...
sage: @tail_recursive
sage: def odd_decor(n):
... if n == 0:
... return False
... else:
... return even_decor(n-1)
...
sage: # you get valid results with this one decorated too, but by
decorating only one, we get less overhead
sage: def even_decor(n):
... if n == 0:
... return True
... else:
... return odd_decor(n-1)
sage: [odd_iter(n) for n in [1..10]] == [odd_decor(n) for n in
[1..10]] == [odd_undecor(n) for n in [1..10]]
True
sage: timeit("odd_iter(900)")
625 loops, best of 3: 386 µs per loop
sage: timeit("odd_decor(900)")
625 loops, best of 3: 749 µs per loop
sage: timeit("odd_undecor(900)")
625 loops, best of 3: 1.01 ms per loop
sage: #break-even point on this hardware:
sage: timeit("odd_decor(212)")
sage: timeit("odd_undecor(212)")
625 loops, best of 3: 171 µs per loop
625 loops, best of 3: 170 µs per loop

rjf

unread,
Sep 12, 2010, 1:06:36 PM9/12/10
to sage-flame
Hardly flammable.
It is not required of common lisp compilers that they detect tail
recursion and remove it.
It IS required, last I looked, of Scheme systems. Incidentally, it
can be done with interpreters too.
If some idea makes Python programs faster or more economical of stack,
that seems
entirely plausible for sage-devel.
I assume you stumbled onto this from some python discussion somewhere
else.
RJF


On Sep 10, 11:43 am, Nils Bruin <bruin.n...@gmail.com> wrote:
> Surprisingly, one can write a decorator in python that optimizes
> functions that use tail-recursion (and no other recursion!)  to have a
> flat call-stack. There have been many versions, all well-documented
> (use google). The one to bookmark is a cython versionhttp://fiber-space.de/wordpress/?p=349(by Kay Schluehr as far as I
> can tell).
...

Bill Hart

unread,
Sep 12, 2010, 1:12:46 PM9/12/10
to sage-...@googlegroups.com
That's right. It is required of scheme. This is to prevent space
leaks, which would be common in scheme otherwise.

One can simulate it in C with gotos. Instead of recursively calling
the function, one puts a jump back to the start of the function.
Naturally one needs to write the function with this in mind.

Interpreters which have a recursive eval layer in some kind of REPL
should do this. For example the consequent and alternative statements
in an "if" statement may be considered in tail position. Those
statements might simply call eval again, or they might optimise the
tail recursion.

I don't know if Python's interpreter does this or not.

It is difficult to have a C compiler automatically detect tail
recursion and have it correctly and efficiently optimise for it,
probably because of pointers I suppose, as with everything else that
is broken about C.

Bill.

> --
> You received this message because you are subscribed to the Google Groups "sage-flame" group.
> To post to this group, send email to sage-...@googlegroups.com.
> To unsubscribe from this group, send email to sage-flame+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sage-flame?hl=en.
>
>

Nils Bruin

unread,
Sep 12, 2010, 11:09:04 PM9/12/10
to sage-flame
On Sep 12, 10:12 am, Bill Hart <goodwillh...@googlemail.com> wrote:
> Interpreters which have a recursive eval layer in some kind of REPL
> should do this. For example the consequent and alternative statements
> in an "if" statement may be considered in tail position. Those
> statements might simply call eval again, or they might optimise the
> tail recursion.
>
> I don't know if Python's interpreter does this or not.

No, it doesn't. Guido expressed he wasn't interested in tail call
optimization. Other people then started thinking about ways around
this and one of the things they came up with, was the decorator given
here. Another solution is "stackless" python. The standard CPython
interpreter stores state on the C-stack. However, the Python call
frames are stored elsewhere, so one shouldn't need the C-stack as
well. In "stackless" python, there is no info on the C-stack anymore
and as a result, it is much easier to implement things like tail call
optimization, continuations and coroutines.

A (from python perspective somewhat contrived) example for a routine
that is naturally recursively expressed and would benefit from tail
call optimization, is membership test in unbalanced binary trees, such
as LISP list structures. If we express the tree as nested 2-element
tuple (i.e., car(L)=L[0] and cdr(L)=L[1])

def is_in(element, L):
if not(L): #(test if L is an empty tuple)
return False
elif element == L[0]:
return True
elif isinstance(L[0],tuple) and is_in(element,L[0]):
return True
else:
return is_in(element,L[1])

A reasonable test-case is:

L=(1,(0,()))
for i in range(1000):
L=( (0,0), L)

the call is_in(1,L) should easily return True, but the level 999
limitation on the Python call-stack means one gets an exception. Yes,
one can change that limit, but then we simply change the 1000 as well.
The (each time shallow) non-tail recursions "is_in(element,L[0])" mean
that the decorator is not applicable.

The "pythonic" solution would be to flatten the tail recursion into an
iterative loop:

def is_in_iter(element, L):
while 1:
if not(L):
return False
elif element == L[0]:
return True
elif isinstance(L[0],tuple) and is_in_iter(element,L[0]):
return True
L = L[1]

So, the challenge: Adapt the the "tail_recursive" decorator so that we
can still use the recursive solution. Obviously, it would be too much
to ask to let the decorator figure out if the call is in a tail
position, so I imagine we'll need something like:

@fancy_new_selective_tail_call_optimizing_decorator
def is_in(element, L):
if not(L): #(test if L is an empty tuple)
return False
elif element == L[0]:
return True
elif isinstance(L[0],tuple) and
is_in(element,L[0],tail_call=False):
return True
else:
return is_in(element,L[1],tail_call=True)

Perhaps comp.lang.python is a better platform for this challenge ...

rjf

unread,
Sep 12, 2010, 11:54:57 PM9/12/10
to sage-flame
A solution would be to parse the python into lisp, do the
transformations necessary, and write out the lisp program into python.

I suspect all the parts already exist.

RJF

Bill Hart

unread,
Sep 12, 2010, 11:59:42 PM9/12/10
to sage-...@googlegroups.com
That wouldn't be slow enough.

Nils Bruin

unread,
Sep 13, 2010, 3:42:35 PM9/13/10
to sage-flame
On Sep 12, 10:12 am, Bill Hart <goodwillh...@googlemail.com> wrote:
> It is difficult to have a C compiler automatically detect tail
> recursion and have it correctly and efficiently optimise for it,
> probably because of pointers I suppose, as with everything else that
> is broken about C.

Yet, GCC has a go at it:

http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-foptimize_002dsibling_002dcalls-686

I haven't check the "correctly" and "efficiently" part. However, given
the fact that tail-call optimisation affects the space complexity of
an implementation, even rather inefficient implementations should have
a rather easy job outperforming the non-optimised version at a certain
point.

For python:

One of Guido's comments is that due to the dynamic nature of python,
it is harder to recognize tail recursion than one might initially
think. If f() is initially tail recursive, one could do

g=f
f=<some new function>

and if g inherited f's tail recursion optimizations, the behaviour is
not consistent with python's defined semantics.

Still, one person had a go at bytecode level transformations:

http://bytecodehacks.sourceforge.net/bch-docs/bch/index.html

Another approach would be to do rewriting on the "abstract syntax
tree" (ast) level (a bit of a misnomer, because the syntax tells you
how to get from the source to the tree). In this respect, the only
difference between Python and LISP is that in Python, the interpreter
produces the parse tree from the source whereas in LISP, the source is
basically a direct representation of the tree.

In case of a tail-recursive function f():

def f(arg):
....
return f(*new_args)
....

one could rewrite to

def f(arg):
while True:
....
arg = new_args
continue "top_level_while"
....

except that python doesn't have labelled "break" or "continue"
statements. I think this example shows that python really needs those.

Anyway, this would only get you tail recursion optimization, not full
tail call optimization [so mutual tail call recursion would be
missed].

At this point, I think I am starting to agree with Guido to a limited
degree: Optimizing tail calls in python is not worth *my* time. Oddly
enough his main arguments against implementing TCO in the python
interpreter are:
- tail call optimization ruins stack backtraces
- it is unlikely that all python implementations will be able to
embrace TCO. Having it in some implementations but not others would
lead to python program (in)efficiency depending on implementation it
runs on, giving a kind of complexity fragmentation of the platform.

The latter would not affect sage, because it only intends to run on
one python implementation anyway, although the counter-argument more
appropriate for this forum is that the fear of a python program being
efficient at all is ungrounded.

I'd still be interested in seeing a decorator in the style of
http://groups.google.ca/group/sage-flame/msg/252d9f5e568b2d92 (based
what seems to be called a "trampoline") to flatten call-stacks for
mainly tail-recursive routines, mainly out of curiosity to see how
hard it is.

Bill Hart

unread,
Sep 13, 2010, 3:53:47 PM9/13/10
to sage-...@googlegroups.com
Not sure if you already posted this one:

http://code.activestate.com/recipes/474088/

The first thing that popped into my head was to do something with the
call stack. At least googling that came up with a hit. But I didn't
look in detail to see what it does, except for the plain english
description at the start.

You won't too much of that kind of fear on this list.

>
> I'd still be interested in seeing a decorator in the style of
> http://groups.google.ca/group/sage-flame/msg/252d9f5e568b2d92 (based
> what seems to be called a "trampoline") to flatten call-stacks for
> mainly tail-recursive routines, mainly out of curiosity to see how
> hard it is.
>

Bill.

Bill Hart

unread,
Sep 13, 2010, 3:55:53 PM9/13/10
to sage-...@googlegroups.com
Oh I see further down that it doesn't handle your criterion of doing
the right thing if the recursion is not a tail call.

Bill.

rjf

unread,
Sep 14, 2010, 1:18:36 AM9/14/10
to sage-flame


On Sep 13, 12:42 pm, Nils Bruin <bruin.n...@gmail.com> wrote:

>
> For python:
>
> One of Guido's comments is that due to the dynamic nature of python,
> it is harder to recognize tail recursion than one might initially
> think. If f() is initially tail recursive, one could do
>
> g=f
> f=<some new function>
>
> and if g inherited f's tail recursion optimizations, the behaviour is
> not consistent with python's defined semantics.

Since Lisp has exactly the same dynamic nature, Lisp interpreters and
compilers and run-time systems must address exactly the same issues.

If the concern is interprocedural (global) flow analysis then two
questions
come to mind. 1. Python does this?? 2. If it does this, there are
lots of
other issues, say, like assume g expands the definition of f in-line.
Now
you change f in almost ANY way. Change the argument count.

So I believe this is just a cop-out. You could do it if you want to.
As I
said, translate the sucker into Lisp (all the suckers together) and
do the job.
convert the result to assembler.

>
> Still, one person had a go at bytecode level transformations:
>
> http://bytecodehacks.sourceforge.net/bch-docs/bch/index.html
>
> Another approach would be to do rewriting on the "abstract syntax
> tree" (ast) level (a bit of a misnomer, because the syntax tells you
> how to get from the source to the tree). In this respect, the only
> difference between Python and LISP is that in Python, the interpreter
> produces the parse tree from the source whereas in LISP, the source is
> basically a direct representation of the tree.

No, that is not the difference, because in Lisp the source is a text
file, and
in Python the source is a text file.
The AST for Lisp is done by the READ program, which takes a text file
and produces the AST. Which can be printed out by the PRINT program.


In Python you have a much more elaborate program that has to do a non-
trivial
parsing task, instead of READ's parsing task, which is fairly easy.
I don't know if Python has an equivalent to PRINT which takes the AST
and
prints out a version of the source code equivalent to the AST, or if
the Python
AST is in some kind of canonical format so that random other people
can
hack on it in a machine-independent way. Byte codes??
Anyway, it seems like an aspect in which Lisp is more utilitarian.

.....

>
> At this point, I think I am starting to agree with Guido to a limited
> degree: Optimizing tail calls in python is not worth *my* time.

If you did this work, could you even be sure it would go into Guido's
work, or would it just be
some hack that would be wiped out in version n+1?


> Oddly
> enough his main arguments against implementing TCO in the python
> interpreter are:
>  - tail call optimization ruins stack backtraces

Yes, it does. In lisp, generally you can disable this to improve
tracing of functions,
along with disabling all sorts of optimizations that make changes to
the apparent
sequence of operations. This is basically BS. you could have a
compiler replace
multiplication by additions in a loop and then when you traced
multiplications, they
would be missing. So are you going to remove this kind of
optimization?

>  - it is unlikely that all python implementations will be able to
> embrace TCO.

I see no reason for TCO not to be done as a transformation of the AST,
in which case
all implementations should be able to use it. Unless each Python
implementation
has its own unique AST, in which case it is pretty sad.

>Having it in some implementations but not others would
> lead to python program (in)efficiency depending on implementation it
> runs on, giving a kind of complexity fragmentation of the platform.

Sounds like a really weak argument. You are driving all
implementations to the lowest capability.
E.g. some python implementation might have limited memory
addressability. So no one can
use more memory than that?
>
> The latter would not affect sage, because it only intends to run on
> one python implementation anyway,

Um, so there is only one AST for Sage's python? That seems like a
plus.

> although the counter-argument more
> appropriate for this forum is that the fear of a python program being
> efficient at all is ungrounded.

Oh, I thought huge amounts of credibility depended on python converted
to cython.

>
> I'd still be interested in seeing a decorator in the style ofhttp://groups.google.ca/group/sage-flame/msg/252d9f5e568b2d92(based
> what seems to be called a "trampoline") to flatten call-stacks for
> mainly tail-recursive routines, mainly out of curiosity to see how
> hard it is.

This technology (and terminology) dates back at least to 1975 in Lisp.
It may have other origins, but it is used in Maxima.

Nils Bruin

unread,
Sep 14, 2010, 4:47:39 PM9/14/10
to sage-flame
On Sep 13, 10:18 pm, rjf <fate...@gmail.com> wrote:
> So I believe this is just a cop-out.  You could do it if you want to.

Even Guido agrees that it is doable. He gives a sketch how to go about
it:
http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html
and it looks like "stackless python" already does it:
http://www.stackless.com/spcpaper.htm

> As I
> said, translate the sucker into Lisp  (all the suckers together) and
> do the job.
> convert the result to assembler.

That sounds like a great approach to writing any compiler. Conversion
to AST needs to be done anyway, and LISP basically is AST. I buy that
some compilers don't get written that way due to ignorance, but my
impression is that the only compilers that take this approach are Lisp
compilers, where the translation step is trivial. Is my impression
wrong? Surely laziness is more widespread than ignorance. Shouldn't we
be seeing more compilers based on this idea? Or are there complicating
factors lurking?

> In Python you have a much more elaborate program that has to do a non-trivial parsing task, instead of READ's parsing task, which is fairly easy.

It is my impression that it is mainly this non-trivial parsing task
that willing users of Python appreciate. An important selling point of
Python is that the source code is so easy to read.

[yes, the AST is accessible via a standard module that comes with
Python. I was a little disappointed to see that it doesn't come with a
pretty-printer. So AST-level transformations are definitely possible.
However, the kind of non-local transformation required to transform a
tail recursion into a loop is made harder due to the absence of
labelled break and continues, or gotos. The presence of exception
handling and "finally"s (unwind-protect in CL if I'm not mistaken)
need a little care too.]

Bill Hart

unread,
Sep 14, 2010, 5:38:05 PM9/14/10
to sage-...@googlegroups.com
On 14 September 2010 21:47, Nils Bruin <bruin...@gmail.com> wrote:
> On Sep 13, 10:18 pm, rjf <fate...@gmail.com> wrote:
>> So I believe this is just a cop-out.  You could do it if you want to.
>
> Even Guido agrees that it is doable. He gives a sketch how to go about
> it:
> http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html
> and it looks like "stackless python" already does it:
> http://www.stackless.com/spcpaper.htm
>
>> As I
>> said, translate the sucker into Lisp  (all the suckers together) and
>> do the job.
>> convert the result to assembler.
>
> That sounds like a great approach to writing any compiler. Conversion
> to AST needs to be done anyway, and LISP basically is AST. I buy that
> some compilers don't get written that way due to ignorance, but my
> impression is that the only compilers that take this approach are Lisp
> compilers, where the translation step is trivial. Is my impression
> wrong? Surely laziness is more widespread than ignorance. Shouldn't we
> be seeing more compilers based on this idea? Or are there complicating
> factors lurking?

No. GCC essentially compiles C to a form of Lisp first, then to
assembly language.

I think it is more common than one might expect, especially given that
many compilers compile to C or their interpreters are written in C
(not that the Lisp stage is still present in the latter case).

>
>> In Python you have a much more elaborate program that has to do a non-trivial parsing task, instead of READ's parsing task, which is fairly easy.
>
> It is my impression that it is mainly this non-trivial parsing task
> that willing users of Python appreciate. An important selling point of
> Python is that the source code is so easy to read.

Actually decent Lisp and Scheme compilers do go via an AST. Many toy
interpreters and compilers don't do this. But the good ones do. Python
would be much harder to parse to an AST though, of course.

>
> [yes, the AST is accessible via a standard module that comes with
> Python. I was a little disappointed to see that it doesn't come with a
> pretty-printer. So AST-level transformations are definitely possible.
> However, the kind of non-local transformation required to transform a
> tail recursion into a loop is made harder due to the absence of
> labelled break and continues, or gotos. The presence of exception
> handling and "finally"s (unwind-protect in CL if I'm not mistaken)
> need a little care too.]
>

Nils Bruin

unread,
Sep 15, 2010, 4:03:53 AM9/15/10
to sage-flame
On Sep 14, 2:38 pm, Bill Hart <goodwillh...@googlemail.com> wrote:

> No. GCC essentially compiles C to a form of Lisp first, then to
> assembly language.

OK, if you intended this in jest then I apologize for missing the
hint. Some quick googling leads me to 3 intermediate languages used in
gcc:

GENERIC, GIMPLE, RTL

none of them seem very much like LISP to me (since LISP indeed allows
for pretty much any kind of programming paradigm, you can claim for
pretty much any language that it is (a subset of) LISP, but I assume
you didn't mean the statement in that sense). I'd say one of the
characterizing features of LISP is its data model and translating C to
that in order to compile it to a conventional CPU machine language
seems like a particularly poor choice.

> Actually decent Lisp and Scheme compilers do go via an AST. Many toy
> interpreters and compilers don't do this. But the good ones do. Python
> would be much harder to parse to an AST though, of course.

Hm, doesn't the internal data structure for LISP code (car/cdr lists)
already fit the definition for AST? Do you mean that good lisp
compilers transform those structures into yet another tree structure
before converting it into bytecode or assembly or C or whatever target
(virtual) machine code? To what gain?

The toy LISP interpreters that I know (the ones that are part of any
course/book on LISP) interpret the AST directly rather than first
converting it into bytecode. (see also "hoc" in Kernighan/Pike, The
Unix Programming Environment (1984), although they don't really bother
constructing the AST as a tree in memory in any of their versions)

Bill Hart

unread,
Sep 15, 2010, 6:55:29 AM9/15/10
to sage-...@googlegroups.com
Essentially gcc's rtl is written in a Lisp-like S-expression form.
Some consider it to be a kind of Lisp, which is the only claim I made.

I was being serious, but maybe to your taste it isn't a Lisp. That's
fine, of course.

Bill.

rjf

unread,
Sep 15, 2010, 10:13:38 AM9/15/10
to sage-flame
The point of claiming an AST is either Lisp or the moral equivalent of
it, is essentially
to point out that a binary or multi-branch tree is trivial to build
and traverse in Lisp.
If you decide to do the same in another language like C, then you have
to design
data structures and program traversal mechanisms, as well as memory
management
in that language (or use some library that someone else has written).
And even then
it tends to be half-baked. e.g. missing the equivalent of Lisp's
PRINT.

So you could either use Lisp, or write (or import a library) that
provides 10 or 20
Lisp functions and maybe 3 data types: atom, symbol table, tree-node
(or list or pair...)
as well as a garbage collector.

I prefer using Lisp, but others may prefer to rewrite it in C. after
all, if you write the AST
using C, you don't get to write a garbage collector or hash table
manager yourself.
That is why the students in my compiler class tend to complete their
assignments
with working code, not like the students using C or C++.

RJF


On Sep 15, 3:55 am, Bill Hart <goodwillh...@googlemail.com> wrote:
> Essentially gcc's rtl is written in a Lisp-like S-expression form.
> Some consider it to be a kind of Lisp, which is the only claim I made.
>
> I was being serious, but maybe to your taste it isn't a Lisp. That's
> fine, of course.
>
> Bill.
>
Reply all
Reply to author
Forward
0 new messages