Wherein I prove that interpreters needn't be (horrendously) slow

32 views
Skip to first unread message

Bill Hart

unread,
Jan 22, 2011, 9:11:22 PM1/22/11
to sage-flame, flint-devel
A month or two ago, William Stein posted a blog article about the
times of Python versus Cython and compiled C for the following
benchmark:

def sumit(n):
j = 1
for i in range(0, n):
j += i

for k in range(0, 1000):
sumit(100000)

On my machine, this code takes about 10s to run in Python. I call this
Stein's benchmark.

Well, I decided to see if I could do better.

I sat down and wrote an interpreter which is just sophisticated enough
to run the above benchmark.

For comparison, GCC 4.4.1 on my machine with -O2 optimisation runs an
equivalent C program in about 80ms, i.e. 10.0/0.08 = 125 times faster.

I decided to not use the C compiler at all, but wrote my interpreter
in Lisp. It lexes the input using a Lisp regular expression lbrary and
I construct an AST with a parser I wrote which was inspired by the
Haskell Parsec library, using a DSL for the task using Lisp macros.

One slight problem is that Lisp doesn't have machine integers per se.
It has fixnums, which are like 61 bits or something like that. So I
have to simulate 64 bit C longs (an artificial constraint perhaps, but
one I decided to add). I did this by doing signed arithmetic modulo
2^64 with some little bit twiddling functions. Another problem is Lisp
doesn't make overloading of operators very easy, and I wanted my
interpreter to also be able to handle bignums, like Python. So I set
up a generic (add-gen) function which dispatches on type.

I also decided to add another benchmark for comparison with Python.

k = 0
for i in range(0, 100):
for j in range(0, 100000000):
k = k + j

Python takes about 2600s on my machine for this benchmark. I call this
bench2. GCC takes 8s on this benchmark. So 2600.0/8.0 = 325 times
faster.

Well, my interpreter now works. It's pretty fault tolerant. If you
mistype, it gives you feedback and lets you continue. If your program
crashes it prints an error and lets you continue. The code isn't 100%
nice yet, but it's well under 500 lines of code and it works, so let's
try it.

The following are actual sessions.

First let's try Stein's benchmark:

> fn sumit(n) {
var i, j = 1
for (i in 0..n) j = j + i
j
}
SUMIT
> var k
K
> for (k in 0..1000) sumit(100000)

Takes about 1.5s. So a little faster than Python.

Now let's try that again, but this time we'll add type annotations so
it isn't using bignums:

> fn sumit(n) {
var i:int64, j:int64 = 1 # note the int64's providing type annotations
for (i in 0..n) j = j + i
j
}
SUMIT
> for (k in 0..1000) sumit(100000)

Too fast to time. So let's up the count a little:

> for (k in 0..100000) sumit(100000) # up the count by a factor of 100

That now takes 8s. Hmm, that means that Stein's benchmark is taking
8.0s/100.0 = 0.08s. Hmm, 125 times faster than Python.

Ok, so what about bench2. Let's jump straight in and do it with type
annotation, as we know that'll be faster:

> {
var i:int64, j:int64, k:int64 = 0
for (i in 0..100) {
for (j in 0..100000000) k = k + j
}
k
}
499999995000000000

Hmm, that takes 8s too. That's about 325 times faster than Python,
same as compiled C.

So, in summary, this interpreter runs faster than Python generically
and with type annotations runs hundreds of times faster. And it's an
*interpreter*.

Of course this is how things were done back in the good 'ole days of
Lisp in the early 60's. And even now, in 2011, I know of absolutely
zero languages currently available that would allow me to do all of
the above, other than Lisp!

How is it that interpreter after interpreter gets written, with the
claim of being "fast", when they run about the same speed as Python,
or in some cases worse, and yet people continue to insist they have
"reasonably good performance for an interpreter"?

Just to summarise:

Python / My Interpreter / Compiled C
Stein's: 10s 0.08s 0.08s
Bench2: 2600s 8.0s 8.0s

(Note: I was very kind to Python by not writing a function to simulate
signed arithmetic mod 2^64 and a generic multiplication function on
top of that. I shudder to think how much worse that would be! As you
can see, in (SBCL) Lisp it costs precisely nothing.)

[/CROW]

Bill.

Tom Boothby

unread,
Jan 22, 2011, 10:12:41 PM1/22/11
to sage-...@googlegroups.com
So, this is interesting and all, but I've gotta ask, "so what"? Have
you written an interpreter for a general purpose language, or just an
interpreter to fit the purpose of this benchmark [1]? If so, is it
the perfect language[2]?

Your response to William's post was "so what", so the question seems fair.

--tom

[1] http://www.dangermouse.net/esoteric/hq9plusplus.html
[2] please tell me you don't have special operators to work with
unsigned long const pointers *%!@, +%!@, /%!@, etc.

Message has been deleted
Message has been deleted

Tom Boothby

unread,
Jan 22, 2011, 11:45:11 PM1/22/11
to sage-...@googlegroups.com
On Sat, Jan 22, 2011 at 7:17 PM, Bill Hart <goodwi...@googlemail.com> wrote:
> These are stupid questions.

Um, you're a stupid question? I'm being serious here (well, the
footnotes were tongue-in-cheek).

My question "so what" is because I did read what you wrote, and I
still don't get your point. You wrote an interpreter for a language
that's apparently robust enough to do some arithmetic and a for loop.
What else can it do? If it's a full-featured language, how does it
compare to other languages? If it isn't full-featured, then how do
you justify comparing it to a full-featured interpreted language?

Just as William's benchmark is trivial, so is yours. The difference
is, Cython is extremely useful. And until you prove otherwise, I'll
doubt that your interpreter is.

Bill Hart

unread,
Jan 22, 2011, 11:51:18 PM1/22/11
to sage-...@googlegroups.com
"I am not able rightly to apprehend the kind of confusion of ideas
that could provoke such a question." - Babbage

> --
> You received this message because you are subscribed to the Google Groups "sage-flame" group.
> To post to this group, send email to sage-...@googlegroups.com.
> To unsubscribe from this group, send email to sage-flame+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sage-flame?hl=en.
>
>

Nils Bruin

unread,
Jan 23, 2011, 12:27:19 AM1/23/11
to sage-flame
Hi Bill,

Your observed speedup is in line with general Python/Cython
observations: Just putting unannotated Python code through Cython
generally only results in a modest speedup. The interpreter overhead
gets removed, but (as you are aware) the full dynamic nature of python
means that the code produced by Cython is still riddled with calls
into the Python library. Compared with that, the interpreter overhead
of Python is only modest anyway.
(it may be that cython cheats with the for-loop in your examples,
because it can infer that it's safe to make the loop variable an
int64.

In your interpreter it looks like you do declare your variables int64,
so no expensive type and method lookups are necessary at runtime.
Perhaps your runtimes more reflect the efficiency differences between
static/dynamic typing vs. compiled/interpreted.

Bill Hart

unread,
Jan 23, 2011, 9:20:00 AM1/23/11
to sage-...@googlegroups.com
On 23 January 2011 05:27, Nils Bruin <bruin...@gmail.com> wrote:
> Hi Bill,
>
> Your observed speedup is in line with general Python/Cython
> observations: Just putting unannotated Python code through Cython
> generally only results in a modest speedup. The interpreter overhead
> gets removed, but (as you are aware) the full dynamic nature of python
> means that the code produced by Cython is still riddled with calls
> into the Python library. Compared with that, the interpreter overhead
> of Python is only modest anyway.
> (it may be that cython cheats with the for-loop in your examples,
> because it can infer that it's safe to make the loop variable an
> int64.

Sure, though this is a compiler not an interpreter. There's a plethora
of compilers with native C performance for such simple things. It's
the number of interpreters that are said to be "fast" that bothers me.

>
> In your interpreter it looks like you do declare your variables int64,
> so no expensive type and method lookups are necessary at runtime.
> Perhaps your runtimes more reflect the efficiency differences between
> static/dynamic typing vs. compiled/interpreted.
>

Type inference can do away with the type annotations. But you have a
choice. Do you want the types of whole numbers to be inferred to be
machine words or bignums. And should they be inferred to be signed or
unsigned integers.

I will be implementing type inference of some kind. I suspect I am
leaning towards having different constant literal notations for the
various types, e.g.

12345L -- bignum
12345 -- unsigned machine word
12323487365876348756834765863485788 bignum
-12345 -- signed machine word
(int64)12345 -- signed machine word.

Python used to essentially distinguish ints and bignums. But I
understand it is moving away from distinguishing them altogether. It
can lead to surprise if you add 2^63 and 2^63 and get 0.

I suspect it is only a small step from there until someone says, why
not have the machine figure out that there is going to be an overflow
and automatically upgrade to a bignum. As soon as you do that you are
back to runtime type checking.

One can also consider having parameterised integer types:

(word6) 12345 -- twos complement 6 machine word integer (signs are not
useful here)

These can also be made to run significantly faster than general bignums.

A similar issue occurs for floating point numbers (doubles vs
multiprecision floats) of course. Here you don't have the notion of
overflow (except for the exponent). But you do want to have efficient
multiprecision floats of fixed precision.

Bill.

Bill Hart

unread,
Jan 23, 2011, 10:01:05 AM1/23/11
to sage-...@googlegroups.com
On 23 January 2011 14:20, Bill Hart <goodwi...@googlemail.com> wrote:
> On 23 January 2011 05:27, Nils Bruin <bruin...@gmail.com> wrote:
>> Hi Bill,
>>
>> Your observed speedup is in line with general Python/Cython
>> observations: Just putting unannotated Python code through Cython
>> generally only results in a modest speedup. The interpreter overhead
>> gets removed, but (as you are aware) the full dynamic nature of python
>> means that the code produced by Cython is still riddled with calls
>> into the Python library. Compared with that, the interpreter overhead
>> of Python is only modest anyway.
>> (it may be that cython cheats with the for-loop in your examples,
>> because it can infer that it's safe to make the loop variable an
>> int64.
>
> Sure, though this is a compiler not an interpreter. There's a plethora
> of compilers with native C performance for such simple things. It's
> the number of interpreters that are said to be "fast" that bothers me.
>
>>
>> In your interpreter it looks like you do declare your variables int64,
>> so no expensive type and method lookups are necessary at runtime.
>> Perhaps your runtimes more reflect the efficiency differences between
>> static/dynamic typing vs. compiled/interpreted.
>>
>

Sorry, I got a bit distracted by my own answer and didn't really
respond to what you wrote.

Yes, static typing makes a difference, though Lisp is in general
dynamically typed. Had I used fixnums with an automatic conversion to
bignums on overflow (as Python essentially does) it still would have
been substantially faster in my Lisp based interpreter than Python.
That would not be a result of static typing. It would definitely be
determining the type at runtime.

Static typing can certainly help, and is definitely essential to me
getting native C performance here (I don't have a proof of that, but I
think it is a widely accepted fact).

However, later on, I hope to give some examples of other features
which can cause interpreters to run substantially faster than Python.
There static typing is not as important, but other things are.

By "later on" I of course mean weeks or months. I'm a mathematician,
not a computer scientist, so in general I can't work on these
experiments except on occasional weekends.

Bill.

rjf

unread,
Jan 23, 2011, 11:07:04 AM1/23/11
to sage-flame
Historically, people have written Lisp interpreters quite frequently.
(Yes, some people published a paper on how they wrote a Lisp
interpreter in SNOBOL).

Sometimes the interpreters are exceedingly fast. For example, one
was written for a VAX that was faster than Franz Lisp compiled. That
was an early Franz Lisp system, but what might have been more
important
was that the competing interpreter left out a bunch of things.
Like error checking, garbage collection, bignums. Especially error
checking, where stack problems, wrong number of args, etc cause
crashes, is usually not acceptable. And the shortcuts in data
structures for fast computation made garbage collection essentially
impossible, so it wasn't that GC was deferred to a later time.
Putting in a GC would require a rewrite and slowdown...

Fixing most of these issues slows the interpreter down.
You'd like to think that something that happens rarely (like some
error condition) would have an insignificant effect on runtime,
and that sometimes is true (e.g. if it can be caught by an
interrupt). But sometimes it isn't, and has to be checked
regularly. Think about "Have we run out of memory yet?" could
be done either way, depending on hardware and software system
design. Similarly for "Does this number overflow"? (Could,
in principle, be done with IEEE exception. Or "Do we need
a bignum" (IEEE inexact).
or "Did the user hit an interrupt key, and are we in a safe
place to enter a break loop?"


I do not know if any interpreter or compiler for that
matter uses these exception features; the hardware
builders tend to make their use slow; the software builders
tend to make their use inaccessible and non-portable.

RJF

Bill Hart

unread,
Jan 23, 2011, 7:18:04 PM1/23/11
to sage-flame, flint-devel
Here are some examples that do not run as fast as compiled C:

Example 1:

fn mulit(n) {
var i:int64, d:double = 1.2394835793448d0
for (i in 0..n) d = d * 1.000000000000023d0
d
}

mulit(1000000000)

Example 2:

fn muladd(n) {
var i:int64, m:int64 = 12369128374699481, k:int64, l:int64
for (i in 0..n) {
k = j * j
l = j * 48732985794357928349
j = k + l + 1
}
j
}

muladd(100000000)

In example 1, my interpreter is 5 times slower than compiled C. This
seems to be because doubles are actual SBCL objects, so it does a lot
of clunking around. Still, my interpreter is 200 times faster than
Python for this example, and compiled C is 1000 times faster than
Python.

In example 2, my interpreter is 2 times slower than compiled C because
the C compiler recognises the identity j * j + j * C = j * (j + C)
which allows for fewer multiplications and more pipelining
opportunities.

uint64's (unsigned machine words) run in the same time as int64's,
however recent versions of GCC allow one to retrieve the high word of
a word by word multiplication quite efficiently. I implemented mulhi
in my interpreter, but SBCL is something like 30 times slower than C.
I imagine that patching SBCL to fix this would be fairly trivial
though.

One other drawback of SBCL is there is no 64 bit Windows port, and the
32 bit Windows port is a few revisions behind.

Bill.

Bill Hart

unread,
Jan 24, 2011, 2:09:26 PM1/24/11
to sage-flame, flint-devel
After talking to the guys on the SBCL list yesterday, they showed me
why my code in example 1 was running slower than C. Now it doesn't.
Here is what it looks like now:

fn mulit(n:int64):double {


var i:int64, d:double = 1.2394835793448d0
for (i in 0..n) d = d * 1.000000000000023d0

return d
}

mulit(1000000000)

Now it runs in 2s, same as compiled C and 1000 times faster than
equivalent Python (here Python has no excuse -- there's no
fixnum/bignum type checking to do).

You can already do some fun stuff. E.g. here is a session showing the
use of lists and arrays (vectors):

> var a = list()
NIL
> a = cons(2, a)
(2)
> a = cons(3, a)
(3 2)
> var b = list(2, 3, 4, 5)
(2 3 4 5)
> cdr(b)
(3 4 5)
> var arr = vector(1, 2, 3, 4, 5)
#(1 2 3 4 5)
> var c = aref(arr, 4)
5
> var i
I
> for (i in 0..length(arr)) {
print(aref(arr, i))
terpri()
}

1

2

3

4

5
NIL

Bill.

Nils Bruin

unread,
Jan 25, 2011, 2:27:27 AM1/25/11
to sage-flame
On Jan 24, 11:09 am, Bill Hart <goodwillh...@googlemail.com> wrote:
> Now it runs in 2s, same as compiled C and 1000 times faster than
> equivalent Python (here Python has no excuse -- there's no
> fixnum/bignum type checking to do).

I'm afraid you're claiming a win in a running contest your opponent
considered to be a Sunday afternoon stroll.
Reply all
Reply to author
Forward
0 new messages