def sumit(n):
j = 1
for i in range(0, n):
j += i
for k in range(0, 1000):
sumit(100000)
On my machine, this code takes about 10s to run in Python. I call this
Stein's benchmark.
Well, I decided to see if I could do better.
I sat down and wrote an interpreter which is just sophisticated enough
to run the above benchmark.
For comparison, GCC 4.4.1 on my machine with -O2 optimisation runs an
equivalent C program in about 80ms, i.e. 10.0/0.08 = 125 times faster.
I decided to not use the C compiler at all, but wrote my interpreter
in Lisp. It lexes the input using a Lisp regular expression lbrary and
I construct an AST with a parser I wrote which was inspired by the
Haskell Parsec library, using a DSL for the task using Lisp macros.
One slight problem is that Lisp doesn't have machine integers per se.
It has fixnums, which are like 61 bits or something like that. So I
have to simulate 64 bit C longs (an artificial constraint perhaps, but
one I decided to add). I did this by doing signed arithmetic modulo
2^64 with some little bit twiddling functions. Another problem is Lisp
doesn't make overloading of operators very easy, and I wanted my
interpreter to also be able to handle bignums, like Python. So I set
up a generic (add-gen) function which dispatches on type.
I also decided to add another benchmark for comparison with Python.
k = 0
for i in range(0, 100):
for j in range(0, 100000000):
k = k + j
Python takes about 2600s on my machine for this benchmark. I call this
bench2. GCC takes 8s on this benchmark. So 2600.0/8.0 = 325 times
faster.
Well, my interpreter now works. It's pretty fault tolerant. If you
mistype, it gives you feedback and lets you continue. If your program
crashes it prints an error and lets you continue. The code isn't 100%
nice yet, but it's well under 500 lines of code and it works, so let's
try it.
The following are actual sessions.
First let's try Stein's benchmark:
> fn sumit(n) {
var i, j = 1
for (i in 0..n) j = j + i
j
}
SUMIT
> var k
K
> for (k in 0..1000) sumit(100000)
Takes about 1.5s. So a little faster than Python.
Now let's try that again, but this time we'll add type annotations so
it isn't using bignums:
> fn sumit(n) {
var i:int64, j:int64 = 1 # note the int64's providing type annotations
for (i in 0..n) j = j + i
j
}
SUMIT
> for (k in 0..1000) sumit(100000)
Too fast to time. So let's up the count a little:
> for (k in 0..100000) sumit(100000) # up the count by a factor of 100
That now takes 8s. Hmm, that means that Stein's benchmark is taking
8.0s/100.0 = 0.08s. Hmm, 125 times faster than Python.
Ok, so what about bench2. Let's jump straight in and do it with type
annotation, as we know that'll be faster:
> {
var i:int64, j:int64, k:int64 = 0
for (i in 0..100) {
for (j in 0..100000000) k = k + j
}
k
}
499999995000000000
Hmm, that takes 8s too. That's about 325 times faster than Python,
same as compiled C.
So, in summary, this interpreter runs faster than Python generically
and with type annotations runs hundreds of times faster. And it's an
*interpreter*.
Of course this is how things were done back in the good 'ole days of
Lisp in the early 60's. And even now, in 2011, I know of absolutely
zero languages currently available that would allow me to do all of
the above, other than Lisp!
How is it that interpreter after interpreter gets written, with the
claim of being "fast", when they run about the same speed as Python,
or in some cases worse, and yet people continue to insist they have
"reasonably good performance for an interpreter"?
Just to summarise:
Python / My Interpreter / Compiled C
Stein's: 10s 0.08s 0.08s
Bench2: 2600s 8.0s 8.0s
(Note: I was very kind to Python by not writing a function to simulate
signed arithmetic mod 2^64 and a generic multiplication function on
top of that. I shudder to think how much worse that would be! As you
can see, in (SBCL) Lisp it costs precisely nothing.)
[/CROW]
Bill.
Your response to William's post was "so what", so the question seems fair.
--tom
[1] http://www.dangermouse.net/esoteric/hq9plusplus.html
[2] please tell me you don't have special operators to work with
unsigned long const pointers *%!@, +%!@, /%!@, etc.
Um, you're a stupid question? I'm being serious here (well, the
footnotes were tongue-in-cheek).
My question "so what" is because I did read what you wrote, and I
still don't get your point. You wrote an interpreter for a language
that's apparently robust enough to do some arithmetic and a for loop.
What else can it do? If it's a full-featured language, how does it
compare to other languages? If it isn't full-featured, then how do
you justify comparing it to a full-featured interpreted language?
Just as William's benchmark is trivial, so is yours. The difference
is, Cython is extremely useful. And until you prove otherwise, I'll
doubt that your interpreter is.
> --
> You received this message because you are subscribed to the Google Groups "sage-flame" group.
> To post to this group, send email to sage-...@googlegroups.com.
> To unsubscribe from this group, send email to sage-flame+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sage-flame?hl=en.
>
>
Sure, though this is a compiler not an interpreter. There's a plethora
of compilers with native C performance for such simple things. It's
the number of interpreters that are said to be "fast" that bothers me.
>
> In your interpreter it looks like you do declare your variables int64,
> so no expensive type and method lookups are necessary at runtime.
> Perhaps your runtimes more reflect the efficiency differences between
> static/dynamic typing vs. compiled/interpreted.
>
Type inference can do away with the type annotations. But you have a
choice. Do you want the types of whole numbers to be inferred to be
machine words or bignums. And should they be inferred to be signed or
unsigned integers.
I will be implementing type inference of some kind. I suspect I am
leaning towards having different constant literal notations for the
various types, e.g.
12345L -- bignum
12345 -- unsigned machine word
12323487365876348756834765863485788 bignum
-12345 -- signed machine word
(int64)12345 -- signed machine word.
Python used to essentially distinguish ints and bignums. But I
understand it is moving away from distinguishing them altogether. It
can lead to surprise if you add 2^63 and 2^63 and get 0.
I suspect it is only a small step from there until someone says, why
not have the machine figure out that there is going to be an overflow
and automatically upgrade to a bignum. As soon as you do that you are
back to runtime type checking.
One can also consider having parameterised integer types:
(word6) 12345 -- twos complement 6 machine word integer (signs are not
useful here)
These can also be made to run significantly faster than general bignums.
A similar issue occurs for floating point numbers (doubles vs
multiprecision floats) of course. Here you don't have the notion of
overflow (except for the exponent). But you do want to have efficient
multiprecision floats of fixed precision.
Bill.
Sorry, I got a bit distracted by my own answer and didn't really
respond to what you wrote.
Yes, static typing makes a difference, though Lisp is in general
dynamically typed. Had I used fixnums with an automatic conversion to
bignums on overflow (as Python essentially does) it still would have
been substantially faster in my Lisp based interpreter than Python.
That would not be a result of static typing. It would definitely be
determining the type at runtime.
Static typing can certainly help, and is definitely essential to me
getting native C performance here (I don't have a proof of that, but I
think it is a widely accepted fact).
However, later on, I hope to give some examples of other features
which can cause interpreters to run substantially faster than Python.
There static typing is not as important, but other things are.
By "later on" I of course mean weeks or months. I'm a mathematician,
not a computer scientist, so in general I can't work on these
experiments except on occasional weekends.
Bill.
Example 1:
fn mulit(n) {
var i:int64, d:double = 1.2394835793448d0
for (i in 0..n) d = d * 1.000000000000023d0
d
}
mulit(1000000000)
Example 2:
fn muladd(n) {
var i:int64, m:int64 = 12369128374699481, k:int64, l:int64
for (i in 0..n) {
k = j * j
l = j * 48732985794357928349
j = k + l + 1
}
j
}
muladd(100000000)
In example 1, my interpreter is 5 times slower than compiled C. This
seems to be because doubles are actual SBCL objects, so it does a lot
of clunking around. Still, my interpreter is 200 times faster than
Python for this example, and compiled C is 1000 times faster than
Python.
In example 2, my interpreter is 2 times slower than compiled C because
the C compiler recognises the identity j * j + j * C = j * (j + C)
which allows for fewer multiplications and more pipelining
opportunities.
uint64's (unsigned machine words) run in the same time as int64's,
however recent versions of GCC allow one to retrieve the high word of
a word by word multiplication quite efficiently. I implemented mulhi
in my interpreter, but SBCL is something like 30 times slower than C.
I imagine that patching SBCL to fix this would be fairly trivial
though.
One other drawback of SBCL is there is no 64 bit Windows port, and the
32 bit Windows port is a few revisions behind.
Bill.
fn mulit(n:int64):double {
var i:int64, d:double = 1.2394835793448d0
for (i in 0..n) d = d * 1.000000000000023d0
return d
}
mulit(1000000000)
Now it runs in 2s, same as compiled C and 1000 times faster than
equivalent Python (here Python has no excuse -- there's no
fixnum/bignum type checking to do).
You can already do some fun stuff. E.g. here is a session showing the
use of lists and arrays (vectors):
> var a = list()
NIL
> a = cons(2, a)
(2)
> a = cons(3, a)
(3 2)
> var b = list(2, 3, 4, 5)
(2 3 4 5)
> cdr(b)
(3 4 5)
> var arr = vector(1, 2, 3, 4, 5)
#(1 2 3 4 5)
> var c = aref(arr, 4)
5
> var i
I
> for (i in 0..length(arr)) {
print(aref(arr, i))
terpri()
}
1
2
3
4
5
NIL
Bill.