what is the difference between these ways of defining a function, and
what is the most efficient (I mean for the resulting function f = binop
o f1 f2, which typically will be called x*1000 times)
type operator = Plus | Minus;;
let binop1 o f1 f2 =
fun x -> match o with
| Plus -> (f1 x) +. (f2 x)
| Minus -> (f1 x) -. (f2 x)
let binop2 o f1 f2 =
match o with
| Plus -> fun x -> (f1 x) +. (f2 x)
| Minus -> fun x -> (f1 x) -. (f2 x)
let binop3 o f1 f2 =
let op = match o with
| Plus -> (+.)
| Minus -> (-.) in
fun x -> op (f1 x) (f2 x)
let binop4 o f1 f2 =
fun x ->
let op = match o with
| Plus -> (+.)
| Minus -> (-.) in
op (f1 x) (f2 x)
Thanks for your expertise !
San
_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs
That's a very hard question, and is probably platform specific but I'll throw
some ideas at you off the top of my head. I'm sure someone like Brian Hurt
will dive into the assembler and prove me wrong. ;-)
> type operator = Plus | Minus;;
>
>
> let binop1 o f1 f2 =
> fun x -> match o with
> | Plus -> (f1 x) +. (f2 x)
> | Minus -> (f1 x) -. (f2 x)
That is equivalent to:
let binop1 o f1 f2 x = ..
it waits for all arguments before doing anything. ocamlopt optimises currying
for that case.
> let binop2 o f1 f2 =
> match o with
> | Plus -> fun x -> (f1 x) +. (f2 x)
> | Minus -> fun x -> (f1 x) -. (f2 x)
After 3 args, this returns a closure waiting for the fourth arg.
> let binop3 o f1 f2 =
> let op = match o with
> | Plus -> (+.)
> | Minus -> (-.) in
> fun x -> op (f1 x) (f2 x)
After 3 args, this creates a closure to do the op and another closure that
captures the first closure. ocamlopt might inline the closure "op" but I
doubt it.
> let binop4 o f1 f2 =
> fun x ->
> let op = match o with
> | Plus -> (+.)
> | Minus -> (-.) in
> op (f1 x) (f2 x)
This waits for all four args again (same as first case) and the closure "op"
might be inlined.
Assuming you invoke the function will all four arguments, I would expect the
first solution to be the fastest by a significant margin. If you factor out a
closure of binop with its first three arguments passed and use it multiple
times then the second solution might be faster.
I've found this with a pattern matcher written in OCaml that was faster when
the pattern matcher evaluated when partially applied to the pattern,
returning a closure that matched against the pattern it had been partially
applied to. I was surprised to find that. I still don't know why that would
be faster...
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
So, I have tried to do some benchmarking, and the results are surprising.
The method: a (for i=1 to ... do) loop.
I have tried first an inlined test with
ignore(binop1 Plus sin cos 1.)
(and then binop2, binop3, binop4)
Then I have defined the function
f = binop1 Plus sin cos
and tried the loop with
ignore(f 1.)
Here are the results
(Linux debian 2.6.17-2-686, Intel(R) Pentium(R) M processor 1300MHz)
BYTE CODE:
Inline binop1 with 100000000 iterations:58736 msec.
Inline binop2 with 100000000 iterations:59681 msec.
Inline binop3 with 100000000 iterations:67663 msec.
Inline binop4 with 100000000 iterations:68692 msec.
f (binop1) with 100000000 iterations:53514 msec.
f (binop2) with 100000000 iterations:47794 msec.
f (binop3) with 100000000 iterations:53315 msec.
f (binop4) with 100000000 iterations:59411 msec.
NATIVE CODE:
Inline binop1 with 100000000 iterations:25017 msec.
Inline binop2 with 100000000 iterations:25860 msec.
Inline binop3 with 100000000 iterations:26310 msec.
Inline binop4 with 100000000 iterations:24251 msec.
f (binop1) with 100000000 iterations:25512 msec.
f (binop2) with 100000000 iterations:25620 msec.
f (binop3) with 100000000 iterations:25554 msec.
f (binop4) with 100000000 iterations:24801 msec.
The last method (binop4) is clearly the slowest in bytcode and the
fastest in native code !
In native code, the difference bewteen the slowest and the fastest
reaches 8%.
This seems to me significant enough, but I don't know whether it depends
on the system architecture or not.
Best
San