Google 網路論壇不再支援新的 Usenet 貼文或訂閱項目,但過往內容仍可供查看。

[Caml-list] function definition

瀏覽次數:2 次
跳到第一則未讀訊息

Vu Ngoc San

未讀,
2007年1月21日 清晨7:48:392007/1/21
收件者:Caml Mailing List
I'm sure this is a basic question:

what is the difference between these ways of defining a function, and
what is the most efficient (I mean for the resulting function f = binop
o f1 f2, which typically will be called x*1000 times)

type operator = Plus | Minus;;


let binop1 o f1 f2 =
fun x -> match o with
| Plus -> (f1 x) +. (f2 x)
| Minus -> (f1 x) -. (f2 x)

let binop2 o f1 f2 =
match o with
| Plus -> fun x -> (f1 x) +. (f2 x)
| Minus -> fun x -> (f1 x) -. (f2 x)

let binop3 o f1 f2 =
let op = match o with
| Plus -> (+.)
| Minus -> (-.) in
fun x -> op (f1 x) (f2 x)

let binop4 o f1 f2 =
fun x ->
let op = match o with
| Plus -> (+.)
| Minus -> (-.) in
op (f1 x) (f2 x)

Thanks for your expertise !

San


_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jon Harrop

未讀,
2007年1月21日 中午12:52:012007/1/21
收件者:caml...@yquem.inria.fr
On Sunday 21 January 2007 12:45, Vu Ngoc San wrote:
> I'm sure this is a basic question:
>
> what is the difference between these ways of defining a function, and
> what is the most efficient (I mean for the resulting function f = binop
> o f1 f2, which typically will be called x*1000 times)

That's a very hard question, and is probably platform specific but I'll throw
some ideas at you off the top of my head. I'm sure someone like Brian Hurt
will dive into the assembler and prove me wrong. ;-)

> type operator = Plus | Minus;;
>
>
> let binop1 o f1 f2 =
> fun x -> match o with
> | Plus -> (f1 x) +. (f2 x)
> | Minus -> (f1 x) -. (f2 x)

That is equivalent to:

let binop1 o f1 f2 x = ..

it waits for all arguments before doing anything. ocamlopt optimises currying
for that case.

> let binop2 o f1 f2 =
> match o with
> | Plus -> fun x -> (f1 x) +. (f2 x)
> | Minus -> fun x -> (f1 x) -. (f2 x)

After 3 args, this returns a closure waiting for the fourth arg.

> let binop3 o f1 f2 =
> let op = match o with
> | Plus -> (+.)
> | Minus -> (-.) in
> fun x -> op (f1 x) (f2 x)

After 3 args, this creates a closure to do the op and another closure that
captures the first closure. ocamlopt might inline the closure "op" but I
doubt it.

> let binop4 o f1 f2 =
> fun x ->
> let op = match o with
> | Plus -> (+.)
> | Minus -> (-.) in
> op (f1 x) (f2 x)

This waits for all four args again (same as first case) and the closure "op"
might be inlined.

Assuming you invoke the function will all four arguments, I would expect the
first solution to be the fastest by a significant margin. If you factor out a
closure of binop with its first three arguments passed and use it multiple
times then the second solution might be faster.

I've found this with a pattern matcher written in OCaml that was faster when
the pattern matcher evaluated when partially applied to the pattern,
returning a closure that matched against the pattern it had been partially
applied to. I was surprised to find that. I still don't know why that would
be faster...

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

Vu Ngoc San

未讀,
2007年1月23日 清晨6:47:532007/1/23
收件者:Jon Harrop
Thanks for the reply.

So, I have tried to do some benchmarking, and the results are surprising.

The method: a (for i=1 to ... do) loop.

I have tried first an inlined test with

ignore(binop1 Plus sin cos 1.)

(and then binop2, binop3, binop4)

Then I have defined the function

f = binop1 Plus sin cos

and tried the loop with

ignore(f 1.)

Here are the results
(Linux debian 2.6.17-2-686, Intel(R) Pentium(R) M processor 1300MHz)

BYTE CODE:

Inline binop1 with 100000000 iterations:58736 msec.
Inline binop2 with 100000000 iterations:59681 msec.
Inline binop3 with 100000000 iterations:67663 msec.
Inline binop4 with 100000000 iterations:68692 msec.

f (binop1) with 100000000 iterations:53514 msec.
f (binop2) with 100000000 iterations:47794 msec.
f (binop3) with 100000000 iterations:53315 msec.
f (binop4) with 100000000 iterations:59411 msec.


NATIVE CODE:

Inline binop1 with 100000000 iterations:25017 msec.
Inline binop2 with 100000000 iterations:25860 msec.
Inline binop3 with 100000000 iterations:26310 msec.
Inline binop4 with 100000000 iterations:24251 msec.

f (binop1) with 100000000 iterations:25512 msec.
f (binop2) with 100000000 iterations:25620 msec.
f (binop3) with 100000000 iterations:25554 msec.
f (binop4) with 100000000 iterations:24801 msec.


The last method (binop4) is clearly the slowest in bytcode and the
fastest in native code !

In native code, the difference bewteen the slowest and the fastest
reaches 8%.
This seems to me significant enough, but I don't know whether it depends
on the system architecture or not.

Best
San

0 則新訊息