Strange performance of Ocaml - correction and slight expansion

graham

unread,

Nov 15, 2000, 3:00:00 AM11/15/00

to

Hi

[I made a slight mistake in the order of the code I sent with this question,
so I will fix the order and send it again -- sorry about that. I have also
added a few extra comments I forgot to post the first time]

I have the following piece of code,

let timebasic (f, xo, a) = Array.fold_left f xo a
let d = 1000000
let a = Array.create d 1
let r = timeit timebasic ((+), 0, a)
let a = Array.create d (Num.Int 1)
let r = Timing.timeit timebasic (Num.(+/), Num.Int 0, a)
let a = Array.create d Big_int.unit_big_int
let r = timeit timebasic (Big_int.add_big_int, Big_int.zero_big_int, a)

Here (timeit f a) times one execution of f on a. Now the first timeit,
where I fold integer addition over an array of 1 million 1's, produces
a time of,

0.340u 0.000s 0.34 100.7%

This is sound performance. Now the second timeit where I fold Num addition
over an array of 1 million 1's (as Num.Ints) also produces some solid
performance, taking time of

1.280u 0.000s 1.29 99.5%

For a point of comparision Magma takes time of 2.64 seconds for this
computation. But the third timeit where I fold big integer
addition over an array of 1 million 1's (as big integers), produces
a time of

7.350u 0.000s 7.35 100.0%

This is dreadful! Magma takes 0.611 seconds for this big integer
computation, meaning OCaml is 10 times slower than Magma.
This surprises me since OCaml's performance is general seems excellent.
It also surprises me since I don't understand how Num can be fast and
Big_int slow ...?

[my platform is a big Sparc machine running Solaris, all code compiled with
unsafe option, O4 optimisation passed to gcc, inlining set aggressively,
etc]

Now to a similar problem. Consider the following code,

(* scale a vector (array) in place *)
let scale_ip s v =
for i = 0 to (d - 1) do
v.(i) <- Num.( */ ) s v.(i)
done

(* add two vectors (arrays) in place *)
let add_ip v1 v2 =
for i = 0 to (d - 1) do
v1.(i) <- Num.(+/) v1.(i) v2.(i)
done

(* functional versions of scale and add *)
let add v1 v2 =
let
r = Array.copy v1
in
add_ip r v2;
r

let scale s v =
let
r = Array.copy v
in
scale_ip s r;
r

(* add a vector v to itself, and scale v -- we will time this function *)
let vectime (v, s) =
let
b = add v v
in let
c = scale s v
in (b,c)

let v = Array.create d (Num.Int 1)
let s = Num.Int 2
let (u,w) = (Timing.timeit vectime (v, s))

So here we create an array of 1 million 1's as Nums, and we create the
scalar 2 as a Num. We then add the array to itself, and then scale the
array by the scalar. The time we get is,

23.770u 0.290s 24.07 100.0%

Again this is dreadful. Magma does the same computation on the same
machine in 1.4 seconds. What I don't understand here is how the
Num operations can be fast (as show in the first set of timings up
top), and yet building arrays of such additions can be so much
slower. Note that if you do these vector computations over type int,
the same phenomenon occurs -- the int operations are fast, but building
arrays of such ints is slow. That suprises me since in general Ocaml
arrays seem fast, and there should be no extraneous stuff like garbage
collection issues involved here.

Comments anyone?
graham

graham

unread,

Nov 15, 2000, 3:00:00 AM11/15/00

to

John Prevost
>>>>>> "g" == graham <grah...@telocity.com> writes:
>
> g> This is sound performance. Now the second timeit where I fold
> g> Num addition over an array of 1 million 1's (as Num.Ints) also
> g> produces some solid performance, taking time of
>
> g> 1.280u 0.000s 1.29 99.5%
>
> g> For a point of comparision Magma takes time of 2.64 seconds for
> g> this computation. But the third timeit where I fold big integer
> g> addition over an array of 1 million 1's (as big integers),
> g> produces a time of
>
> g> 7.350u 0.000s 7.35 100.0%
>
> g> This is dreadful! Magma takes 0.611 seconds for this big
> g> integer computation, meaning OCaml is 10 times slower than
> g> Magma. This surprises me since OCaml's performance is general
> g> seems excellent. It also surprises me since I don't understand
> g> how Num can be fast and Big_int slow ...?
>
> I think if you examine this definition (from num.mli), the reason for
> Num's greater performance will become clear:
>
> type num = Int of int | Big_int of big_int | Ratio of ratio
>
> the num type uses a more efficient representation for values that fall
> into the normal integer type's range. The big_int type always uses
> the less effficient representation.

Ok I agree that this is a reasonable explanation of why Num is much
faster than Big_int. I must admit that I had assumed that Big_int
would use a representation that included the optimised case of small
ints. That was my mistake.

But this still doesn't explain why adding two arrays containing a
million Nums each (all 1) was so slow compared to adding a million
Nums (all 1). This difference in speed suggests some performance problem
with arrays.

graham

Stephen T. Weeks

unread,

Nov 16, 2000, 3:00:00 AM11/16/00

to

> This is dreadful! Magma takes 0.611 seconds for this big

> integer computation, meaning OCaml is 10 times slower

> than Magma.

> This surprises me since OCaml's performance is general

> seems excellent. It also surprises me since I don't

> understand how Num can be fast and Big_int slow ...?

>
> [my platform is a big Sparc machine running Solaris, all code
> compiled with unsafe option, O4 optimisation passed to gcc,
> inlining set aggressively, etc]

I tried out your program on my 400 mhz P6. Here is the command line I
used to build the executable.

ocamlopt -inline 100 -ccopt -O4 -unsafe -cc gcc -cclib \
-lnums unix.cmxa nums.cmxa z.ml

Here are the times it printed. They are similar to what you saw on
the Sparc.

0.120u 0.000s 0.12 99.1%
1.210u 0.000s 1.20 100.5%
6.790u 0.000s 6.79 100.0%

As a reminder, 0.12 is the time for the normal integer loop, 1.21 is
the time for the Num loop, and 6.79 is the time for the Big_int loop.

As a comparison, I tried the integer and big integer computations in
SML (there are no standard arbitrary precision rationals) using MLton
(http://www.sourcelight.com/MLton). I made one tweak so that the
computation runs for 1E7 instead of 1E6 iterations. I also added
print statements and array update statements to make sure the result
is correct and not optimized away. Here is the SML code.

fun timeit (f, a, toString) =
let
fun time() =
let val {utime, stime, ...} = Posix.ProcEnv.times ()
in Time.+ (utime, stime)
end
val t = time ()
val r = f a
val t = Time.- (time (), t)
val _ = print (concat ["result is ", toString r, "\n",
"time is ", Time.toString t, "\n"])
in ()
end
fun timebasic (f, xo, a) = Array.foldl f xo a
val d = 10000000
val a = Array.array (d, 1)
val _ = Array.update (a, 0, 2)
val _ = timeit (timebasic, (op +, 0, a), Int.toString)
val a = Array.array (d, 1)
val _ = Array.update (a, 0, 2)
val _ = timeit (timebasic, (IntInf.+, 0, a), IntInf.toString)

This was compiled with the command line "mlton z.sml". Here are the
results.

result is 10000001
time is 0.410
result is 10000001
time is 0.650

So, the penalty in MLton for using big integers is less than a factor
of 2, while it was more than a factor of 60 in OCAML. As John Prevost
pointed out, the hit in OCAML is due to Big_int always using a less
efficient representation, even for small ints. But the hit in OCAML
for using Num, which uses the efficient representation for small
integers is still a factor of 10, which is pretty high. Another
strangeness is that the integer loop runs about 3 times faster in
MLton than in OCAML (10 times the computation in 3 times the time) and
the big integer loop runs about 100 times faster (10 times the
computation in one tenth the time). I guess we have an explanation
for the big integer loop, but I don't have one for the normal integer
loop. Also, even the Num code in OCAML which uses the specialized
representation for small integers is 20X slower than the big integer
code in MLton (one tenth the computation in twice the time).

FYI, here is how MLton handles big integers. Integers in the range
[-2^31, 2^31) are represented unboxed, with a single tag bit to
distinguish them from pointers. Larger integers are represented as a
pointer to a data structure in the GNU multiprecision library (gmp) .
Integer addition first does a test on its inputs, and if both are
small, does the addition without calling gmp. So, for this benchmark,
all of the additions happen inline on unboxed integers.

Anyways, I'd be interested to see the OCAML performance with a more
efficient big integer package.

> But this still doesn't explain why adding two arrays
> containing a million Nums each (all 1) was so slow compared
> to adding a million Nums (all 1). This difference in speed
> suggests some performance problem with arrays.

My (complete) guess is that there is some overhead due to the fold,
possibly missed inlining? Perhaps this overhead is the same
explanation for why the OCAML integer loop is 3X slower than in MLton?

Sent via Deja.com http://www.deja.com/
Before you buy.

graham

unread,

Nov 18, 2000, 3:00:00 AM11/18/00

to

John Prevost
>>>>>> "g" == graham <grah...@telocity.com> writes:
>

> g> Ok I agree that this is a reasonable explanation of why Num is
> g> much faster than Big_int. I must admit that I had assumed that
> g> Big_int would use a representation that included the optimised
> g> case of small ints. That was my mistake.
>
> g> But this still doesn't explain why adding two arrays containing
> g> a million Nums each (all 1) was so slow compared to adding a
> g> million Nums (all 1). This difference in speed suggests some
> g> performance problem with arrays.
>
> Hmm. I didn't look at that too closely. If you're doing destructive
> update on an array, I expect it's because you're hitting the write
> barrier on every update.

What does the "write barrier" mean? Do other languages have a write
barrier too? If C does then how does this explain Ocaml's performance?

graham

Basile STARYNKEVITCH

unread,

Nov 19, 2000, 3:00:00 AM11/19/00

to

>>>>> "graham" == graham <grah...@telocity.com> writes:

graham> John Prevost

>>>>>>> "g" == graham <grah...@telocity.com> writes:
>>
>>
g> But this still doesn't explain why adding two arrays containing
g> a million Nums each (all 1) was so slow compared to adding a
g> million Nums (all 1). This difference in speed suggests some
g> performance problem with arrays.
>> Hmm. I didn't look at that too closely. If you're doing
>> destructive update on an array, I expect it's because you're
>> hitting the write barrier on every update.

graham> What does the "write barrier" mean? Do other languages
graham> have a write barrier too?

Briefly, the write barrier means that everytime a pointer field is
overwritten inside a garbage collected object (ie an Ocaml value) the
GC should be aware of it. Usually, the updated object containing the
overwritten pointer field) is added to some internal GC structure
(called the store list or store vector).

The write barrier is needed in generational copying garbage collector
(like Ocaml's GC, which is generational copying for the young
generation, and incrementally trace&marking -optionally
trace&compacting- for the old generation). This is not specific to
Ocaml : other languages' implementations (like Self, CMU CommonLisp,
and some "high performance" Java JIT implementations...) which use a
generational copying GC require a write barrier. Without any write
barrier, a generational copying GC would have its invariant (the
generational hypothesis) broken.

For more details, read a good book on garbage collection, such as
Jones' and Lins'.

Concerning the Ocaml-3.00 implementation, it has one of the best
garbage collector I've ever met. Perhaps tuning some GC parameters
(thru the Gc.set standard library functions) could help about your
initial problem.

Regards.
--

Basile STARYNKEVITCH -- http://perso.wanadoo.fr/starynkevitch/basile/
email: basile dot starynkevitch at wanadoo dot fr (France)
8, rue de la Faïencerie, 92340 Bourg La Reine, phone: 1.46.65.45.53

Xavier Leroy

unread,

Nov 19, 2000, 3:00:00 AM11/19/00

to

graham <grah...@telocity.com> writes:

> This surprises me since OCaml's performance is general seems excellent.
> It also surprises me since I don't understand how Num can be fast and
> Big_int slow ...?

As others have already mentioned, "num" is a sum type (of small
integers, arbitrary-precision integers, and arbitrary-precision
rationals), with mixed arithmetic operations, so it is normal that
"num" is more efficient than "big_int" on tests involving only small
integers.

In other terms, "num" is intended for applications where most integers
are small, but arbitrary-precision rationals must also be supported
for correctness, while "big_int" is intended for applications where it
is known from the beginning that only big integers will be
manipulated, and the overhead of mixed arithmetic on "num" doesn't pay
off.

(The representation of values of type "num" is still less efficient in
OCaml than in MLton, because the "num" type is an ordinary Caml sum
type, meaning that small integers represented as "num" are still
heap-allocated.)

> Now to a similar problem. Consider the following code,
>

> [vector scaling and addition]

>
> Again this is dreadful. Magma does the same computation on the same
> machine in 1.4 seconds. What I don't understand here is how the
> Num operations can be fast (as show in the first set of timings up
> top), and yet building arrays of such additions can be so much
> slower.

I can confirm what others suggested, namely that it's the generational
write barrier that kills you. OCaml uses a two-generation garbage
collector, and this means that any in-place modification of an object
in the "old" generation must be checked to see if the new value
written in it is in the "young" generation. If so, the address of the
modified field is recorded in a table (to serve as an extra root for
minor collections), and a premature major collection is triggered when
this table becomes too large (in order to empty it).

Your test program spends about 70% of its time in cross-generation
checks and premature major collections. It represents one of the
worst cases for generational garbage collection.

One last word of caution: the performance of OCaml is tuned with real
applications, not micro-benchmarks. For instance, while it is easy to
construct micro-benchmarks where generational collection fares very
badly, very nearly all realistic programs run faster (sometimes by
quite large factors) with generational collection. So, I hope your
evaluation of OCaml will also use larger, more realistic programs.

- Xavier Leroy
--
Valid e-mail address (without the underscores): Xavier.Leroy@i_n_r_i_a.f_r
This is a protection against junk mail. Apologies for the inconvenience.
Home page: http://pauillac.inria.fr/~xleroy/

graham

unread,

Nov 19, 2000, 3:00:00 AM11/19/00

to

Xavier Leroy wrote:

> graham <grah...@telocity.com> writes:
>> Now to a similar problem. Consider the following code,
>>
>> [vector scaling and addition]
>>
>> Again this is dreadful. Magma does the same computation on the same
>> machine in 1.4 seconds. What I don't understand here is how the
>> Num operations can be fast (as show in the first set of timings up
>> top), and yet building arrays of such additions can be so much
>> slower.
>
> I can confirm what others suggested, namely that it's the generational
> write barrier that kills you. OCaml uses a two-generation garbage
> collector, and this means that any in-place modification of an object
> in the "old" generation must be checked to see if the new value
> written in it is in the "young" generation. If so, the address of the
> modified field is recorded in a table (to serve as an extra root for
> minor collections), and a premature major collection is triggered when
> this table becomes too large (in order to empty it).
>
> Your test program spends about 70% of its time in cross-generation
> checks and premature major collections. It represents one of the
> worst cases for generational garbage collection.

Ok thanks for the explanation. Subtle. I gather that MLton must use some
different kind of collector then, since it performs ok on this example.
Is there any chance of improvements in this cross generation checking?
Maybe some tweaks. Or are there some gc settings I can get to mitigate
against this problem?

> One last word of caution: the performance of OCaml is tuned with real
> applications, not micro-benchmarks. For instance, while it is easy to
> construct micro-benchmarks where generational collection fares very
> badly, very nearly all realistic programs run faster (sometimes by
> quite large factors) with generational collection. So, I hope your
> evaluation of OCaml will also use larger, more realistic programs.

Sure I understand that microbenchmarks don't tell you much about a
language overall. I was really just curious about this particular
code since relative to all the other benchmarks I have done,
which were great, this one stood out like a sore thumb. Also this
particular case came right after I had benchmarked my Caml implementation
of finite fields against some C code (which uses a faster algorithm),
and found my finite field code to be great (faster than the C by some
margin). So don't worry I will give Ocaml a fair assessment. So far I
think Ocaml is great (modulo some syntax and some puzzles about the
module system which have shown up in other posts). I had programmed
a lot in ML, Haskell and Clean before doing this project in OCaml, so
I am perhaps not unbiased toward functional languages, and ML dialects
:-)

graham

Hannah Schroeter

unread,

Nov 19, 2000, 3:00:00 AM11/19/00

to

Hello!

In article <B63C208B.16AE6%grah...@telocity.com>,
graham <grah...@telocity.com> wrote:
>[...]

>What does the "write barrier" mean? Do other languages have a write
>barrier too? If C does then how does this explain Ocaml's performance?

No, C doesn't have that. Please read up on generational garbage
collection, that's where the term "write barrier" stems from,
at least in this case.

Kind regards,

Hannah.

Fergus Henderson

unread,

Nov 19, 2000, 7:26:35 PM11/19/00

to

Xavier...@see.my.sig.for.address (Xavier Leroy) writes:

>As others have already mentioned, "num" is a sum type (of small
>integers, arbitrary-precision integers, and arbitrary-precision
>rationals), with mixed arithmetic operations, so it is normal that
>"num" is more efficient than "big_int" on tests involving only small
>integers.
>
>In other terms, "num" is intended for applications where most integers
>are small, but arbitrary-precision rationals must also be supported
>for correctness, while "big_int" is intended for applications where it
>is known from the beginning that only big integers will be
>manipulated, and the overhead of mixed arithmetic on "num" doesn't pay
>off.

Does OCaml have any type for applications where many integers are
small, arbitrary-precision _integers_ must also be supported for
correctness, and you want to express in the type system that the
values must be integral, not fractions?

My intuition is that that case would most likely be more common than
the case of applications for which OCaml's current "big_int" is required.
For applications which only use big integers, the relative overhead of
the extra checks is likely to be fairly small, I would expect.

--
Fergus Henderson <f...@cs.mu.oz.au> | "I have always known that the pursuit
| of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.

Stephen T. Weeks

unread,

Nov 20, 2000, 3:00:00 AM11/20/00

to

> Ok thanks for the explanation. Subtle. I gather that MLton
> must use some different kind of collector then, since it
> performs ok on this example.

Yes. Right now, MLton uses a two-space copying collector, which
doesn't require a write barrier. That happens to work perfectly for
this benchmark. However, its performance will be bad on fast
allocating programs with lots of long lived data.

Hmmm, now that I look at the benchmark again, I don't understand the
explanation given by the OCAML people about the write barrier. The
loops that are being timed are folding over the array and *reading*
from it, not *writing* to it. I don't see why the write barrier has
anything to do with this benchmark.

Jerome Vouillon

unread,

Nov 20, 2000, 3:00:00 AM11/20/00

to

Stephen T. Weeks <swe...@my-deja.com> writes:

> Hmmm, now that I look at the benchmark again, I don't understand the
> explanation given by the OCAML people about the write barrier. The
> loops that are being timed are folding over the array and *reading*
> from it, not *writing* to it. I don't see why the write barrier has
> anything to do with this benchmark.

For the first benchmark (iteration over an array), the problem is
indeed not the write barrier, but the fact that even small integers
are boxed.

For the second benchmark, the problem is also the write barrier, as
there are a lot of affectations:

let scale_ip s v =
for i = 0 to (d - 1) do
v.(i) <- Num.( */ ) s v.(i)

done ^^^

-- Jerome

Xavier Leroy

unread,

Nov 20, 2000, 3:00:00 AM11/20/00

to

graham <grah...@telocity.com> writes:

> Is there any chance of improvements in this cross generation checking?
> Maybe some tweaks.

The check always needs to be done at run-time, but there are several
different implementations of the "remembered set" (the table of old ->
young pointers); some would result in better performance on
this particular example, although they might be less efficient on
other examples.

> Or are there some gc settings I can get to mitigate
> against this problem?

I thought so initially. E.g. by setting a very large minor
generation, you could hope to fit all of the data in the minor
generation and never have cross-generation pointers. It turns out not
to work: although the OCaml GC has plenty of user-settable parameters,
the threshold size over which a new block is allocated immediately in
the old generation (so as not to overflow the new generation) is
fixed, and a 1000000-element array is well above that limit.

> So don't worry I will give Ocaml a fair assessment.

OK, fine :-)

graham

unread,

Nov 20, 2000, 3:00:00 AM11/20/00

to

Stephen T. Weeks:

>> Ok thanks for the explanation. Subtle. I gather that MLton
>> must use some different kind of collector then, since it
>> performs ok on this example.
>
> Yes. Right now, MLton uses a two-space copying collector, which
> doesn't require a write barrier. That happens to work perfectly for
> this benchmark. However, its performance will be bad on fast
> allocating programs with lots of long lived data.
>

> Hmmm, now that I look at the benchmark again, I don't understand the
> explanation given by the OCAML people about the write barrier. The
> loops that are being timed are folding over the array and *reading*
> from it, not *writing* to it. I don't see why the write barrier has
> anything to do with this benchmark.

On top of this I have a naive question: why is the array in the old
generation instead of in the new generation? If it were in the new
generation there wouldn't be a write barrier.

graham

Julian Assange

unread,

Nov 20, 2000, 8:42:58 AM11/20/00

to

f...@cs.mu.oz.au (Fergus Henderson) writes:

> Does OCaml have any type for applications where many integers are
> small, arbitrary-precision _integers_ must also be supported for
> correctness, and you want to express in the type system that the
> values must be integral, not fractions?
>
> My intuition is that that case would most likely be more common than
> the case of applications for which OCaml's current "big_int" is required.
> For applications which only use big integers, the relative overhead of
> the extra checks is likely to be fairly small, I would expect.

Markus, do you just mean Num minus ratios?

If so removing ratios, but keeping big nums isn't likely to help much at
all, since the values will still be boxed.

Cheers,
Julian.

--
Julian Assange |If you want to build a ship, don't drum up people
|together to collect wood or assign them tasks
pr...@iq.org |and work, but rather teach them to long for the endless
pr...@gnu.ai.mit.edu |immensity of the sea. -- Antoine de Saint Exupery

Julian Assange

unread,

Nov 20, 2000, 8:43:05 AM11/20/00

to

f...@cs.mu.oz.au (Fergus Henderson) writes:

> Does OCaml have any type for applications where many integers are
> small, arbitrary-precision _integers_ must also be supported for
> correctness, and you want to express in the type system that the
> values must be integral, not fractions?
>
> My intuition is that that case would most likely be more common than
> the case of applications for which OCaml's current "big_int" is required.
> For applications which only use big integers, the relative overhead of
> the extra checks is likely to be fairly small, I would expect.

Markus, do you just mean Num minus ratios?

Julian Assange

unread,

Nov 20, 2000, 8:45:03 AM11/20/00

to

Fergus<->Markus, like, hey man, get a real edit distance.

lob...@my-deja.com

unread,

Nov 20, 2000, 9:12:16 PM11/20/00

to

this is an extended question on the integer precision and performance
topic. It probably fits any language, but functional languages may be
more amenable to this development.

with 64-bit processors on the market next year - are languages like
OCAML prepared for it?

Reasoning:
- 'big' integers are frequently used when just 'extended' integers
would be enough
- one can efficiently implement 128-bit integers on 64-bit processors
(not much different from implementing 64-bit ints on 32 bit processors)
- the efficiency of such implementation would be much better
than 'general big integer' (?)

performance advantages of 128 bit integers for various functions
(graphics, encryption, random number generation etc) could be
significant. The reduction of the code size needed to implement various
tricky things will most certainly be significant.

Such a new int type is unlikely to be implemented in Java [requires
changes to JVM - unlikely to happen]. Which would result in opening
some new areas of application for a language implementing it first.

or am i totally off the mark?

In article <B6382C31.16992%grah...@telocity.com>,

graham <grah...@telocity.com> wrote:
> Hi
>
> [I made a slight mistake in the order of the code I sent with this
question,
> so I will fix the order and send it again -- sorry about that. I have
also
> added a few extra comments I forgot to post the first time]
>
> I have the following piece of code,
>
> let timebasic (f, xo, a) = Array.fold_left f xo a
> let d = 1000000
> let a = Array.create d 1
> let r = timeit timebasic ((+), 0, a)
> let a = Array.create d (Num.Int 1)
> let r = Timing.timeit timebasic (Num.(+/), Num.Int 0, a)
> let a = Array.create d Big_int.unit_big_int
> let r = timeit timebasic (Big_int.add_big_int,
Big_int.zero_big_int, a)
>
> Here (timeit f a) times one execution of f on a. Now the first timeit,
> where I fold integer addition over an array of 1 million 1's, produces
> a time of,
>
> 0.340u 0.000s 0.34 100.7%
>

> This is sound performance. Now the second timeit where I fold Num
addition
> over an array of 1 million 1's (as Num.Ints) also produces some solid
> performance, taking time of
>
> 1.280u 0.000s 1.29 99.5%
>
> For a point of comparision Magma takes time of 2.64 seconds for this

> computation. But the third timeit where I fold big integer

> addition over an array of 1 million 1's (as big integers), produces
> a time of
>
> 7.350u 0.000s 7.35 100.0%
>

> This is dreadful! Magma takes 0.611 seconds for this big integer
> computation, meaning OCaml is 10 times slower than Magma.

> This surprises me since OCaml's performance is general seems
excellent.
> It also surprises me since I don't understand how Num can be fast and
> Big_int slow ...?
>

> [my platform is a big Sparc machine running Solaris, all code
compiled with
> unsafe option, O4 optimisation passed to gcc, inlining set
aggressively,
> etc]
>

> Now to a similar problem. Consider the following code,
>

> (* scale a vector (array) in place *)

> let scale_ip s v =
> for i = 0 to (d - 1) do
> v.(i) <- Num.( */ ) s v.(i)
> done
>

> (* add two vectors (arrays) in place *)

> let add_ip v1 v2 =

> for i = 0 to (d - 1) do

> Again this is dreadful. Magma does the same computation on the same
> machine in 1.4 seconds. What I don't understand here is how the
> Num operations can be fast (as show in the first set of timings up
> top), and yet building arrays of such additions can be so much

> slower. Note that if you do these vector computations over type int,
> the same phenomenon occurs -- the int operations are fast, but
building
> arrays of such ints is slow. That suprises me since in general Ocaml
> arrays seem fast, and there should be no extraneous stuff like garbage
> collection issues involved here.
>
> Comments anyone?
> graham
>
>

Fergus Henderson

unread,

Nov 21, 2000, 3:00:00 AM11/21/00

to

Julian Assange <pr...@iq.org> writes:

>f...@cs.mu.oz.au (Fergus Henderson) writes:
>
>> Does OCaml have any type for applications where many integers are
>> small, arbitrary-precision _integers_ must also be supported for
>> correctness, and you want to express in the type system that the
>> values must be integral, not fractions?
>>
>> My intuition is that that case would most likely be more common than
>> the case of applications for which OCaml's current "big_int" is required.
>> For applications which only use big integers, the relative overhead of
>> the extra checks is likely to be fairly small, I would expect.
>

>do you just mean Num minus ratios?

Yes.

>If so removing ratios, but keeping big nums isn't likely to help much at
>all, since the values will still be boxed.

It won't help performance much, in comparison to Num, but it will help
static type checking.

In most cases I would rather use big_int than Num, even if the
performance is worse, because big_int lets me more clearly express my
intent that the data be integers not fractions. Using Num instead of
big_int is a hack that I might use if profiling showed it was a serious
performance problem, but it would be nicer if I didn't have to resort
to such hacks.

If I had to choose one of "big_int" and "Num minus ratios", I'd rather
the latter, but unfortunately Ocaml apparently only provides the former.

Markus Mottl

unread,

Nov 21, 2000, 3:00:00 AM11/21/00

to

lob...@my-deja.com wrote:
> with 64-bit processors on the market next year - are languages like
> OCAML prepared for it?

OCaml has been supporting native code generation for 64-bit platforms
already for a very long time (e.g. Alpha-processors).

What concerns byte-code: it should compile out of the box on any
POSIX-compliant operating system, no matter what processor it uses.

There is already an IA-64 native code backend in the CVS-repository of
OCaml. I am not sure whether it is already finished, but there hasn't
been any work on it for a few months so I guess that most has been done
and people are only waiting to get a final release of the hardware to
optimise for it.

Maybe Xavier can tell us more about the current status of this backend?

Here is a list of platforms for which OCaml can produce native code:

http://caml.inria.fr/ocaml/portability.html

Best regards,
Markus Mottl

--
Markus Mottl, mo...@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl

Xavier Leroy

unread,

Nov 21, 2000, 3:00:00 AM11/21/00

to

lob...@my-deja.com writes:

> with 64-bit processors on the market next year - are languages like
> OCAML prepared for it?

This statement made me smile: 64-bit processors have been commonplace
in the workstation market for 7-8 years already. Caml Light was
running in 64 bit on a prototype of the Alpha processor as early as
summer 1991 -- even before the chip was commercially available.
The Objective Caml native-code compiler has supported the Alpha in
64-bit mode since day one (1995).

The MIPS, SPARC, PPC and HPPA code generators are still 32 bits even
though 64-bit versions of these processors exist, but that's just
because there hasn't been a clear demand for 64-bit ports of OCaml on
these platforms.

More recently, OCaml was ported to the IA64 (including the native-code
generator), again before this architecture is commercially available
-- thanks Intel and SourceForge for giving access to prototype IA64
machines. And I have a port to AMD's x86_64 (a.k.a. Sledgehammer) in
progress, although it is stopped at the moment by lack of
corresponding hardware and C/assembler development tools.

Notice, however, that going to a 64-bit platform doesn't impact the
OCaml *language* much: the "int" type has a much larger range, heap
blocks (most notably arrays) can be considerably bigger, and you can
address more than 4G of memory, but that's about it.

> Reasoning:
> - 'big' integers are frequently used when just 'extended' integers
> would be enough
> - one can efficiently implement 128-bit integers on 64-bit processors
> (not much different from implementing 64-bit ints on 32 bit processors)
> - the efficiency of such implementation would be much better
> than 'general big integer' (?)

128-bit integers cannot replace arbitrary-precision integers in most
applications. For instance, factoring, primality testing, RSA
cryptography require much larger integers. And theorem proving or
symbolic algebra require arbitrary-precision integers so as to produce
mathematically exact results in all cases.

> performance advantages of 128 bit integers for various functions
> (graphics, encryption, random number generation etc) could be
> significant. The reduction of the code size needed to implement various
> tricky things will most certainly be significant.

I agree that 128-bit integers can be very interesting for certain
cryptographic algorithms (IDEA, CAST, elliptic-curve crpytography over
"small enough" fields). If the need arises, we could always add a
type of 128-bit integers in the standard library, just like we
currently do for 32- and 64-bit integers. The implementation of
128-bit integers could then be optimized on 64-bit platforms as you
describe. But I'll wait to see a strong need for this feature.

Andrzej M. Ostruszka

unread,

Nov 21, 2000, 3:00:00 AM11/21/00

to

On 20 Nov 2000 14:22:44 -0500, Jerome Vouillon wrote:
>For the first benchmark (iteration over an array), the problem is
>indeed not the write barrier, but the fact that even small integers
>are boxed.

Stupid question :)). Could someone point me to some explanation what is
boxed integer.
Best regards
--
____ _ ___
/ | \_/ |/ _ \ Andrzej Marek Ostruszka
/ _ | | (_) | Instytut Fizyki, Uniwersytet Jagiellonski (Cracow)
/_/ L|_|V|_|\___/ (PGP <-- finger ostr...@order.if.uj.edu.pl)

Jacques Garrigue

unread,

Nov 21, 2000, 10:00:57 PM11/21/00

to

f...@cs.mu.oz.au (Fergus Henderson) writes:

> Julian Assange <pr...@iq.org> writes:
>
> >do you just mean Num minus ratios?
>

> Yes.

>
> >If so removing ratios, but keeping big nums isn't likely to help much at
> >all, since the values will still be boxed.
>

> It won't help performance much, in comparison to Num, but it will help
> static type checking.
>
> In most cases I would rather use big_int than Num, even if the
> performance is worse, because big_int lets me more clearly express my
> intent that the data be integers not fractions. Using Num instead of
> big_int is a hack that I might use if profiling showed it was a serious
> performance problem, but it would be nicer if I didn't have to resort
> to such hacks.
>
> If I had to choose one of "big_int" and "Num minus ratios", I'd rather
> the latter, but unfortunately Ocaml apparently only provides the former.

Doing it yourself seems trivial:

Create a new module Integer, containing an abstract type integer,
internally equal to num.
Report there all functions from the num module, but export them typed
as integer.
The only operation for which you will need to slightly change the
definition is division, otherwise the result of an operation on
integers is an integer...

This is a general technique to ensure safe typing.
What is the problem?

--
---------------------------------------------------------------------------
Jacques Garrigue Kyoto University garrigue at kurims.kyoto-u.ac.jp
<A HREF=http://wwwfun.kurims.kyoto-u.ac.jp/~garrigue/>JG</A>

Fergus Henderson

unread,

Nov 21, 2000, 10:53:48 PM11/21/00

to

Jacques Garrigue <s...@my.signature> writes:

>f...@cs.mu.oz.au (Fergus Henderson) writes:
>
>> If I had to choose one of "big_int" and "Num minus ratios", I'd rather
>> the latter, but unfortunately Ocaml apparently only provides the former.
>
>Doing it yourself seems trivial:
>
>Create a new module Integer, containing an abstract type integer,
>internally equal to num.
>Report there all functions from the num module, but export them typed
>as integer.
>The only operation for which you will need to slightly change the
>definition is division, otherwise the result of an operation on
>integers is an integer...
>
>This is a general technique to ensure safe typing.
>What is the problem?

Suppose I use this technique to create such a type, and write lots of
code using it. Next suppose you also use this technique to create
such a type, and write lots of code using it. What will happen if
someone wants to reuse your code and my code together?

Jacques Garrigue

unread,

Nov 22, 2000, 3:00:00 AM11/22/00

to

f...@cs.mu.oz.au (Fergus Henderson) writes:

> Jacques Garrigue <s...@my.signature> writes:
>
> >f...@cs.mu.oz.au (Fergus Henderson) writes:
> >
> >> If I had to choose one of "big_int" and "Num minus ratios", I'd rather
> >> the latter, but unfortunately Ocaml apparently only provides the former.
> >
> >Doing it yourself seems trivial:
> >
> >Create a new module Integer, containing an abstract type integer,
> >internally equal to num.
> >Report there all functions from the num module, but export them typed
> >as integer.
> >The only operation for which you will need to slightly change the
> >definition is division, otherwise the result of an operation on
> >integers is an integer...
> >
> >This is a general technique to ensure safe typing.
> >What is the problem?
>
> Suppose I use this technique to create such a type, and write lots of
> code using it. Next suppose you also use this technique to create
> such a type, and write lots of code using it. What will happen if
> someone wants to reuse your code and my code together?

He will just need a bit of magic...
That's not very clean, but since we know that both use the same
underlying num type, just using Obj.magic to get from one integer type
to the other one will work ok. Once you've defined your int_of_int
function, this is type safe.

I do not say that it wouldn't be better to have such a type in the
standard library (and preferably optimized), but from the point of
view of safe typing alone, one should be able to encode the behavior
he wants.

Fergus Henderson

unread,

Nov 22, 2000, 3:00:00 AM11/22/00

to

Jacques Garrigue <s...@my.signature> writes:

>f...@cs.mu.oz.au (Fergus Henderson) writes:
>
>> Jacques Garrigue <s...@my.signature> writes:
>>
>> >f...@cs.mu.oz.au (Fergus Henderson) writes:
>> >
>> >> If I had to choose one of "big_int" and "Num minus ratios", I'd rather
>> >> the latter, but unfortunately Ocaml apparently only provides the former.
>> >
>> >Doing it yourself seems trivial:
>> >
>> >Create a new module Integer, containing an abstract type integer,
>> >internally equal to num.
>> >Report there all functions from the num module, but export them typed
>> >as integer.
>> >The only operation for which you will need to slightly change the
>> >definition is division, otherwise the result of an operation on
>> >integers is an integer...
>> >
>> >This is a general technique to ensure safe typing.
>> >What is the problem?
>>
>> Suppose I use this technique to create such a type, and write lots of
>> code using it. Next suppose you also use this technique to create
>> such a type, and write lots of code using it. What will happen if
>> someone wants to reuse your code and my code together?
>

>He will just need a bit of magic...
>That's not very clean, but since we know that both use the same
>underlying num type, just using Obj.magic to get from one integer type
>to the other one will work ok. Once you've defined your int_of_int
>function, this is type safe.

Hmm, sounds dangerous... what will happen when, some years down the
track, I change the implementation of my type, or you change yours?
Will the result be that the program that is reusing your code and mine
will crash in some mysterious way? How will the maintenance
programmer know that this Obj.magic stuff that someone added all those
years ago is the cause of the problem?

Also, my code might not just use integers, it may also uses arrays of
integers, functions from integer to integer, and so forth. In fact it
may use dozens of types involving integers. Converting between
arrays of my integer and arrays of your integer is going to be
either expensive or unsafe, and definitely cumbersome either way.

>I do not say that it wouldn't be better to have such a type in the
>standard library (and preferably optimized)

I'm glad you agree, or at least don't disagree, with my point ;-)

Marcin 'Qrczak' Kowalczyk

unread,

Nov 22, 2000, 3:00:00 AM11/22/00

to

22 Nov 2000 03:53:48 GMT, Fergus Henderson <f...@cs.mu.oz.au> pisze:

> Suppose I use this technique to create such a type, and write lots of
> code using it. Next suppose you also use this technique to create
> such a type, and write lots of code using it. What will happen if
> someone wants to reuse your code and my code together?

That's why such code should IMHO be put in functors.

--
__("< Marcin Kowalczyk * qrc...@knm.org.pl http://qrczak.ids.net.pl/
\__/
^^ SYGNATURA ZASTĘPCZA
QRCZAK

Joachim Durchholz

unread,

Nov 22, 2000, 3:00:00 AM11/22/00

to

Jacques Garrigue <s...@my.signature> wrote:
> That's not very clean, but since we know that both use the same
> underlying num type, just using Obj.magic to get from one integer type
> to the other one will work ok. Once you've defined your int_of_int
> function, this is type safe.

Oh, but that's quite unlikely.
Sure enough, the core routines will be the same. Addition and
subtraction should always yield the same result.
Things are less clear for the more ephemeral features. What are the
precise rules for converting from string to integer? In what cases are
exceptions generated? How are overflows handled?
Two people extending an existing type in a given direction will produce
different types. Often enough, it's difficult or impossible to bridge
the differences with glue code. (YMMV, but look at databases. Looking at
databases with an ADT hat, they are a fine example of where this
principle can be carried.)

Regards,
Joachim
--
This is not an official statement from my employer or from NICE.