[Caml-list] Bigarray access speed

Richard Nyberg

unread,

Aug 15, 2002, 6:57:07 PM8/15/02

to caml...@inria.fr

In the following small programs 3) is faster than 1) and 2), which run
equally fast. However, 4 is significantly slower than the rest. If you
change the line "a.{i} <- a.{i} + i" to "a.{i} <- i" the execution time
is halfed but it's still much slower.

Are access to Bigarrays slower when passed to functions? If so, is it
fixable? or is there some workaround?

I stumbled upon this while coding on a school assignment ((not too ;) fast
multiplication of large integers).

1)
let a = Array.make 1000000 0 in
for i = 0 to 999999 do
a.(i) <- a.(i) + i;
done;;

2)
let a = Array.make 1000000 0 in
let rec loop a i =
if i <= 999999 then begin
a.(i) <- a.(i) + i;
loop a (i + 1)
end in
loop a 0;;

3)
open Bigarray;;
let a = Array1.create int c_layout 1000000 in
Array1.fill a 0;
for i = 0 to 999999 do
a.{i} <- a.{i} + i
done;;

4)
open Bigarray;;
let a = Array1.create int c_layout 1000000 in
Array1.fill a 0;
let rec loop a i =
if i <= 999999 then begin
a.{i} <- a.{i} + i;
loop a (i + 1)
end in
loop a 0;;

-Richard
-------------------
To unsubscribe, mail caml-lis...@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

Joe HELL

unread,

Aug 16, 2002, 2:24:00 AM8/16/02

to rny...@it.su.se, caml...@inria.fr

Bigarray is designed to facilitate operation on large numerical array when
used with external C functions.

Pure caml use of bigarray is generally slower.

_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com

malc

unread,

Aug 16, 2002, 6:32:32 AM8/16/02

to Richard Nyberg, caml...@inria.fr

http://caml.inria.fr/archives/200110/msg00148.html

An aside(not all facts are cheked):

Bigarrays(of at least floats) can have a slight edge over normals arrays.
To get maximal speed of the inner loops data needs to be naturally
aligned. OCaml does nothing to enforce it for non-big arrays. Bigarrays on
the other hand are mmaped(4k on IA32) and you get perfectly aligned data
for free. I was thinking that maybe Array can be extended with
make[create]_aligned, for speed/space tradeoff.

--
mailto:ma...@pulsesoft.com

Markus Mottl

unread,

Aug 16, 2002, 6:45:28 AM8/16/02

to malc, Richard Nyberg, caml...@inria.fr

On Fri, 16 Aug 2002, malc wrote:
> Bigarrays(of at least floats) can have a slight edge over normals arrays.
> To get maximal speed of the inner loops data needs to be naturally
> aligned. OCaml does nothing to enforce it for non-big arrays. Bigarrays on
> the other hand are mmaped(4k on IA32) and you get perfectly aligned data
> for free. I was thinking that maybe Array can be extended with
> make[create]_aligned, for speed/space tradeoff.

Additionally, it would also be nice to have a specialized "create"
function for (naturally unboxed) float arrays such that they need not
be initialized with a given float value. This may be beneficial for
algorithms that allocate work space whose contents is not necessarily
fully needed but filled on demand.

Regards,
Markus Mottl

--
Markus Mottl mar...@oefai.at
Austrian Research Institute
for Artificial Intelligence http://www.oefai.at/~markus

William Chesters

unread,

Aug 16, 2002, 7:12:28 AM8/16/02

to caml...@inria.fr

malc writes:
> To get maximal speed of the inner loops data needs to be naturally
> aligned. OCaml does nothing to enforce it for non-big arrays. Bigarrays on
> the other hand are mmaped(4k on IA32) and you get perfectly aligned data
> for free. I was thinking that maybe Array can be extended with
> make[create]_aligned, for speed/space tradeoff.

I did this once to be able to interface with Fortran libs on Sparc32,
and I still have a patch (against ocaml-2.01) lying around. It was
actually quite thoroughly tested, but it's probably not very tidy. The
main gotcha iirc was getting output_value/input_value to preserve
alignment :).

Richard Nyberg

unread,

Aug 16, 2002, 2:51:10 PM8/16/02

to malc, caml...@inria.fr

> http://caml.inria.fr/archives/200110/msg00148.html

Yes. That message explained it very well :)
Thanks!

-Richard

Xavier Leroy

unread,

Aug 19, 2002, 9:01:03 AM8/19/02

to William Chesters, caml...@inria.fr

malc writes:
> To get maximal speed of the inner loops data needs to be naturally
> aligned. OCaml does nothing to enforce it for non-big arrays. Bigarrays on
> the other hand are mmaped(4k on IA32) and you get perfectly aligned data
> for free. I was thinking that maybe Array can be extended with
> make[create]_aligned, for speed/space tradeoff.

As William Chester said, allocating 8-aligned arrays isn't really
hard, but keeping them 8-aligned across copying collection,
compaction, and structured I/O is quite a pain.

My experiments indicate that the lack of alignment on float arrays
(or more precisely the fact that they are 4-aligned instead of
8-aligned) has negligible impact on performance for the IA32 (Pentium)
and PowerPC processors, but non-negligible for SPARC and MIPS.
And of course on a 64-bit architecture the problem goes away because
everything in the Caml heap is then 8-aligned. Since I expect IA32
and PowerPC to remain dominant until we massively switch to 64-bit
processors, there's no urgent need to do something about float array
alignment.

- Xavier Leroy

malc

unread,

Aug 19, 2002, 9:12:44 AM8/19/02

to Xavier Leroy, William Chesters, caml...@inria.fr

On Mon, 19 Aug 2002, Xavier Leroy wrote:

> malc writes:
> > To get maximal speed of the inner loops data needs to be naturally
> > aligned. OCaml does nothing to enforce it for non-big arrays. Bigarrays on
> > the other hand are mmaped(4k on IA32) and you get perfectly aligned data
> > for free. I was thinking that maybe Array can be extended with
> > make[create]_aligned, for speed/space tradeoff.
>
> As William Chester said, allocating 8-aligned arrays isn't really
> hard, but keeping them 8-aligned across copying collection,
> compaction, and structured I/O is quite a pain.
>
> My experiments indicate that the lack of alignment on float arrays
> (or more precisely the fact that they are 4-aligned instead of
> 8-aligned) has negligible impact on performance for the IA32 (Pentium)
> and PowerPC processors, but non-negligible for SPARC and MIPS.
> And of course on a 64-bit architecture the problem goes away because
> everything in the Caml heap is then 8-aligned. Since I expect IA32
> and PowerPC to remain dominant until we massively switch to 64-bit
> processors, there's no urgent need to do something about float array
> alignment.

IA32 is now much bigger family, and unlucky owners of AMD 7th generation
machines, such as myself, do pay a price for unaligned double precission
float accesses.

--
mailto:ma...@pulsesoft.com