[erlang-questions] Indexed element access: array vs. erlang:element

Dimitry Golubovsky

unread,

Aug 25, 2008, 10:40:16 AM8/25/08

to erlang-q...@erlang.org

Hi,

I need an index-based access to elements of a fixed-size array-like
structure. Number of elements is expected to be about several
thousands.

What would be more efficient: to create an array (using the "array"
module) or to build a huge tuple and use erlang:element?

Thanks.

PS I suspect the latter, but how does erlang:element scale on tuples
of such size?

--
Dimitry Golubovsky

Anywhere on the Web

Joel Reymont

unread,

Aug 25, 2008, 10:53:00 AM8/25/08

to Dimitry Golubovsky, erlang-q...@erlang.org

Why not profile and find out?

Please share results when you do!

On Aug 25, 2008, at 3:40 PM, Dimitry Golubovsky wrote:

> What would be more efficient: to create an array (using the "array"
> module) or to build a huge tuple and use erlang:element?

> [...]

> PS I suspect the latter, but how does erlang:element scale on tuples
> of such size?

--
wagerlabs.com

Dimitry Golubovsky

unread,

Aug 25, 2008, 10:56:43 AM8/25/08

to Joel Reymont, erlang-q...@erlang.org

Joel,

If nobody did it before, then I'll do (not earlier than tonight anyway ;)

I just believe someone already came across such a common task.

Thank you.

On 8/25/08, Joel Reymont <joe...@gmail.com> wrote:
> Why not profile and find out?
>
> Please share results when you do!

--

Joel Reymont

unread,

Aug 25, 2008, 11:47:13 AM8/25/08

to Dimitry Golubovsky, erlang-q...@erlang.org

On Aug 25, 2008, at 3:56 PM, Dimitry Golubovsky wrote:

> Joel,
>
> If nobody did it before, then I'll do (not earlier than tonight
> anyway ;)

Here, check out the performance of fixed-size vs extensible array set
op.

Astounding!

---

Mac OSX Leopard 10.5.4

Mac Pro 2x2.8Ghz Quad-Core Intel Xeon, 14Gb 800Mhz DDR2 FB-DIMM

Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:8] [async-
threads:0] [kernel-poll:false]

%% 10K elements

14> arr:test().
Fixed-size array: get: 1256ns, set: 6667ns
Extensible array: get: 1255ns, set: 22ns
Tuple: get: 659ns, set: 6ns
ok

%% 1 million

15> arr:test(1000000).
Fixed-size array: get: 121881ns, set: 777067ns
Extensible array: get: 120026ns, set: 35ns
Tuple: get: 66288ns, set: 40ns
ok

---

-module(arr).

-compile([export_all]).

data1(N) ->
%% size implies fixed-size array
%% but lets be explicit
array:new([{size, N}, {default, 0}, {fixed, true}]).

data2(N) ->
%% extensible array
array:new([{size, N}, {default, 0}, {fixed, false}]).

data3(N) ->
erlang:make_tuple(N, 0).

array_set(Array, I, Value) ->
%% array indexing starts at 0
array:set(I - 1, Value, Array).

tuple_set(Tuple, I, Value) ->
%% tuple indexing starts at 1
setelement(I, Tuple, Value).

array_get(Array, I) ->
array:get(I - 1, Array).

tuple_get(Tuple, I) ->
element(I, Tuple).

get(_, _, 0) ->
ok;

get(Fun, Data, N) ->
Fun(Data, N),
get(Fun, Data, N - 1).

set(_, _, 0) ->
ok;

set(Fun, Data, N) ->
Data1 = Fun(Data, N, N),
set(Fun, Data1, N - 1).

test() ->
test(10000).

test(N) ->
%% fixed-size array
{G1, _} = timer:tc(arr, get, [{arr, array_get}, data1(N), N]),
{S1, _} = timer:tc(arr, set, [{arr, array_set}, data1(N), N]),
%% extensible array
{G2, _} = timer:tc(arr, get, [{arr, array_get}, data2(N), N]),
{S2, _} = timer:tc(arr, set, [{arr, array_get}, data2(N), N]),
%% tuple
{G3, _} = timer:tc(arr, get, [{arr, tuple_get}, data3(N), N]),
{S3, _} = timer:tc(arr, set, [{arr, tuple_get}, data3(N), N]),
%% results
io:format("Fixed-size array: get: ~8wns, set: ~8wns~n", [G1 , S1]),
io:format("Extensible array: get: ~8wns, set: ~8wns~n", [G2 , S2]),
io:format("Tuple: get: ~8wns, set: ~8wns~n", [G3 , S3]),
ok.

--
wagerlabs.com

Joel Reymont

unread,

Aug 25, 2008, 12:42:55 PM8/25/08

to Dimitry Golubovsky, erlang-q...@erlang.org

There are serious errors in the code I posted but I updated the code
here and added gb_trees for kicks:

http://www.wagerlabs.com/blog/2008/08/optimizing-erlang-a-death-match-of-arrays-vs-tuples.html#more

One of the differences is that I'm using the populated data structure
for get operations instead of an empty one.

Times are in microseconds as Serge Aleynikov kindly pointed out.

Thanks, Joel

Dimitry Golubovsky

unread,

Aug 25, 2008, 12:49:07 PM8/25/08

to Joel Reymont, erlang-q...@erlang.org

Joel,

Just wondering why set operations are so much faster (except for fixed
size arrays)?

The result of applying *set operations is not consumed, is it? What if
you try to do anything with it (after it is returned from timer:tc)?

Could this be some internal laziness, so only function application
thunks are indeed formed, but since results are never consumed, it
does not expand any further? Or is it some optimization of the same
kind, that the compiler sees unconsumed result, and does not really
compile anything? Or even something at JIT level?

Or am I missing anything ? ;)

Thanks.

On 8/25/08, Joel Reymont <joe...@gmail.com> wrote:
>

> 14> arr:test().
> Fixed-size array: get: 1256ns, set: 6667ns
> Extensible array: get: 1255ns, set: 22ns
> Tuple: get: 659ns, set: 6ns
> ok
>
> %% 1 million
>
> 15> arr:test(1000000).
> Fixed-size array: get: 121881ns, set: 777067ns
> Extensible array: get: 120026ns, set: 35ns
> Tuple: get: 66288ns, set: 40ns
> ok

Joel Reymont

unread,

Aug 25, 2008, 12:52:39 PM8/25/08

to Dimitry Golubovsky, erlang-q...@erlang.org

On Aug 25, 2008, at 5:49 PM, Dimitry Golubovsky wrote:

> The result of applying *set operations is not consumed, is it? What if
> you try to do anything with it (after it is returned from timer:tc)?

The result of a set operation _is_ consumed on every iteration, it's
used to prime the subsequent one.

The result of every get operation is thrown away, though!

I updated the blog post to state that the times are in microseconds.

--
wagerlabs.com

Joel Reymont

unread,

Aug 25, 2008, 12:56:04 PM8/25/08

to Dimitry Golubovsky, erlang-q...@erlang.org

Also, this is the updated results table from the blog post [1].

%% 10,000 elements

11> arr:test().
Fixed-size array: get: 3364mc, set: 6941mc
Extensible array: get: 1mc, set: 23mc
Tuple: get: 625mc, set: 209628mc
Tree: get: 47700mc, set: 4305mc
ok

%% 100,000 elements

4> arr:test(100000).
Fixed-size array: get: 39017mc, set: 73636mc
Extensible array: get: 2mc, set: 36mc
Tuple: get: 6290mc, set: 24495846mc
Tree: get: 619615mc, set: 52691mc
ok

[1] http://www.wagerlabs.com/blog/2008/08/optimizing-erlang-a-death-match-of-arrays-vs-tuples.html#more

--
wagerlabs.com

Hynek Vychodil

unread,

Aug 25, 2008, 12:59:40 PM8/25/08

to Joel Reymont, Dimitry Golubovsky, erlang-q...@erlang.org

You have typo in your code. Set on tuple can't be as fast as your measure :)
You do get on Extensible array and tuple instead of set and you call this inside set cycle which remove data structure away.

It should be:

test(N) ->
%% fixed-size array
{G1, _} = timer:tc(arr, get, [{arr, array_get}, data1(N), N]),
{S1, _} = timer:tc(arr, set, [{arr, array_set}, data1(N), N]),
%% extensible array
{G2, _} = timer:tc(arr, get, [{arr, array_get}, data2(N), N]),

{S2, _} = timer:tc(arr, set, [{arr, array_set}, data2(N), N]), % <-- here

%% tuple
{G3, _} = timer:tc(arr, get, [{arr, tuple_get}, data3(N), N]),

{S3, _} = timer:tc(arr, set, [{arr, tuple_set}, data3(N), N]), % <-- here

%% results
io:format("Fixed-size array: get: ~8wns, set: ~8wns~n", [G1 , S1]),
io:format("Extensible array: get: ~8wns, set: ~8wns~n", [G2 , S2]),
io:format("Tuple: get: ~8wns, set: ~8wns~n", [G3 , S3]),
ok.

And you should gain very different result looks like (Debian Linux 2.2GHz Intel C2Duo Erlang/OTP R12B-3):

6> arr:test().
Fixed-size array: get:     8912ns, set:    29114ns
Extensible array: get:     8639ns, set:    15065ns
Tuple:            get:     1244ns, set:   233815ns
ok
7> arr:test(100000).
Fixed-size array: get:    78370ns, set:   110003ns
Extensible array: get:    29618ns, set:   110906ns
Tuple:            get:    12458ns, set: 27852761ns
ok

Tuple set is far slower and get is far faster than on array operation as expected.
There is possible GC issue why Extensible array is slower than fixed. Try run it in opposite order and you will see.

11> arr:test(100000).
Fixed-size array: get:    29036ns, set:   103593ns
Extensible array: get:    89132ns, set:   114022ns
Tuple:            get:    12471ns, set: 28103959ns
ok

Writing worth benchmark is not as simple as it looks ;-)

_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

--
--Hynek (Pichi) Vychodil

Joel Reymont

unread,

Aug 25, 2008, 1:22:01 PM8/25/08

to Hynek Vychodil, Dimitry Golubovsky, erlang-q...@erlang.org

On Aug 25, 2008, at 5:59 PM, Hynek Vychodil wrote:

> You have typo in your code. Set on tuple can't be as fast as your
> measure :)
> You do get on Extensible array and tuple instead of set and you call
> this
> inside set cycle which remove data structure away.

I updated my blog post to remove the typo and re-ran the benchmarks.

--
wagerlabs.com

Colm Dougan

unread,

Aug 25, 2008, 3:21:45 PM8/25/08

to erlang-q...@erlang.org

Interesting benchmark. I tried compiling the array API natively and
got pretty decent improvements, especially for the array set
operations (approx 40% better). .

Colm

Richard Carlsson

unread,

Aug 25, 2008, 4:04:20 PM8/25/08

to Dimitry Golubovsky, erlang-q...@erlang.org

Dimitry Golubovsky wrote:
> I need an index-based access to elements of a fixed-size array-like
> structure. Number of elements is expected to be about several
> thousands.
>
> What would be more efficient: to create an array (using the "array"
> module) or to build a huge tuple and use erlang:element?

If it's the case of write-once, read-many, you could prepare the
data in a list, do list_to_tuple, and then use the tuple for reading.
But I'd only do that myself as a last resort, to maximize speed.

> PS I suspect the latter, but how does erlang:element scale on tuples
> of such size?

Just reading elements using erlang:element(N, Tuple) is as fast as
it can possibly get - simply a pointer addition and a load. And tuples
can be pretty huge. Try e.g. erlang:make_tuple(10000000,0)) - that's
10 million elements. But updating big tuples (copying N words) is much
too inefficient; hence the array module, which tries to provide a good
balance between read and write access times.

/Richard

Reply all

Reply to author

Forward