[erlang-questions] How fast is to retrieve date from ETS table

Martin Dimitrov

unread,

Jan 11, 2012, 6:15:30 AM1/11/12

to Erlang Questions

I thought this is a lame question so I posted it on StackOverflow
(http://stackoverflow.com/questions/8811430/retrieval-of-data-from-ets-table)
so not to bother the list but there aren't many replies.

Here it is my observation:

I know that lookup time is constant for ETS tables. But I also heard
that the table is kept outside of the process and when retrieving data,
the data needs to be moved to the process heap. So, this is expensive.
But then, how to explain this:

1> {ok, B} = file:read_file("IMG_2171.JPG").
{ok,<<255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,
0,8,0,0,0,10,0,14,1,2,0,32,...>>}
2> size(B).
1986392
3> L = binary_to_list(B).
[255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,0,8,0,0,
0,10,0,14,1,2,0,32,0,0|...]
4> length(L).
1986392
5> ets:insert(utilo, {a, L}).
true
6> timer:tc(ets, match, [utilo, {a, '$1'}]).
{106000,
[[[255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,0,8,0,
0,0,10,0,14,1,2|...]]]}

It takes 106000 microseconds to retrieve 1986392 long list which is
pretty fast, isn't it?
I also tried it from a module and the result is the same.

Best regards,

Martin

_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Adam Rutkowski

unread,

Jan 11, 2012, 6:22:12 AM1/11/12

to Martin Dimitrov, Erlang Questions

Hi,

On Jan 11, 2012, at 12:15 PM, Martin Dimitrov wrote:

> 3> L = binary_to_list(B).
> [255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,0,8,0,0,
> 0,10,0,14,1,2,0,32,0,0|...]

What's the reason behind this?
You can store binaries in ETS, you should also get significantly better results retrieving them.

--
AR

Adam Rutkowski

unread,

Jan 11, 2012, 6:26:04 AM1/11/12

to Martin Dimitrov, Erlang Questions

Ah, sorry, ignore my previous answer, I didn't read your stackoverflow post.

Gleb Peregud

unread,

Jan 11, 2012, 6:27:09 AM1/11/12

to Adam Rutkowski, Erlang Questions

On Wed, Jan 11, 2012 at 12:22, Adam Rutkowski <adam.ru...@jtendo.com> wrote:
>> 3> L = binary_to_list(B).
>> [255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,0,8,0,0,
>> 0,10,0,14,1,2,0,32,0,0|...]
>
> What's the reason behind this?
> You can store binaries in ETS, you should also get significantly better results retrieving them.

The question is actually about how to retrieve whole structures from
ETS. Binaries would be only referenced (as said in stackoverflow
thread).

100ms does not seem to be that much for copying 2MB of data from one
place in memory to another. Although for building such a list
cell-by-cell it seems pretty fast.

You can do the following:
- measure time to memcpy 2MB memory chunk in pure C or
- analyze how data is copied from ETS to process heap in the source code or
- wait for an answer from someone who knows it (OTP team)

Best,
Gleb

Ulf Wiger

unread,

Jan 11, 2012, 6:45:05 AM1/11/12

to Martin Dimitrov, Erlang Questions

Well, reading from ETS does impose a copy operation, but this is just as efficient (as far as it goes) as message passing, which also copies. Exactly the same copy operation is used, in fact.

And just as with message passing, if you are using binaries, they may be passed by reference instead, as has already been pointed out.

Given that term copying is central in message passing and GC, as well as in ETS, a lot of time has been invested in optimizing it*. The Erlang VM does it well, but of course, copying is always copying - the cost will be relative to the size of the data, and the emulator must traverse the term in order to know what's in it.

The actual code is in $ERL_TOP/erts/emulator/beam/copy.c (copy_struct()), or:

https://github.com/erlang/otp/blob/master/erts/emulator/beam/copy.c#L191

Another potential performance issue with ETS tables is locking, if you have many cores. Again, this is an area that the ERTS team is working hard on, so it gets better and better. However, shared data structures are notoriously hard to manage as the core count grows.

Bottom line: it's good to be aware of cost factors, but you won't know how fast or slow it is in reality, until you measure (which you did!). ETS tables are fast enough for most purposes. :)

BR,

Ulf W

* The garbage collector uses its own copying techniques, since it doesn't have to be limited to copying one term at a time, and also _must_ preserve subterm sharing.

Martin Dimitrov

unread,

Jan 11, 2012, 8:54:17 AM1/11/12

to Ulf Wiger, Erlang Questions

Thanks for the informative answer. I did some tests with memcpy in C and
it turns out that 0.1 second for ~2MB data is not "quite fast" but, I
guess, normal.

Best regards,

Martin

Sverker Eriksson

unread,

Jan 11, 2012, 10:59:43 AM1/11/12

to Martin Dimitrov, Erlang Questions

Ulf Wiger wrote:
> :

> The actual code is in $ERL_TOP/erts/emulator/beam/copy.c (copy_struct()), or:
>
> https://github.com/erlang/otp/blob/master/erts/emulator/beam/copy.c#L191
>

Copy from ETS actually uses copy_shallow() which is simpler and even
more effective as the term is known to be contained within one
continuous block.

https://github.com/erlang/otp/blob/master/erts/emulator/beam/copy.c#L919

/Sverker, Erlang/OTP Ericsson

Ulf Wiger

unread,

Jan 11, 2012, 11:04:52 AM1/11/12

to Sverker Eriksson, Erlang Questions

Thanks for correcting me. I thought as much*, but then did some sloppy grep:ing and was misled. :)

I assume it's copy_struct() in one direction (storing) and copy_shallow() in the other?

BR,
Ulf W

* As in, Björn G told me some 5-6 years ago, but I forgot the particulars.

Sverker Eriksson

unread,

Jan 12, 2012, 5:09:55 AM1/12/12

to Ulf Wiger, Erlang Questions

Ulf Wiger wrote:
> Thanks for correcting me. I thought as much*, but then did some sloppy grep:ing and was misled. :)
>
> I assume it's copy_struct() in one direction (storing) and copy_shallow() in the other?
>
>