I know that lookup time is constant for ETS tables. But I also heard that the table is kept outside of the process and when retrieving data, the data needs to be moved to the process heap. So, this is expensive. But then, how to explain this:
On Wed, Jan 11, 2012 at 12:22, Adam Rutkowski <adam.rutkow...@jtendo.com> wrote: >> 3> L = binary_to_list(B). >> [255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,0,8,0,0, >> 0,10,0,14,1,2,0,32,0,0|...]
> What's the reason behind this? > You can store binaries in ETS, you should also get significantly better results retrieving them.
The question is actually about how to retrieve whole structures from ETS. Binaries would be only referenced (as said in stackoverflow thread).
100ms does not seem to be that much for copying 2MB of data from one place in memory to another. Although for building such a list cell-by-cell it seems pretty fast.
You can do the following: - measure time to memcpy 2MB memory chunk in pure C or - analyze how data is copied from ETS to process heap in the source code or - wait for an answer from someone who knows it (OTP team)
Well, reading from ETS does impose a copy operation, but this is just as efficient (as far as it goes) as message passing, which also copies. Exactly the same copy operation is used, in fact.
And just as with message passing, if you are using binaries, they may be passed by reference instead, as has already been pointed out.
Given that term copying is central in message passing and GC, as well as in ETS, a lot of time has been invested in optimizing it*. The Erlang VM does it well, but of course, copying is always copying - the cost will be relative to the size of the data, and the emulator must traverse the term in order to know what's in it.
The actual code is in $ERL_TOP/erts/emulator/beam/copy.c (copy_struct()), or:
Another potential performance issue with ETS tables is locking, if you have many cores. Again, this is an area that the ERTS team is working hard on, so it gets better and better. However, shared data structures are notoriously hard to manage as the core count grows.
Bottom line: it's good to be aware of cost factors, but you won't know how fast or slow it is in reality, until you measure (which you did!). ETS tables are fast enough for most purposes. :)
BR, Ulf W
* The garbage collector uses its own copying techniques, since it doesn't have to be limited to copying one term at a time, and also _must_ preserve subterm sharing.
> I know that lookup time is constant for ETS tables. But I also heard > that the table is kept outside of the process and when retrieving data, > the data needs to be moved to the process heap. So, this is expensive. > But then, how to explain this:
> It takes 106000 microseconds to retrieve 1986392 long list which is > pretty fast, isn't it? > I also tried it from a module and the result is the same.
Thanks for the informative answer. I did some tests with memcpy in C and it turns out that 0.1 second for ~2MB data is not "quite fast" but, I guess, normal.
> Well, reading from ETS does impose a copy operation, but this is just as efficient (as far as it goes) as message passing, which also copies. Exactly the same copy operation is used, in fact.
> And just as with message passing, if you are using binaries, they may be passed by reference instead, as has already been pointed out.
> Given that term copying is central in message passing and GC, as well as in ETS, a lot of time has been invested in optimizing it*. The Erlang VM does it well, but of course, copying is always copying - the cost will be relative to the size of the data, and the emulator must traverse the term in order to know what's in it.
> The actual code is in $ERL_TOP/erts/emulator/beam/copy.c (copy_struct()), or:
> Another potential performance issue with ETS tables is locking, if you have many cores. Again, this is an area that the ERTS team is working hard on, so it gets better and better. However, shared data structures are notoriously hard to manage as the core count grows.
> Bottom line: it's good to be aware of cost factors, but you won't know how fast or slow it is in reality, until you measure (which you did!). ETS tables are fast enough for most purposes. :)
> BR, > Ulf W
> * The garbage collector uses its own copying techniques, since it doesn't have to be limited to copying one term at a time, and also _must_ preserve subterm sharing.
> On 11 Jan 2012, at 12:15, Martin Dimitrov wrote:
>> I know that lookup time is constant for ETS tables. But I also heard >> that the table is kept outside of the process and when retrieving data, >> the data needs to be moved to the process heap. So, this is expensive. >> But then, how to explain this:
>> It takes 106000 microseconds to retrieve 1986392 long list which is >> pretty fast, isn't it? >> I also tried it from a module and the result is the same.
> Copy from ETS actually uses copy_shallow() which is simpler and even more effective as the term is known to be contained within one continuous block.