[erlang-questions] binary, ets and memory...

41 views
Skip to first unread message

Dmitry Kolesnikov

unread,
Oct 12, 2012, 3:15:21 PM10/12/12
to Erlang Questions
Hello,

Recently, my system starts to swap. The investigation has indicated that memory consumption was almost twice more then I've expected and to be honest, I've confused why it so...

I am talking about Erlang R15B (erts-5.9) [source] [64-bit] [smp:8:8] [async-threads:0] [hipe] [kernel-poll:false]

So, I do have two processes
* first process handles tcp/ip socket I/O. received data is pushed to second process
* second process splits binaries binary:split(Buf, [<<$,>>], [global]), parses data and makes list of tuple. When list of tuples is ready, it folds tuples into ets table
ets:new(cache, [named_table, ordered_set, public]),
lists:foldl(
fun({A, B}, Acc) -> ets:insert(cache, {A, B}) end,
true,
List
).

I do have about 6M tuples, where first element is SHA1 signature, second element is integer. Receiver process pushes 100 tuples per time. I hope you got rough idea.

When cache is populated, I do have following memory usage, it looks suspicious for me:
{total,1179235080},
{processes,2373638},
{processes_used,2373570},
{system,1176861442},
{atom,264505},
{atom_used,253241},
{binary,434761000}, <-- this looks strange for me. Why binaries are left in heap and in ets?
{code,6521469},
{ets,732409416}

If I change my implementation to
lists:foldl(
fun({A, B}, Acc) -> ets:insert(cache, {binary:copy(A), B}) end,
true,
List
).
then memory utilization is on the par with my estimates
{total,701448856},
{processes,2405251},
{processes_used,2405170},
{system,699043605},
{atom,264505},
{atom_used,253241},
{binary,2686280},
{code,6521493},
{ets,686663080}

Best Regards,
Dmitry

_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Chris Hicks

unread,
Oct 12, 2012, 3:27:03 PM10/12/12
to dmkole...@gmail.com, erlang-q...@erlang.org
I could be wrong but I'm going to take a guess and say that in the first implementation the whole binary is being kept around and never destroyed. I think what's happening is you are getting a reference to part of a larger binary and passing that around, but the larger binary is sticking around since part of it is still being used. Copying the part you need, and thus creating an entirely new binary, is probably allowing all references to that large binary to disappear so that it can be GC'd.

That's a rather naive guess based on what I know about how binaries work. Can anyone else back that up or tell me I'm wrong?

Chris

Dmitry Kolesnikov

unread,
Oct 12, 2012, 3:42:25 PM10/12/12
to Chris Hicks, erlang-q...@erlang.org
this make sense for me, especially taking into account that binary:split returns references to subject but I always thought that ets copies data. But here it seems that data is copied but reference counter to parts is not decreased... 

- Dmitry

Robert Virding

unread,
Oct 16, 2012, 5:39:29 PM10/16/12
to Chris Hicks, erlang-q...@erlang.org
Yes, that is why binary:copy was added. For just the case where you are accessing a small section of a large binary and the whoe binary is kept.

Another problem is with large binaries which are sent between processes. Yes, they are not copied and only a reference is passed, but it also means that it will take a longer time before the system can detect that the binary is no longer live and can be reclaimed. All the processes through which the binary has passed must be garbage collected first. The binary is a global object.

Robert


Reply all
Reply to author
Forward
0 new messages