[erlang-questions] moving process state - copy on write - heap memory increase

46 views
Skip to first unread message

Roland

unread,
Sep 11, 2012, 5:45:18 PM9/11/12
to erlang-q...@erlang.org
Hello!

I want to move the state of my process to another process. Unfortunately
this is not possible without a huge increase in memory consumption as by
copying the state the "copy on write semantics" are lost.

This behaviour is demonstrated in my attached simplified t_process module.

-> {ok, Pid} = t_process:start_link().
-> t_process:bloat(Pid). --> memory consumption of the process is at
6657256 after "bloat" is finished
-> t_process:reload_bag(P1). --> memory consumption is at 26295176 (4
times as much)

No data has not been modified in any way, the data stays the same.

My question is: Despite what i want to do with this state transfer, is
there any way to copy the state of a process without losing the "copy on
write semantics", so that the target process has the same memory
footprint as the original one?

Can someone point me to any direction/documentation on how i can get
arround this issue?

Thank you!

-module(t_process).

-behaviour(gen_server).

-export([start_link/0, init/1, handle_call/3, handle_cast/2,
handle_info/2, terminate/2, code_change/3, bloat/1, reload_bag/1]).

-record(process, {bag = []}).

start_link() ->
gen_server:start_link(?MODULE, [], []).

bloat(Pid) ->
gen_server:cast(Pid, bloat).

reload_bag(Pid) ->
gen_server:call(Pid, reload_bag).

init([]) ->
{ok, #process{}}.

handle_call(reload_bag, _From, State) ->
BinaryBag = erlang:term_to_binary(State#process.bag), % simulate
state transfer / loss of semantics
io:format("Bag size is ~w.", [erlang:size(BinaryBag)]),
{reply, ok, State#process{bag = erlang:binary_to_term(BinaryBag)}};

handle_call(_Call, _From, State) ->
{reply, ok, State}.

handle_cast(bloat, State) ->
NewBag = lists:concat([State#process.bag, lists:foldl(fun(_C, List)
-> List ++ [{"ABCDEFGHIJKLMNOPQRSTUVWXYZ"}] end, [], lists:seq(1, 50000))]),
io:format("Bloating finished."),
{noreply, State#process{bag = NewBag}};

handle_cast(_Cast, State) ->
{noreply, State}.

handle_info(_Info, State) ->
{noreply, State}.

terminate(_Reason, _State) ->
ok.

code_change(_OldVsn, State, _Extra) ->
{ok, State}.

_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Björn-Egil Dahlberg

unread,
Sep 11, 2012, 9:10:39 PM9/11/12
to Roland, erlang-q...@erlang.org
Hi,

Sharing within a compounded data term is not preserved on send nor in
term to binary.

One way to reinstate sharing is to walk through the compounded term
and look for sharing (Wind3d uses this trick).

Ex.
Ls = <list of items>,
{Sh,_} = lists:mapfold(fun(I, T) ->
case gb_trees:lookup(I,T) of
none -> {I, gb_trees:enter(I,I,T)};
{value, V} -> {V, T}
end
end, gb_trees:empty(), Ls),
Sh.

(disclaimer: I am writing code when sleepy and I always forget the
mapfold syntax. But you get the idea.)

This approach requires some knowledge about the structure of the data
to be reshared. It will also temporary increase the total memory while
performing the resharing.

I'm sure there are other, perhaps much better, approaches as well to
this problem.

There exists several proposals on preserved sharing on send, but none
are currently implemented in the vm.

// Björn-Egil

Skickat från min iPad

Yiannis Tsiouris

unread,
Sep 12, 2012, 3:11:48 AM9/12/12
to erlang-q...@erlang.org
Hi,

Actually that's funny because there is some work [1] by Nikos Papaspyrou and Kostis Sagonas that will be presented on Friday the 14th on the Erlang Workshop in Copenhagen! It's an (experimental) implementation of a copy that preserves sharing...

Maybe you could take a look at the code in that repo. :-)


Best regards,
Yiannis

[1]: https://github.com/nickie/otp

--
Yiannis Tsiouris
Ph.D. student,
Software Engineering Laboratory,
National Technical University of Athens
WWW: http://www.softlab.ntua.gr/~gtsiour

Richard O'Keefe

unread,
Sep 12, 2012, 3:38:14 AM9/12/12
to Roland, erlang-q...@erlang.org
On 12/09/2012, at 9:45 AM, Roland wrote that
creating a term with a very great deal of shared subterms
and then sending it to another process "unshared" the subterms.

There's a comment in his code, "% simulate state transfer / loss of semantics".

The problem is that there are *two* levels of semantics for a
language like Erlang:
- a "value" semantics
- a "cost" semantics
and subterm sharing is part of the cost semantics
but it is NOT part of the value semantics.

For example,

copy_pairs([{K,V}|Pairs]) -> [{K,V}|copy_pairs(Pairs)];
copy_pairs([]) -> [].

is allowed by the value semantics *either* to
create new {K,V} pairs *or* to share the old
ones, even though they have different cost semantics.
(Yes, I know this is a restriction of the identity function.)

This is one place where the divergence between the value
semantics and the cost semantics is observable. There are
more than just stylistic reasons to write

copy_pairs([P={_,_}|Pairs]) -> [P|copy_pairs(Pairs)];
copy_pairs([]) -> [].

term_to_binary and message sending are additional places
where the value and cost semantics diverge. I thought it
had always been clear from the Erlang documentation that
these things linearise the term they are given.

There's a close connection between term copying and (some
forms of) garbage collection. Indeed, the first interpreter
I ever wrote for a Lisp-like language reused some of the GC
code in the copier.

I imagine most of us would be very happy with term_to_binary
and message sending preserving sharing as long as it didn't
slow existing code down more than a percent or two. Not
because we _want_ to pass terms with lots of sharing but
because it's so much nicer not to have to worry about that.

Ulf Wiger

unread,
Sep 12, 2012, 6:28:04 AM9/12/12
to Richard O'Keefe, erlang-q...@erlang.org

On 12 Sep 2012, at 09:38, Richard O'Keefe wrote:

> I imagine most of us would be very happy with term_to_binary
> and message sending preserving sharing as long as it didn't
> slow existing code down more than a percent or two. Not
> because we _want_ to pass terms with lots of sharing but
> because it's so much nicer not to have to worry about that.

Indeed. The list archives also contain examples of where the
loss of sharing has proven fatal (only to the program, thankfully).

http://erlang.org/pipermail/erlang-bugs/2007-November/000488.html

http://erlang.org/pipermail/erlang-questions/2005-November/017926.html

Admittedly, the cases where preserved sharing is really needed,
and the lack of it is very hard to work around, are few. In the first
example above, it was difficult to overcome; in the second example,
it was accidental, easy to fix, but not that easy to debug*.

BR,
Ulf W

* The memory explosion occurred in a process that ran embarrassingly
trivial code, so it took a while before it occurred to me that it might be
gobbling up memory, but of course the penalty (in terms of increased
memory footprint) for the flattening is paid by the _receiving_ process,
and not the sending one.

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com
Reply all
Reply to author
Forward
0 new messages