[erlang-questions] Erlang Memory Question

Eranga Udesh

unread,

Sep 22, 2014, 10:12:07 AM9/22/14

to erlang-questions

Hi,

I'm trying to optimize my memory consumption in Erlang VM and to garbage collect as soon as possible.

Let's say I have a large object, "X", After some processing, I only need to work on small part of X, called "x".

Can someone advice me if below process flow will put the large object X in to garbage collection, while waiting for the long running job to continue?

function1() ->

X = ... fetch a large object....

... some processing...

x = ... extract a part of X...
... long running job....

If it's not putting X into garbage collection, does below change do that?

function1() ->

X = ... fetch a large object....

... some processing...

x = ... extract a part of X...
function2(x).

function2(x) ->

... long running job...

Tks,

- Eranga

Jesper Louis Andersen

unread,

Sep 22, 2014, 1:56:40 PM9/22/14

to Eranga Udesh, erlang-questions

In general: No it won't. GC will trigger once the process has allocated enough data. If your processing is allocating enough data, this will quickly happen and you don't have to worry. If not, you may have to gently persuade the process to do so. There are several ways:

* Set fullsweep_after on the process with a low value (0) and run erlang:garbage_collect()

* Go through a hibernation with an immediate wakeup (feels ugly to me)

* Handle fetching and extraction in a separate process and send the extracted part 'x' back as a message. This will free up memory almost immediately for other processes to use (simple and elegant)

* Don't fetch the large object in the first place, but make it possible to ask for an extraction only. Or stream data a bit at a time and handle the stream in chunks rather than everything at once.

_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

--
J.

Björn-Egil Dahlberg

unread,

Sep 22, 2014, 2:40:28 PM9/22/14

to Jesper Louis Andersen, erlang-questions

It should suffice by running an erlang:garbage_collect/0 directly after extracting some parts of the structure:

function1() ->

X1 = ... fetch a large object ....

... some processing...

X2 = ... extract a part of X1 ...
erlang:garbage_collect(),
... long running job....

As long as the object of X1 is not referenced by the process after the explicit call to the gc, you're fine.

Note also, an explicit call to garbage_collect/0 will always to a 'fullsweep'.

Ofc, as Jesper mentioned, the it's probably preferable not to fetch the whole object if possible.

// Björn-Egil

Eranga Udesh

unread,

Sep 22, 2014, 10:24:20 PM9/22/14

to Björn-Egil Dahlberg, erlang-questions

Thanks for the information received so far.

Wouldn't it be good for Erlang to have a single object garbage collection function/bif? For example, when I no longer require a large object, I force to garbage collect it, without making a full sweep?

As mentioned in the document, a full sweep may degrade the performance.

- Eranga

Michael Truog

unread,

Sep 23, 2014, 1:12:49 AM9/23/14

to Eranga Udesh, erlang-questions

If garbage collection isn't happening quick enough, you can generate the problematic data within a temporary process to make the temporary process' death trigger garbage collection naturally. Doing that is a natural way to approach it. The tuning is something additional that is optional. I have a module I had used in the past to be as harsh as possible to the garbage collector (i.e., force it as much as possible) here https://gist.github.com/okeuday/dee991d580eeb00cd02c but I don't think it is necessary with a decent architecture in-place (being that harsh shouldn't be necessary). If you need to check the memory consumption of your processes http://www.erlang.org/doc/man/instrument.html is very helpful.

Richard A. O'Keefe

unread,

Sep 23, 2014, 3:12:28 AM9/23/14

to Eranga Udesh, erlang-questions

On 23/09/2014, at 2:24 PM, Eranga Udesh wrote:

> Thanks for the information received so far.
>
> Wouldn't it be good for Erlang to have a single object garbage collection function/bif?

NO. In C it's called free() and it's a major cause of errors.

> For example, when I no longer require a large object, I force to garbage collect it, without making a full sweep?

Why should the garbage collector *believe* you that the "object"
is free? You could be deliberately lying. You could (and
probably are) be mistaken. How is it to know which bits you
want to keep without doing its usual thing? In a shared heap
implementation (which Erlang has had and may have again) the
fact that *you've* finished with the object doesn't mean
everyone *else* has. A meaningful operation should not depend
on implementation details like that.

>
> As mentioned in the document, a full sweep may degrade the performance.

Not half as much as freeing too much would!

This is micro-optimisation. Avoid passing around large
objects in the first place.

Eranga Udesh

unread,

Sep 23, 2014, 9:20:56 PM9/23/14

to Richard A. O'Keefe, erlang-questions

Well, yes I may deliberately lie.

However my suggestion is to, instead of doing a full sweep by the garbage collector (GC) to identify data going out of scope and reclaim, can the program (or rather I) deliberately say I (the calling process) is finished using the said data, so the GC may free that part.

Then the GC may carry out it's own logic, which it currently does to verify if the same data is referenced by any other processes, etc., and decide if to GC or not.

= Eranga

Jesper Louis Andersen

unread,

Sep 24, 2014, 9:54:07 AM9/24/14

to Eranga Udesh, erlang-questions

On Wed, Sep 24, 2014 at 3:13 AM, Eranga Udesh <erang...@gmail.com> wrote:

However my suggestion is to, instead of doing a full sweep by the garbage collector (GC) to identify data going out of scope and reclaim, can the program (or rather I) deliberately say I (the calling process) is finished using the said data, so the GC may free that part.

You probably want to read the paper "A unified theory of Garbage collection", Baker et.al[0]. What you are proposing is certainly possible. You can identify data which is dead after you leave a scope and quickly reclaim such data. Very simple data already has this property and goes onto the Y stack in he VM. It has to be weighed against a number of factors though. It is not a priori clear it would provide a much better space profile of the program. The Erlang GC is generational to avoid storing data which is short-lived in size, so usually it is able to take care of dead data quite efficiently.

The paper describes the duality between quick release of data weighed against the advantage of batching up garbage collection in larger chunks. Good GCs tend to gravitate toward the dual point. So a collector relying on reference counting has to adapt itself as to not pause the system when freeing up a large structure. And systems relying on batch collection gains what you propose to free memory faster for certain known corner cases.

That said, I am interested in your use case, because I have never really hit Erlangs GC as a limiting factor. Maybe because I'm concious about how a GC works and tries to pet it nicely :)

[0] http://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf

--
J.

Robert Raschke

unread,

Sep 24, 2014, 10:28:46 AM9/24/14

to erlang-questions

I always thought that is one of the reasons to have processes.
If you've got something big you want to throw away quickly, make a process for it.

$ erl
Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.4 (abort with ^G)
1> erlang:memory().
[{total,14148416},
{processes,4091608},
{processes_used,4091488},
{system,10056808},
{atom,194289},
{atom_used,172614},
{binary,1514552},
{code,4026600},
{ets,262688}]
2> Pid = spawn(
2>   fun () ->
2>     X = binary:copy(<<1,2,3,4,5,6,7,8>>, 1024),
2>     Y = binary:copy(X, 1024),
2>     receive stop -> ok end
2>   end
2> ).
<0.35.0>
3> erlang:memory().
[{total,22643832},
{processes,4203448},
{processes_used,4203448},
{system,18440384},
{atom,194289},
{atom_used,175110},
{binary,9685320},
{code,4221791},
{ets,267056}]
4> Pid ! stop.
stop
5> erlang:memory().
[{total,13587776},
{processes,4084496},
{processes_used,4084384},
{system,9503280},
{atom,194289},
{atom_used,175110},
{binary,748144},
{code,4221791},
{ets,267056}]
6>

Regards,
Robby

Eranga Udesh

unread,

Sep 24, 2014, 10:37:19 PM9/24/14

to Robert Raschke, erlang-questions

Thanks all for your advice. Let me see how I can apply them to my program. It looks like my code/logic is going to get ugly and induce a performance penalty, in order to save memory.

Cheers,

- Eranga

Robert Virding

unread,

Sep 27, 2014, 11:46:19 PM9/27/14

to Eranga Udesh, erlang-questions

The obvious question is whether you are sure you actually need to optimise to save memory? Premature optimisation and all that. (Actually sensible advice) Maybe reviewing algorithms and datastructures will do the trick for you.

Robert

From my Nexus

Eranga Udesh

unread,

Oct 5, 2014, 3:27:54 AM10/5/14

to Robert Virding, erlang-questions

By doing forced garbage collection, I managed to reduce the overall system memory consumption from 1 GB to 190 MB in normal operation. That's over 420% memory saving. A single use session was taking 12 MB of memory at peak and holding for some time before are now getting released fast.

I didn't notice much change in CPU usage.

I found temporary variables, eg. binary_to_list of a XML data say 100 KB in size (Xmerl needs string), won't get freed for a long period of time without force garbage collection. Therefore when there are about 500 user sessions, each process consuming large memory blocks, makes the system memory usage extremely high. We plan to support a large number of user sessions, say 10000s and this memory consumption is a show stopper for us at the moment.

Using erlang:process_info/2, I can get the total memory usage of a process. But, is there a method to get full breakdown of the memory allocation in a process in waiting state? Eg. if it can show the memory usage of each variable in the current scope of the process, pending for garbage collection, etc., wold be ideal.

Are there any fast XML parsers that work with binary, instead of string?

- Eranga

Motiejus Jakštys

unread,

Oct 5, 2014, 3:56:39 PM10/5/14

to Eranga Udesh, erlang-questions

On Sun, Oct 5, 2014 at 9:27 AM, Eranga Udesh <erang...@gmail.com> wrote:
>
> Are there any fast XML parsers that work with binary, instead of string?

There are quite a few, but I heard good comments about this one:

https://github.com/maxlapshin/parsexml

Regards,
Motiejus

Jesper Louis Andersen

unread,

Oct 5, 2014, 5:18:11 PM10/5/14

to Eranga Udesh, erlang-questions

On Sun, Oct 5, 2014 at 9:27 AM, Eranga Udesh <erang...@gmail.com> wrote:

I found temporary variables, eg. binary_to_list of a XML data say 100 KB in size (Xmerl needs string), won't get freed for a long period of time without force garbage collection. Therefore when there are about 500 user sessions, each process consuming large memory blocks, makes the system memory usage extremely high. We plan to support a large number of user sessions, say 10000s and this memory consumption is a show stopper for us at the moment.

Hi!

This is your problem in a nutshell. Calling binary_to_list/1 on a 100KB binary blows it up to at least 2.4 megabytes in size. When the process is done, it takes a bit of time for the heap to shrink down again. It will just get into a serious problem when your system is going to process XML documents for a large set of users at the same time. You have two general options, which should both be applied in a serious system:

* xmerl is only useful for small configuration blocks of data. If you are processing larger amounts of data, you need an XML parser which operates directly on the binary representation. In addition, if you can find an XML parser which allows you to parse in SAX-style, so you don't have to form an intermediate structure will help a lot. In Haskell, particularly GHC, fusion optimizations would mostly take of these things, but it doesn't exist in the Erlang ecosystem, so you will have to approach it yourself. Unfortunately I don't have any suggestion handy, since it is too long since I've last worked with XML as a format.

* Your Erlang node() needs to have a way to shed load once it reaches capacity. In other words, you design your system up to a certain amount of simultaneous users and then you make sure there is a limit to how much processing that can happen concurrently. This frames the erlang system so it does not break down under the stress if it gets loaded over capacity. Fred Hebert has written a book, "Erlang in Anger"[0] which touches on the subject in chapter 3 - "Planning for overload". You may have 20.000 users on the system, but if you make sure only 100 of those can process XML data at the same time, you can at most have 240 megabytes of outstanding memory space at the moment. Also, you may want to think about how much time it will take K cores to chew through 240 megabytes of data. Reading data is expensive.

Irina Guberman (from Ubiquity networks if memory serves) recently had a very insightful (and funny!) talk[1] on how she employed the "jobs" framework in a situation which is slightly akin to yours. It is highly recommended, since she touches on the subject in far more depth than what I do here. For a production system I would recommend employing some kind of queueing framework early on. Otherwise, you system will just bow under the load once it gets deployed.

[0] http://www.erlang-in-anger.com/

[1] https://www.youtube.com/watch?v=1Z_Z8aLIBQ8&list=UUQ7dFBzZGlBvtU2hCecsBBg

--
J.

Erik Søe Sørensen

unread,

Oct 5, 2014, 5:37:54 PM10/5/14

to Jesper Louis Andersen, Erlang Questions

Good advice.
For the short term, I think the option of hibernating the processes should be mentioned as well - it ensures that dormant session processes don't take up more memory than necessary.

/Erik

Eranga Udesh

unread,

Oct 5, 2014, 9:38:47 PM10/5/14

to Motiejus Jakštys, erlang-questions

Parsexml benchmarks look promising. Thanks Motiejus. Will try that and give my feedback.

- Eranga

Eranga Udesh

unread,

Oct 5, 2014, 9:42:01 PM10/5/14

to Erik Søe Sørensen, Erlang Questions

I thought of hibernation, but then I loose the Timeout feature of the gen_server/gen_fsm. But of course I can do my own timers. I didn't go there yet. Will do the same and give my feedback if that improves the system or not.

Tks,

- Eranga

Eranga Udesh

unread,

Oct 5, 2014, 9:52:21 PM10/5/14

to Jesper Louis Andersen, erlang-questions

Thanks Jesper.. good stuff/advice.

Let me digest your suggestions and articles and re-think of better architecture. Will revert back with results soon.