Elixir string binaries and GC

421 views
Skip to first unread message

gvim

unread,
Oct 13, 2014, 9:28:17 AM10/13/14
to elixir-l...@googlegroups.com
This got me a little worried regarding Erlang's GC with binaries:

http://andy.wordpress.com/2012/02/13/erlang-is-a-hoarder/

Considering Elixir promotes binaries for string-handling is this a
problem Elixir users are likely to encounter?

gvim

Saša Jurić

unread,
Oct 13, 2014, 10:16:44 AM10/13/14
to elixir-l...@googlegroups.com
I believe this problem is related to refc binaries. I recall first time reading about it here: http://dieswaytoofast.blogspot.com/2012/12/erlang-binaries-and-garbage-collection.html

The gist is in this passage: 
It turns out that refc-binaries keep track of every process that has ever touched them! I know, its pretty obvious in retrospect, but the point here is that a refc-binary is not clobbered till every process that has ever touched it has been garbage-collected.
Not just processes that do something to the binary. Any process that touched it. 
And therein lies the rub. Some of our processes barely did anything at all. For example, we had a few processes that acted as extremely simple "routers" - simply passing the binary along to an appropriate destination based on request type -  and they didn't actually manipulate the binary in any form or fashion.  Think of them as the equivalent of me saying "Hey Bob, if you get a chance, drop this package off at Mary's place"and Bob never even looks inside the package.
The thing is, this "router" process was also on the list of processes that had "touched" the binary!!!
Given that this "router" process barely did anything, it also pretty much never got garbage-collected. Which meant that the original binary hung around for a long time without getting garbage-collected.

So yeah, this can happen when you pass around larger binaries (I think its > 64 bytes), and some processes don't do anything but routing.

I wouldn't worry about this upfront, but at one point you'll need to load/stress test your system and watch out for memory usage. If you suffer from this bug, it should be reasonably easy to spot by looking at memory usage and especially binary memory. Relying on tools such as entop (https://github.com/mazenharake/entop) can help a lot.

Fred Hebert (author of LYSE) had also some related helpers in his recon library (https://github.com/ferd/recon) but I didn't try it out.

gvim

unread,
Oct 13, 2014, 10:25:24 AM10/13/14
to elixir-l...@googlegroups.com
On 13/10/2014 15:16, Saša Jurić wrote:
> I believe this problem is related to refc binaries. I recall first time
> reading about it
> here: http://dieswaytoofast.blogspot.com/2012/12/erlang-binaries-and-garbage-collection.html
>
> So yeah, this can happen when you pass around larger binaries (I think
> its > 64 bytes), and some processes don't do anything but routing.
>
> I wouldn't worry about this upfront, but at one point you'll need to
> load/stress test your system and watch out for memory usage. If you
> suffer from this bug, it should be reasonably easy to spot by looking at
> memory usage and especially binary memory. Relying on tools such as
> entop (https://github.com/mazenharake/entop) can help a lot.
>
> Fred Hebert (author of LYSE) had also some related helpers in his recon
> library (https://github.com/ferd/recon) but I didn't try it out.
>

These 2 articles are probably better resources than the first one I came
across:

https://blog.heroku.com/archives/2013/11/7/logplex-down-the-rabbit-hole
http://blog.bugsense.com/post/74179424069/erlang-binary-garbage-collection-a-love-hate

gvim

José Valim

unread,
Oct 13, 2014, 10:28:53 AM10/13/14
to elixir-l...@googlegroups.com
It depends. It won't happen with small binaries since they are copied. With large binaries, it *can* happen because that's when we share binaries but those are exactly the cases you don't want to copy them. I haven't seen this issue *personally* yet.

I definitely wouldn't worry though. The truth is that all GC runtimes will have issues once you get to a certain load. I was at an event last week and someone was telling a story how they had to rewrite some algorithms in their data processing layer (in Scala) to make it kinder to the Java GC to reduce the pauses. I have heard multiple times, from different communities, issues about "the garbage collector taking too long" or "the garbage collector is not reclaiming some particular memory" and so on.

What matters at the end of the day is how frequent those issues are and how useful are the tools and documents to solve those issues when they come out. Erlang in Anger is an excellent resource on this matter:


If you are worried about binaries in particular, there are some great links in the blog post above or in this thread.




José Valim
Skype: jv.ptec
Founder and Lead Developer



gvim

--
You received this message because you are subscribed to the Google Groups "elixir-lang-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-talk+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gvim

unread,
Oct 13, 2014, 11:56:59 AM10/13/14
to elixir-l...@googlegroups.com
On 13/10/2014 15:16, Saša Jurić wrote:
> So yeah, this can happen when you pass around larger binaries (I think
> its > 64 bytes), and some processes don't do anything but routing.
>

I'm no Erlang engineer but 64 bytes doesn't sound particularly large.
Wouldn't a lot of string operations fall outside this limit?

gvim

gvim

unread,
Oct 13, 2014, 12:05:04 PM10/13/14
to elixir-l...@googlegroups.com
On 13/10/2014 15:28, José Valim wrote:
> It depends. It won't happen with small binaries since they are copied.
> With large binaries, it *can* happen because that's when we share
> binaries but those are exactly the cases you don't want to copy them. I
> haven't seen this issue *personally* yet.

Can you give a rough idea of a string or piece of text that would be
equal in binary size to the small/large threshold of 64 bytes? I don't
have any idea of how 64 bytes in binary converts to ASCII or UTF8 text.

gvim

Saša Jurić

unread,
Oct 13, 2014, 12:34:54 PM10/13/14
to elixir-l...@googlegroups.com
Well, large and small are subjective and relative. 

Heap binaries are small binaries, up to 64 bytes, that are stored directly on the process heap. They will be copied when the process is garbage collected and when they are sent as a message. They don't require any special handling by the garbage collector.

I’m also interested to hear from someone (maybe Robert?) how has this particular limit been chosen.

You can determine the byte size of a binary using Kernel.byte_size/1:

iex> String.duplicate("a",64) |> byte_size
64

iex> String.duplicate("š",64) |> byte_size
128


--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-talk/fIHq9t1OV_g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-ta...@googlegroups.com.

José Valim

unread,
Oct 13, 2014, 1:14:45 PM10/13/14
to elixir-l...@googlegroups.com
Also worth pointing out this issue only happens when you have a non-heap binary, you reference part of it and you keep the reference around, not allowing the full binary to be garbage collected. There is also :binary.copy/1 which allows you to copy the reference so the bigger binary can be collected.



José Valim
Skype: jv.ptec
Founder and Lead Developer

--
You received this message because you are subscribed to the Google Groups "elixir-lang-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-ta...@googlegroups.com.

Saša Jurić

unread,
Oct 13, 2014, 1:39:03 PM10/13/14
to elixir-l...@googlegroups.com
You mean “also happens”?

If I understand correctly, the issue can also happen if a refc binary just passes as a message through a long-living intermediate process that doesn’t GC. That intermediate process is then holding a reference to the non-heap binary, and prevents it from being released, right?

Robert Virding

unread,
Oct 13, 2014, 5:00:34 PM10/13/14
to elixir-l...@googlegroups.com
This *only* occurs when you send binaries in messages between processes. For binaries local to a process this is never a problem as they are garbage collected like any other data type.

I don't consider it a problem to worry about until you run into problems with it. Then you can see what the issue is and take steps to fix it.

Robert

Saša Jurić

unread,
Oct 13, 2014, 5:14:40 PM10/13/14
to elixir-l...@googlegroups.com

On 13 Oct 2014, at 23:00, Robert Virding <rvir...@gmail.com> wrote:

> This *only* occurs when you send binaries in messages between processes. For binaries local to a process this is never a problem as they are garbage collected like any other data type.

Are you sure? As José and @dieswaytoofast explain, pattern matching a non-heap binary will hold a reference to it. So if a long-running process is holding on to just a small part of a large binary it has created, the large binary is not eligible for GC and the process may cause a leak. Is this right, or am I misunderstanding something?

Robert Virding

unread,
Oct 13, 2014, 8:54:25 PM10/13/14
to elixir-l...@googlegroups.com
As long as a process is referencing a non-heap binary then the binary can't be collected. However this is seldom a problem unless the binary has been sent to other processes as there is only one process which references it. The case @dieswaytoofast was referring to was when these binaries had been sent through various processes so there were many processes which needed to do garbage collection before they could be freed.

I think this is a problem to be aware of but not really start worrying about until it hits you. Otherwise you may start trying to do premature optimisation of this problem without getting a feel for the cost.

Robert
Reply all
Reply to author
Forward
0 new messages