[erlang-questions] Optimization for the ++ operator

José Valim

unread,

May 31, 2012, 7:55:48 AM5/31/12

to Erlang

I thought Erlang would optimize the ++ operator when the left side is known at compile time.

For example, if the compiler sees the following outside of a function signature:

"Foo" ++ Bar

It could rewrite it as:

[$F, $o, $o | assert_list(Bar)]

However, I ran some benchmarks and it seems the optimization does not happen (on R15B).

With a local dummy implementation of assert_list(Bar), I got that the first format is 50% slower than the second one.

That said, given the possibility something is odd in my setup, does Erlang optimize it or not? If not, could it?

José Valim

www.plataformatec.com.br

Founder and Lead Developer

Björn Gustavsson

unread,

Jun 1, 2012, 4:28:12 AM6/1/12

to José Valim, Erlang

On Thu, May 31, 2012 at 1:55 PM, José Valim <jose....@gmail.com> wrote:
> I thought Erlang would optimize the ++ operator when the left side is known
> at compile time.
>
> For example, if the compiler sees the following outside of a function
> signature:
>
> "Foo" ++ Bar
>
> It could rewrite it as:
>
> [$F, $o, $o | assert_list(Bar)]
>

The '++' operator does not verify that the second argument is a list.
Therefore, the compiler will rewrite the first expression to simply:

[$F,$o,$o|Bar]

> However, I ran some benchmarks and it seems the optimization does not happen
> (on R15B).
> With a local dummy implementation of assert_list(Bar), I got that the first
> format is 50% slower than the second one.

Have you made sure that you run each test in a newly spawned process?
Do you run the test for long enough time?

If you have set up your benchmark environment properly, and your second
example is still faster, I can only assume that the code sits better in cache.

--
Björn Gustavsson, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

José Valim

unread,

Jun 1, 2012, 4:46:28 AM6/1/12

to Björn Gustavsson, Erlang

The '++' operator does not verify that the second argument is a list.
Therefore, the compiler will rewrite the first expression to simply:

[$F,$o,$o|Bar]

Yes, thanks. For some weird reason I thought Erlang restricted the right side to be a list in such cases.

Have you made sure that you run each test in a newly spawned process?

Do you run the test for long enough time?

Running in a newly spawned process did the trick, the results are now consistent.

I was already running it 1000 times, here is the code (without using spawn):

https://www.refheap.com/paste/2954

I assume having a "clean" heap (reducing the chance of garbage collection) is the reason why it is a good idea to run benchmarks in a newly spawned process?

Thanks Björn!

Björn-Egil Dahlberg

unread,

Jun 1, 2012, 6:07:33 AM6/1/12

to José Valim, Erlang

If you want to vary one parameter in a test, make sure all other parameters stays the same otherwise you have no idea what you are measuring!

Code loading, heaps, position of mars .. Everything has to be taken into account.

Thanks Björn!

Gustav Simonsson

unread,

Jun 1, 2012, 9:09:09 AM6/1/12

to erlang-q...@erlang.org

On 2012-06-01 12:07, Björn-Egil Dahlberg wrote:

1 jun 2012 kl. 10:46 skrev "José Valim" <jose....@gmail.com>:

The '++' operator does not verify that the second argument is a list.
Therefore, the compiler will rewrite the first expression to simply:

[$F,$o,$o|Bar]

Yes, thanks. For some weird reason I thought Erlang restricted the right side to be a list in such cases.

Have you made sure that you run each test in a newly spawned process?

Do you run the test for long enough time?

Running in a newly spawned process did the trick, the results are now consistent.

I was already running it 1000 times, here is the code (without using spawn):

https://www.refheap.com/paste/2954

I assume having a "clean" heap (reducing the chance of garbage collection) is the reason why it is a good idea to run benchmarks in a newly spawned process?

If you want to vary one parameter in a test, make sure all other parameters stays the same otherwise you have no idea what you are measuring!

Code loading, heaps, position of mars .. Everything has to be taken into account.

The effect of Mars magnetic field on that of Earth, even if Earth's aphelion and the perihelion of Mars were to coincide and the interplanetary distance were to be at its theoretical minimum, is negligible compared to the normal variations in Earth's magnetic field.

Therefore it's very unlikely that the relative position of Mars would affect the probability of a memory bitflip occurring due to cosmic radiation and thus affecting the benchmark.

Regards,
Gustav Simonsson
Erlang/OTP team

Björn-Egil Dahlberg

unread,

Jun 1, 2012, 9:22:16 AM6/1/12

to erlang-q...@erlang.org

On 2012-06-01 15:09, Gustav Simonsson wrote:

On 2012-06-01 12:07, Björn-Egil Dahlberg wrote:

1 jun 2012 kl. 10:46 skrev "José Valim" <jose....@gmail.com>:

The '++' operator does not verify that the second argument is a list.
Therefore, the compiler will rewrite the first expression to simply:

[$F,$o,$o|Bar]

Yes, thanks. For some weird reason I thought Erlang restricted the right side to be a list in such cases.

Have you made sure that you run each test in a newly spawned process?

Do you run the test for long enough time?

Running in a newly spawned process did the trick, the results are now consistent.

I was already running it 1000 times, here is the code (without using spawn):

https://www.refheap.com/paste/2954

I assume having a "clean" heap (reducing the chance of garbage collection) is the reason why it is a good idea to run benchmarks in a newly spawned process?

If you want to vary one parameter in a test, make sure all other parameters stays the same otherwise you have no idea what you are measuring!

Code loading, heaps, position of mars .. Everything has to be taken into account.

The effect of Mars magnetic field on that of Earth, even if Earth's aphelion and the perihelion of Mars were to coincide and the interplanetary distance were to be at its theoretical minimum, is negligible compared to the normal variations in Earth's magnetic field.

Therefore it's very unlikely that the relative position of Mars would affect the probability of a memory bitflip occurring due to cosmic radiation and thus affecting the benchmark.

My point, in case you missed it, is that you can't disregard something out of hand. You have to be sure. In physics you might be familiar to similar topics from dimensional analysis, i.e. how to come up with a model and how to disregard useless parameters.

All I'm saying is that you should be sure of what you are measuring. =)

Heinz N. Gies

unread,

Jun 8, 2012, 12:25:32 PM6/8/12

to Erlang Questions

The very unlikely is the important part here ;) also imagine mars position causes it to catch a astroid in it's gravitation field and slingshots it right into your computer - now that would screw your benchmark results!

--
Heinz N. Gies
he...@licenser.net
http://licenser.net

Reply all

Reply to author

Forward