Can we optimize immediately-called anonymous functions such as the ones used with then/2?

137 views
Skip to first unread message

w...@resilia.nl

unread,
Jan 3, 2022, 10:28:42 AM1/3/22
to elixir-lang-core
Since v1.12 we have the macro `Kernel.then(value, function)` which expects an arity-1 function and will call it with the given value.

This makes code which used to be written as follows:

```
def update(params, socket) do
  socket =
    socket
    |> assign(:myvar, params["myvar"])
    |> assign_new(:some_default, fn -> 42 end)

  {:noreply, socket}
end
```

more readable, by allowing it to be written as:

```
def update(params, socket) do
    socket
    |> assign(:myvar, params["myvar"])
    |> assign_new(:some_default, fn -> 42 end)
    |> then(&{:noreply, &1})
end
``` 

This pattern seems to be common in codebases using Elixir 1.12 and up (At least according to anecdotal evidence).

All is well. Except there is a little snag: The new code does not have the same runtime characteristics (both in performance and in memory usage) as `then`desugars to `(function).(value)`: An anonymous function is created and immediately run (and then garbage collected soon after).

The Erlang compiler is clever enough to optimize these immediately-called anonymous functions away, but it will only do so when `@compile :inline` is set in the given module, to not mess with the call stack that might be returned when an exception is thrown.

Now `@compile :inline` is quite the sledgehammer, as it will inline all functions in the current module (as long as they are not 'too big', which can also be configured, and only in the places where they are called statically).
But since we're dealing with anonymous functions here which do not have clear names, there is no way to predict the name one should pass to the `@compile` option.


It seems like this situation could be improved, although I am not sure how.

Is there a way to mark these anonymous functions in some kind of way, to allow only them to be inlined?
Or is there maybe a way to have the Elixir-compiler already inline common patterns like a capture with a datatype, rather than relying on the Erlang compiler for this?
Your input is greatly appreciated.

~Marten/Qqwy

José Valim

unread,
Jan 3, 2022, 10:57:43 AM1/3/22
to elixir-lang-core
then/2 is a macro and the emitted code should be optimized from Erlang/OTP 24+.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/f0da2df2-432e-423c-a02b-27d8b916a0ecn%40googlegroups.com.

Wiebe-Marten Wijnja

unread,
Jan 3, 2022, 11:03:35 AM1/3/22
to elixir-l...@googlegroups.com

I've been running my tests on Elixir v1.13.1 built for OTP24 with OTP 24.1.2.
When decompiling the resulting BEAM bytecode, the anonymous functions are still visible.

I will do some benchmarks to see how the resulting performance is. Maybe the JIT will do something which is not visible in the BEAM bytecode.

OpenPGP_signature

José Valim

unread,
Jan 3, 2022, 11:30:17 AM1/3/22
to elixir-lang-core
The optimization may happen on the loader. Use erts_debug:df(Mod, Fun, Arity) and see that.

Wiebe-Marten Wijnja

unread,
Jan 3, 2022, 2:06:52 PM1/3/22
to elixir-l...@googlegroups.com

I have run some benchmarks (comparing OTP23 with JIT-enabled OTP24).
Full results here: https://github.com/Qqwy/elixir-test-benchmrking_then/

It compares, in a situation where no tail recursion optimization is possible, `Kernel.then/2` vs. writing the same code manually vs. using `Kernel.then/2` with `@compile :inline`.


A brief summary of the results:

- OTP24 is able to get roughly twice as many iterations per second as OTP23. However:
- On OTP24:
  - using `Kernel.then/2` requires (when tail recursion is not possible) 2.5x the memory of the other two variants.
  - using `Kernel.then/2`is roughly 30% slower than the other two variants.
- On OTP23:
  - all three techniques use the same amount of memory.
  - using `Kernel.then/2`is roughly 8% slower than the other two variants.

Strange...


I also took a look at the disassembled code using :erts_debug.df as you suggested.
Details here: https://github.com/Qqwy/elixir-test-benchmrking_then/#looking-at-the-disassembled-code
(Note that under OTP24 the *.dis-files only contained 1-5 empty lines, so the output is from OTP23. Should I file a bug with the OTP team for this?)

It seems that also during loading, no optimization of immediately-called anonymous functions is taking place.
Above benchmarks seem to support this fact, although the results w.r.t. memory usage and the difference in slowdown vs OTP23/24 seems very odd to me.


How to continue?


~Marten/Qqwy

OpenPGP_signature

José Valim

unread,
Jan 3, 2022, 2:17:14 PM1/3/22
to elixir-l...@googlegroups.com
Ah, df has no effect on a JIT system, I forgot about that. Is the memory measurements guaranteed to have consistent effect of the GC across benchmarks?

Wiebe-Marten Wijnja

unread,
Jan 3, 2022, 2:38:07 PM1/3/22
to elixir-l...@googlegroups.com

Yes, across benchmark runs the memory measurements are the same.

OpenPGP_signature

José Valim

unread,
Jan 3, 2022, 2:47:25 PM1/3/22
to elixir-lang-core
Sorry, for the short replies, I was on my phone. :)

What I mean is, are the measurements across examples guaranteed to have the same amount of garbage collector calls (or no calls at all)? I am worried that, for quick snippets, the memory measurements are being influenced by other factors. But according to my understanding the anonymous function should not be allocated on Erlang/OTP 24 (and I think some further improvements are coming on 25).

Plus comparing against OTP 23 and 24 will be tough due to the JIT.

Wiebe-Marten Wijnja

unread,
Jan 3, 2022, 4:05:46 PM1/3/22
to elixir-l...@googlegroups.com

No worries, thanks a lot for your guidance in this matter! ^_^

I will try to come up with some other, more 'real-world'-like examples to double-check whether the benchmark's results apply only on quick snippets or across the board.

Do you happen to know if there is any way to inspect the result of the JIT-pass?

You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/15sjCMZyqFk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4LqMq6oLpncmWethkon3Xpbp%3DTQAw8kOm96sU%2Bf3qvj0Q%40mail.gmail.com.
OpenPGP_signature

José Valim

unread,
Jan 3, 2022, 4:18:58 PM1/3/22
to elixir-lang-core
Unfortunately I don't know if there is a way to see the JIT code. But given that regular profiling tools like prof now work with the BEAM, maybe it is also possible to use similar tools to see the JITed code?

In any case, I tracked the commit: https://github.com/erlang/otp/pull/4545 - none of the work is happening in the loader, unfortunately. Sorry for the red herring. The commit makes it so a function object is no longer allocated but you still have to perform a local call and perhaps that's the additional cost? I guess a further pass would be to eliminate the function call altogether if the invoked function does not define any variable, but that should be done by the Erlang Compiler.

Wiebe-Marten Wijnja

unread,
Jan 5, 2022, 3:49:48 PM1/5/22
to elixir-l...@googlegroups.com

As a follow-up, I wrote another benchmark, and ran it 'properly' this time, by running it on a custom-built Erlang/OTP v 24.2.0 which supports both the JIT and EMU `emu_flavor`s.

This updated benchmark compares three different implementations of a GenServer `handle_cast` callback, which seemed like a more realistic scenario to me.
See here for the three implementations https://github.com/Qqwy/elixir-benchmarking_then_genserver/blob/main/lib/implementation.ex#L2-L34

The results in this example are that `Then`, `ThenInlined` and `Manual` are similarly efficient, and all take the same amount of memory.
So I guess that at least when `Kernel.then/2` is in a tail-recursion position (which is probably the common case), it will be optimized well by the Erlang compiler. :-)


For whom wants to dig deeper into this benchmark themselves, see: https://github.com/Qqwy/elixir-benchmarking_then_genserver

~Marten/Qqwy

OpenPGP_signature
Reply all
Reply to author
Forward
0 new messages