Using cerl -debug with an elixir release

114 views
Skip to first unread message

Stephen Baldwin

unread,
Aug 22, 2022, 11:46:50 AM8/22/22
to elixir-lang-core
Hello I've been trying to debug a seg fault in my elixir app for a bit. I've learned that if debugging symbols are enabled on the erlang vm you can use gdb to debug a linux core file to deduce where the seg fault is occuring. Now I've rebuilt erlang from source to have the debugging symbols and that all works fine, but using it with an elixir release seems to be a bit difficult.

I modified the elixir bin similar to https://github.com/elixir-lang/elixir/pull/11082 but I am getting on_boot errors when running the release. So replacing cerl with erl isn't a path to success.

I need some help as I don't fully understand the path from an elixir release to the erlang vm. Any quick ways to get this to work? Otherwise I think it would be worthwhile to have an option when building an elixir release to use a cerl vm (debug, valgrind, etc). 

Regards,
Stephen

José Valim

unread,
Aug 22, 2022, 12:14:01 PM8/22/22
to elixir-lang-core
Running Elixir with cerl should just work. Can you expand on the on_boot errors you get in a release?

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/fd8b2291-c3aa-49fd-925f-bde1560fc379n%40googlegroups.com.

José Valim

unread,
Aug 22, 2022, 12:14:52 PM8/22/22
to elixir-lang-core
Or perhaps please provide a minimal app that reproduces it. :)

Stephen Baldwin

unread,
Aug 22, 2022, 1:17:39 PM8/22/22
to elixir-lang-core
 Hey José,

I was able to get elixir to work with cerl, such as elixir -e "IO.puts :ok". But I could not get it to work in the release environment running in docker. I get this error:

root:~# releases/0.1.0/elixir -e "IO.puts :ok"
{"init terminating in do_boot",{undef,[{elixir,start_cli,[],[]},{init,start_em,1,[]},{init,do_boot,3,[]}]}}
init terminating in do_boot ({undef,[{elixir,start_cli,[],[]},{init,start_em,1,[]},{init,do_boot,3,[]}]})

Crash dump is being written to: erl_crash.dump...done

In my dockerfile I replace the elixir bin with my elixir debug like so:

....
COPY --from=build /app/_build/prod/rel/app ./
# Copy erlang source, along with erlang debug binary
COPY --from=build /OTP/subdir /OTP/subdir
# Symlink erlang debug binary to erts bin dir
RUN ln -s /OTP/subdir/bin/cerl /app/releases/0.1.0/../../erts-11.2.2.15/bin/cerl
# Replace elixir script with our scrip that runs the erlang debug binary
COPY elixir-debug releases/0.1.0/elixir
RUN chmod +x releases/0.1.0/elixir
...

The source image for my app docker file is hexpm/elixir:1.10.4-erlang-23.3.4.16-ubuntu-bionic-20210930 but modified to keep the erlang source code and build cerl debug vm.

I don't think I can reasonably share a minimal app that reproduces the issue (without sharing my app code which I cannot). The seg fault happens randomly after x hours and I do not know what is causing it.

Attached is my modified elixir bin.


elixir-debug

José Valim

unread,
Aug 22, 2022, 1:23:11 PM8/22/22
to elixir-lang-core
Sorry, I meant if you have a small app that reproduces the _boot_ issue.

In any case, I would say it is worth trying the latest Erlang. The segfault may have been addressed.

Stephen Baldwin

unread,
Aug 22, 2022, 1:23:23 PM8/22/22
to elixir-lang-core
I can successfully run the cerl -debug vm from the sym link:

root@:~# erts-11.2.2.15/bin/cerl -debug
Erlang/OTP 23 [erts-11.2.2.15] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe] [type-assertions] [debug-compiled] [lock-checking]

Eshell V11.2.2.15  (abort with ^G)
1>

Stephen Baldwin

unread,
Aug 22, 2022, 1:24:39 PM8/22/22
to elixir-lang-core
Ok I'll see about using the most recent erlang version. Thanks for the tip.

Stephen Baldwin

unread,
Aug 23, 2022, 5:29:24 PM8/23/22
to elixir-lang-core
Just to circle back here. I upgraded our app from otp 22.3.4.20 -> 23 and we have not seen any segfaults for ~ 24 hours; which is solid considering we'd expect at least a couple in that timeframe. 

I also was able to get our elixir release to use the debug build of erlang by configuring the release to use the erlang source dir as the erts directory. Below is the configuration. Because mix.release copy_erts expects the erts dir to be of the format "erts-<version-number>" we symlinked the erlang source dir /OTP/subdir to /OTP/subdir/erts-11.2.2.15. We did this in the Dockerfile right before calling mix release. 

# in our app's mix.exs
      releases: [
        app: [
          include_erts: "/OTP/subdir/erts-11.2.2.15",
          strip_beams: false,
        ],
      ],

Now the cerl binary will be in the top-level erts directory assembled by the release. We also had to have a slightly modified elixir bin that used cerl -debug instead of erl. 

Reply all
Reply to author
Forward
0 new messages