Disable regex precompilation

68 views
Skip to first unread message

Daniil Fedotov

unread,
May 9, 2019, 4:28:53 PM5/9/19
to elixir-lang-core
Hi all,

By default regex strings are compiled on code compilation stage.
This makes it harder to produce packages which would work with different
erlang versions and on different platforms.


Also the runtime prints a warning message, when starting, if code is running
on a system with different endianness. This can make users anxious.

This can be solved on the code level by recompiling regexes in runtime, but
that requires code hygiene, which may be harder to maintain and also does not
protect from regexes in dependencies.


Regexes are precompiled in the sigil_r macros in Kernel. It should be possible
to configure it to not precompile regexes and use plain binaries instead.

The challenge here is to pass a build configuration parameter to this macro.
I tried to use Mix.Project.config in my PR: https://github.com/elixir-lang/elixir/pull/9031

But as Jose noticed, Mix cannot be accessed from the Kernel module.

I wonder if there is some other way to pass this build configuration to Kernel?

José Valim

unread,
May 9, 2019, 5:11:27 PM5/9/19
to elixir-l...@googlegroups.com
So one option where this switch could be implemented is in  Code.compiler_options.

However, none of the options in there change how the code is compiled is today, and I am a bit worried about introducing a precedent, but I thought I would mention it.


José Valim
Skype: jv.ptec
Founder and Director of R&D


--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/99a6af4c-6811-4f9a-8a82-c7e9645279ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
May 10, 2019, 2:55:09 PM5/10/19
to elixir-lang-core
A compiler option makes sense to me. This is a compile-time decision after all.

An alternative for Elixir users who distribute binary builds of their packages (such as RabbitMQ) would be to produce platform-specific builds
which would be a non-trivial undertaking and double the number of packages that have to be produced (we already produce 8, for example).

José, is there a specific concern that you have with setting a precedent?
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-l...@googlegroups.com.

José Valim

unread,
May 10, 2019, 3:05:34 PM5/10/19
to elixir-l...@googlegroups.com
There is no compiler option that changes the code behavior. It is a global configuration, which can make debugging and understanding the system hard. We would need a strong precedent to add it.

Given the current problem has existing solutions today, all you need to do is to wrap the regex, I still think having a linter that guarantees all regexes have been wrapped is IMO the best call. So all we need is a flag to disable the endianess check.
--

Louis Pilfold

unread,
May 10, 2019, 5:05:25 PM5/10/19
to elixir-lang-core
I'm not sure a linter would help here as it could only be applied to your app code, not that of libraries.

This approach would mean that any library that uses an regex signal could not be used in the application as they may not work on other Erlang versions. Is that right?

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2BNDVa%2BG0XH-21UWu9BU-%2BJnnED55XQaRjncKp1qowRXw%40mail.gmail.com.

José Valim

unread,
May 10, 2019, 5:07:39 PM5/10/19
to elixir-l...@googlegroups.com
> This approach would mean that any library that uses an regex signal could not be used in the application as they may not work on other Erlang versions. Is that right?

It is correct. Elixir for example wraps its regexes in the same Regex.recompile block. But according to the previous discussion in the issues tracker they are not concerned with dependencies.

José Valim
Skype: jv.ptec
Founder and Director of R&D

Louis Pilfold

unread,
May 10, 2019, 5:24:50 PM5/10/19
to elixir-lang-core
I see! The first email here said deps was a problem so I was going with that. I'm glad a solution has been found :)

José Valim

unread,
May 10, 2019, 5:25:20 PM5/10/19
to elixir-l...@googlegroups.com
> I see! The first email here said deps was a problem so I was going with that. I'm glad a solution has been found :)

So maybe it is a problem. :P

José Valim
Skype: jv.ptec
Founder and Director of R&D

Daniil Fedotov

unread,
May 10, 2019, 5:36:37 PM5/10/19
to elixir-lang-core

It's mostly a maintenance problem. We currently don't use libraries with precompiled regexes, but if we need to use a new library - we will have to search the code there as well.
That maintenance may cause bugs in the future. So support for endianness it's not broken right now, but it's fragile.
When the endianness check is disabled, it does not prevent code with precompiled regexes to be loaded, which may be hard to debug.

Maybe regexes could have some sort of endianness check in the regex module and data structure and recompile if necessary (or use plain binary). This would impact performance on different endian system, but would be more explicit and robust solution. Then there will be no need to check for endianness in the system start.

On Friday, 10 May 2019 17:24:50 UTC-4, Louis Pilfold wrote:
I see! The first email here said deps was a problem so I was going with that. I'm glad a solution has been found :)

On Fri, 10 May 2019, 22:07 José Valim, <jose...@plataformatec.com.br> wrote:
> This approach would mean that any library that uses an regex signal could not be used in the application as they may not work on other Erlang versions. Is that right?

It is correct. Elixir for example wraps its regexes in the same Regex.recompile block. But according to the previous discussion in the issues tracker they are not concerned with dependencies.

José Valim
Skype: jv.ptec
Founder and Director of R&D


On Fri, May 10, 2019 at 11:05 PM Louis Pilfold <lo...@lpil.uk> wrote:
I'm not sure a linter would help here as it could only be applied to your app code, not that of libraries.

This approach would mean that any library that uses an regex signal could not be used in the application as they may not work on other Erlang versions. Is that right?

On Fri, 10 May 2019, 20:05 José Valim, <jose...@plataformatec.com.br> wrote:
There is no compiler option that changes the code behavior. It is a global configuration, which can make debugging and understanding the system hard. We would need a strong precedent to add it.

Given the current problem has existing solutions today, all you need to do is to wrap the regex, I still think having a linter that guarantees all regexes have been wrapped is IMO the best call. So all we need is a flag to disable the endianess check.
--


José Valim
Skype: jv.ptec
Founder and Director of R&D

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-l...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-l...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-l...@googlegroups.com.

José Valim

unread,
May 10, 2019, 6:02:59 PM5/10/19
to elixir-l...@googlegroups.com
I like the idea of recompiling on the fly. It would also remove the need for the explicit recompile calls.

José Valim

unread,
May 11, 2019, 3:37:36 AM5/11/19
to elixir-l...@googlegroups.com
Taking a further look at the code, the issue with recompiling regexes on the fly is that it makes executing the regexes more expensive, as we need to compute the version on every execution. We could store the version in ETS but that would have performance issues. Storing in a persistent_term would be great, but at the moment we support Erlang/OTP 20+. Thoughts?



José Valim
Skype: jv.ptec
Founder and Director of R&D

Michael Klishin

unread,
May 13, 2019, 2:14:22 PM5/13/19
to elixir-l...@googlegroups.com
In case [1] is insufficient, I'd say bump the OTP requirement to 21+ for Elixir 1.9 and use persistent terms.


--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KCpvxxk%2BsMzhZiSO0VgO2TwtHS9_WgvUmm_jQRgL0r%2Bw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

José Valim

unread,
May 13, 2019, 2:27:38 PM5/13/19
to elixir-l...@googlegroups.com
Persistent terms require Erlang 21.1 or 21.2 and we don't depend on point versions because that makes it more confusing. So if we need to depend on persistent terms for it to be concrete, it will take in the most optimistic scenarios at least a year. But now that we have the PR, we have enough to bench, so we can push the discussion forward.



José Valim
Skype: jv.ptec
Founder and Director of R&D

Reply all
Reply to author
Forward
0 new messages