[Proposal] Memoize sigil_r on OTP 28

58 views
Skip to first unread message

Kip

unread,
Mar 29, 2025, 9:29:42 AMMar 29
to elixir-lang-core
TLDR;
Memoize (bind to a variable) the result of `Regex.compile!/2` on OTP 28 so that it is only compiled once.

Background

Since in OTP 28 it's not possible to unquote a regex (~r/..../) into code, the implementation of sigil_r on OTP 28 has to compile the regex at runtime. In code which iterates over text using regex (for example Unicode break algorithm, Unicode transforms and so on) this could lead to a performance penalty. 

Proposal

Bind the result of Regex.compile!/2 to a variable called something like `__regex_#{hash_of_regex_string}` if its successful. If the variable is bound, use it directly without compilation. Do performance testing to confirm that there is benefit to memoizing.

I am fine to do this work if the proposal has merit.


José Valim

unread,
Mar 29, 2025, 9:54:32 AMMar 29
to elixir-l...@googlegroups.com
The memoization will only be useful if we either do variable hoisting, which are inherently limited to the current function, or we store it in persistent term. The former will require meaningful changes in the compiler and the latter may have runtime impact.

Instead, we are discussing adding the optimization we did before directly to Erlang/OTP. Meanwhile, I suggest refactoring the code to pass the regex around in sensitive areas. :(



--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/elixir-lang-core/afa20f7d-372c-4307-9884-7a1a931a927cn%40googlegroups.com.

Kip

unread,
Mar 29, 2025, 10:17:42 AMMar 29
to elixir-lang-core
> and the latter may have runtime impact.
Definitely a concern for the "simple" case of a use-once regex. I was thinking of something like (pseudo code)
    var = :erlang.iolist_to_binary(["__regex_",:erlang.integer_to_list(abs(:erlang.monotonic_time(:nanosecond)))])
    quote do
     var =  if  var, do: var, else: Regex.compile!(binary_or_tuple, options)
    end
Too much risk of performance impact?  I think the BEAM optimises out the binding in positive cases like this?

> Instead, we are discussing adding the optimization we did before directly to Erlang/OTP
Yep, I'm anxiously awaiting a good outcome from that conversation :-) Thanks for encouraging the OTP team on this.

Kip

unread,
Mar 29, 2025, 10:26:22 AMMar 29
to elixir-lang-core
> Meanwhile, I suggest refactoring the code to pass the regex around in sensitive areas. :(

I wish I could (or maybe more clearly, had the ability!). The Elixir code is all generated from this delightful set of regex and rules. It works surprisingly well but I suspect a performance hit on OTP28 (benchmarking next week after I coerce benchee to compile on OTP 28).

Kip

unread,
Mar 29, 2025, 10:31:04 AMMar 29
to elixir-lang-core
Sorry for the noise. You're totally right of course, memoization can only work within the context of a single function (as written). Persistent term has the issues you mentioned. And even in the cases I'm looking to improve, the code structure won't benefit enough.

I hope the OTP team can optimise the loading of regex in a future release.

Case closed.

Reply all
Reply to author
Forward
0 new messages