Multi-letter (uppercase) sigils

222 views
Skip to first unread message

José Valim

unread,
Mar 4, 2023, 3:15:45 AM3/4/23
to elixir-lang-core
Sigils in Elixir are currently limited to a single letter. We had many discussions in the past about allowing more letters but they were ultimately rejected because of lowercase sigils.

The issue with multi-letter lowercase sigils is that:

1. they are ambiguous to humans
2. they are ambiguous to machines
3. they may have security implications

For instance, I would say that sigils in Elixir have quite distinctive features:

var = ~w"foo"
var = ~w[bar]

Tilde, a letter, and the content surrounded by terminators. However, given how most identifiers in the language are lowercase, I think using a multi-letter starts to become less clear. For example, imagine we supported a sigil named opts:

var = ~opts[bar]

That's awfully close to:

var =~ opts[bar]

Which would in fact be ambiguous at the parser level.

The other aspect is that security recommendations suggest different interpolations to be used for different aspects. For example, imagine someone wants to implement a SQL query sigil that automatically escapes characters. Today, one could write this:

~q"""
SELECT * FROM posts WHERE id = #{id}
"""

And that would be safe! But the fact we are using interpolation means someone can simply forget the ~q at the front and write an _unsafe_ query. It would be much better if the interpolation is different altogether:

~SQL"""
SELECT * FROM posts WHERE id = {{id}}
"""

On one hand, it may feel inconsistent to have different ways to interpolate, but at the same time it is reasonable to use different mechanisms when different behaviours and security trade-offs are involved. Especially because #{...} typically means string conversion and that's not the case for SQL queries (it is simply parameter placement).

With all of this in mind, the suggestion is to allow only multi-letter uppercase sigils. Most sigils are uppercase anyway:

1. Elixir defines 4 lowercase sigils (~r, ~s, ~w, and ~c) but 8 uppercase ones (the four previous plus ~T, ~D, ~N, ~U for datetimes)
2. Nx uses ~V and ~M for vectors and matrices respectively
3. LiveView uses ~H, Surface uses ~F, and LiveView Native will need at least two uppercase sigils for Swift UI and Jetpack Compose

Therefore, I would like to propose for multi-letter uppercase only sigils to be introduced and be, from now on, the recommendation for new libraries. This means we won't deprecate ~T, ~D, ~N, ~U in Elixir, but there is still time to rewrite ~V and ~M in Nx to ~VEC and ~MAT. LiveView and Surface can decide if they want to migrate or not, ~SF may be a better choice for the latter, but LiveView Native can choose to support, for example, between ~JETPACK or ~JC if it prefers an abbreviation.

Looking forward to feedback,



Zach Daniel

unread,
Mar 4, 2023, 10:04:31 AM3/4/23
to elixir-l...@googlegroups.com
Yes please :) that’s is all.


--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KTx%2BYW02gQLvH-ihyhgv6dAhjrwSEdhP81niuvjrWfTg%40mail.gmail.com.

Amos King - Binary Noggin

unread,
Mar 4, 2023, 11:53:58 AM3/4/23
to elixir-l...@googlegroups.com
I love this idea. Removing the ambiguity will help with adoption as new developers to the language are less confused. It also allows infinite sigils to be available without stepping on each other. I'm not suggesting a jump on making sigils for everything.

You made the subtle point of sigils possibly having a different interpolation syntax. I like the side effect of reducing errors, but it also creates another syntax to learn. I like languages requiring you to be safer by default, but this isn't a security concern in many circumstances.

~D[#{year}-08-24]

~D[{{year}}-08-24]

In the SQL case, it makes sense, but does it for something like Date?

Changing the interpolation syntax saves a small class of errors in specific circumstances but creates multiple ways to interpolate that lead to confusion, IMO.

Cheers,



José Valim

unread,
Mar 4, 2023, 12:55:28 PM3/4/23
to elixir-l...@googlegroups.com
To be clear, I am not advocating for any change to the existing sigils. The main point is that uppercase sigils are more important than lowercase ones and that, even in _some_ cases you may want interpolation (which would warrant a lowercase sigil), having a different syntax for interpolation can be a plus. Look no further than ~H for an example of sigil where the interpolation syntax is different for several reasons!

Austin Ziegler

unread,
Mar 4, 2023, 1:10:47 PM3/4/23
to elixir-l...@googlegroups.com
Would such sigils need to be all uppercase or would an uppercase initial letter be sufficient? I would think that `~Vec[…]` or `~Mat[…]` would be more readable (and easier to type, eventually). I’m not sure whether `~Sql[…]` or `~SQL[…]` would be better, but I think that would be good.

What about Unicode sigils like `~δ[…]` (lowercase delta) vs `~Δ[…]`? I haven’t tried, but are those possible and would Unicode capitalization make the difference?

Could I make a "multi-word" sigil by using underscores? `~Jet_Pack[…]`. (I think…no, especially if we could use intercapping `~JetPack[…]`.)

I’m in favour of this, but would prefer initial casing as sufficient indicator that multi-letters are allowed.

-a

José Valim

unread,
Mar 4, 2023, 1:24:28 PM3/4/23
to elixir-l...@googlegroups.com
All ascii uppercase for now because I want to avoid introducing confusion that sigils are somehow related to modules, e.g. ~Mat[...] may have people looking for a module named Mat somewhere. Plus, we can always further relax the rules later and allow more characters.

Bruce Tate

unread,
Mar 5, 2023, 10:12:41 AM3/5/23
to elixir-l...@googlegroups.com
This change would be a most welcome one. Sigils are going to be more important as Elixir expands into new domains, and it's helpful to have clues to what each sigil does. 

The restrictions to upper case seem to be reasonable ones. 

-bt

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KTx%2BYW02gQLvH-ihyhgv6dAhjrwSEdhP81niuvjrWfTg%40mail.gmail.com.


--

Regards,
Bruce Tate
CEO

José Valim

unread,
Mar 5, 2023, 2:21:47 PM3/5/23
to elixir-l...@googlegroups.com
This has been accepted and merged. Thanks everyone.

Ben Wilson

unread,
Mar 5, 2023, 6:33:01 PM3/5/23
to elixir-lang-core
I'm generally +1 on this, but I am a tiny bit confused about the proposed change to interpolation. Does this propose a change to interpolation for all sigils, single or multi letter? If so, what is the deprecation plan for the current interpolation syntaxx?

José Valim

unread,
Mar 6, 2023, 2:46:02 AM3/6/23
to elixir-l...@googlegroups.com
No change or deprecation. They will remain allowed only on lowercase (and therefore limited).

Reply all
Reply to author
Forward
0 new messages