Proposal: Multi-letter sigils

221 views
Skip to first unread message

Wojtek Mach

unread,
Feb 15, 2020, 8:52:38 AM2/15/20
to elixir-l...@googlegroups.com
Currently sigils are single letter which means there can be only 2 x 26 of them and some of them
are already taken by the standard library. As mentioned in [1] it's not clear if there should be
for example a `~P` and if so, whether it should be for PID or Port. Similarly, there couldn't be an
`~R` sigil for Reference given the symbol is already taken by Regex.

I'd like to propose to extend sigils to support multiple letters. For example, to define a
`~Port` sigil we'd write a `sigil_Port` function/macro and to use it it would have to be either
local to the module or be imported. After the first letter, we could only use US-ASCII letters
(a-z, A-Z). If a sigil starts with a lower-case letter it's interpolated, otherwise it is not.

As part of this proposal I'd like to introduce the following sigils to the Kernel module:

- `~Port<0.6>`
- `~PID<0.108.0>`
- `~Reference<0.2489367154.3551002625.84263>`
- `~Version<1.0.0>`
- `~URI<https://elixir-lang.org>`

But worth mentioning that the primary goal of this proposal is allow the community to build sigils
like these:

- `~Decimal<3.14>`
- `~Complex<0+1i>`
- `~Ratio<1/3>`
- `~Money<100 USD>`
- `~Geo<SRID=4326;POINT(30 -90)>`
etc

basically whenever there's some piece of structured data with compact string representation it'd
be a good candidate for a sigil.

Notice, I have chosen the same delimiter, `<`, for all proposed sigils. Different ones for
different sigils could be of course chosen as the "cannonical" (returned from the Inspect
implementation.)

Below I'd like to discuss some limitations of this proposal.

Given we already have sigils that correspond to structs like `~D`, `~T`, `~N`, `~U`, `~R`, should
we deprecate them in favour of `~Date`, `~Time`, `~NaiveDateTime`, `~DateTime`, `~Regex`? I'd
arbitrarily say we **should not** and instead keep them as is. (Personally I wouldn't mind using
all of these except for maybe `~NaiveDateTime` which is rather long.)

The longest possible sigil name would be 249 letters (which along with 6 letters in `sigil_` make
255 characters which is the atom length limit). A shorter maximum name length could be chosen.

As mentioned in [2], we run into technical limitations when implementing a ~MapSet sigil, given
sigils work on string and not the AST. This could be emulated with some caveats [3]. I'd argue
that given single letter sigils have exactly the same problem, perhaps it's not a deal-breaker,
just one of consequence of the original design.

Given multi-letter sigils may (but of course don't have to) correspond to module names, what about
modules with dots like `Date.Range` and `Version.Requirement`? This is especially relevant for
user provided sigils, e.g. `~MyApp.Money`. Turns out it's very easy to support these too, instead
of `def sigil_Date.Range` which would be a syntax error, we would do `def
unquote(:"sigil_Date.Range")`. But then the other parts of the system don't quite work either,
e.g. instead of `iex> h sigil_Date.Range` currently we would have to do
`iex> h Kernel."sigil_Date.Range"`. For what it's worth it's not very different than the `./2`
macro [4] which has similar caveats. In any case, as much as I'd personally like to see
`~Date.Range` in particular, I concede we probably should stick to just supporting letters for
now.

Worth mentioning that sigils need to be manually imported into the current scope (unless they are
already there by default, like the ones on Kernel). Thus, to use ~Decimal, users would have to do:
`import Decimal, only: [sigil_Decimal: 2]`. A convenience like `import Decimal, only: :sigils`
could be added in the future but it's not the topic of this proposal.

Limitations aside, here's a proof-of-concept!
https://github.com/elixir-lang/elixir/compare/master...wojtekmach:wm-long-sigil

- [1] https://groups.google.com/forum/#!topic/elixir-lang-core/C7-QgKKu1Mw,
- [2] https://github.com/elixir-lang/elixir/pull/9640#issuecomment-564022856
- [3] https://gist.github.com/wojtekmach/7d4b5dc2f45a4708ce04d19e7c381360
- [4] https://github.com/elixir-lang/elixir/blob/v1.10.1/lib/elixir/lib/kernel/special_forms.ex#L492

José Valim

unread,
Feb 15, 2020, 10:11:22 AM2/15/20
to elixir-l...@googlegroups.com
I believe this is the best proposal on the topic so far. I agree with trade-offs too (disclaimer: we have talked about those trade-offs privately before). I would suggest two amendments:

1. Limit multi-letter sigils only to uppercase sigils for now
2. Include "only: :sigils" as part of the initial implementation


--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/E4F9858D-7019-4E2E-A463-9FEDAFA52B0E%40wojtekmach.pl.

Allen Madsen

unread,
Feb 15, 2020, 4:04:36 PM2/15/20
to elixir-l...@googlegroups.com
+1

I think if sigil names with dots is allowed, it should be based on the fully qualified name of the module the sigil is defined in or it's alias. For example:

module Date
  defmacro sigil_Range(range, _flags) do
    #...
  end
end

require Date
alias Date, as: D
~Date.Range<...>
~D.Range<...>

Bruce Tate

unread,
Feb 15, 2020, 4:34:01 PM2/15/20
to elixir-l...@googlegroups.com
I love this proposal. It takes a construct that was formerly limited and opens it up with very little cost in readability. 

-bt



--

Regards,
Bruce Tate
CEO

Fernando Tapia Rico

unread,
Feb 16, 2020, 8:57:44 AM2/16/20
to elixir-lang-core
(Related to Allen's comment) 

Sigils would be independent of aliases, right?

For example, if Decimal provides sigil_Decimal and an alias is defined as alias Decimal, as: D then the sigil would still be used as ~Decimal<...> and not ~D<...>.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscribe@googlegroups.com.

José Valim

unread,
Feb 16, 2020, 10:07:30 AM2/16/20
to elixir-l...@googlegroups.com
Fernando, yes. The reason I like this proposal is exactly because it steers multi-letter sigils away from aliases - which we have tried before and it introduced a bunch of separate issues with them.

To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.


--

Regards,
Bruce Tate
CEO

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/b1e27881-902b-42c3-b347-6106535e9954%40googlegroups.com.

Stefan Chrobot

unread,
Feb 16, 2020, 3:00:45 PM2/16/20
to elixir-l...@googlegroups.com
What is the problem that these new sigils attempt to solve? They're just a few characters away from .parse/1. Is the intention here to change the implementation of the inspect protocol, like so:

iex> Version.parse!("0.0.1")
~Version<0.0.1>

If yes, then it looks like a great change. Otherwise I don't see the benefits.

Best,

Stefan


Bruce Tate

unread,
Feb 17, 2020, 5:49:19 PM2/17/20
to elixir-l...@googlegroups.com
I use sigils as sugar for clearly representing concepts that are awkward with regular Elixir syntax. 

A few examples. String escaping: 

html = ~s(<p id="note-quotation-mark">)
-> "<p id=\"note-quotation-mark\">"


The result is a string that can easily include quotes on one line, making string escaping much easier and less error prone.

A list of words: 

iex(21)> ~w[one two three]

-> ["one", "two", "three"]


The result is a list that's much easier to read, especially long one. 

Or atoms: 

~w[one two three]a

-> [:one, :two, :three]


The result is much less syntax between words, smoothing out the experience of reading a long list of atoms. 


Or regular expressions: 

~r<\\//>

-> ~r/\\\/\//


To me this proposal fits right in. It's not about saving characters; it's about constructs that aid in readability and reduce errors. 

-bt

László Bácsi

unread,
Feb 18, 2020, 3:40:01 AM2/18/20
to elixir-l...@googlegroups.com
I use a sigil in a project with lots of plain SQL queries to mark them for the editor to syntax highlight:

~q"SELECT * FROM users WHERE id = 42"

This still returns the string unchanged, but the edit can change the highlighting rule inside the double quotes to SQL which makes these files easier to work with. Obviously, the queries are more complicated than that, and often presented in a heredoc.

LB

Reply all
Reply to author
Forward
0 new messages