This is a formalization of
my concept here, as a first-class proposal for explicit discussion/feedback, since I now have
a working prototype.
Goal
The aim of this proposal is to support a commonly-requested feature: short-hand construction and pattern matching of key/value pairs of associative data structures, based on variable names in the current scope.
Context
Similar shorthand syntax sugar exists in many programming languages today, known variously as:
This feature has been in discussion for a decade, on this mailing list (
1,
2,
3,
4,
5,
6) and the Elixir forum (
1,
2,
3,
4,
5,
6), and has motivated many libraries (
1,
2,
3,
4). These narrow margins cannot fit the full history of possibilities, proposals, and problems with this feature, and I will not attempt to summarize them all. For context, I suggest reading
this mailing list proposal and
this community discussion in particular.
However, in summary, this particular proposal tries to solve a couple of past sticking points:
I propose we overload the unary capture operator (&) to accept compile-time atoms and strings as arguments, for example &:foo and &"bar". This would expand at compile time into a tagged tuple with the atom/string and a variable reference. For now, I am calling this a "tagged-variable capture" to differentiate it from a function capture.
For the purposes of this proposal, assume:
Additionally,
- Lines beginning with # == indicate what the compiler expands an expression to.
- Lines beginning with # => represent the result of evaluating that expression.
- Lines beginning with # !> represent an exception.
Bare Captures
I'm not sure if we should support bare tagged-variable capture, but it is illustrative for this proposal, so I left it in my prototype. It would look like:
&:foo
# == {:foo, foo}
# => {:foo, 1}
&"foo"
# == {"foo", foo}
# => {"foo", 1}
If bare usage is supported, this expansion would work as expected in match and guard contexts as well, since it expands before variable references are resolved:
{:foo, baz} = &:foo
# == {:foo, baz} = {:foo, foo}
# => {:foo, 1}
baz
# => 1
List Captures
Since capture expressions are allowed in lists, this can be used to construct Keyword lists from the local variable scope elegantly:
list = [&:foo, &:bar]
# == list = [{:foo, foo}, {:bar, bar}]
# => [foo: 1, bar: 2]
This would work with other list operators like |:
baz = 3
list = [&:baz | list]
# == list = [{:baz, baz} | list]
# => [baz: 3, foo: 1, bar: 2]
And list destructuring:
{foo, bar, baz} = {nil, nil, nil}
[&:baz, &:foo, &:bar] = list
# == [{:baz, baz}, {:foo, foo}, {:bar, bar}] = list
# => [baz: 3, foo: 1, bar: 2]
{foo, bar, baz}
# => {1, 2, 3}
Map Captures
With
a small change to the parser, we can allow this expression inside map literals. Because this expression individually gets expanded into a tagged-tuple before the map associations list as a whole are processed, it allow this syntax to work in all existing map/struct constructs, like map construction:
map = %{&:foo, &"bar"}
# == %{:foo => foo, "bar" => bar}
# => %{:foo => 1, "bar" => 2}
Map updates:
foo = 3
map = %{map | &:foo}
# == %{map | :foo => foo}
# => %{:foo => 3, "bar" => 2}
And map destructuring:
{foo, bar} = {nil, nil}
%{&:foo, &"bar"} = map
# == %{:foo => foo, "bar" => bar} = map
# => %{:foo => 3, "bar" => 2}
{foo, bar}
# => {3, 2}
Considerations
Pro: solves existing pain points
As mentioned, this solves flaws previous proposals suffer from:
- Atom vs String key support
This supports both. - Visual clarity that atom/string matching is occurring
This leverages the appropriate literal in question within the syntax sugar. - Limitations of string-based sigil parsing
This is compiler-expansion-native. - Easy confusion with tuples
%{&:foo, &"bar"} is very different from {foo, bar}, instead of 1-character different.
Additionally, it solves my main complaint with historical proposals: syntax to combine a variable identifier with a literal must either obscure that we are building an identifier, or obscure the key/string typing of the literal.
I'm proposing overloading the capture operator rather than introducing a new operator because the capture operator already has a semantic association with messing with variable scope, via the nested integer-based positional function argument syntax (ex & &1).
By using the capture operator we indicate that we are messing with an identifier in scope, but via a literal atom/string we want to associate with, to get the best of both worlds.
Pro: works with existing code
The capture today operator has well-defined compile-time-error semantics if you try to pass it an atom or a string. All compiling Elixir code today will continue to compile as before.
Pro: works with existing tooling
By overloading an existing operator, this approach works seamlessly for me with the syntax highlighters I have tried it with so far, and reasonable with the formatter.
In my experimentation I've found that the formatter wants to rewrite &:baz to (&:baz) pretty often. That's good, because there are several edge cases in my prototype where not doing so causes it to behave strangely; I'm sure it's resolving ambiguities that would occur in function captures that impact my proposal in ways I have yet fully anticipated.
Pros: minimizes surface area of the language
By overriding the capture operator instead of introducing a new operator or sigil, we are able to keep the surface area of this feature slim.
Cons: overloads the capture operator
Of course, much of the virtues of this proposal comes from overloading the capture operator. But it is an already semantically fraught syntactic sugar construct that causes confusion to newcomers, and this would place more strain on it.
We would need to augment it with more than
the meager error message modification in my prototype, as well as documentation and anticipate a new wave of questions from the community upon release.
This inelegance really shows when considering embedding a tagged variable capture inside an anonymous function capture, ex & &1 = &:foo. In my prototype I've chosen to allow this rather than error on "nested captures not allowed" (would probably become: "nested function captures not allowed"), but I'm not sure I found all the edge-cases of mixing them in all possible constructions.
Additionally, since my proposal now allows the capture operator as an associative element inside map literal parsing, that would change the syntax error reported by providing a function capture as an associative element to be generated during expansion rather than during parsing. I am not fluent enough in leex to have have updated the parser to preserve the exact old error, but serendipitously what it reports in my prototype today is pretty good regardless, but I prefer the old behaviour:
Old:
%{& &1}
# !> ** (SyntaxError) syntax error before '}'
# !> |
# !> 1 | %{& &1}
# !> | ^
New:
%{& &1}
# => error: expected key-value pairs in a map, got: & &1
# => ** (CompileError) cannot compile code (errors have been logged)
Cons: here there be dragons I cannot see
I'm quite sure a full implementation would require a lot more knowledge of the compiler than I am able to provide. For example, &:foo = &:foo raises an exception where (&:foo) = &:foo behaves as expected. I also find the variable/context/binding environment implementation in the erlang part of the compiler during expansion to be impenetrable, and I'm sure my prototype fails on edge cases there.
Open Question: the pin operator
As this feature constructs a variable ref for you, it is not clear if/how we should support attempts to pin the generated variable to avoid new bindings. In my prototype, I have tried to support the pin operator via the &^:atom syntax, though I'm pretty sure it's super buggy on bare out-of-data-structure cases and I only got it far enough to work in function heads for basic function head map pattern matching.
Open Question: charlists
I did not add support for charlist tagged variable captures in my prototype, as it would be more involved to differentiate a capture of list mean to become a tagged tuple from a list representing the AST of a function capture. I would not lose a lot of sleep over this.
Open Question: allowed contexts
Would we even want to allow this syntax construct outside of map literals? Or list literals?
I can certainly see people abusing the bare-outside-of-associative-datastructure syntax to make some neigh impenetrable code where it's really unclear where assignment and pattern matching is occuring, and relatedly this is where I see a lot of odd edge-case behaviour in my prototype. I allowed it to speed up the implementation, but it merits more discussion.
On the other hand, this does seem like an... interesting use-case:
error = "rate limit exceeded"
&:error # return error tuple
Thanks for reading! What do you think?