Formatting of module attributes

73 views
Skip to first unread message

Dario Heinisch

unread,
Dec 28, 2022, 6:59:17 AM12/28/22
to elixir-lang-core
Running `mix format --check-formatted` passes with success on the following code:

defmodule Example do
  @xs [
    1,
    2,
    3,
    4,
    5,
    6,
    7
  ]

  @other [
           1,
           2,
           3,
           4,
           5,
           6,
           7
         ]
         |> Enum.map(&(&1 * 2))
         |> Enum.reject(&(&1 < 5))
         |> length()

  @words ~w(
  ONE
  TWO
  THREE
  FOUR
  FIVE
  )
         |> Enum.filter(&String.contains?(&1, "F"))
end

I am wondering whether that is intended or if I should open an issue on Github and look into fixing it. 

`@words` does not seem to be formatted in the same way as `@other` which I would kinda expect and the formatting of `@words` looks kinda weird.

Secondly is there reason why when I pipe the module attribute that it gets intended differently than when I do not pipe it (compare @other with @xs)? 

Best regards,
Dario

Christopher Keele

unread,
Dec 28, 2022, 5:49:49 PM12/28/22
to elixir-lang-core
This feels like a feature request to me: I understand why sigils are generally not touched by the formatter without plugins, but I feel like the sigil_w included in the standard library should have smarter formatting by default.

NOTE: I'm using Elixir 1.14.2 here to observe the behaviour of the formatter, this may be out-of-date with the mainline branch.


General formatting of sigils

Conceptually, to the compiler, the contents of a sigil is a potentially multi-line string. However, actually using a multiline string does get forced into a format as expected:

@words """

        ONE
        TWO
        THREE
        FOUR
        FIVE
        """
        |> String.split
        |> Enum.filter(&String.contains?(&1, "F"))

Elixir literal multiline strings have special semantics for stripping the whitespace on the left, based on the indentation of the closing """. Try increasing and decreasing the indentation of that lexeme and watch how the formatter reacts.

I'd personally intuitively expect the sigil_w case to do the same, but you can see why we cannot apply multiline string semantics to every sigil—the sigil macro, called at compile-time, receives the verbatim contents of the string, extra whitespace and all. Correctly handling that whitespace, including stripping it, is the job of the sigil itself, which may vary depending on the intentions of the sigil developer. (Consider: a custom sigil for parsing the Whitespace esolang or a python program has different multiline-stripping semantics than literal multiline strings or sigil_w.)

Since the formatter cannot know what whitespace semantics any particular sigil expects, it cannot modify the contents of the string with the knowledge that it will not impact the program, unlike multiline string literals. So it will do absolutely no work on the sigil's contents; leaving your awkward indentation in place. The good news is that if you correct the indentation manually, knowing how this particular sigil handles whitespace, that rewriting will pass formatting and stay unchanged.


Formatting stdlib sigils

That being said, the documentation for extending the formatter is very sigil-special-casing-oriented. It should be easy to implement a plugin that knows how to normalize the currently un-touched

~w(
  ONE
  TWO
  THREE
    FOUR
  FIVE
    )


However, I'd really imagine that the stdlib formatter would understand the special whitespace semantics of the stdlib sigil_w and format it out-of-the-box. This is the feature request I see here.

I also think that other stdlib sigil formatting could be improved; for example I feel like

~D[2022-01-01
]

should automatically be formatted to 

~D[2022-01-01]

without any plugins.


Formatting module attributes

> Is there reason why when I pipe the module attribute that it gets intended differently than when I do not pipe it (compare @other with @xs)? 

I believe this is an emergent behaviour of whatever order the formatter calculates rules for determining:

- the indentation of the module attribute's argument
- the indentation of the pipeline
- the indentation of the (list) argument to the pipeline
- the indentation of list items within the list

These determinations add up in an unexpected way I do not understand. Essentially, Pipelines want to have their indentation flush with the leftmost character of their argument, so that you get:

[1, 2, 3]
|> Enum.map(&(&1 * 2))
|> Enum.reject(&(&1 < 5))
|> length()

[
  1,
  2,
  3

]
|> Enum.map(&(&1 * 2))
|> Enum.reject(&(&1 < 5))
|> length()


Somehow this interacts with how module attributes want to indent things, and we get

@nums [
        1,
        2,
        3

      ]
      |> Enum.map(&(&1 * 2))
      |> Enum.reject(&(&1 < 5))
      |> length()


This does not seem like a bug per se, but I also personally think that this should format as

@nums [
  1,
  2,
  3

]
|> Enum.map(&(&1 * 2))
|> Enum.reject(&(&1 < 5))
|> length()


This seems like it would be a backwards-compatible enhancement.


Summary

The combination of sigil_w not being internally normalized with sigil_w whitespace semantics, alongside the current behaviour of multi-line expressions in module attributes, leads to this particularly unexpected appearance.

I feel like improvements to both would be welcome in PRs. It may be worth first discussing the impact of releasing changes to the formatter, though. Even semantically backwards-compatible changes have the potential to lead to a lot of syntactic line diff noise and churn when upgrading Elixir, so I'm not certain if there is a more cautious release policy for such things—such as only releasing major formatter changes in minor version bumps.

Christopher Keele

unread,
Dec 28, 2022, 8:13:30 PM12/28/22
to elixir-lang-core
After a little tinkering, it seems like the major obstacle in implementing this via a custom formatter module is that sigil formatters are provided the file and line number, but not the column number, of each sigil literal's start; so there's no great way for any multiline sigil formatter to reason about how it should format its contents with respect to indentation.

José Valim

unread,
Dec 28, 2022, 9:01:22 PM12/28/22
to elixir-l...@googlegroups.com
A pipeline is always indented according to the argument it applies to. Therefore, if it was indented like this:

@nums [
  1,
  2,
  3
]
|> Enum.map(&(&1 * 2))
|> Enum.reject(&(&1 < 5))
|> length()

I would expect it to apply to the result of "@nums [...]".

In general, Elixir code would be formatted as:

@nums [
        1,
        2,
        3
      ]

And as:

nums =
  [
    1,
    2,
    3
  ]

But we added a rule that, if the right side of operators or the last argument of a function call is a list (or a map, or a fn, etc), we can skip the newline and the indentation. It is a good rule for most cases, but it leads to two different formats in cases you have a pipeline and so on.

Regareding sigils, Elixir could add rules to format its own sigils, but they would have to be opt-in. First of all, someone can override the built-in sigils and the formatter wouldn't know. And even if it is the same sigil, the Elixir formatter does not change the AST by default.

Christopher Keele

unread,
Dec 29, 2022, 1:54:02 AM12/29/22
to elixir-lang-core
> the Elixir formatter does not change the AST by default

Ah, this makes sense. The whitespace in a sigil is part of the AST (unlike, say, a list literal) so it must stay put.

It feels like that this soft guarantee is not exactly upheld by the formatter's treatment of heredoc strings, which it does rewrite; but on the other hand the compiler does emit warnings about misdented heredoc lines, and other subtleties, so that kind of makes sense.

> Elixir could add rules to format its own sigils, but they would have to be opt-in.

In that case, it feels effectively equivalent to just publish some stdlib sigil formatters outside core as a library, then "opt in" by adding to one's deps and .formatter.exs plugins list. I might take a whack at that, if I can put together a PR to mix that passes column metadata about a sigil into a custom formatter's opts (alongside file and line no) as I think this is required for them to take indentation into consideration.

> A pipeline is always indented according to the argument it applies to.

> But we added a rule that, if the right side of operators or the last argument of a function call is a list (or a map, or a fn, etc), we can skip the newline and the indentation.

I do view the result of applying these two rules to module attributes pipelines as unfortunate, but internally consistent... 

> but it leads to two different formats in cases you have a pipeline and so on.

Are there other common cases where this happens, off the top of your head? I might investigate what it'd take to special-case them, if there is a sensible way to do so for reasonable gain.

Christopher Keele

unread,
Dec 29, 2022, 2:27:11 AM12/29/22
to elixir-lang-core
> I might take a whack at that, if I can put together a PR to mix that passes column metadata about a sigil into a custom formatter's opts (alongside file and line no) as I think this is required for them to take indentation into consideration.

At first glance, this looks like this is tantamount to tracking column metadata in all Elixir AST nodes, which is well beyond me. Unless I stumble into something clever I probably won't get too far. The default formatter has the luxury of passing the entire document to Code.format_string, which can leverage Inspect.Algebra.format to be context-aware about indentation when rendering the AST. The "sigil-specific" custom formatters only get to operate on the sigil string itself, and the options passed to them come from the entire formatter run's configuration + the sigil AST node's metadata, neither of which currently know about columns.

Reply all
Reply to author
Forward
0 new messages