[llvm-dev] [RFC] Adding support for clang-format making further code modifying changes

281 views
Skip to first unread message

MyDeveloper Day via llvm-dev

unread,
Aug 9, 2021, 3:36:36 PM8/9/21
to clang developer list, llvm-dev
Hi all,

As a frequent user and maintainer of clang-format I would like to formalize a change to the "charter" of what clang-format can/could/should do

Motivation
==========

As you all know clang-format parses source code and can manipulate the tokens to produce consistently formatted code (most of the time), its become ubiquitous in the industry and its integration into
popular editors & IDEs such as vim/visual studio/code mean it very simple for users to get setup and running producing good looking code.

clang-format does not use semantic information, and as such it doesn't need includes, defines or compiler flags to interpret code. Clang-format is generally guessing that certain sequences of tokens from the lexer represent certain patterns. It's a good guess but it gets it wrong from time to time, hence trading correctness for performance.
 
Because of this clang-format is fast (not maybe as fast as we'd like) but fast enough to be part of in a "save" operation such that the code is formatted as the ide saves your work.

Mostly clang-format manipulates only whitespace, but over the years there have been a number of extremely useful features that have broken this rule, namely include sorting, namespace commenting to name a few.

The usage scenario of clang-format has also changed slightly to include a non modifying advisory role identifying clang-format violations (as in our own llvm-premerge checks), which can greatly aid the code review process by removing the need to constantly ask for the code to be formatted correctly or follow the LLVM convention.

Recently a number of new features have been proposed that would also alter code, insertion of braces, east/west const conversion that can be performed at "save" speeds.

As no semantic information is present, this raises the question as to whether clang-format could actually break your code.
This has actually always been the case especially since the introduction of include sorting, but also we all know that clang-format can break your code from the visual perspective too and hence the need for // clang-format off/on

In the most part include sorting not only might break your code noisily such that it won't compile, but it can also break it silently,
and because IncludeSorting is on by default this breakage could potentially go unnoticed.

https://stackoverflow.com/questions/37927553/can-clang-format-break-my-code
https://travisdowns.github.io/blog/2019/11/19/toupper.html

I don't think it can be in any doubt that IncludeSorting is a nice feature used by 100,000's of developers without too many issues, but there is some suggestion that its inclusion as "on by default" was potentially a mistake.

Proposals for other new features that modify the code in a similar way are getting some push back for changing the "charter" of clang-format when it could be considered to have already changed.
This is causing friction on the review of some features and so it was suggested to present an RFC to gain wider consensus on the concept of clang-format changing code.

Mostly when a code modifying change is submitted the view is that this isn't clang-formats job but more clang-tidy, however clang-tidy requires full access to all include files and compiler definitions and only works on the preprocessor paths of the code you are analyzing for and its speed and hence its frequency of use is drastically reduced.

Some clang-format based modifications can in theory be made with a relatively high level of confidence without paying the price and the configuration complexity of getting all the
semantic information. https://reviews.llvm.org/D105701.
There is potentially for clang-format to introduce breaking changes and whilst this could in theory cause noisy breakages they could also in theory produce silent ones.

These new features want to be run at "reformat" speeds & frequency and benefit from the rich Ecosystem of inclusion and integration in IDEs and editors that clang-format enjoys.

This RFC is to try to gain some consensus as to what clang-format can do and what the conditions/constraints should be if allowed to do so.

Benefits
========

The benefits are that clang-format can be used to further make code modifications to adhere to a coding convention (insertion/removal of braces),
clang-format could be used to validate and correct coding convention (left/right const),  and could be used to remove unnecessary semicolons or
potentially convert code to trailing return types all of which could be performed at "reformat" speeds.

Whilst some of these capabilities are available in clang-tidy, it requires significant infrastructure to often perform these often relatively simple operations and it's unlikely
that all users of clang-format are set up to perform these actions in clang-tidy.

There are likely a number of clang-tidy modifications that could in theory be made at "reformat" speeds with such an approach. But there really needs some agreement that it's OK for clang-format to modify the code.

Allowing these kinds of modification capabilities could lead to a new set of "Resharper" style capabilities being added to clang-format,
capable of bringing source code quickly into line with coding conventions.

Concerns
========

Correctness is King, the concern is your formatting tool should not perform operations that could break your code. (this is already the case)

It's perhaps not clang-format's job to do this.

I should personally declare myself as being in favor of allowing clang-format to modify code, I think it only fair that I let others reply to the RFC with their own concerns.

Constraints
===========

To minimize the impact to existing users, We suggest that a number of constraints be generally considered good practice when submitting reviews for clang-format with modifying changes

1) Code Modifying Features should always be off by default
The user should have to make a positive affirmative action to use such a feature

2) Code Modifying Features configuration options should be highlighted as such in the ClangFormatStyleOptions.rst such that its clear these are potentially code breaking options

3) Existing "Code Modifying Features" do not have to adhere to 1) but the documentation should be updated to adhere to 2)

4) Code Modifying Features should be conservative to be "correct first" before being "complete".
i.e. If it's possible a change could be ambiguous it should tend towards not making the incorrect change at all rather than forcing an incorrect change. (This could cause some
    cases to be missed)

Request
=======

I would like to get some general agreement that it's OK for future reviews of code modification changes to become part of clang-format (as they are in IncludeSorting) assuming the best practices are
followed to protect users from unwanted changes.

Feedback would be appreciated

MyDeveloperDay












Keane, Erich via llvm-dev

unread,
Aug 9, 2021, 3:47:03 PM8/9/21
to MyDeveloper Day, llvm-dev

My thoughts (as one of the people who pushed for this RFC) are that we have a few options:

 

  1. Clang Format is NEVER allowed to break code.  This is already violated with the reordering of includes.
  2. Clang Format should be able to break code at will, and we should publish this fact.
  3. Clang Format should strive to not break code, but can make exceptions for good features when the examples are quite contrived, and the patch makes an as-good-as-reasonable attempt to avoid those breakages.

 

 

I personally LIKE #1 in theory, but the ship has sailed, and it is at the expense of GOOD formatting options.

 

I personally HATE #2, I think it makes the tool unusable as a formatting tool.

 

I personally quite like #3, and believe it should be the policy going forward (which I believe is the RIGHT outcome of this RFC).  I also believe the patch for left/right const already falls into this category, and thus would be generally acceptable to me (though, I think the include-reordering distinctly does NOT).

 

-Erich

David Blaikie via llvm-dev

unread,
Aug 9, 2021, 3:55:12 PM8/9/21
to MyDeveloper Day, Manuel Klimek, Daniel Jasper, llvm-dev, clang developer list
+some of the folks involved in the early development of clang-format who might have more context on those design principles/choices made back then (& perhaps on the include reordering change/tradeoff/choices made there)

Anyone got a link to the include reordering change/design discussion for it? Might be informative to see if there was precedent/philosophy clearly stated about how that was acceptable and what principles were applied & could be reapplied to more recent proposals.

I tend towards the status quo (in general - that's just my personality) and tend to prefer the fairly firm line clang-format currently has. But I don't have a great rationale for "why include reordering but not const reordering or dropping braces, etc" - marginally: Changing C++ syntax is more complicated than reordering headers. Despite the fact that reordering headers can have a big impact/breakage, it's hopefully fairly clear if/when that happens that headers should be made order-agnostic. The subtleties of what might happen when adding/removing braces if the heuristic quick-parsing of clang-format fail seem likely to be more unfortunate, to me at least. But maybe we're at a point where clang-format is widely adopted/tested enough that that's less likely to have subtle mis-parse breakages for terribly long? Not sure.

- Dave

_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Chris Tetreault via llvm-dev

unread,
Aug 9, 2021, 4:13:27 PM8/9/21
to Keane, Erich, MyDeveloper Day, llvm...@lists.llvm.org

Regarding the 3 possibilities Erich has listed below, my personal opinion here is that option 3 is incredibly dangerous. If the tool is forbidden to break code, then you can be confident that it doesn’t break code (barring bugs). If the tool  is allowed to change code such that it potentially breaks code, then at least you know what you’re getting yourself in for. If the tool “almost never” breaks code, then it can lull you into a false sense of security. Unless the tool screams from the rooftops that it might break your code, it’s really easy to assume that it is in fact providing the “does not break code” guarantee. Even if it’s documented, people will make this assumption (just like people constantly add UB to c++ codebases). In practice, you really want to provide 1 or 3 regardless, because it would be nice if the tool didn’t go out of it’s way to break your code, but it shouldn’t advertise that it is 3.

 

I think  that 1 should be the default. It should be documented that, by default, clang-format will not break your code, and if it does that it is a bug. Then, options can be enabled that potentially break code. In effect, it would be Erich’s option 3, but with the caveat that if you do not opt into potentially breaking options, you have option 1. Additionally, `BasedOnStyle` and similar options that turn several knobs at once should also be guaranteed to be non-breaking. Furthermore, we should turn off existing potentially-breaking settings such as include sorting by default. While this may result in clang-format doing something different between versions with the same config, that ship has also sailed, so I think that’s fine. In the documentation for options, it should clearly state if an option has the potential to break your code.

 

Thanks,

   Christopher Tetreault

 

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Keane, Erich via llvm-dev
Sent: Monday, August 9, 2021 12:46 PM
To: MyDeveloper Day <mydevel...@gmail.com>; llvm-dev <llvm...@lists.llvm.org>
Subject: Re: [llvm-dev] [cfe-dev] [RFC] Adding support for clang-format making further code modifying changes

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

Aaron Ballman via llvm-dev

unread,
Aug 9, 2021, 4:14:19 PM8/9/21
to David Blaikie, llvm-dev, clang developer list
On Mon, Aug 9, 2021 at 3:55 PM David Blaikie via cfe-dev
<cfe...@lists.llvm.org> wrote:
>
> +some of the folks involved in the early development of clang-format who might have more context on those design principles/choices made back then (& perhaps on the include reordering change/tradeoff/choices made there)
>
> Anyone got a link to the include reordering change/design discussion for it? Might be informative to see if there was precedent/philosophy clearly stated about how that was acceptable and what principles were applied & could be reapplied to more recent proposals.

Commit:
https://github.com/llvm/llvm-project/commit/d89ae9d3accb4cb8b7357932c3dfba8f4ae383c4

Patch discussion:
https://reviews.llvm.org/D11240

It doesn't look like there was any explicit discussion about the
potential for breaking code (but maybe I missed some, and maybe some
discussion happened in a different thread I've not yet found).

~Aaron

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

David Blaikie via llvm-dev

unread,
Aug 9, 2021, 4:16:51 PM8/9/21
to Aaron Ballman, llvm-dev, clang developer list
On Mon, Aug 9, 2021 at 1:13 PM Aaron Ballman <aa...@aaronballman.com> wrote:
On Mon, Aug 9, 2021 at 3:55 PM David Blaikie via cfe-dev
<cfe...@lists.llvm.org> wrote:
>
> +some of the folks involved in the early development of clang-format who might have more context on those design principles/choices made back then (& perhaps on the include reordering change/tradeoff/choices made there)
>
> Anyone got a link to the include reordering change/design discussion for it? Might be informative to see if there was precedent/philosophy clearly stated about how that was acceptable and what principles were applied & could be reapplied to more recent proposals.

Commit:
https://github.com/llvm/llvm-project/commit/d89ae9d3accb4cb8b7357932c3dfba8f4ae383c4

Patch discussion:
https://reviews.llvm.org/D11240

It doesn't look like there was any explicit discussion about the
potential for breaking code (but maybe I missed some, and maybe some
discussion happened in a different thread I've not yet found).

Thanks for looking! Yeah, maybe it's so old that the dev was so small that those issues might've been discussed in an offline discussion in person between Manuel and Daniel. Ah well.
 

David Blaikie via llvm-dev

unread,
Aug 9, 2021, 4:53:57 PM8/9/21
to Andrew Tomazos, llvm-dev, clang developer list
On Mon, Aug 9, 2021 at 1:44 PM Andrew Tomazos via cfe-dev <cfe...@lists.llvm.org> wrote:
I'd just offer a few thoughts:

- Whether or not to adopt clang-format for a project is already quite controversial in some places.  I think if we made it so that it also made more extensive non-whitespace changes, I think it would reduce its adoption, even if they were off by default.

Yeah? I wouldn't've figured that. 

- Non-whitespace changes are very dangerous.  A bug in clang-format could more easily create bugs in the code it formats (at least that is the perception).  For some risk-averse projects, this would be an unacceptable risk.

- Perhaps an alternative idea would be to create a new program called clang-reshape or something, that shared a common underlying framework with clang-format, but clang-reshape contains the options for both the non-whitespace changes and whitespace-changes, whereas clang-format only has the whitespace changes.  That way folks could adopt clang-format and ban clang-reshape if they only wanted whitespace changes.

This crossed my mind too - but I expect a lot of the clang-format integration may be more invasive than that. (hardcoded binary names, use of clang-format as a library, etc) - adding new features to the existing adopted clang-format may be especially valuable compared to providing a new tool. (that said, I do also feel like grouping features in an existing tool only because of its existing adoption is super great either)

- Dave
 

HTH. -Andrew


_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Nicolai Hähnle via llvm-dev

unread,
Aug 9, 2021, 5:06:52 PM8/9/21
to MyDeveloper Day, llvm-dev, clang developer list
Aren't there two separate discussions here:

1. What is the set of features in clang-format?

2. What is the subset of those features used in the LLVM (and other "major") coding styles?

For me personally, the answers to those questions are quite different.

For (2), I feel that static analysis tools which aim for broad adoption must keep false positives to a minimum, otherwise the tool will just end up being ignored. clang-format is arguably a static analysis tool, albeit a very simple-minded one. So I wouldn't want to run the risk of false positives in the default LLVM style.

For (1), if some users really want to have such features for their own, non-LLVM coding style, then I'd say it's perfectly fine for clang-format to include those features as long as the majority of users aren't affected directly or indirectly. Indirect effects could be caused e.g. by losing some performance ("save speed" is a valuable feature) or by making clang-format harder to maintain.

Cheers,
Nicolai
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

Andrew Tomazos via llvm-dev

unread,
Aug 9, 2021, 6:21:09 PM8/9/21
to MyDeveloper Day, llvm-dev, clang developer list
I'd just offer a few thoughts:

- Whether or not to adopt clang-format for a project is already quite controversial in some places.  I think if we made it so that it also made more extensive non-whitespace changes, I think it would reduce its adoption, even if they were off by default.

- Non-whitespace changes are very dangerous.  A bug in clang-format could more easily create bugs in the code it formats (at least that is the perception).  For some risk-averse projects, this would be an unacceptable risk.

- Perhaps an alternative idea would be to create a new program called clang-reshape or something, that shared a common underlying framework with clang-format, but clang-reshape contains the options for both the non-whitespace changes and whitespace-changes, whereas clang-format only has the whitespace changes.  That way folks could adopt clang-format and ban clang-reshape if they only wanted whitespace changes.

HTH. -Andrew


Sam McCall via llvm-dev

unread,
Aug 9, 2021, 6:23:59 PM8/9/21
to MyDeveloper Day, llvm-dev, clang developer list
I'm cautiously +1 on const reordering, having previously opposed it and been convinced.
I think anyone who's worked on a large shared codebase both before and after clang-format can understand the value here, so I'll focus mostly on the risks and why I think they're acceptable.

Risk: clang-format will become a grab-bag of features with no clear line - just anything implemented on top of its pseudo-AST.
Clang-format's brand is low-level formatting details and I think it's important to preserve this. Const order fits here in users' minds. (So does brace addition/removal).

Risk: The feature will break code and clang-format will no longer be (seen as) reliable. This can make it harder socially or technically to deploy, and cause real damage.
I think we need to work hard on mitigating this:
 - the feature needs careful design and extra scrutiny, like security-critical code
 - it should be clearly and temporarily marked as experimental, with opt-in required
 - it should be clearly and permanently marked as "makes assumptions about coding style", with opt-in required.
 - bugs need to be thoughtfully addressed
From what I can see MyDeveloperDay is serious about doing all of this.

Risk: clang-format will be overtaken by the complexity of such features, which will outweigh the benefits (if few use them), hurting maintenance, causing bugs etc.
However this isn't different from other optional features. Editing tokens tends to be done as a separate pass which is relatively easy to isolate (compared to something like supporting a new language). With complexity isolated, this is mostly just about how maintainers prioritize their time/attention, which must be left up to them.

Regarding include-ordering: I think this is a valuable feature if you follow a coding style that allows it to be correct, and it fits well in clang-format's brand. However it wasn't clearly labeled to emphasize its caveats, and in hindsight it shouldn't have been made part of the Google style without further opt-in required.

Cheers, Sam

David Blaikie via llvm-dev

unread,
Aug 9, 2021, 6:34:29 PM8/9/21
to Sam McCall, llvm-dev, clang developer list

Thanks for the write up! & for the record I'm pretty comfortable with what Sam's said here. (not that I've got strong weight in clang-format development, but to make sure my other comments on the thread aren't treated as standing criticisms/concerns)

- Dave

MyDeveloper Day via llvm-dev

unread,
Aug 10, 2021, 4:32:44 AM8/10/21
to Sam McCall, llvm-dev, clang developer list
Thanks for the response Sam, 

Here is how I see we mitigate the risk:

On Mon, Aug 9, 2021 at 11:23 PM Sam McCall <samm...@google.com> wrote:
I'm cautiously +1 on const reordering, having previously opposed it and been convinced.
I think anyone who's worked on a large shared codebase both before and after clang-format can understand the value here, so I'll focus mostly on the risks and why I think they're acceptable.

Risk: clang-format will become a grab-bag of features with no clear line - just anything implemented on top of its pseudo-AST.
Clang-format's brand is low-level formatting details and I think it's important to preserve this. Const order fits here in users' minds. (So does brace addition/removal).

I doubt we wouldn't continue to apply the same level of scrutiny on the code reviews and expect them to follow best practices and guidelines, I am expecting us to still be quite circumspect as to what we'd consider.

To be honest clang-format I think already runs at quite a high review rejection rate, people ask for all sorts of things and we do try to push back pretty hard, landing something can sometimes be pretty torturous to get through review,
I'm not expecting that to change.
 

Risk: The feature will break code and clang-format will no longer be (seen as) reliable. This can make it harder socially or technically to deploy, and cause real damage.
I think we need to work hard on mitigating this:
 - the feature needs careful design and extra scrutiny, like security-critical code
 - it should be clearly and temporarily marked as experimental, with opt-in required
 - it should be clearly and permanently marked as "makes assumptions about coding style", with opt-in required.
 - bugs need to be thoughtfully addressed
From what I can see MyDeveloperDay is serious about doing all of this.

I am, I also think that we shouldn't plough on with individual changes if we see them as potentially ambiguous, I would rather ignore a change if in doubt, I don't feel such features need to be 100% catch all (like how sometimes clang doesn't always tell you about all missing overrides, just as it can rationalize them), This may require more specific options to ensure we know what an tok::identifier actually is in order to avoid ambiguities caused by macros (a little like StatementMacros)
 

Risk: clang-format will be overtaken by the complexity of such features, which will outweigh the benefits (if few use them), hurting maintenance, causing bugs etc.
However this isn't different from other optional features. Editing tokens tends to be done as a separate pass which is relatively easy to isolate (compared to something like supporting a new language). With complexity isolated, this is mostly just about how maintainers prioritize their time/attention, which must be left up to them.

To be honest these are likely some of the less invasive features (in comparison to say adding something like adding Whitesmiths style or C#), as you say the "Passes" give us an easy mechanisms to handle the "OptIn" without adding "if (...) everywhere and the passes also tend to be very self contained especially as the Formatting itself is just a Pass in its own right which is performed later.

I have no concerns over the maintenance other than ensuring we understand how new passes actually work, but the compartmentalization feels on a par to  compartmentalization of individual clang-tidy checks.
 

Regarding include-ordering: I think this is a valuable feature if you follow a coding style that allows it to be correct, and it fits well in clang-format's brand. However it wasn't clearly labeled to emphasize its caveats, and in hindsight it shouldn't have been made part of the Google style without further opt-in required.


To be honest as a developer I like the brutality of include-ordering, breaking my code only tells me it isn't robust enough (likely missing forward declarations or not including what its using) 

The handling of defaults is always difficult as some people want things and others don't, (hence the need for the RFC), but I've always been clear this needs to be "Opt-In" from the start. For the majority of  developers I would expect them to continue to use clang-format as a code formatter and nothing else, but having a ability to make some (not all) obvious changes could potentially be a great help to improving code

For example how many times do you see in LLVM the review comment that  says "elide the braces" for 

if (x) {
    return;
}

I feel this is something that clang-format could be made to easily handle. This RFC is about gaining a general consensus to let us try. We feel we can add even more value.

Anyone who knows me, knows I'm very much pro "clang-format all the things"

MyDeveloperDay




 

Björn Schäpers via llvm-dev

unread,
Aug 10, 2021, 5:52:31 AM8/10/21
to MyDeveloper Day, Sam McCall, llvm-dev, clang developer list
I'm all in favor of allowing such changes and will help to create and review
these changes.

Kind regards,
Björn (HazardyKnusperkeks).

Am 10.08.2021 um 10:32 schrieb MyDeveloper Day via cfe-dev:
> Thanks for the response Sam,
>
> Here is how I see we mitigate the risk:
>
> On Mon, Aug 9, 2021 at 11:23 PM Sam McCall <samm...@google.com
> <mailto:samm...@google.com>> wrote:
>
> I'm cautiously +1 on const reordering, having previously opposed it and been
> convinced.
> I think anyone who's worked on a large shared codebase both before and after
> clang-format can understand the value here, so I'll focus mostly on the
> risks and why I think they're acceptable.
>

> *Risk: *clang-format will become a grab-bag of features with no clear line -


> just anything implemented on top of its pseudo-AST.
> Clang-format's brand is low-level formatting details and I think it's
> important to preserve this. Const order fits here in users' minds. (So does
> brace addition/removal).
>
>
> I doubt we wouldn't continue to apply the same level of scrutiny on the code
> reviews and expect them to follow best practices and guidelines, I am expecting
> us to still be quite circumspect as to what we'd consider.
>
> To be honest clang-format I think already runs at quite a high review rejection
> rate, people ask for all sorts of things and we do try to push back pretty hard,
> landing something can sometimes be pretty torturous to get through review,
> I'm not expecting that to change.
>
>

> *Risk*: The feature will break code and clang-format will no longer be (seen


> as) reliable. This can make it harder socially or technically to deploy, and
> cause real damage.
> I think we need to work hard on mitigating this:
>  - the feature needs careful design and extra scrutiny, like
> security-critical code
>  - it should be clearly and temporarily marked as experimental, with opt-in
> required
>  - it should be clearly and permanently marked as "makes assumptions about
> coding style", with opt-in required.
>  - bugs need to be thoughtfully addressed
> From what I can see MyDeveloperDay is serious about doing all of this.
>
>
> I am, I also think that we shouldn't plough on with individual changes if we see
> them as potentially ambiguous, I would rather ignore a change if in doubt, I
> don't feel such features need to be 100% catch all (like how sometimes clang
> doesn't always tell you about all missing overrides, just as it can rationalize
> them), This may require more specific options to ensure we know what an
> tok::identifier actually is in order to avoid ambiguities caused by macros (a
> little like StatementMacros)
>
>

> *Risk*: clang-format will be overtaken by the complexity of such features,

> _______________________________________________
> cfe-dev mailing list
> cfe...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

_______________________________________________

Renato Golin via llvm-dev

unread,
Aug 10, 2021, 6:10:33 AM8/10/21
to MyDeveloper Day, llvm-dev, clang developer list
On Tue, 10 Aug 2021 at 09:32, MyDeveloper Day via llvm-dev <llvm...@lists.llvm.org> wrote:
I feel this is something that clang-format could be made to easily handle. This RFC is about gaining a general consensus to let us try. We feel we can add even more value.
Anyone who knows me, knows I'm very much pro "clang-format all the things"

I agree with most of the sentiment in this thread, too. clang-format can be dangerous but it can also be so much more. 

I have been using clang-format on editors, pre-commit checks and ninja targets for a quite a while and I think it's a fundamental development tool. With it, style comments stop being a personal conversation and become tooling.

I would like to see a world where there are no more long styling documents with personal angst in all projects, just a clang-format config file.

I'm in favour of increasing the potency of clang-format. Quick checks can be done on save, more expensive ones as ninja targets and even more expensive ones as CI.

To me, the discussion of defaults and perils must be clear to all users and documentation may not be the best way: I've never read the clang-format documentation.

Another problem of clang-format is that there are so many options but not many people know enough of them.

Yet another problem of clang-format is that its behaviour is different for the same options over different versions. We're stuck on clang-format-9, not because it's "the best" but because re-formatting the whole tree every time a new tool comes out is silly.

So a good solution to all of these problems is to generate a default configuration file with all the options and comments on each one of them, including "this changes non-whitespace code", "this is quite slow", "if this breaks code, here's how to fix", etc, and setting all of them to the default value.

We also nee to be able to update the config file with new options without destroying the old (often modified) ones, and make sure new versions only do new things if the config is different.

I know, configuration management and backwards compatibility aren't trivial, but managing clang-format over time isn't either, and it should. Well, it should, if it wants to be a catch-all tool to do all the things.

cheers,
--renato

Manuel Klimek via llvm-dev

unread,
Aug 10, 2021, 7:13:43 AM8/10/21
to Renato Golin, llvm-dev, clang developer list
clang-format is designed to run on changed lines, not to have all code be compliant. You can use it for the latter, but I think it's a pretty fundamental difference in development goals.


cheers,
--renato

Ben Boeckel via llvm-dev

unread,
Aug 10, 2021, 7:32:39 AM8/10/21
to Manuel Klimek, llvm-dev, clang developer list
On Tue, Aug 10, 2021 at 13:13:09 +0200, Manuel Klimek via cfe-dev wrote:
> clang-format is designed to run on changed lines, not to have all code be
> compliant. You can use it for the latter, but I think it's a pretty
> fundamental difference in development goals.

The problem I've seen is that the opinion of code formatted with 3 lines
of context can differ from 5 lines of context which can be still
different again given the entire file. The only reliable way we've
gotten it to work is to format the whole file every time. But we require
well-formatted code via CI processes, so that's fine (if one can afford
the "rewrite the world" every time the tool gets updated…hence why some
projects are still on version 3.8(!), others 6, some even made it to 9).

--Ben

Manuel Klimek via llvm-dev

unread,
Aug 10, 2021, 7:36:06 AM8/10/21
to Ben Boeckel, llvm-dev, clang developer list
On Tue, Aug 10, 2021 at 1:32 PM Ben Boeckel <ben.b...@kitware.com> wrote:
On Tue, Aug 10, 2021 at 13:13:09 +0200, Manuel Klimek via cfe-dev wrote:
> clang-format is designed to run on changed lines, not to have all code be
> compliant. You can use it for the latter, but I think it's a pretty
> fundamental difference in development goals.

The problem I've seen is that the opinion of code formatted with 3 lines
of context can differ from 5 lines of context which can be still
different again given the entire file. The only reliable way we've
gotten it to work is to format the whole file every time. But we require
well-formatted code via CI processes, so that's fine (if one can afford
the "rewrite the world" every time the tool gets updated…hence why some
projects are still on version 3.8(!), others 6, some even made it to 9).

The underlying design idea for clang-format is to give it the whole file as *context*, but only *format* the changed lines.
clang-format supports that use case very well, and it can even be built into presubmit checks.

Manuel Klimek via llvm-dev

unread,
Aug 10, 2021, 7:36:34 AM8/10/21
to Björn Schäpers, llvm-dev, clang developer list
Fwiw, I think "clang-format can make breaking changes to code when we consider the benefit to be worth it" has been the policy on clang-format for a very long time, so accepting that as the official policy is IMO not a change. If somebody wants to write it down to prevent future revisiting, that seems fine with me.

Ben Boeckel via llvm-dev

unread,
Aug 10, 2021, 7:50:13 AM8/10/21
to Manuel Klimek, llvm-dev, clang developer list
On Tue, Aug 10, 2021 at 13:35:46 +0200, Manuel Klimek wrote:
> The underlying design idea for clang-format is to give it the whole file as
> *context*, but only *format* the changed lines.
> clang-format supports that use case very well, and it can even be built
> into presubmit checks.

IIRC, Phab deals with "patches" that have infinite context, but that's
not the norm out there. Maybe that's where this assumption comes from?

Anyways, the main issue I've seen is that developers have versions X, Y,
and Z laying around locally depending on their distro or whatever. So
developers' machines "fight" with CI and each other over the preferred
setup. This is why I don't use any formatting-on-save features myself
with any tool: it's CI's job to enforce it, so why am I wasting my time
on pre-commit hooks that aren't even necessarily accurate? It also makes
teasing out my change with `git add -p` that much harder. So instead, I
just say "ignore formatting if you don't have the exact right version
laying around; CI will handle it". The tool we use can rewrite the
entire topic to ensure it is well-formatted at each step on demand. In
fact, we update `clang-format` versions using this mechanism: tell it to
use a new version, update the config as necessary, and then tell it to
format the entire tree using the new setup.

Aaron Ballman via llvm-dev

unread,
Aug 10, 2021, 8:34:05 AM8/10/21
to MyDeveloper Day, llvm-dev, clang developer list
Thank you for putting this RFC together! Given how important
clang-format is to various people's workflows and the *massive* user
base for the tool (who largely aren't represented here on the mailing
lists), I think it's good that we are having a broader discussion
about tool expectations for clang-format.

I come down on the side of not wanting my code formatting tool to be
opinionated about the input source code I feed it. For my personal
needs, I need to be able to trust my formatting tool won't modify the
meaning of my code. If it modifies the meaning in a way that gets
compile errors, that's bad (it means I'm having to work around a
helper tool in order to pass my CI pipeline, which is highly
frustrating). I know that clang-format already happily breaks code
with include sorting, and I think it was a mistake to add that
functionality, especially given the lack of discussion about breaking
code during the original review of the feature. Changes that can
*silently* change the meaning of my code are an even worse concern
because of the increased risk of not noticing the semantic change.
(Keep in mind that some workflows will format an entire file when
saving the file, so there may not even be direct user interaction
involved in the formatting.)

Putting potentially breaking changes behind configuration flags is a
solution, but not one that I'm all that excited by. There are already
*a lot* of configuration options for clang-format, and having to
remember which ones may destroy my input source is a burden. We could
use a naming convention to clearly identify all of these options, but
the first time we have a configuration option that doesn't follow the
naming convention, we're back to the "I can't know which options are
high-risk" problem. Also, the fact that configuration options can be
impacted by non-local configuration files elsewhere in the file system
hierarchy can cause surprising configuration option behavior (I know
we've run into the situation before where an inherited config option
caused a bit of head-scratching as to where the option was set).

I think the idea of a separate tool that's built on top of
clang-format (consuming clang-format as a library with additional
formatting features) has the most appeal to me. Then we no longer have
to worry about the config options being the gate -- the tool is the
deciding factor as to whether you want to opt into danger mode or not.
One of the architectural benefits of designing a series of composable
libraries is the ability to compose them into more powerful tools, I
think we should take advantage of that. This also provides a migration
path for more experimental functionality -- it can be implemented in
the more dangerous tool, shake out the bugs from there, and then be
shuffled into the safer tool if/when the issues have been worked out.

Experience has taught me that just about the worst thing a tool can do
is break user trust, and once you break that trust, you almost never
get it back again. clang-format is a fantastic tool, but the more
opinionated it becomes about source input, the less people are able to
use it (by definition).

~Aaron

> _______________________________________________
> cfe-dev mailing list
> cfe...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

MyDeveloper Day via llvm-dev

unread,
Aug 10, 2021, 8:35:38 AM8/10/21
to Ben Boeckel, llvm-dev, clang developer list
1) We don't change the defaults between versions
2) Any formatting changes caused between versions are either from
        a) bug fixes
        b) regressions

I'd like to think 2b) doesn't happen but we have to be realistic. It does, but it's why I don't like letting people change existing unit tests in a review.

New features could potentially change things as we try to pick a new default, 

but mostly changes could be because clang-format miss identified a  correct sequences of tokens (BinaryOperator vs PointerOrReference being a particular pain point)

FWIW, I run a multi million line project, we run the bleeding edge, last released version, each release of Clang-Format I have maybe 10-20 files to change. If I left it 5 or so releases I'm sure the impact would seem greater.

We check on CI and nightly builds (with -n option) and report out as warnings so they get fixed pretty quickly.

If you are worried about git blame see (https://akrabat.com/ignoring-revisions-with-git-blame/)

My 2c worth

MyDeveloperDay



On Tue, Aug 10, 2021 at 12:32 PM Ben Boeckel via cfe-dev <cfe...@lists.llvm.org> wrote:
On Tue, Aug 10, 2021 at 13:13:09 +0200, Manuel Klimek via cfe-dev wrote:
> clang-format is designed to run on changed lines, not to have all code be
> compliant. You can use it for the latter, but I think it's a pretty
> fundamental difference in development goals.

The problem I've seen is that the opinion of code formatted with 3 lines
of context can differ from 5 lines of context which can be still
different again given the entire file. The only reliable way we've
gotten it to work is to format the whole file every time. But we require
well-formatted code via CI processes, so that's fine (if one can afford
the "rewrite the world" every time the tool gets updated…hence why some
projects are still on version 3.8(!), others 6, some even made it to 9).

--Ben
_______________________________________________

Aaron Ballman via llvm-dev

unread,
Aug 10, 2021, 8:35:58 AM8/10/21
to Manuel Klimek, llvm-dev, Björn Schäpers, clang developer list
On Tue, Aug 10, 2021 at 7:37 AM Manuel Klimek via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> Fwiw, I think "clang-format can make breaking changes to code when we consider the benefit to be worth it" has been the policy on clang-format for a very long time, so accepting that as the official policy is IMO not a change. If somebody wants to write it down to prevent future revisiting, that seems fine with me.

FWIW, I've never had the impression that this was the status quo for
the tool. I've looked through the mailing list archives and some old
reviews and I don't see any supporting evidence one way or the other
(obviously I could have missed something though!).

~Aaron

Renato Golin via llvm-dev

unread,
Aug 10, 2021, 9:41:22 AM8/10/21
to MyDeveloper Day, llvm-dev, clang developer list, Ben Boeckel
On Tue, 10 Aug 2021 at 13:35, MyDeveloper Day <mydevel...@gmail.com> wrote:
FWIW, I run a multi million line project, we run the bleeding edge, last released version, each release of Clang-Format I have maybe 10-20 files to change. If I left it 5 or so releases I'm sure the impact would seem greater.

That's not the only problem.

If people have different versions of clang-format installed, and you have format as part of the workflow, committing will often include rewrites of the same things.

In our case, if our formatting doesn't match the CI formatting the PR is rejected. Even just running on the changed lines, you get different results.

The only solution is to force people to install clang-format-X locally.

You can say "clang-format wasn't designed for that", but that's what it's used for in many places and its user base is now massive.

You wanted to gather opinions of what clang-format could/should do, well, those are common use cases.

Adding more breaking changes without safety pins will make usage reduce, not increase. 

IMHO, if you want more people to use you need to increase safety first, then add breaking changes with visible warnings (naming, flags, comments, whatever), and make sure breaking changes are not on by default.

For example, if clang-format starts breaking code without notice, we'll have to stick to -9 forever, or remove it as a dependency for PRs altogether.

cheers,
--renato

Manuel Klimek via llvm-dev

unread,
Aug 10, 2021, 10:55:19 AM8/10/21
to Aaron Ballman, llvm-dev, Björn Schäpers, clang developer list
On Tue, Aug 10, 2021 at 2:35 PM Aaron Ballman <aa...@aaronballman.com> wrote:
On Tue, Aug 10, 2021 at 7:37 AM Manuel Klimek via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> Fwiw, I think "clang-format can make breaking changes to code when we consider the benefit to be worth it" has been the policy on clang-format for a very long time, so accepting that as the official policy is IMO not a change. If somebody wants to write it down to prevent future revisiting, that seems fine with me.

FWIW, I've never had the impression that this was the status quo for
the tool. I've looked through the mailing list archives and some old
reviews and I don't see any supporting evidence one way or the other
(obviously I could have missed something though!).

I agree that that was probably not obvious or stated very clearly anywhere - clang-format for years was basically developed by a small team without much interest in input by the rest of the world, for a very specific use case.
At the time we did not do a great job at spreading our thoughts with the world, as a lot of discussions were in person.

Renato Golin via llvm-dev

unread,
Aug 10, 2021, 11:25:06 AM8/10/21
to Manuel Klimek, llvm-dev, clang developer list, Björn Schäpers
On Tue, 10 Aug 2021 at 15:55, Manuel Klimek via llvm-dev <llvm...@lists.llvm.org> wrote:
I agree that that was probably not obvious or stated very clearly anywhere - clang-format for years was basically developed by a small team without much interest in input by the rest of the world, for a very specific use case.
At the time we did not do a great job at spreading our thoughts with the world, as a lot of discussions were in person.

I think there's more to that than what was created by the people who created it.

After being used extensively by people around the world, clang-format has perhaps grown to something that the original team did not envision.

However, as a public clang tool, and as part of the LLVM "family", it's no longer "a tool used by the team that created it", and not even "a tool used by LLVM developers". It is truly a public tool maintained by the LLVM community.

As such, I think we shouldn't restrict the design goals to the original design, or to what a small sub-group uses it for. Clang-format is an official tool like other visible stuff and should have the same community oversight as everything else.

There is a high demand for a formatting tool that can do so much more than the original design and people have been using it, albeit precariously, for that purpose already.

So I believe the original policy that "breaking changes are a benefit" can still be valid in a number of places, but it does not (and should not) need to be the only valid case. Definitely not the default, either.

I think we need to take a step back and ask users what they expect, rather than push forward a policy that people never really wanted but coped with it anyway because there's nothing better.

FWIW, borrowing suggestions from others in this thread, I propose we change the policy to:
 * Breaking changes off by default, override behaviour with configuration files
 * All behaviour must be controlled by a configuration flag with options explicit on doc/config
 * Make explicit some options are breaking (comment/naming)
 * Backwards compatibility is pursued, but can be broken in case of clear bugs

An easy way to manage the configuration in this case is to have a dump function (`clang-format --dump --config foo.cfg`) that reads a config file and dumps all missing parameters and their current values.

Inline comments would be nice but make configuration management hard, so there could be a webpage that describes all options and the URL is dumped on the config file.

cheers,
--renato

David Blaikie via llvm-dev

unread,
Aug 10, 2021, 12:43:36 PM8/10/21
to Renato Golin, llvm-dev, clang developer list
On Tue, Aug 10, 2021 at 8:24 AM Renato Golin via cfe-dev <cfe...@lists.llvm.org> wrote:
On Tue, 10 Aug 2021 at 15:55, Manuel Klimek via llvm-dev <llvm...@lists.llvm.org> wrote:
I agree that that was probably not obvious or stated very clearly anywhere - clang-format for years was basically developed by a small team without much interest in input by the rest of the world, for a very specific use case.
At the time we did not do a great job at spreading our thoughts with the world, as a lot of discussions were in person.

I think there's more to that than what was created by the people who created it.

After being used extensively by people around the world, clang-format has perhaps grown to something that the original team did not envision.

However, as a public clang tool, and as part of the LLVM "family", it's no longer "a tool used by the team that created it", and not even "a tool used by LLVM developers". It is truly a public tool maintained by the LLVM community.

As such, I think we shouldn't restrict the design goals to the original design, or to what a small sub-group uses it for. Clang-format is an official tool like other visible stuff and should have the same community oversight as everything else.

There is a high demand for a formatting tool that can do so much more than the original design and people have been using it, albeit precariously, for that purpose already.

Mixed feelings - we don't necessarily have to let users dictate the feature set, if they aren't contributing to it. Like users using -Weverything, that is ill advised and we are/should be more than happy to break them at any turn. We do get to make choices about what uses we are developing the tools for.
 
So I believe the original policy that "breaking changes are a benefit" can still be valid in a number of places, but it does not (and should not) need to be the only valid case. Definitely not the default, either.

I think we need to take a step back and ask users what they expect, rather than push forward a policy that people never really wanted but coped with it anyway because there's nothing better.

FWIW, borrowing suggestions from others in this thread, I propose we change the policy to:
 * Breaking changes off by default, override behaviour with configuration files
 * All behaviour must be controlled by a configuration flag with options explicit on doc/config
 * Make explicit some options are breaking (comment/naming)
 * Backwards compatibility is pursued, but can be broken in case of clear bugs

I think that's more-or-less what MyDeveloperDay has been saying? I think that sounds reasonable to me (we could potentially rename the existing features that do change more than whitespace, like include sorting - to match the naming convention of "this changes the token sequence/can have an effect on how/whether code compiles" - while keeping it on by default for backwards compatibility).
 

An easy way to manage the configuration in this case is to have a dump function (`clang-format --dump --config foo.cfg`) that reads a config file and dumps all missing parameters and their current values.

Inline comments would be nice but make configuration management hard, so there could be a webpage that describes all options and the URL is dumped on the config file.

cheers,
--renato

Renato Golin via llvm-dev

unread,
Aug 10, 2021, 2:22:52 PM8/10/21
to David Blaikie, llvm-dev, clang developer list
On Tue, 10 Aug 2021 at 17:43, David Blaikie <dbla...@gmail.com> wrote:
Mixed feelings - we don't necessarily have to let users dictate the feature set, if they aren't contributing to it. Like users using -Weverything, that is ill advised and we are/should be more than happy to break them at any turn. We do get to make choices about what uses we are developing the tools for.

Ah, that's not what I meant. I agree with you.

I meant there is a lot of potential for things that we (LLVM developers) are saying we want (in this thread) and we should pursue it without having to be bound by the "original design".

I think that's more-or-less what MyDeveloperDay has been saying? I think that sounds reasonable to me (we could potentially rename the existing features that do change more than whitespace, like include sorting - to match the naming convention of "this changes the token sequence/can have an effect on how/whether code compiles" - while keeping it on by default for backwards compatibility).

It is, intentionally so. I didn't mean to hijack the thread, just validate the original point.

Like the miscommunication you pointed above, this was a reply to a specific point to stick to the original design for the sake of "this is what we wanted back then", which I don't think it's a good policy.

--renato

David Blaikie via llvm-dev

unread,
Aug 10, 2021, 2:39:36 PM8/10/21
to Renato Golin, llvm-dev, clang developer list
On Tue, Aug 10, 2021 at 11:22 AM Renato Golin <reng...@gmail.com> wrote:
On Tue, 10 Aug 2021 at 17:43, David Blaikie <dbla...@gmail.com> wrote:
Mixed feelings - we don't necessarily have to let users dictate the feature set, if they aren't contributing to it. Like users using -Weverything, that is ill advised and we are/should be more than happy to break them at any turn. We do get to make choices about what uses we are developing the tools for.

Ah, that's not what I meant. I agree with you.

I meant there is a lot of potential for things that we (LLVM developers) are saying we want (in this thread) and we should pursue it without having to be bound by the "original design".

Fair - though I think it's insightful to know where things started/what the goals were (& some of those are still relevant, since that original use case is still a major use case), though it sounds like the direction we're discussing is pretty compatible with that original vision - Google style can continue to opt in to things consistent with that original vision of "more than whitespace changes when they're worth it" while keeping them off by default for unconfigured clang-format to make it safer-by-default. (LLVM will probably opt into some too, I imagine - like brace addition/removal, etc) Given both Google and LLVM are specific styles anyway, opting them into some of these more-than-whitespace features while having them otherwise off by default seems quite fine to me.

Andrew Tomazos via llvm-dev

unread,
Aug 10, 2021, 2:45:33 PM8/10/21
to Aaron Ballman, llvm-dev, clang developer list
On Tue, Aug 10, 2021 at 12:34 PM Aaron Ballman via cfe-dev <cfe...@lists.llvm.org> wrote:
I think the idea of a separate tool that's built on top of
clang-format (consuming clang-format as a library with additional
formatting features) has the most appeal to me.

Right.  Another reason that this may be a good idea is the name "clang-format".  The word "format" appeals to the term "code formatting", which is generally accepted to cover only whitespace-related issues (indentation, line width, spacing, vertical alignment, etc).  Issues that involve rearranging tokens and syntax, with no semantic impact, are considered to be, by definition, beyond the scope of "code formatting" and more issues of "code style" or "coding style".

If you don't like clang-reshape for the new program name, here is a quick brainstorm of some alternatives:

    clang-style
    clang-styler
    clang-syntactor
    clang-syntaxer
    clang-restyle

Chris Lattner via llvm-dev

unread,
Aug 10, 2021, 3:02:53 PM8/10/21
to MyDeveloper Day, llvm-dev, clang developer list
First of all, thank you so much for writing up this clear and mostly balanced summary of the situation, the concerns, and the decision point ahead here.

I think there is a something fundamental problem to the nature of this discussion: we can all make semi-informed guesses about how well a particular format will work in practice, but we can’t know until there is usage experience.  Furthermore, clang-format has multiple different communities with different tradeoffs, and imposing a new format on an existing codebase has concerns.

Clang format is already highly parameterized to support this, are you saying that there is pushback on adding new off-by-default capabilities to clang-format, or is the pushback about adding these capabilities to existing language modes (llvm style, google style, etc)?  I don’t see an obvious problem with introducing off-by-default capabilities.

I can see a reasonable concern about introducing “const moving” or other new things into existing formats by default - that could be disruptive, and depends a lot about how well the algorithm and implementation works in practice.  Two thoughts on how to make progress here:

1) You could implement these things in an off-by-default setting, ship it out to lots of people, then gain data somehow (e.g. ask for feedback).
2) We could introduce a new top level “changing my code is allowed” mode to clang-format and put these checks into that.  You could even conceptually move namespace commenting and include sorting to that mode, making clang-format more consistent.

-Chris

David Blaikie via llvm-dev

unread,
Aug 10, 2021, 3:10:38 PM8/10/21
to Chris Lattner, llvm-dev, clang developer list
On Tue, Aug 10, 2021 at 12:02 PM Chris Lattner via cfe-dev <cfe...@lists.llvm.org> wrote:
First of all, thank you so much for writing up this clear and mostly balanced summary of the situation, the concerns, and the decision point ahead here.

I think there is a something fundamental problem to the nature of this discussion: we can all make semi-informed guesses about how well a particular format will work in practice, but we can’t know until there is usage experience.  Furthermore, clang-format has multiple different communities with different tradeoffs, and imposing a new format on an existing codebase has concerns.

Clang format is already highly parameterized to support this, are you saying that there is pushback on adding new off-by-default capabilities to clang-format, or is the pushback about adding these capabilities to existing language modes (llvm style, google style, etc)?  I don’t see an obvious problem with introducing off-by-default capabilities.

I can see a reasonable concern about introducing “const moving” or other new things into existing formats by default - that could be disruptive, and depends a lot about how well the algorithm and implementation works in practice.  Two thoughts on how to make progress here:

1) You could implement these things in an off-by-default setting, ship it out to lots of people, then gain data somehow (e.g. ask for feedback).
2) We could introduce a new top level “changing my code is allowed” mode to clang-format and put these checks into that.  You could even conceptually move namespace commenting and include sorting to that mode, making clang-format more consistent.

I don't think anyone's suggesting these things would be on-by-default. I believe the discussion/debate was about adding the functionality (off by default) at all. It's certainly been a discussion and an angle ("clang-format must not make more-than-whitespace changes, generally" (with some carveouts like include sorting)) - which, apparently was an overly strong statement compared to the original design intent (not sure where I got the impression that "nothing but whitespace" was a core design principle, but a bunch of us did - which made it mostly a reality for quite a while).

but sounds like most folks are happy for "more than whitespace changes are OK so long as they're off by default" (but could be on by default under specific clang-format configurations like Google and LLVM) which I think is OK for those asking for the features & seems mostly OK for folks who don't want the features (some folks like Andrew and Aaron who are advocating for a stronger separation - such as non-whitespace changes only being accessible under a separate tool name).

- Dave
 

MyDeveloper Day via llvm-dev

unread,
Aug 10, 2021, 4:41:41 PM8/10/21
to Andrew Tomazos, llvm-dev, clang developer list
Whilst I understand the desire to introduce a new binary, its basically ends up being clang-format underneath, it feels artificial to go this route, and we did play with this concept but it wasn't universally accepted either

I kind of agree, but those of us who want these features would simply rename the binary to clang-format.exe anyway. it would likely read the same .clang-format file it feels like a decision we'd reverse further down the line.

At the end of the day, it's exactly why I guess we didn't make a new tool for "include sorting" or "namespace commenter"

The benefit of keeping it all in clang-format is that it's highly integrated into external tools so we don't need to reinvent the wheel

MyDeveloperDay

Renato Golin via llvm-dev

unread,
Aug 10, 2021, 4:45:45 PM8/10/21
to David Blaikie, llvm-dev, clang developer list
On Tue, 10 Aug 2021 at 19:39, David Blaikie <dbla...@gmail.com> wrote:
Fair - though I think it's insightful to know where things started/what the goals were (& some of those are still relevant, since that original use case is still a major use case), though it sounds like the direction we're discussing is pretty compatible with that original vision - Google style can continue to opt in to things consistent with that original vision of "more than whitespace changes when they're worth it" while keeping them off by default for unconfigured clang-format to make it safer-by-default. (LLVM will probably opt into some too, I imagine - like brace addition/removal, etc) Given both Google and LLVM are specific styles anyway, opting them into some of these more-than-whitespace features while having them otherwise off by default seems quite fine to me.

Agreed. That was the point I was trying to make. A more general tool with safe defaults and configurable behaviour is compatible with all use cases.

It seemed to me Manuel was arguing to keep the breaking changes because that fitted the original design and that was good enough. Perhaps I misunderstood his point.  

MyDeveloper Day via llvm-dev

unread,
Aug 10, 2021, 4:54:13 PM8/10/21
to Chris Lattner, llvm-dev, clang developer list
Thank you so much for  taking the time to read, let alone comment on the RFC, obviously knowing your opinion is important.

I'm very much encouraging that any such features that change the code would be "off-by-default", I also think that getting these features out there will hopefully help to improve them further.

I'm not an east const person, but I have converted large parts of LLVM to be east const to test this without any issues, but I've also used it to identify places in LLVM that are NOT west const, 

I feel many projects could use these kinds of capabilities. (I know mine can, I really don't want to have to persuade every new developer who joins that their style is not our style), in my view this really matches with clang-format ethos of ending "whitespace" wars!

Thanks again for your input.

MyDeveloperDay 

Mehdi AMINI via llvm-dev

unread,
Aug 11, 2021, 4:37:27 AM8/11/21
to Renato Golin, llvm-dev, clang developer list
On Tue, Aug 10, 2021 at 5:24 PM Renato Golin via cfe-dev <cfe...@lists.llvm.org> wrote:
On Tue, 10 Aug 2021 at 15:55, Manuel Klimek via llvm-dev <llvm...@lists.llvm.org> wrote:
I agree that that was probably not obvious or stated very clearly anywhere - clang-format for years was basically developed by a small team without much interest in input by the rest of the world, for a very specific use case.
At the time we did not do a great job at spreading our thoughts with the world, as a lot of discussions were in person.

I think there's more to that than what was created by the people who created it.

After being used extensively by people around the world, clang-format has perhaps grown to something that the original team did not envision.

However, as a public clang tool, and as part of the LLVM "family", it's no longer "a tool used by the team that created it", and not even "a tool used by LLVM developers". It is truly a public tool maintained by the LLVM community.

As such, I think we shouldn't restrict the design goals to the original design, or to what a small sub-group uses it for. Clang-format is an official tool like other visible stuff and should have the same community oversight as everything else.

There is a high demand for a formatting tool that can do so much more than the original design and people have been using it, albeit precariously, for that purpose already.

So I believe the original policy that "breaking changes are a benefit" can still be valid in a number of places, but it does not (and should not) need to be the only valid case. Definitely not the default, either.

I think we need to take a step back and ask users what they expect, rather than push forward a policy that people never really wanted but coped with it anyway because there's nothing better.

FWIW, borrowing suggestions from others in this thread, I propose we change the policy to:
 * Breaking changes off by default, override behaviour with configuration files
 * All behaviour must be controlled by a configuration flag with options explicit on doc/config
 * Make explicit some options are breaking (comment/naming)
 * Backwards compatibility is pursued, but can be broken in case of clear bugs

FWIW, this seems very reasonable to me.
(and it is also what I understood MyDeveloperDay's original proposal to be).

Cheers,

-- 
Mehdi

 

An easy way to manage the configuration in this case is to have a dump function (`clang-format --dump --config foo.cfg`) that reads a config file and dumps all missing parameters and their current values.

Inline comments would be nice but make configuration management hard, so there could be a webpage that describes all options and the URL is dumped on the config file.

cheers,
--renato

Aaron Ballman via llvm-dev

unread,
Aug 27, 2021, 7:43:46 AM8/27/21
to Mehdi AMINI, llvm-dev, clang developer list
Thanks to everyone who participated in the discussion, and especially
to MyDeveloperDay for getting this started. Now that discussion on
this RFC seems to have died down, I think it's worth circling back to
see if we have consensus to move forward. From reading the threads, I
think Renato summarized the consensus position nicely:

* Breaking changes off by default, override behaviour with configuration files
* All behaviour must be controlled by a configuration flag with
options explicit on doc/config
* Make explicit some options are breaking (comment/naming)
* Backwards compatibility is pursued, but can be broken in case of clear bugs

Unless there is strong opposition to this, then I'd say MyDeveloperDay
has an answer to their RFC. Any disagreement?

~Aaron

On Wed, Aug 11, 2021 at 4:37 AM Mehdi AMINI via cfe-dev

Manuel Klimek via llvm-dev

unread,
Aug 27, 2021, 10:03:38 AM8/27/21
to Aaron Ballman, llvm-dev, clang developer list
On Fri, Aug 27, 2021 at 1:43 PM Aaron Ballman via cfe-dev <cfe...@lists.llvm.org> wrote:
Thanks to everyone who participated in the discussion, and especially
to MyDeveloperDay for getting this started. Now that discussion on
this RFC seems to have died down, I think it's worth circling back to
see if we have consensus to move forward. From reading the threads, I
think Renato summarized the consensus position nicely:

I agree with the following items:
 
 * Breaking changes off by default, override behaviour with configuration files
 * All behaviour must be controlled by a configuration flag with
options explicit on doc/config
 * Make explicit some options are breaking (comment/naming)

I disagree with this one:
 
 * Backwards compatibility is pursued, but can be broken in case of clear bugs

Which I think is also tangential to the current RFC, and IMO should be treated in a separate discussion if necessary (this is a fundamental problem with how clang-format is designed).

Renato Golin via llvm-dev

unread,
Aug 27, 2021, 10:08:46 AM8/27/21
to Manuel Klimek, llvm-dev, clang developer list
On Fri, 27 Aug 2021 at 15:03, Manuel Klimek <kli...@google.com> wrote:
 * Backwards compatibility is pursued, but can be broken in case of clear bugs

Which I think is also tangential to the current RFC, and IMO should be treated in a separate discussion if necessary (this is a fundamental problem with how clang-format is designed).

Is this a fundamental problem with the code structure and maintainability or with the mindset of the main developers?

Manuel Klimek via llvm-dev

unread,
Aug 27, 2021, 2:33:24 PM8/27/21
to Renato Golin, llvm-dev, clang developer list
It's the algorithm - clang-format is a numerical optimizer of "beauty" - any small changes can lead to the numerical outputs changing slightly, which then leads to slight diffs in how code is formatted where two options were previously considered very "similarly beautiful". It's not happening super often, but it does happen.

I agree that we shouldn't introduce changes to formatting without good reasons, but we can't restrict changes only to bug fixes - new features can introduce changes, too.

Aaron Ballman via llvm-dev

unread,
Aug 27, 2021, 2:47:46 PM8/27/21
to Manuel Klimek, llvm-dev, clang developer list
On Fri, Aug 27, 2021 at 10:04 AM Manuel Klimek <kli...@google.com> wrote:
>
> On Fri, Aug 27, 2021 at 1:43 PM Aaron Ballman via cfe-dev <cfe...@lists.llvm.org> wrote:
>>
>> Thanks to everyone who participated in the discussion, and especially
>> to MyDeveloperDay for getting this started. Now that discussion on
>> this RFC seems to have died down, I think it's worth circling back to
>> see if we have consensus to move forward. From reading the threads, I
>> think Renato summarized the consensus position nicely:
>
>
> I agree with the following items:
>
>>
>> * Breaking changes off by default, override behaviour with configuration files
>> * All behaviour must be controlled by a configuration flag with
>> options explicit on doc/config
>> * Make explicit some options are breaking (comment/naming)
>
>
> I disagree with this one:
>
>>
>> * Backwards compatibility is pursued, but can be broken in case of clear bugs
>
>
> Which I think is also tangential to the current RFC, and IMO should be treated in a separate discussion if necessary (this is a fundamental problem with how clang-format is designed).

Thank you for the feedback! I'm personally fine dropping that last
bullet if it's contentious. I think consensus on the first three
bullets are what's necessary to unblock MyDeveloperDay's review.

~Aaron

David Blaikie via llvm-dev

unread,
Aug 27, 2021, 3:42:38 PM8/27/21
to Manuel Klimek, llvm-dev, clang developer list
+1 to what Manuel's said here.

One slight change I'd suggest is changing the term "breaking changes" to "non-whitespace changes", perhaps? (they aren't necessarily breaking anything) At least I assume that's the intent, but I might be wrong in which case I'd love to better understand what's being proposed.

MyDeveloper Day via llvm-dev

unread,
Aug 27, 2021, 6:16:52 PM8/27/21
to Manuel Klimek, llvm-dev, clang developer list
Really thank you to everyone for the comments,on the RFC I thought I'd try to keep quiet on the RFC so as not to appear overly pressing my agenda, I'm super glad that we have got to a nice consensus, thank you to those that summarized it.

We'll definitely take care, and we can introduce something into the documentation at some point to highlight the existing breaking changes (sort includes, namespace commenter etc...) and use this going forward.

On the point of "Version to Version" breaking, there doesn't feel like something that can be solved easily other than us simply not touching the code and not fixing bugs. We get a relatively good stream of people
logging issues in the bug tracker, and frankly we don't have enough people working through them. (I personally can't keep up so I tend cherry pick! depending on my availability.)

For the clang-format team the way to try and minimise the changes between versions is IMHO to be quite strict about NOT changing any existing tests, and I to push back on reviews that do unless there is good
reason. (i.e. the test was previously wrong)

 But I  still causing changes version to version mainly because we are fixing bugs rather than adding new features. Recent improvement to make clang-format handle K&R code better, means we now identify functions better, 
The consequence of which is in the large code base that I manage this identified 100s of cases where "BreakAfterReturn" was incorrectly formated previously. Also recent changes to AfterEnum to make it more correct in enum detection
will likely find lots of places where enum was/n't breaking correctly.

In my view this issue is tangential to what we want to do with being able to modify the token stream, which we'd hope to keep completely seperate and not have any impact if turned off.

Some asked what was being proposed, and the review I was working on is for clang-format ensuring consistent "Left/Right" const/volatile placement. (https://reviews.llvm.org/D69764) Running this on LLVM shows a couple of places where even we are not consistent, please note this is still WIP.

MyDeveloperDay


Renato Golin via llvm-dev

unread,
Aug 27, 2021, 6:44:27 PM8/27/21
to Manuel Klimek, llvm-dev, clang developer list
On Fri, 27 Aug 2021 at 19:33, Manuel Klimek <kli...@google.com> wrote:
It's the algorithm - clang-format is a numerical optimizer of "beauty" - any small changes can lead to the numerical outputs changing slightly, which then leads to slight diffs in how code is formatted where two options were previously considered very "similarly beautiful". It's not happening super often, but it does happen.

I agree that we shouldn't introduce changes to formatting without good reasons, but we can't restrict changes only to bug fixes - new features can introduce changes, too.

Makes sense, thanks! 

Aaron Ballman via llvm-dev

unread,
Aug 28, 2021, 8:52:13 AM8/28/21
to David Blaikie, llvm-dev, clang developer list
On Sat, Aug 28, 2021 at 3:48 AM David Blaikie <dbla...@gmail.com> wrote:
>
> +1 to what Manuel's said here.
>
> One slight change I'd suggest is changing the term "breaking changes" to "non-whitespace changes", perhaps? (they aren't necessarily breaking anything) At least I assume that's the intent, but I might be wrong in which case I'd love to better understand what's being proposed.

To me, the crux of my concern isn't nonwhitespace changes, but changes
that can make code which used to compile no longer do so. It just so
happens that nonwhitespace changes are where that risk is highest, but
whitespace changes can impact syntactic validity (such as reformatting
preprocessor directives which terminate with an end of line) and
nonwhitespace changes can (in theory) be written to not break code
(such as inserting parentheses in expressions such that they match the
current order of operations used by the expression). So I prefer
"breaking changes" because it captures the concern better, but I'd
reword it to "potentially breaking changes" to improve clarity. It's
not that we expect the option to break significant code (that'd be
bad), it's that we anticipate that it *could* break code and that's
why it's treated with greater caution.

~Aaron

Arthur O'Dwyer via llvm-dev

unread,
Aug 28, 2021, 9:52:24 AM8/28/21
to Aaron Ballman, llvm-dev, clang developer list
On Sat, Aug 28, 2021 at 8:52 AM Aaron Ballman via cfe-dev <cfe...@lists.llvm.org> wrote:
On Sat, Aug 28, 2021 at 3:48 AM David Blaikie <dbla...@gmail.com> wrote:
>
> +1 to what Manuel's said here.
>
> One slight change I'd suggest is changing the term "breaking changes" to "non-whitespace changes", perhaps? (they aren't necessarily breaking anything) At least I assume that's the intent, but I might be wrong in which case I'd love to better understand what's being proposed.

To me, the crux of my concern isn't nonwhitespace changes, but changes
that can make code which used to compile no longer do so. It just so
happens that nonwhitespace changes are where that risk is highest

Perhaps it would be correct to say that the problematic formatters are those that change the file's sequence of preprocessing tokens.  This is particularly relevant to clang-format because clang-format doesn't actually parse C++. So for example you might imagine a formatter that cuddles angle brackets:
    std::vector<std::vector<int> > v;  // BEFORE
    std::vector<std::vector<int>> v;  // AFTER
This changes the token sequence, so it's potentially dangerous. Because clang-format doesn't parse, such a formatter can't tell the difference between that (safe, post-C++03) edit and this (unsafe) edit:
    X<&Y::operator> >();  // BEFORE
    X<&Y::operator>>();  // AFTER: syntax error

Obviously such a formatter is still going to be relatively safe in practice. But because it (has the potential to) change the token sequence, it is qualitatively more dangerous than a formatter that merely reformats the existing token sequence.

Shuffling around the tokens (e.g. changing west-const into east-const) is just a special case of changing the token sequence.

In particular, if you change the token sequence when you're inside a preprocessor macro, then (because clang-format doesn't parse C++) you really have no idea what effect your change is going to have.
    #define X(V) int V = 42
    int main() { X(v1); X(const v2); }
Here, editing `const v2` into `v2 const` produces a syntax error.

Now, for any formatter, one can find pathological programs that are broken by it; e.g.
    template<int X> void F() requires (X==2) {}
    int main() { F<__LINE__>(); }
will stop compiling if you add linebreaks to it. I don't think this quite reduces my thesis to absurdity, but I admit it's theoretically awkward.  But if you restrict your edits to those that preserve the token sequence, then I think you'll only ever break programs that use either `#X` (stringifying) or `__LINE__`. Anyone care to produce a counterexample? :)

my $.02,
Arthur

David Blaikie via llvm-dev

unread,
Aug 28, 2021, 3:40:59 PM8/28/21
to Aaron Ballman, Arthur O'Dwyer, llvm-dev, clang developer list
On Sat, Aug 28, 2021 at 5:52 AM Aaron Ballman <aa...@aaronballman.com> wrote:
On Sat, Aug 28, 2021 at 3:48 AM David Blaikie <dbla...@gmail.com> wrote:
>
> +1 to what Manuel's said here.
>
> One slight change I'd suggest is changing the term "breaking changes" to "non-whitespace changes", perhaps? (they aren't necessarily breaking anything) At least I assume that's the intent, but I might be wrong in which case I'd love to better understand what's being proposed.

To me, the crux of my concern isn't nonwhitespace changes, but changes
that can make code which used to compile no longer do so. It just so
happens that nonwhitespace changes are where that risk is highest, but
whitespace changes can impact syntactic validity (such as reformatting
preprocessor directives which terminate with an end of line) and
nonwhitespace changes can (in theory) be written to not break code
(such as inserting parentheses in expressions such that they match the
current order of operations used by the expression). So I prefer
"breaking changes" because it captures the concern better, but I'd
reword it to "potentially breaking changes" to improve clarity. It's
not that we expect the option to break significant code (that'd be
bad), it's that we anticipate that it *could* break code and that's
why it's treated with greater caution.

The language here seems a bit too vague/guarded for my comfort level. Is the expectation then that someone must demonstrate a specific breakage situation (however rare/unlikely?) to justify classifying the formatting feature into this special off-by-default/risky group?

Perhaps we'll end up with easy idioms/obvious proofs that apply to whole classes (perhaps, for instance, there's an easy/obvious/reusable proof that applies to most cases of token reordering similar to what Arthur showed?) of changes & that'll keep the burden of proof fairly low?

In any case, I see the language is intentional, and if the folks working on this are comfortable with it, I'll leave it to you folks - thanks for explaining!

- Dave
 

Andrew Tomazos via llvm-dev

unread,
Aug 28, 2021, 5:39:33 PM8/28/21
to Arthur O'Dwyer, llvm-dev, clang developer list
On Sat, Aug 28, 2021 at 1:52 PM Arthur O'Dwyer via cfe-dev <cfe...@lists.llvm.org> wrote:
But because it (has the potential to) change the token sequence, it is qualitatively more dangerous than a formatter that merely reformats the existing token sequence.

Formally new-line is not a preprocessing-token, but it is effectively, as it is referred to as a terminal in the preprocessor grammar.  If we consider a new-line a preprocessor token, then I think it is almost the case that any source code change that does not change the preprocessor token sequence is unobservable in the final program.  (For the true pedants there is std::source_location::column.)

But regardless of the formal proof:  I think most people perceive the two general buckets "whitespace formatting" and "non-whitespace refactoring", and that the latter is more dangerous.  Hence the suggestion of shipping two different frontend programs to help users isolate those two buckets if they so desire.

Aaron Ballman via llvm-dev

unread,
Aug 29, 2021, 9:46:36 AM8/29/21
to David Blaikie, Arthur O'Dwyer, llvm-dev, clang developer list
On Sat, Aug 28, 2021 at 3:40 PM David Blaikie <dbla...@gmail.com> wrote:
>
> On Sat, Aug 28, 2021 at 5:52 AM Aaron Ballman <aa...@aaronballman.com> wrote:
>>
>> On Sat, Aug 28, 2021 at 3:48 AM David Blaikie <dbla...@gmail.com> wrote:
>> >
>> > +1 to what Manuel's said here.
>> >
>> > One slight change I'd suggest is changing the term "breaking changes" to "non-whitespace changes", perhaps? (they aren't necessarily breaking anything) At least I assume that's the intent, but I might be wrong in which case I'd love to better understand what's being proposed.
>>
>> To me, the crux of my concern isn't nonwhitespace changes, but changes
>> that can make code which used to compile no longer do so. It just so
>> happens that nonwhitespace changes are where that risk is highest, but
>> whitespace changes can impact syntactic validity (such as reformatting
>> preprocessor directives which terminate with an end of line) and
>> nonwhitespace changes can (in theory) be written to not break code
>> (such as inserting parentheses in expressions such that they match the
>> current order of operations used by the expression). So I prefer
>> "breaking changes" because it captures the concern better, but I'd
>> reword it to "potentially breaking changes" to improve clarity. It's
>> not that we expect the option to break significant code (that'd be
>> bad), it's that we anticipate that it *could* break code and that's
>> why it's treated with greater caution.
>
>
> The language here seems a bit too vague/guarded for my comfort level. Is the expectation then that someone must demonstrate a specific breakage situation (however rare/unlikely?) to justify classifying the formatting feature into this special off-by-default/risky group?

Sort of? My thinking is: if someone comes up with a test case that the
clang-format developers do not consider to be a bug (and thus don't
intend to "fix" the behavior), then we absolutely should document it
as described. (Also, I think it should be an explicit test case
showing the behavior with a comment about it being expected behavior
-- that helps with communication if someone files a bug about it.) If
no one can come up with a test case that would break, I'd be fine if
we still classified it as a potentially breaking change based on
whitespace vs not whitespace or some other metric, but my bigger point
is that if someone can demonstrate a test case that break that isn't
sufficiently compelling to change the tool to handle better, that
definitely meets the bar for being opt-in.

> Perhaps we'll end up with easy idioms/obvious proofs that apply to whole classes (perhaps, for instance, there's an easy/obvious/reusable proof that applies to most cases of token reordering similar to what Arthur showed?) of changes & that'll keep the burden of proof fairly low?

That seems plausible.

> In any case, I see the language is intentional, and if the folks working on this are comfortable with it, I'll leave it to you folks - thanks for explaining!

Any time, thanks for the discussion!

~Aaron

David Blaikie via llvm-dev

unread,
Aug 30, 2021, 1:03:29 PM8/30/21
to Aaron Ballman, Arthur O'Dwyer, llvm-dev, clang developer list

Yeah, this is the vague/uncertain aspects I'm a bit concerned with: If someone comes along with a test case demonstrating a formatting rules can break certain code and folks classify it as effectively "will not fix because it's not worth it/diminishing returns" we now have to change the rule to be off-by-default (where it was possibly on-by-default before). But then if someone decides to fix that bug, because they have a use-case that makes it worthwhile for them to fix it - now we might flip it back to on-by-default?

If that's unlikely - if most of the time the answers are clear, that the breakage is /super/ expensive to avoid such that no one's likely to ever revisit the decision/have a different cost/benefit tradeoff - and the breakages are obvious/easy to demonstrate (like I said, if there's common types of breakage and so simple breakage patterns that can be used in most cases to demonstrate breakage, etc) then we might avoid these sort of flip/flop cases most of the time.
 

Aaron Ballman via llvm-dev

unread,
Aug 31, 2021, 7:09:02 AM8/31/21
to David Blaikie, Arthur O'Dwyer, llvm-dev, clang developer list

Ah, I see where your concerns come in now. Thank you! I was
envisioning that once we moved an option from on-by-default to
off-by-default, it would most likely never move again. My assumption
is: if the breaking change could be fixed, it would have been fixed or
confirmed as a bug rather than switching to off-by-default. So we
could have some on-by-default checks that DO change code for a while,
but there's a grace period of sorts where clang-format can work on the
fix to the bug. If the bug turns out to not be fixable or no one shows
an interest in fixing it by the time there's a second reporter, then
we switch it to off-by-default at that point and it stays there,
barring extenuating circumstances. Because of the inexact nature of
how clang-format works, I guess my feeling is that once a
transformation has been demonstrated to break code in a way that's
difficult enough we needed to turn the feature off by default, it's
reasonable to assume it's going to break other code we've not
considered yet and so the bar is raised much higher to try to switch
it back to on-by-default, so these flip flops should be vanishingly
rare.

> If that's unlikely - if most of the time the answers are clear, that the breakage is /super/ expensive to avoid such that no one's likely to ever revisit the decision/have a different cost/benefit tradeoff - and the breakages are obvious/easy to demonstrate (like I said, if there's common types of breakage and so simple breakage patterns that can be used in most cases to demonstrate breakage, etc) then we might avoid these sort of flip/flop cases most of the time.

Yeah, I would hope to avoid flip flopping like that.

~Aaron

Reply all
Reply to author
Forward
0 new messages