[llvm-dev] RFC: changing variable naming rules in LLVM codebase

426 views
Skip to first unread message

Michael Platings via llvm-dev

unread,
Feb 7, 2019, 5:11:08 PM2/7/19
to llvm...@lists.llvm.org, nd

TL;DR: change the rule for variable names from UpperCamelCase to lowerCamelCase.


Just to get wider visibility on this, I'm raising this again as an RFC, as suggested by Roman Lebedev.


My original post from last week is here and gives a rationale: http://lists.llvm.org/pipermail/llvm-dev/2019-February/129854.html. There seemed to be general agreement that the status quo is not ideal.


Chris Lattner pointed out that this came up in 2014 as well: http://lists.llvm.org/pipermail/llvm-dev/2014-October/077685.html


I've created a patch to implement the change. Review and comments welcome: https://reviews.llvm.org/D57896

via llvm-dev

unread,
Feb 8, 2019, 10:41:46 AM2/8/19
to Michael....@arm.com, llvm...@lists.llvm.org, n...@arm.com

An excellent first step toward improving our naming style, I'm all for it.  Very much tired of circumlocutions like "Triple TheTriple;".

--paulr

Alex Denisov via llvm-dev

unread,
Feb 11, 2019, 3:23:35 PM2/11/19
to via llvm-dev, n...@arm.com
Just to add another data point: I’m pretty much in love with the lowerCamelCase.
To be more objective: I’ve such code across LLVM’s code base:

class Dragon {};
/// ...
Dragon Dragon;

Which compiles correctly, but debuggers get confused when you ask them to do any actions on Dragon: they could not differentiate between variable and class*.

* not sure if that’s still the case with the recent versions of gdb/lldb, but it was caused some problems while debugging in the past.

> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Philip Reames via llvm-dev

unread,
Feb 11, 2019, 6:20:31 PM2/11/19
to Michael Platings, llvm...@lists.llvm.org, nd

I don't care about the convention, but I'm really not sure it's worth the churn which would result in the code base.  The hurtle which needs cleared here is not "is it a better naming style", but "is the disruption implied by changing to the new convention justified".  To be clear, not opposed, just hesitant. 


Philip

Björn Pettersson A via llvm-dev

unread,
Feb 12, 2019, 7:02:50 AM2/12/19
to Michael Platings, llvm...@lists.llvm.org

(Sorry if this subject already has been discussed, but I could not find any clear rules/recommendations.)

 

What would the recommendation be for acronyms (I’ve seen the rule about avoiding them unless they are “well known”,

but sometimes an acronym is useful, and we at least need to have some recommendation for the “well known” ones).

 

Example:

 

    if (ConstantExpr *CE = dyn_cast<ConstantExpr>(V))

      if (CE->getOpcode() == Instruction::GetElementPtr &&

          CE->getOperand(0)->isNullValue()) {

 

In the above example, is the recommendation to use “ce” instead of “CE” now? Or should it be “cE”?

With lowerCamelCase one might think that “cE” is the correct one (although I personally think that one looks quite ugly).

 

Maybe there should be an exception that variable names that start with an acronym still should start with an upper case letter?

 

/Björn

 

 

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Michael Platings via llvm-dev
Sent: den 7 februari 2019 23:11
To: llvm...@lists.llvm.org
Cc: nd <n...@arm.com>
Subject: [llvm-dev] RFC: changing variable naming rules in LLVM codebase

 

TL;DR: change the rule for variable names from UpperCamelCase to lowerCamelCase.

Alex Denisov via llvm-dev

unread,
Feb 12, 2019, 4:17:43 PM2/12/19
to Björn Pettersson A, llvm...@lists.llvm.org
I would assume that the proper name in this case is constantExpr, and not CE.
This is not really an acronym, but rather a shortcut taken for some unclear reason.

Krzysztof Parzyszek via llvm-dev

unread,
Feb 12, 2019, 4:28:47 PM2/12/19
to llvm...@lists.llvm.org
The reason is clear: the variable name in such a context doesn't add
anything, since it's obvious what it is. Long names should be used where
meaning needs to be conveyed, otherwise they just obfuscate the code
needlessly.

-Krzysztof

David Greene via llvm-dev

unread,
Feb 12, 2019, 4:47:48 PM2/12/19
to Krzysztof Parzyszek, llvm...@lists.llvm.org
It very much depends on what is following the code snippit. If the
second "if" is guarding a substantial block of code, "constantExpr"
might very well be a good name. Otherwise something like "cExpr" or
"constExpr" might be fine.

In the past when I have seen things like "CE" in the code, it's not
always immediately clear to me what it is. I have to go find the
declaration.

-David

Zachary Turner via llvm-dev

unread,
Feb 12, 2019, 7:47:37 PM2/12/19
to David Greene, llvm...@lists.llvm.org
One additional consideration is that LLDB currently uses underscore_names. It might be worth considering that style on these grounds alone, as that would bring a very large existing part of LLVM into conformance

Chandler Carruth via llvm-dev

unread,
Feb 12, 2019, 7:56:48 PM2/12/19
to Zachary Turner, David A. Greene, llvm-dev
FWIW, I'm pretty strongly opposed to humbleCamelCase. We already use that style so something else. One of the biggest advantages of changing the variable naming convention would be to differentiate variables from other constructs IMO, and that's the nature of many examples here.

Using underscore_names for variables as Zach suggests would,, onhe other hand, be a significant improvement IMO.

That said, all of these changes are pretty dramatic and expensive. I'm personally in favor but you'd want a *lot* of buy in from the community as well as some really good tooling I think to help people update code they're about to make substantial changes to prior to those changes, much like we often do with clamg-format.


Chris Lattner via llvm-dev

unread,
Feb 13, 2019, 4:00:18 AM2/13/19
to Björn Pettersson A, llvm...@lists.llvm.org

On Feb 12, 2019, at 4:02 AM, Björn Pettersson A via llvm-dev <llvm...@lists.llvm.org> wrote:

(Sorry if this subject already has been discussed, but I could not find any clear rules/recommendations.)
 
What would the recommendation be for acronyms (I’ve seen the rule about avoiding them unless they are “well known”,
but sometimes an acronym is useful, and we at least need to have some recommendation for the “well known” ones).
 
Example:
 
    if (ConstantExpr *CE = dyn_cast<ConstantExpr>(V))
      if (CE->getOpcode() == Instruction::GetElementPtr &&
          CE->getOperand(0)->isNullValue()) {
 
In the above example, is the recommendation to use “ce” instead of “CE” now? Or should it be “cE”?
With lowerCamelCase one might think that “cE” is the correct one (although I personally think that one looks quite ugly).

In most examples, you’d use something other than an initialism.  I was the one who started the whole CE thing as a contraction of the type into something short, but there are almost always things that work otherwise.  For example, I’d now write that example as:


    if (ConstantExpr *expr = dyn_cast<ConstantExpr>(value))
      if (expr->getOpcode() == Instruction::GetElementPtr &&
          expr->getOperand(0)->isNullValue()) {

or perhaps ‘constant' instead of ‘expr’.
 
Maybe there should be an exception that variable names that start with an acronym still should start with an upper case letter?

That would also be fine with me - it could push such a debate down the road, and is definitely important for a transition period anyway.

-Chris

Björn Pettersson A via llvm-dev

unread,
Feb 13, 2019, 4:26:40 AM2/13/19
to llvm...@lists.llvm.org

“CE” was just an example of a less common acronym. I think such an acronym

can be used for various purposes, so it is probably quite local in the code what it

stands for.

We also have other (I think more common) acronyms like TLI for

TargetLoweringInfo, or MF for MachineFunction.

 

As Krzystzof were saying “Long names should be used where

meaning needs to be conveyed, otherwise they just obfuscate the code

needlessly.”

Having to write out targetLoweringInfo instead of the short form TLI

everywhere would probably obfuscate things (lots of long lines, line splits)

rather than making things easier to read/understand.

 

 

FWIW, there is a proposal in the RFC, and the coding standard still say that

acronyms may be used. I still haven’t seen any answers about what the

“correct way” would be when using an acronym if the RFC is accepted.

I’ve only seen answers about _not_ using acronyms, but that has not part

of the RFC afaict.

 

My point was that the RFC need to be clear about what to do with the

acronyms if we go down that path.

 

/Björn

David Greene via llvm-dev

unread,
Feb 13, 2019, 9:58:07 AM2/13/19
to Chandler Carruth, llvm-dev
Chandler Carruth <chan...@gmail.com> writes:

> FWIW, I'm pretty strongly opposed to humbleCamelCase. We already use
> that style so something else. One of the biggest advantages of
> changing the variable naming convention would be to differentiate
> variables from other constructs IMO, and that's the nature of many
> examples here.

I guess I don't see a lot of point to being different for difference's
sake. Functions use humbleCamelCase but it's fairly easy to
differentiate a function from a variable at the point of use, unless one
is referencing a function's address in which case it kinda is like a
variable in that context.

humbleCamelCase also has the advantage of removing the weird special
case for lambdas, where in one sense it's a function and in another
sense it's a variable.

What are the current uses of humbleCamelCase that concern you?

In the end I don't really care what the convention is. I'm not sure a
mechanical updating of the source is worth it, as that can make using
git blame slightly inconvenient.

-David

via llvm-dev

unread,
Feb 13, 2019, 11:52:40 AM2/13/19
to chan...@gmail.com, ztu...@google.com, d...@cray.com, llvm...@lists.llvm.org
Chandler wrote:

> FWIW, I'm pretty strongly opposed to humbleCamelCase. We already use that
> style so something else.

Presumably you are equally opposed to RegularCamelCase, because we already
use *that* style for something else.

But really, objecting on the grounds that a given style is already used for
function names is really a very weak argument. IME function names are
*incredibly* *hard* to confuse with anything else, because they *always* have
surrounding syntactic context. Given `TheStuff->fooBar().getThingy()` is it
even conceivable that you might not instantly get that fooBar and getThingy
are methods? Therefore, using the same convention for some other kind of
name is Not Confusing.

OTOH, `TheStuff` comes out of nowhere with no clues to its origin, and *that*
is a barrier to code-reading IME. Even renaming it to `stuff` would help
approximately zero percent. Parameter? Local? Class member? Global? LLVM has
incredibly few globals for other reasons, but using the same convention for
locals and class members is a real problem for code-reading, especially code
operating in methods for classes you're not super familiar with.

I acknowledge that the current RFC doesn't propose a member naming convention
different from other variables, but IMO it really ought to. *That* is the
distinction that would really help in reading unfamiliar code.
--paulr

Zachary Turner via llvm-dev

unread,
Feb 13, 2019, 2:48:54 PM2/13/19
to paul.r...@sony.com, d...@cray.com, llvm...@lists.llvm.org
I want to reiterate the benefit that underscore_names would bring.  To be clear it's not my favorite style, but it does have a very concrete advantage which is that we have a very large subproject already using it.  it doesn't make sense to do a purely aesthetic move that not everyone is going to agree on anyway, when we could do one with actual tangible value.

James Y Knight via llvm-dev

unread,
Feb 13, 2019, 11:02:58 PM2/13/19
to Zachary Turner, David Greene, llvm-dev
There is of course some amount of llvm and clang code which already uses initialLowerCaseNames for variable names too, contrary to the style guide. I don't know how to easily quantify how much.

E.g. ParseGNUAttributes in clang/include/clang/Parse/Parser.h is one I noticed.


_______________________________________________

Nemanja Ivanovic via llvm-dev

unread,
Feb 14, 2019, 8:02:42 AM2/14/19
to James Y Knight, David Greene, llvm-dev
I have to agree with Paul that I think it is rather useful to have a naming convention that distinguishes class members from locals, etc. I'm not sure what that would look like, whether an m prefix for data members would be something others would entertain, but something that makes it clear would probably be useful. To use Paul's example, I think that mTheStuff vs. TheStuff makes it super easy to visually identify what this is. I imagine this wasn't mentioned in this thread or previously adopted because of some good reason I am not aware of.
A more minor point about underscores vs camel case - what I like about camel case is that it generally keeps my fingers on the 3 rows of the keyboard I use the most. From an ergonomics perspective, I find typing a whole lot of underscores a bit unnatural. So since I find camel case easier to type and equally as readable, I would favour it over underscores.

Zachary Turner via llvm-dev

unread,
Feb 14, 2019, 11:03:40 AM2/14/19
to Nemanja Ivanovic, David Greene, llvm-dev
fwiw, LLDB also uses m_ for member variables, so if we were to adopt an m prefix, then in conjunction with lowercase_underscore the entire codebase would be conforming.

Joel E. Denny via llvm-dev

unread,
Feb 14, 2019, 5:01:23 PM2/14/19
to Björn Pettersson A, llvm...@lists.llvm.org
On Wed, Feb 13, 2019 at 4:26 AM Björn Pettersson A via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> “CE” was just an example of a less common acronym. I think such an acronym
>
> can be used for various purposes, so it is probably quite local in the code what it
>
> stands for.
>
> We also have other (I think more common) acronyms like TLI for
>
> TargetLoweringInfo, or MF for MachineFunction.
>
>
>
> As Krzystzof were saying “Long names should be used where
>
> meaning needs to be conveyed, otherwise they just obfuscate the code
>
> needlessly.”
>
> Having to write out targetLoweringInfo instead of the short form TLI
>
> everywhere would probably obfuscate things (lots of long lines, line splits)
>
> rather than making things easier to read/understand.
>
>
>
>
>
> FWIW, there is a proposal in the RFC, and the coding standard still say that
>
> acronyms may be used. I still haven’t seen any answers about what the
>
> “correct way” would be when using an acronym if the RFC is accepted.
>
> I’ve only seen answers about _not_ using acronyms, but that has not part
>
> of the RFC afaict.
>
>
>
> My point was that the RFC need to be clear about what to do with the
>
> acronyms if we go down that path.

I vote for treating acronyms like any other individual word written in
camel case or snake case. That way, no one has to waste cycles trying
to dodge initial or consecutive acronyms when they make sense.

For example, given acronyms FOO and BAR, the phrase "FOO BAR" becomes
"fooBar", "FooBar", or "foo_bar".

Another occasional bonus is that a dictionary of acronyms isn't
required to automatically convert arbitrary names among the forms.

Some people think UpperCamelCase is ugly for acronyms, such as "Gdb"
or "Llvm". That's the only objection I'm aware of for this approach,
and it annoys me far less than the above issues. Maybe I'm in the
minority.

Joel

>
>
>
> /Björn
>
>
>
> From: Chris Lattner <clat...@nondot.org>
> Sent: den 13 februari 2019 10:00
> To: Björn Pettersson A <bjorn.a.p...@ericsson.com>
> Cc: Michael Platings <Michael....@arm.com>; llvm...@lists.llvm.org
> Subject: Re: [llvm-dev] changing variable naming rules in LLVM codebase
>
>
>
>
>
>
>
> On Feb 12, 2019, at 4:02 AM, Björn Pettersson A via llvm-dev <llvm...@lists.llvm.org> wrote:
>
>
>
> (Sorry if this subject already has been discussed, but I could not find any clear rules/recommendations.)
>
>
>
> What would the recommendation be for acronyms (I’ve seen the rule about avoiding them unless they are “well known”,
>
> but sometimes an acronym is useful, and we at least need to have some recommendation for the “well known” ones).
>
>
>
> Example:
>
>
>
> if (ConstantExpr *CE = dyn_cast<ConstantExpr>(V))
>
> if (CE->getOpcode() == Instruction::GetElementPtr &&
>
> CE->getOperand(0)->isNullValue()) {
>
>
>
> In the above example, is the recommendation to use “ce” instead of “CE” now? Or should it be “cE”?
>
> With lowerCamelCase one might think that “cE” is the correct one (although I personally think that one looks quite ugly).
>
>
>
> In most examples, you’d use something other than an initialism. I was the one who started the whole CE thing as a contraction of the type into something short, but there are almost always things that work otherwise. For example, I’d now write that example as:
>
>
>
>
>
> if (ConstantExpr *expr = dyn_cast<ConstantExpr>(value))
> if (expr->getOpcode() == Instruction::GetElementPtr &&
> expr->getOperand(0)->isNullValue()) {
>
> or perhaps ‘constant' instead of ‘expr’.
>
>
>
> Maybe there should be an exception that variable names that start with an acronym still should start with an upper case letter?
>
>
>
> That would also be fine with me - it could push such a debate down the road, and is definitely important for a transition period anyway.
>
>
>
> -Chris
>
>
>

Mehdi AMINI via llvm-dev

unread,
Feb 14, 2019, 7:32:36 PM2/14/19
to Michael Platings, llvm...@lists.llvm.org, nd
I'm concerned about the internal inconsistency we would live with for (likely) many years if there is not a migration plan to converge the code base toward the new naming convention.
(other than that, changing the convention is fine with me, I don't have much preference here)

-- 
Mehdi
 

Hubert Tong via llvm-dev

unread,
Feb 14, 2019, 8:00:30 PM2/14/19
to Mehdi AMINI, llvm...@lists.llvm.org, nd
I think consistency within a function (for locals) and within a class (for class members) is enough. I don't think that is any worse than the consistency over
int *p;
versus
int* p;
 

-- 
Mehdi
 

Alex Bradbury via llvm-dev

unread,
Feb 15, 2019, 5:15:27 AM2/15/19
to Philip Reames, llvm...@lists.llvm.org, nd
On Mon, 11 Feb 2019 at 23:20, Philip Reames via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> I don't care about the convention, but I'm really not sure it's worth the churn which would result in the code base. The hurtle which needs cleared here is not "is it a better naming style", but "is the disruption implied by changing to the new convention justified". To be clear, not opposed, just hesitant.

I have the same concern. The whole advantage of a common coding
convention is consistency. There are exceptions, but the vast majority
of LLVM and Clang code I've read does indeed stick to the current
CamelCase convention. Unless there's a plan for conversion then the
practical impact of the naming convention change is that the codebase
will be a muddle of mixed conventions for years. That seems like a
regression, even if camelBack is a better convention.

On a more practical note, if the intent is to move to camelBack I
think it would be worth adding an example to the coding standard for
handling an acronym. e.g. is it the intent that TTI would become tti.

Best,

Alex

Michael Platings via llvm-dev

unread,
Feb 18, 2019, 5:16:08 AM2/18/19
to Alex Bradbury, Philip Reames, llvm...@lists.llvm.org, nd
Regarding a plan for conversion, I'm keen to avoid perfect being the enemy of better.

Privately, people I've spoken with have told me that they're opposed to a large scale conversion. Reasons given include breaking git blame, and creating needless merge conflicts. I might be wrong, but the evidence I've seen suggests that it's going to be very hard to get consensus on a conversion.

So what's worse: inconsistent capitalization or keeping a convention that discourages good naming?

Taking my previous example [1]:

InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,
&LVL, &CM);

If we imagine that over time it evolves such that 50% of the variables have been renamed to camelBack versions of the type names, then it will look like this:

InnerLoopVectorizer LB(loop, PSE, loopInfo, DT, targetLibraryInfo, TTI,
assumptionCache, ORE, vectorizationFactor.Width, IC,
&loopVectorizationLegality, &CM);

Yes it's inconsistent, but IMHO it still conveys so much more information than the original that the benefit greatly outweighs the cost.

So is the disruption implied by changing to the new convention justified? I think so.

[1] https://github.com/llvm/llvm-project/blob/ec0263027811f48b907224ede0dd24c33c1c7507/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L7433

Kristof Beyls via llvm-dev

unread,
Feb 18, 2019, 10:01:12 AM2/18/19
to Nemanja Ivanovic, David Greene, llvm-dev


Op do 14 feb. 2019 om 14:02 schreef Nemanja Ivanovic via llvm-dev <llvm...@lists.llvm.org>:
I have to agree with Paul that I think it is rather useful to have a naming convention that distinguishes class members from locals, etc. I'm not sure what that would look like, whether an m prefix for data members would be something others would entertain, but something that makes it clear would probably be useful. To use Paul's example, I think that mTheStuff vs. TheStuff makes it super easy to visually identify what this is. I imagine this wasn't mentioned in this thread or previously adopted because of some good reason I am not aware of.

+1
 

Sanjoy Das via llvm-dev

unread,
Feb 18, 2019, 2:22:20 PM2/18/19
to Michael Platings, llvm...@lists.llvm.org, nd, Alex Bradbury
On Mon, Feb 18, 2019 at 2:16 AM Michael Platings via llvm-dev
<llvm...@lists.llvm.org> wrote:
> Regarding a plan for conversion, I'm keen to avoid perfect being the enemy of better.
>
> Privately, people I've spoken with have told me that they're opposed to a large scale conversion. Reasons given include breaking git blame, and creating needless merge conflicts. I might be wrong, but the evidence I've seen suggests that it's going to be very hard to get consensus on a conversion.
>
> So what's worse: inconsistent capitalization or keeping a convention that discourages good naming?
>
> Taking my previous example [1]:
>
> InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,
> &LVL, &CM);
>
> If we imagine that over time it evolves such that 50% of the variables have been renamed to camelBack versions of the type names, then it will look like this:
>
> InnerLoopVectorizer LB(loop, PSE, loopInfo, DT, targetLibraryInfo, TTI,
> assumptionCache, ORE, vectorizationFactor.Width, IC,
> &loopVectorizationLegality, &CM);

I find myself less productive in a codebase with inconsistent styling
like you show above because it is more difficult to "guess" the name
of a variable. E.g. is the LoopInfo parameter named LI or loopInfo?
I'll have to double check to be sure, which adds an extra step.

So maybe a gradual transition plan could be to allow these upper case
acronyms for specific classes? For instance we could start by
designating a set of "common" classes like Function, BasicBlock
DominatorTree, LoopInfo, ScalarEvolution whose instances would
instances still be called F, BB, DT, LI and SE, but mandate all other
classes should use the new camelCase convention to name their
instances? I think this helps readability since the reader will know
that "BB" is always the basic block, "F" is always the function etc.
And I don't have the "guessing" problem I mentioned above since
VectorationFactor instances are always "vectorizationFactor" and
LoopInfo instances are always "LI". We could then try to shrink this
set down over time (or not).

-- Sanjoy

Krzysztof Parzyszek via llvm-dev

unread,
Feb 18, 2019, 3:10:25 PM2/18/19
to llvm...@lists.llvm.org
On 2/18/2019 4:15 AM, Michael Platings via llvm-dev wrote:
> Taking my previous example [1]:
>
> InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,
> &LVL, &CM);
>
> If we imagine that over time it evolves such that 50% of the variables have been renamed to camelBack versions of the type names, then it will look like this:
>
> InnerLoopVectorizer LB(loop, PSE, loopInfo, DT, targetLibraryInfo, TTI,
> assumptionCache, ORE, vectorizationFactor.Width, IC,
> &loopVectorizationLegality, &CM);

Hold on...

The change from UpperCamel to lowerCamel should be separate from going
from X to somethingOtherName.

It seems like in this example, TLI is changed to targetLibraryInfo for
the purpose of having a name that lowerCamel can be applied to, which
is, at best, backwards.

When a new person sees "TLI", they won't know what it is. When an LLVM
developer sees "TLI" they know exactly what it is, even without any
context. At the same time, a person is new to LLVM for only a certain
period of time, much shorter than the lifetime of the code.

The key to readability is to make the important things easy to see, and
get the unimportant things out of the way. By using completely expanded
names we run the risk of making everything equally "easy to see"...

-Krzysztof

David Greene via llvm-dev

unread,
Feb 18, 2019, 8:14:42 PM2/18/19
to Nemanja Ivanovic, James Y Knight, llvm-dev
That's interesting because I have always thought it strange to name members differently
from other variables. I guess in my mind if a local variable isn't easily identified as such,
it's either declared much too far away from its use (the function is too large, is lacking
proper scoping, whatever) or it is not well-named such as to denote its use. Note that
I specifically write, "denote its use" and not, "denote its scope." Of course the poor
naming could go the other way; naming a member "i," for example.

I don't think I've ever come across a naming convention that treats function parameters
specially. Why? Arguably they are as different from locals as members are, particularly
when it comes to reference parameters.

Slapping an "m_" in front of poorly-named members isn't really going to help much, any
more than slapping an "l_" in front of local variables would.

That said, I am certainly open to being convinced otherwise.

-David

________________________________________
From: Nemanja Ivanovic <nemanj...@gmail.com>
Sent: Thursday, February 14, 2019 7:02:19 AM
To: James Y Knight
Cc: Zachary Turner; David Greene; llvm-dev
Subject: Re: [llvm-dev] changing variable naming rules in LLVM codebase

I have to agree with Paul that I think it is rather useful to have a naming convention that distinguishes class members from locals, etc. I'm not sure what that would look like, whether an m prefix for data members would be something others would entertain, but something that makes it clear would probably be useful. To use Paul's example, I think that mTheStuff vs. TheStuff makes it super easy to visually identify what this is. I imagine this wasn't mentioned in this thread or previously adopted because of some good reason I am not aware of.
A more minor point about underscores vs camel case - what I like about camel case is that it generally keeps my fingers on the 3 rows of the keyboard I use the most. From an ergonomics perspective, I find typing a whole lot of underscores a bit unnatural. So since I find camel case easier to type and equally as readable, I would favour it over underscores.

On Wed, Feb 13, 2019 at 11:03 PM James Y Knight via llvm-dev <llvm...@lists.llvm.org<mailto:llvm...@lists.llvm.org>> wrote:
There is of course some amount of llvm and clang code which already uses initialLowerCaseNames for variable names too, contrary to the style guide. I don't know how to easily quantify how much.

E.g. ParseGNUAttributes in clang/include/clang/Parse/Parser.h is one I noticed.


On Wed, Feb 13, 2019 at 2:49 PM Zachary Turner via llvm-dev <llvm...@lists.llvm.org<mailto:llvm...@lists.llvm.org>> wrote:
I want to reiterate the benefit that underscore_names would bring. To be clear it's not my favorite style, but it does have a very concrete advantage which is that we have a very large subproject already using it. it doesn't make sense to do a purely aesthetic move that not everyone is going to agree on anyway, when we could do one with actual tangible value.

llvm...@lists.llvm.org<mailto:llvm...@lists.llvm.org>


https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list

llvm...@lists.llvm.org<mailto:llvm...@lists.llvm.org>

Chandler Carruth via llvm-dev

unread,
Feb 18, 2019, 8:23:30 PM2/18/19
to Robinson, Paul, David A. Greene, llvm-dev
On Wed, Feb 13, 2019 at 6:52 AM <paul.r...@sony.com> wrote:
Chandler wrote:

> FWIW, I'm pretty strongly opposed to humbleCamelCase. We already use that
> style so something else.

Presumably you are equally opposed to RegularCamelCase, because we already
use *that* style for something else.

I would be opposed to going through the very significant cost of *changing* the naming convention, merely to end up there.

The convention we already use has a huge advantage of already being relatively consistent.
 

But really, objecting on the grounds that a given style is already used for
function names is really a very weak argument.  IME function names are
*incredibly* *hard* to confuse with anything else, because they *always* have
surrounding syntactic context. Given `TheStuff->fooBar().getThingy()` is it
even conceivable that you might not instantly get that fooBar and getThingy
are methods?  Therefore, using the same convention for some other kind of
name is Not Confusing.

I disagree FWIW... Lambdas (and callables generally) at the least make this ambiguous.

I think the fact that named things are called does not fully disambiguate what they are.

I'm not trying to say that the collision with functions is *as* confusing as that of colliding with types. Merely that both seem confusing. And I find `foo_bar_baz` and `fooBarBaz` basically equivalent[1]. So between those equivalents, I would choose the one with fewer collisions.

[1]: Ok, not quite, but I find this to be a more personal preference and am trying to weight it lower as a consequence. I find functions much more similar to types -- they are manifest properties of the program. I find `FooBarBaz` and `fooBarBaz` to be very similar looking. There is *a* distinction, but it is a minor one. I would prefer a greater visual difference for variables, which `foo_bar_baz` provides.

OTOH, `TheStuff` comes out of nowhere with no clues to its origin, and *that*
is a barrier to code-reading IME.  Even renaming it to `stuff` would help
approximately zero percent. Parameter? Local? Class member? Global?  LLVM has
incredibly few globals for other reasons, but using the same convention for
locals and class members is a real problem for code-reading, especially code
operating in methods for classes you're not super familiar with.

I acknowledge that the current RFC doesn't propose a member naming convention
different from other variables, but IMO it really ought to.  *That* is the
distinction that would really help in reading unfamiliar code.

I could see a lot of utility of this, but I do find it to be orthogonal.
 
--paulr

Chandler Carruth via llvm-dev

unread,
Feb 18, 2019, 8:25:17 PM2/18/19
to Zachary Turner, David Greene, llvm-dev
Also FWIW, I also like m_ prefix for member variables when combined with lowercase_underscore naming.

*IF* it is worth going through the significant cost of switching, and we have a plan to minimize the cost to developers reading inconsistent code.

Chandler Carruth via llvm-dev

unread,
Feb 18, 2019, 8:30:17 PM2/18/19
to James Y Knight, David Greene, llvm-dev
On Wed, Feb 13, 2019 at 6:02 PM James Y Knight via llvm-dev <llvm...@lists.llvm.org> wrote:
There is of course some amount of llvm and clang code which already uses initialLowerCaseNames for variable names too, contrary to the style guide. I don't know how to easily quantify how much.

There is also a decent amount of code in Clang using foo_bar_baz. ::shrug::

I think the amount of all of these pales in comparison to LLDB, and I think generally all of these are not going to significantly change the total cost of transition.

-Chandler

Chandler Carruth via llvm-dev

unread,
Feb 18, 2019, 8:34:06 PM2/18/19
to Sanjoy Das, llvm...@lists.llvm.org, nd, Alex Bradbury
On Mon, Feb 18, 2019 at 9:22 AM Sanjoy Das via llvm-dev <llvm...@lists.llvm.org> wrote:
On Mon, Feb 18, 2019 at 2:16 AM Michael Platings via llvm-dev
<llvm...@lists.llvm.org> wrote:
> Regarding a plan for conversion, I'm keen to avoid perfect being the enemy of better.
>
> Privately, people I've spoken with have told me that they're opposed to a large scale conversion. Reasons given include breaking git blame, and creating needless merge conflicts. I might be wrong, but the evidence I've seen suggests that it's going to be very hard to get consensus on a conversion.
>
> So what's worse: inconsistent capitalization or keeping a convention that discourages good naming?
>
> Taking my previous example [1]:
>
> InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,
>   &LVL, &CM);
>
> If we imagine that over time it evolves such that 50% of the variables have been renamed to camelBack versions of the type names, then it will look like this:
>
> InnerLoopVectorizer LB(loop, PSE, loopInfo, DT, targetLibraryInfo, TTI,
>   assumptionCache, ORE, vectorizationFactor.Width, IC,
>   &loopVectorizationLegality, &CM);

I find myself less productive in a codebase with inconsistent styling
like you show above because it is more difficult to "guess" the name
of a variable.  E.g. is the LoopInfo parameter named LI or loopInfo?
I'll have to double check to be sure, which adds an extra step.

So maybe a gradual transition plan could be to allow these upper case
acronyms for specific classes?  For instance we could start by
designating a set of "common" classes like Function, BasicBlock
DominatorTree, LoopInfo, ScalarEvolution whose instances would
instances still be called F, BB, DT, LI and SE, but mandate all other
classes should use the new camelCase convention to name their
instances?

FWIW, I agree with the idea of this (the inconsistency is very expensive for me as well).

However, I would find tying it to the type still terribly difficult. I would have a very hard time remembering which classes were part of which set.

Personally, I'd much rather the granularity of either an "interface" (type + methods, or overload set of namespace functions) being consistent, or a "file" being consistent. Both of those would be much cheaper for me to remember and reliably follow.

It would also largely match our *existing* points of divergence.
 

Chandler Carruth via llvm-dev

unread,
Feb 18, 2019, 9:08:39 PM2/18/19
to Krzysztof Parzyszek, llvm-dev
On Mon, Feb 18, 2019 at 10:10 AM Krzysztof Parzyszek via llvm-dev <llvm...@lists.llvm.org> wrote:
On 2/18/2019 4:15 AM, Michael Platings via llvm-dev wrote:
> Taking my previous example [1]:
>
> InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,
>    &LVL, &CM);
>
> If we imagine that over time it evolves such that 50% of the variables have been renamed to camelBack versions of the type names, then it will look like this:
>
> InnerLoopVectorizer LB(loop, PSE, loopInfo, DT, targetLibraryInfo, TTI,
>    assumptionCache, ORE, vectorizationFactor.Width, IC,
>    &loopVectorizationLegality, &CM);

Hold on...

The change from UpperCamel to lowerCamel should be separate from going
from X to somethingOtherName.

FWIW, I suspect separating the transition of our acronyms from the transition of identifiers with non-acronym words may be an effective way to chip away at the transition cost... Definitely an area that people who really care about this should look at carefully.
 

It seems like in this example, TLI is changed to targetLibraryInfo for
the purpose of having a name that lowerCamel can be applied to, which
is, at best, backwards.

When a new person sees "TLI", they won't know what it is. When an LLVM
developer sees "TLI" they know exactly what it is, even without any
context. At the same time, a person is new to LLVM for only a certain
period of time, much shorter than the lifetime of the code.

The key to readability is to make the important things easy to see, and
get the unimportant things out of the way. By using completely expanded
names we run the risk of making everything equally "easy to see"...

I think this bias towards acronyms (which I used to share) due to keeping things short but still recognizable once people become deeply familiar with LLVM is the wrong prioritization. It does work well for experienced LLVM developers, but I think we should do much more to facilitate and encourage people who are not in this set. While this does come at some cost to highly experienced LLVM developers (reading `library_info` instead of `TLI`), but it seems easily worth it to make the codebase more accessible to new contributors.

Diana Picus via llvm-dev

unread,
Feb 19, 2019, 5:12:51 AM2/19/19
to Chandler Carruth, llvm-dev
On Tue, 19 Feb 2019 at 03:08, Chandler Carruth via llvm-dev

<llvm...@lists.llvm.org> wrote:
>
> On Mon, Feb 18, 2019 at 10:10 AM Krzysztof Parzyszek via llvm-dev <llvm...@lists.llvm.org> wrote:
>>
>> On 2/18/2019 4:15 AM, Michael Platings via llvm-dev wrote:
>> > Taking my previous example [1]:
>> >
>> > InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,
>> > &LVL, &CM);
>> >
>> > If we imagine that over time it evolves such that 50% of the variables have been renamed to camelBack versions of the type names, then it will look like this:
>> >
>> > InnerLoopVectorizer LB(loop, PSE, loopInfo, DT, targetLibraryInfo, TTI,
>> > assumptionCache, ORE, vectorizationFactor.Width, IC,
>> > &loopVectorizationLegality, &CM);
>>
>> Hold on...
>>
>> The change from UpperCamel to lowerCamel should be separate from going
>> from X to somethingOtherName.
>
>
> FWIW, I suspect separating the transition of our acronyms from the transition of identifiers with non-acronym words may be an effective way to chip away at the transition cost... Definitely an area that people who really care about this should look at carefully.
>
>>
>>
>> It seems like in this example, TLI is changed to targetLibraryInfo for
>> the purpose of having a name that lowerCamel can be applied to, which
>> is, at best, backwards.
>>
>> When a new person sees "TLI", they won't know what it is. When an LLVM
>> developer sees "TLI" they know exactly what it is, even without any
>> context.

I wouldn't downplay context, my first thought when I see TLI is
TargetLoweringInfo (of course, then I see Vectorizer around and get
back on track, but when it's spelled out that reflex just doesn't kick
in to begin with).

>> At the same time, a person is new to LLVM for only a certain
>> period of time, much shorter than the lifetime of the code.

True, but their impression of LLVM when they are new to it may
influence their decision to stick with it or not.

>> The key to readability is to make the important things easy to see, and
>> get the unimportant things out of the way. By using completely expanded
>> names we run the risk of making everything equally "easy to see"...
>
>
> I think this bias towards acronyms (which I used to share) due to keeping things short but still recognizable once people become deeply familiar with LLVM is the wrong prioritization. It does work well for experienced LLVM developers, but I think we should do much more to facilitate and encourage people who are not in this set. While this does come at some cost to highly experienced LLVM developers (reading `library_info` instead of `TLI`), but it seems easily worth it to make the codebase more accessible to new contributors.

+1 for making things accessible to new contributors. Note that this
also works for people that aren't new to LLVM per se, but may be new
to a given part of the codebase.

I'm also not sure it comes at such a high cost to experienced
developers, since you'd get used to seeing 'library_info' after a
while. It's not like your brain processes every single letter to
recognize a word. Personally, when I look through code I decode the
acronyms to the full name in my head anyway (it's not something I do
on purpose, it's just how it works for me).

Just my 2c.
Diana

James Henderson via llvm-dev

unread,
Feb 19, 2019, 5:34:53 AM2/19/19
to David Greene, llvm-dev
+1 to David's statement that naming members and locals differently seems strange to me. I can't think of a case where it's been important for me to distinguish between a local and class member and it hasn't already been clear at a glance/click etc. Frankly, I just find turning things into non-English words (e.g. due to a prefix/suffix) strange and makes it harder for me to actually read/talk about code with people in person etc (e.g. try talking about the theoretical member variable mThing/m_thing with someone).

Danila Malyutin via llvm-dev

unread,
Feb 19, 2019, 6:08:02 AM2/19/19
to llvm-dev

For what it’s worth, I’ve definitely been in situation where I’ve wondered whether some variable is a member or not (i.e. local or function argument). Especially if I wasn’t using IDE and was looking at an unfamiliar code.
It usually wasn’t hard to resolve but it incurred a “context switch” penalty.

 

Never personally had a problem with short one letter_ prefixes. You don’t need to pronounce them or treat them like part of the name. Similarly to how you don’t need to underscores.

That’s obviously just my opinion (and I’m just starting to get familiar with LLVM code base). Also, pervasiveness of acronyms was bigger obstacle in reading new code, in my experience.

 

--

Danila

Michael Platings via llvm-dev

unread,
Feb 19, 2019, 6:27:51 AM2/19/19
to Chandler Carruth, llvm...@lists.llvm.org, nd
> FWIW, I suspect separating the transition of our acronyms from the transition
> of identifiers with non-acronym words may be an effective way to chip away at
> the transition cost... Definitely an area that people who really care about
> this should look at carefully.

If it makes for an easier transition then I'd be happy to go with upper case acronyms and camelBack for non-acronyms. I've updated https://reviews.llvm.org/D57896 accordingly.

-Michael

Krzysztof Parzyszek via llvm-dev

unread,
Feb 19, 2019, 9:49:16 AM2/19/19
to Diana Picus, Chandler Carruth, llvm-dev
On 2/19/2019 4:12 AM, Diana Picus wrote:
> +1 for making things accessible to new contributors. Note that this
> also works for people that aren't new to LLVM per se, but may be new
> to a given part of the codebase.

The issue here is that neither of us is a new contributor, but we're
trying to guess what it would be like for someone new. It may seem that
long names make it easier, but when I started with LLVM I actually found
the naming convention (and the use of abbreviations) very appealing.

I was recently involved in another project, and the most important thing
for me was to identify the logical structure of it: what components it's
made of, what's the role of each component, etc. Reading the lines of
code in detail was secondary. In my experience, accessibility of a
project is directly related to how easy it is to understand what you
need to do (conceptually) when you want to implement some changes. In
that sense LLVM does really well, but it's more of a function of its design.

I'd be careful not to overestimate the impact of the naming convention
on project's accessibility. Barring extreme cases, I'd even suggest that
it's one of the easier things to get used to.

-Krzysztof

Zachary Turner via llvm-dev

unread,
Feb 19, 2019, 10:24:51 AM2/19/19
to Michael Platings, llvm...@lists.llvm.org, nd, Alex Bradbury
On Mon, Feb 18, 2019 at 2:16 AM Michael Platings via llvm-dev <llvm...@lists.llvm.org> wrote:
Regarding a plan for conversion, I'm keen to avoid perfect being the enemy of better.

It seems a bit early to discuss conversion given there’s not consensus on a style.  For example:

If we imagine that over time it evolves such that 50% of the variables have been renamed to camelBack versions of the type names, then it will look like this:

This comment seems to assume that camelBack is the agreed upon style and all we need to do now is move forward, and I do not believe that to be the case

Alex Bradbury via llvm-dev

unread,
Feb 19, 2019, 10:44:01 AM2/19/19
to Zachary Turner, llvm...@lists.llvm.org, nd
On Tue, 19 Feb 2019 at 15:24, Zachary Turner <ztu...@google.com> wrote:
> On Mon, Feb 18, 2019 at 2:16 AM Michael Platings via llvm-dev <llvm...@lists.llvm.org> wrote:
>>
>> Regarding a plan for conversion, I'm keen to avoid perfect being the enemy of better.
>
>
> It seems a bit early to discuss conversion given there’s not consensus on a style. For example:

I see it a bit differently. The first question is "should we change
the LLVM naming conventions". I view the plan for conversion as
essential to answering this question - IMHO if we're going to live
with mixed styles for years (which would be the case if there were no
concerted conversion effort) then the advantages of changing naming
convention are outweighed by the disadvantages by quite some way. So
while I appreciate the desire to separate concerns, I find it
difficult to do so in this case.

Best,

Alex

Michael Platings via llvm-dev

unread,
Feb 19, 2019, 10:46:02 AM2/19/19
to Zachary Turner, llvm...@lists.llvm.org, nd, Alex Bradbury
> It seems a bit early to discuss conversion given there’s not consensus on a style.

My reason for pushing the conversion conversation is that if it's decided that we want to clang-tidy everything then we have much more leeway in which style we use. If we allow conversion to happen organically, then for that to work we need to take into account the current style coexisting with the new style, whatever that may be.

My fear is that if we insist on a big-bang transition then there will be sufficient opposition that we end up making no change at all. For that reason I favour allowing conversion to happen organically.

> This comment seems to assume that camelBack is the agreed upon style and all we need to do now is move forward

No, I don't believe it's agreed upon, sorry to give that impression. I see strong views on both sides. I just picked one for arguments sake.

Bruno Ricci via llvm-dev

unread,
Feb 19, 2019, 11:09:35 AM2/19/19
to llvm...@lists.llvm.org, cfe...@lists.llvm.org
> The issue here is that neither of us is a new contributor, but we're
> trying to guess what it would be like for someone new. It may seem that
> long names make it easier, but when I started with LLVM I actually found
> the naming convention (and the use of abbreviations) very appealing.

As a somewhat new contributor to clang, I agree with the above. What the
abbreviations means is very easy to pick up and it let me concentrate on
the structure of the code.

But independently of this I would like to express the fact that a change
of naming convention is very disruptive for what seems to me a minor gain.
Yes there are some inconsistencies across all of the sub-projects but
is that really such a big problem ? The two proposed solutions (big-bang
change and gradually over time) both in general would be quite disruptive
and would be annoying with git blame.

Bruno

Mehdi AMINI via llvm-dev

unread,
Feb 19, 2019, 11:26:02 AM2/19/19
to Alex Bradbury, llvm...@lists.llvm.org, nd
On Tue, Feb 19, 2019 at 10:44 AM Alex Bradbury via llvm-dev <llvm...@lists.llvm.org> wrote:
On Tue, 19 Feb 2019 at 15:24, Zachary Turner <ztu...@google.com> wrote:
> On Mon, Feb 18, 2019 at 2:16 AM Michael Platings via llvm-dev <llvm...@lists.llvm.org> wrote:
>>
>> Regarding a plan for conversion, I'm keen to avoid perfect being the enemy of better.
>
>
> It seems a bit early to discuss conversion given there’s not consensus on a style.  For example:

I see it a bit differently. The first question is "should we change
the LLVM naming conventions". I view the plan for conversion as
essential to answering this question - IMHO if we're going to live
with mixed styles for years (which would be the case if there were no
concerted conversion effort) then the advantages of changing naming
convention are outweighed by the disadvantages by quite some way

Not much to add, just a +1 : this is exactly my view as well.

-- 
Mehdi

Michael Platings via llvm-dev

unread,
Feb 19, 2019, 11:42:22 AM2/19/19
to Alex Bradbury, Zachary Turner, llvm...@lists.llvm.org, nd
Alex Bradbury <a...@lowrisc.org> wrote:
> IMHO if we're going to live
> with mixed styles for years (which would be the case if there were no
> concerted conversion effort) then the advantages of changing naming
> convention are outweighed by the disadvantages by quite some way

The specific concern I've seen raised is that when writing code you'll need to check which form of variable name is used in a particular scope.

Clearly this is a cost, but given that code is read many times more than it is written, I submit that this as a cost worth bearing for the sake of making it easy to write good variable names.

If you see other costs then I'd be interested to know what those are.

Michael Platings via llvm-dev

unread,
Feb 20, 2019, 7:13:44 AM2/20/19
to llvm...@lists.llvm.org, nd
There's a tool git-hyper-blame [1] that's designed to work well with mass-refactoring commits.

It could aid a big-bang transition if we added a .git-blame-ignore-revs file. It could also include IDs of previous NFC commits.

Does this make a big bang refactoring more palatable?

[1] https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html

Chris Lattner via llvm-dev

unread,
Feb 20, 2019, 11:50:29 AM2/20/19
to Krzysztof Parzyszek, llvm...@lists.llvm.org

> On Feb 18, 2019, at 12:09 PM, Krzysztof Parzyszek via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> On 2/18/2019 4:15 AM, Michael Platings via llvm-dev wrote:
>> Taking my previous example [1]:
>> InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,
>> &LVL, &CM);
>> If we imagine that over time it evolves such that 50% of the variables have been renamed to camelBack versions of the type names, then it will look like this:
>> InnerLoopVectorizer LB(loop, PSE, loopInfo, DT, targetLibraryInfo, TTI,
>> assumptionCache, ORE, vectorizationFactor.Width, IC,
>> &loopVectorizationLegality, &CM);
>
> Hold on...
>
> The change from UpperCamel to lowerCamel should be separate from going from X to somethingOtherName.
>
> It seems like in this example, TLI is changed to targetLibraryInfo for the purpose of having a name that lowerCamel can be applied to, which is, at best, backwards.

I completely agree. I don’t see why “tli” would be problematic. I don’t think that “targetLibraryInfo” would make the code easier to read, write, or maintain.

-Chris

Chris Lattner via llvm-dev

unread,
Feb 20, 2019, 11:57:48 AM2/20/19
to Michael Platings, llvm...@lists.llvm.org, nd


On Feb 19, 2019, at 3:27 AM, Michael Platings via llvm-dev <llvm...@lists.llvm.org> wrote:

FWIW, I suspect separating the transition of our acronyms from the transition
of identifiers with non-acronym words may be an effective way to chip away at
the transition cost... Definitely an area that people who really care about
this should look at carefully.

If it makes for an easier transition then I'd be happy to go with upper case acronyms and camelBack for non-acronyms. I've updated https://reviews.llvm.org/D57896 accordingly.

It isn’t clear to me that people are strongly in favor of upper case initialisms, it seems that people are opposed to changing TLI to “targetLibraryInfo”, which I can totally understand.

Communities that use these proposed conventions already include guidelines like this:

Follow case conventions. Names of types and protocols are UpperCamelCase. Everything else is lowerCamelCase.

Acronyms and initialisms that commonly appear as all upper case in American English should be uniformly up- or down-cased according to case conventions:

var utf8Bytes: [UTF8.CodeUnit]
var isRepresentableAsASCII = true
var userSMTPServer: SecureSMTPServer

Other acronyms should be treated as ordinary words:

var radarDetector: RadarScanner
var enjoysScubaDiving = true
Among other nice things, this specifically means that things like “tli” are ok as is.



I think the significant point you raise is how to handle the transition.  I think that we can handle this in a soft way by saying something like “all new code should follow the conventions”, “old code shouldn’t be mass migrated”, “significant changes to old code can do local changes to improve adherence to convention”.  (or something like that).  This case is far from the only thing in LLVM that is inconsistent, so that seems like a reasonable global policy independent of the capitalization discussion.

-Chris

Chris Lattner via llvm-dev

unread,
Feb 20, 2019, 12:07:20 PM2/20/19
to Alex Bradbury, llvm...@lists.llvm.org, nd

> On Feb 19, 2019, at 7:43 AM, Alex Bradbury via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> On Tue, 19 Feb 2019 at 15:24, Zachary Turner <ztu...@google.com> wrote:
>> On Mon, Feb 18, 2019 at 2:16 AM Michael Platings via llvm-dev <llvm...@lists.llvm.org> wrote:
>>>
>>> Regarding a plan for conversion, I'm keen to avoid perfect being the enemy of better.
>>
>>
>> It seems a bit early to discuss conversion given there’s not consensus on a style. For example:
>
> I see it a bit differently. The first question is "should we change
> the LLVM naming conventions". I view the plan for conversion as
> essential to answering this question - IMHO if we're going to live
> with mixed styles for years (which would be the case if there were no
> concerted conversion effort) then the advantages of changing naming
> convention are outweighed by the disadvantages by quite some way. So
> while I appreciate the desire to separate concerns, I find it
> difficult to do so in this case.

I absolutely respect that position and concern, but there are other factors to consider. Let’s assume that we all agreed that a change was the right thing to do, and we were only concerned about the transition (just to simplify the discussion):

1) Transition cost can be used to argue against *any* global improvement to an existing codebase. The GCC community used this argument for many years arguing against moving to C++ from C. It is true that there will be a time of transition, but at some point in time, you just decide that the cost of a global mechanical change is worth it and you get to consistency.

2) Tremendous amounts of new code is being written, and lots of existing code is being touched. This provides opportunities to incrementally migrate.

3) The cost of inconsistency is low, because the affected declarations typically have local scope. Even the affected globals in LLVM are typically static within a file. This means that you don’t need a global index to go what is going on. Variables and functions are also often named quite different (e.g. functions often use imperative verbs).

4) The LLVM community and project is growing through new subprojects, new targets, and other new things, and holding back the “correct” thing in new subprojects because of legacy code in other parts of the project doesn’t make sense.


That said, I understand that there is still controversy about whether making a change would improve the project, I just wanted to point out that if we converge on that, then we should consider what LLVM looks like 10 years from now, not just what it looks like 6 months from now. I for one don’t want to see LLVM stagnate, slow, and suffer because of legacy concerns.

-Chris

Alex Bradbury via llvm-dev

unread,
Feb 21, 2019, 9:28:13 AM2/21/19
to Chris Lattner, llvm...@lists.llvm.org, nd

Thanks for elaborating on your thinking Chris. The point about new
subprojects etc is a particularly strong one.

It wasn't clear from my previous email, but my general inclination is
to pursue global improvements such as this, and to combine them with a
somewhat aggressive transition plan in order for the benefits to be
reaped sooner (and to minimise the cost of inconsistency).

Best,

Alex

via llvm-dev

unread,
Feb 22, 2019, 10:59:23 AM2/22/19
to a...@lowrisc.org, clat...@nondot.org, llvm...@lists.llvm.org, n...@arm.com
I had posted something in the code review but Chris suggested doing it
here instead, which makes sense. Also I have to remember that the
discussion is specifically about spelling variables, not changing any
other spelling conventions.

Looking at names of "variables" there's reasonable support for making
them visually more distinct from other kinds of names. Regarding
making data-member names distinct from local variables, some people
don't see why it should matter, others find it helpful; given this
neutral-to-helpful spectrum, going with the kind-of helpful convention
seems preferable.

So:

- Local variables and formal parameters should be lower_case, with
one exception: Variables/parameters that have lambda/function
type should follow the function-name spelling rules.
- Class data members should have an "m_" prefix, so m_lower_case.
- Initialisms and other abbreviations would be considered words for
this purpose, so we have names such as:
tli // Local variable for TargetLoweringInfo
m_cgm // Data member for CodeGenModule
- I don't have a good suggestion for file-static/global variables.
Some people have suggested a "g_" prefix for globals, or possibly
an "s_" prefix for class-static data.

Regarding the transition:

Some people have worried that the churn will cause blame issues.
I respectfully point out that in my own archaeology I have to deal
with lots of clang-format/indentation/other random semantically
meaningless refactoring, this is just one more. Also the point is
not to optimize for git-blame but to optimize for reading what is
there at the moment.

A more focused and shorter transition period will create a lot of
short-term churn but get us to the good endpoint sooner. Doing
conversions per-file or per-class (rather than per-function [too
small] or per-library [too big]) are probably the way to go.
Given we are changing the names used for _data_, and we try to
practice good data-hiding, the impact of the conversion of any
given class *ought* to be reasonably confined.

If someone can make clang-tidy help with this, that's awesome.

I'm almost afraid to make the next suggestion, but here goes:
In more complicated/wide-impact cases, it would be possible to
stage a data-member name conversion into "small-bang" iterations
using a C++ tactic like this:
class Foo {
int m_bar; // The renamed member.
int &Bar = m_bar; // TEMPORARY alias using the old name.
};
This would have to be done sparingly and for good reason, such as
when the names are known across many components/subprojects and
doing them all at once would be really too much. Someone would
have to commit to getting it all done and removing the aliases in
a reasonably short period of time. Needing to do this trick would
be (IMO) strong evidence of poor software design and a place to
focus some refactoring effort.

HTH,
--paulr

Jonas Toth via llvm-dev

unread,
Feb 22, 2019, 2:24:44 PM2/22/19
to paul.r...@sony.com, a...@lowrisc.org, clat...@nondot.org, llvm...@lists.llvm.org, n...@arm.com

> If someone can make clang-tidy help with this, that's awesome.
>
Drive By: There is clang-tidy's 'readability-identifier-naming' for
bulk-renaming and clang-rename (maybe clangd can help too) for manual
changes.

Especially clang-tidy-diff could be used to enforce at least new code
and a few modifications to these
tools might help doing the transition.

David Greene via llvm-dev

unread,
Feb 22, 2019, 2:40:31 PM2/22/19
to paul.r...@sony.com, llvm...@lists.llvm.org, n...@arm.com, a...@lowrisc.org
via llvm-dev <llvm...@lists.llvm.org> writes:

> Looking at names of "variables" there's reasonable support for making
> them visually more distinct from other kinds of names. Regarding
> making data-member names distinct from local variables, some people
> don't see why it should matter, others find it helpful; given this
> neutral-to-helpful spectrum, going with the kind-of helpful convention
> seems preferable.

There are drawbacks to the "m_" prefix notation, though. It makes it
more work to move entities between class and local scope. It's extra
typing, it's hard to pronounce, etc.

Now, these may be considered trivial complaints, but given that some of
them have already been raised, I think we need some discussion about it.
If there's overwhelming support for "m_" (and/or "g_" etc.) then fine,
but if there's about even opinions either way, we ought to go with the
status quo, at least for now.

> So:
>
> - Local variables and formal parameters should be lower_case, with
> one exception: Variables/parameters that have lambda/function
> type should follow the function-name spelling rules.

"lower_case" is a pretty drastic change from CamelCase and camelCase.
So far the only argument for it I've seen is, "lldb uses it all over the
place." Having one subproject drive the standard for everything else
seems backward. It's certainly possible I missed other opinions on this
though.

> - Initialisms and other abbreviations would be considered words for
> this purpose, so we have names such as:
> tli // Local variable for TargetLoweringInfo
> m_cgm // Data member for CodeGenModule

I actually think we need something stronger. Acronyms should be
discouraged unless it's absolutely clear what it is. As has been
pointed out, "tli" means at least two common and very different things:
"TargetLoweringInfo" and "TargetLibraryInfo."

> - I don't have a good suggestion for file-static/global variables.
> Some people have suggested a "g_" prefix for globals, or possibly
> an "s_" prefix for class-static data.

Is this really helpful or does it just make the code more difficult to
work with? I don't know since I've never used such conventions before.
But I'm hesitant to introduce really strong binding of scopes to names
because it make refactoring more tedious/difficult.

> Some people have worried that the churn will cause blame issues.
> I respectfully point out that in my own archaeology I have to deal
> with lots of clang-format/indentation/other random semantically
> meaningless refactoring, this is just one more. Also the point is
> not to optimize for git-blame but to optimize for reading what is
> there at the moment.

I want to call out that we already have a policy on this in the current
coding standards:

Our long term goal is for the entire codebase to follow the
convention, but we explicitly do not want patches that do large-scale
reformatting of existing code.

I could argue things either way, but we should realize that a mechanical
reformatting goes explicitly against current stated (and presumably
previously-agreed-upon) policy.

-David

Krzysztof Parzyszek via llvm-dev

unread,
Feb 22, 2019, 3:11:15 PM2/22/19
to llvm...@lists.llvm.org
On 2/22/2019 1:40 PM, David Greene via llvm-dev wrote:
> I actually think we need something stronger. Acronyms should be
> discouraged unless it's absolutely clear what it is.

I think we should avoid terms that are both strict and subjective at the
same time. (How do you test if something is absolutely clear?) Instead
we should delegate resolving such ambiguities to the common sense of the
author and the reviewers.

-Krzysztof

Michael Kruse via llvm-dev

unread,
Feb 22, 2019, 3:38:57 PM2/22/19
to Robinson, Paul, llvm-dev
Am Fr., 22. Feb. 2019 um 09:59 Uhr schrieb via llvm-dev
<llvm...@lists.llvm.org>:

> Looking at names of "variables" there's reasonable support for making
> them visually more distinct from other kinds of names. Regarding
> making data-member names distinct from local variables, some people
> don't see why it should matter, others find it helpful; given this
> neutral-to-helpful spectrum, going with the kind-of helpful convention
> seems preferable.
>
> So:
>
> - Local variables and formal parameters should be lower_case, with
> one exception: Variables/parameters that have lambda/function
> type should follow the function-name spelling rules.
> - Class data members should have an "m_" prefix, so m_lower_case.
> - Initialisms and other abbreviations would be considered words for
> this purpose, so we have names such as:
> tli // Local variable for TargetLoweringInfo
> m_cgm // Data member for CodeGenModule
> - I don't have a good suggestion for file-static/global variables.
> Some people have suggested a "g_" prefix for globals, or possibly
> an "s_" prefix for class-static data.

If the goal of prefixes is to visually distinguish different scopes:
Some IDEs/editors support semantic code highlighting [1], i.e. assign
different colors to local/global/static variables/members/parameters.
I know at least the following support this natively: KDevelop [1],
Visual Studio [2], Eclipse CDT [3]

Michael

[1] https://zwabel.wordpress.com/2009/01/08/c-ide-evolution-from-syntax-highlighting-to-semantic-highlighting/
[2] https://devblogs.microsoft.com/cppblog/first-look-at-the-new-c-ide-productivity-features-in-visual-studio-11/
[3] https://help.eclipse.org/kepler/index.jsp?topic=%2Forg.eclipse.cdt.doc.user%2Freference%2Fcdt_u_c_editor_color.htm

Chandler Carruth via llvm-dev

unread,
Feb 22, 2019, 3:39:27 PM2/22/19
to Robinson, Paul, llvm-dev, nd, Alex Bradbury
On Fri, Feb 22, 2019 at 5:59 AM via llvm-dev <llvm...@lists.llvm.org> wrote:
- Local variables and formal parameters should be lower_case, with
  one exception: Variables/parameters that have lambda/function
  type should follow the function-name spelling rules.

I really dislike this exception. Callable objects are *objects* and locally scoped, I would much prefer they look like variables.

Also, what about callable objects that aren't lambdas? Or that use operator() for something other than emulating a function call?

I think the simple rule is superior. 

- Initialisms and other abbreviations would be considered words for
  this purpose, so we have names such as:
    tli   // Local variable for TargetLoweringInfo
    m_cgm // Data member for CodeGenModule
 
Agreed.
 
- I don't have a good suggestion for file-static/global variables.
  Some people have suggested a "g_" prefix for globals, or possibly
  an "s_" prefix for class-static data.

These are rare enough that I'm not sure we need special rules for naming them. They should also should typically be wrapped up in an actual API limiting how widely they are referenced.


Regarding the transition:

Some people have worried that the churn will cause blame issues.
I respectfully point out that in my own archaeology I have to deal
with lots of clang-format/indentation/other random semantically
meaningless refactoring, this is just one more.  Also the point is
not to optimize for git-blame but to optimize for reading what is
there at the moment.

A more focused and shorter transition period will create a lot of
short-term churn but get us to the good endpoint sooner.  Doing
conversions per-file or per-class (rather than per-function [too
small] or per-library [too big]) are probably the way to go. 
Given we are changing the names used for _data_, and we try to
practice good data-hiding, the impact of the conversion of any
given class *ought* to be reasonably confined.

I generally agree with this strategy. That said, I would still do it somewhat lazily rather than eagerly, but batched much as you're suggesting.
 

If someone can make clang-tidy help with this, that's awesome.

I'm almost afraid to make the next suggestion, but here goes:
In more complicated/wide-impact cases, it would be possible to
stage a data-member name conversion into "small-bang" iterations
using a C++ tactic like this:
    class Foo {
      int m_bar; // The renamed member.
      int &Bar = m_bar; // TEMPORARY alias using the old name.
    };
This would have to be done sparingly and for good reason, such as
when the names are known across many components/subprojects and
doing them all at once would be really too much. Someone would
have to commit to getting it all done and removing the aliases in
a reasonably short period of time.  Needing to do this trick would
be (IMO) strong evidence of poor software design and a place to
focus some refactoring effort.

Honestly, I don't think we need to do this. We routinely make wide-ranging API updates. If we need to do that, we do that.

What we *should* do is encourage anyone that before they decide to do this to discuss it and see if there is a good way to hide this usage of a variable name behind a better API and make *that* widespread change instead. Then the name change is more local again.

Chandler Carruth via llvm-dev

unread,
Feb 22, 2019, 3:42:28 PM2/22/19
to David Greene, llvm...@lists.llvm.org, n...@arm.com, a...@lowrisc.org
On Fri, Feb 22, 2019 at 9:40 AM David Greene via llvm-dev <llvm...@lists.llvm.org> wrote:
> So:
>
> - Local variables and formal parameters should be lower_case, with
>   one exception: Variables/parameters that have lambda/function
>   type should follow the function-name spelling rules.

"lower_case" is a pretty drastic change from CamelCase and camelCase.
So far the only argument for it I've seen is, "lldb uses it all over the
place."  Having one subproject drive the standard for everything else
seems backward.  It's certainly possible I missed other opinions on this
though.

This is not how I described my preference for it. I prefer it generally and find it substantially easier to read code using it.

That said, I think the fact that LLDB uses it does contribute two motivations:
1) It does show some subset of the community is familiar with it and has found it to be a reasonably good convention for their usage.
2) It removes some cost of transition.

I wouldn't make the decision *because* of this, but I also don't think we should ignore these points.

via llvm-dev

unread,
Feb 22, 2019, 3:46:11 PM2/22/19
to d...@cray.com, llvm...@lists.llvm.org, n...@arm.com, a...@lowrisc.org

> -----Original Message-----
> From: David Greene [mailto:d...@cray.com]
> Sent: Friday, February 22, 2019 2:40 PM
> To: Robinson, Paul
> Cc: a...@lowrisc.org; clat...@nondot.org; llvm...@lists.llvm.org;
> n...@arm.com
> Subject: Re: [llvm-dev] RFC: changing variable naming rules in LLVM
> codebase
>

> via llvm-dev <llvm...@lists.llvm.org> writes:
>
> > Looking at names of "variables" there's reasonable support for making
> > them visually more distinct from other kinds of names. Regarding
> > making data-member names distinct from local variables, some people
> > don't see why it should matter, others find it helpful; given this
> > neutral-to-helpful spectrum, going with the kind-of helpful convention
> > seems preferable.
>
> There are drawbacks to the "m_" prefix notation, though. It makes it
> more work to move entities between class and local scope. It's extra
> typing, it's hard to pronounce, etc.

IMO these are pretty feeble objections. The "m_" is silent, for example.
Moving entities between class and local scope isn't all that common, and
the small bit of extra typing provides clarity to your colleagues who
like to know what data is ephemeral and what is persistent. It's equally
true that using real words instead of shorthand abbreviations is more
typing, but we accept that for the clarity it brings.

>
> Now, these may be considered trivial complaints, but given that some of
> them have already been raised, I think we need some discussion about it.
> If there's overwhelming support for "m_" (and/or "g_" etc.) then fine,
> but if there's about even opinions either way, we ought to go with the
> status quo, at least for now.

I don't think "overwhelming" is an appropriate barrier to adoption. I
hear people say "yes, this would make it easier for me to read the code"
and on the other side "meh, I don't see the benefit"; however I do *not*
see anyone saying "No, this really would interfere with my code-reading."

>
> > So:
> >
> > - Local variables and formal parameters should be lower_case, with
> > one exception: Variables/parameters that have lambda/function
> > type should follow the function-name spelling rules.
>
> "lower_case" is a pretty drastic change from CamelCase and camelCase.
> So far the only argument for it I've seen is, "lldb uses it all over the
> place." Having one subproject drive the standard for everything else
> seems backward. It's certainly possible I missed other opinions on this
> though.

You have. For example, here's a "significant improvement" comment:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130214.html

The debate suffers from being spread across multiple dev threads plus
a Phabricator review. Not optimal, but that's what it is.

>
> > - Initialisms and other abbreviations would be considered words for
> > this purpose, so we have names such as:
> > tli // Local variable for TargetLoweringInfo
> > m_cgm // Data member for CodeGenModule
>
> I actually think we need something stronger. Acronyms should be
> discouraged unless it's absolutely clear what it is. As has been
> pointed out, "tli" means at least two common and very different things:
> "TargetLoweringInfo" and "TargetLibraryInfo."

That's a separate discussion and would be something to pursue regardless
of the other spelling conventions. Let's not drag that in here. My
inclusion of this is based on "we find these things in names today, and
they aren't 'words' so what do we do with them" not on any assessment
that they are good or bad on their own merits.

>
> > - I don't have a good suggestion for file-static/global variables.
> > Some people have suggested a "g_" prefix for globals, or possibly
> > an "s_" prefix for class-static data.
>
> Is this really helpful or does it just make the code more difficult to
> work with? I don't know since I've never used such conventions before.
> But I'm hesitant to introduce really strong binding of scopes to names
> because it make refactoring more tedious/difficult.

I have, but in a project that had rather more global data than is good
for anyone. I put it out there for discussion, and thanks for doing
exactly that!

>
> > Some people have worried that the churn will cause blame issues.
> > I respectfully point out that in my own archaeology I have to deal
> > with lots of clang-format/indentation/other random semantically
> > meaningless refactoring, this is just one more. Also the point is
> > not to optimize for git-blame but to optimize for reading what is
> > there at the moment.
>
> I want to call out that we already have a policy on this in the current
> coding standards:
>
> Our long term goal is for the entire codebase to follow the
> convention, but we explicitly do not want patches that do large-scale
> reformatting of existing code.
>
> I could argue things either way, but we should realize that a mechanical
> reformatting goes explicitly against current stated (and presumably
> previously-agreed-upon) policy.
>
> -David

Right, but my impression is that people who lived through the last update
to the spelling conventions are less enthused about a slow transition,
this time around, and that bit of the convention would necessarily also
be up for revision.

--paulr

Chandler Carruth via llvm-dev

unread,
Feb 22, 2019, 3:47:06 PM2/22/19
to Krzysztof Parzyszek, llvm-dev
On Fri, Feb 22, 2019 at 10:11 AM Krzysztof Parzyszek via llvm-dev <llvm...@lists.llvm.org> wrote:
On 2/22/2019 1:40 PM, David Greene via llvm-dev wrote:
> I actually think we need something stronger.  Acronyms should be
> discouraged unless it's absolutely clear what it is.

I think we should avoid terms that are both strict and subjective at the
same time. (How do you test if something is absolutely clear?) Instead
we should delegate resolving such ambiguities to the common sense of the
author and the reviewers.

+1 to generally delegating to author and reviewer.

The guidance I would suggest is: "Prefer words over acronyms generally, and especially if the acronyms are not widely used in the industry or literature." I think this still gives some leeway for things like "TCP", but also good pressure against arbitrarily inventing acronyms when there are good words to use for variables.

via llvm-dev

unread,
Feb 22, 2019, 3:51:16 PM2/22/19
to chan...@gmail.com, llvm...@lists.llvm.org, n...@arm.com, a...@lowrisc.org
Chandler wrote:

>> - Local variables and formal parameters should be lower_case, with
>>   one exception: Variables/parameters that have lambda/function
>>   type should follow the function-name spelling rules.
>
> I really dislike this exception. Callable objects are *objects* and
> locally scoped, I would much prefer they look like variables.
>
> Also, what about callable objects that aren't lambdas? Or that use
> operator() for something other than emulating a function call?
>
> I think the simple rule is superior. 

Hm I was probably mis-remembering your opinion from earlier, as this
was supposed to be accommodating it! I'm happy to trash the exception.

Rui Ueyama via llvm-dev

unread,
Feb 22, 2019, 4:48:36 PM2/22/19
to Chandler Carruth, llvm-dev, nd, Alex Bradbury
If we are going to update variable names in a batch, I'd like to nominate lld as a starter project. It is a middle-sized LLVM subproject which currently follows the today's LLVM naming convention, and because of its size it shouldn't be too hard to convert the entire code base in a single patch or a few patches.


If someone can make clang-tidy help with this, that's awesome.

I'm almost afraid to make the next suggestion, but here goes:
In more complicated/wide-impact cases, it would be possible to
stage a data-member name conversion into "small-bang" iterations
using a C++ tactic like this:
    class Foo {
      int m_bar; // The renamed member.
      int &Bar = m_bar; // TEMPORARY alias using the old name.
    };
This would have to be done sparingly and for good reason, such as
when the names are known across many components/subprojects and
doing them all at once would be really too much. Someone would
have to commit to getting it all done and removing the aliases in
a reasonably short period of time.  Needing to do this trick would
be (IMO) strong evidence of poor software design and a place to
focus some refactoring effort.

Honestly, I don't think we need to do this. We routinely make wide-ranging API updates. If we need to do that, we do that.

What we *should* do is encourage anyone that before they decide to do this to discuss it and see if there is a good way to hide this usage of a variable name behind a better API and make *that* widespread change instead. Then the name change is more local again.

James Henderson via llvm-dev

unread,
Feb 25, 2019, 5:36:15 AM2/25/19
to Paul Robinson, David Greene, llvm-dev, n...@arm.com, a...@lowrisc.org
You might treat "m_" as silent, but I don't. It's just not how my mind works when reading code. As for moving entities between class and local scope - I've found myself regularly going from local scope to class scope in the past in other projects at least, although I can't say the same for LLVM. I do know I'd get annoyed by typing m_* before every member variable I have to write, whereas I don't for using real words, but I accept that might just be me.

David Greene via llvm-dev

unread,
Feb 25, 2019, 10:17:36 AM2/25/19
to paul.r...@sony.com, llvm...@lists.llvm.org, n...@arm.com, a...@lowrisc.org
<paul.r...@sony.com> writes:

>> "lower_case" is a pretty drastic change from CamelCase and camelCase.
>> So far the only argument for it I've seen is, "lldb uses it all over the
>> place." Having one subproject drive the standard for everything else
>> seems backward. It's certainly possible I missed other opinions on this
>> though.
>
> You have. For example, here's a "significant improvement" comment:
> http://lists.llvm.org/pipermail/llvm-dev/2019-February/130214.html

But I see nothing there about *why* it would be a "significant
improvement." At best I see an allusion to something like, "this is
really different than anytyhing we've done before so it doesn't conflict
with any existing naming." If I've misinterpreted I hope Chandler will
correct me.

I agree with Chandler that any change will require lots of buy-in from
the community. What's the plan to measure/get that?

James Henderson <jh737...@my.bristol.ac.uk> writes:

> You might treat "m_" as silent, but I don't. It's just not how my mind

> works when reading code. As for moving entities between class and


> local scope - I've found myself regularly going from local scope to
> class scope in the past in other projects at least, although I can't
> say the same for LLVM. I do know I'd get annoyed by typing m_* before
> every member variable I have to write, whereas I don't for using real
> words, but I accept that might just be me.

It's not. My brain works the same way.

-David

Chandler Carruth via llvm-dev

unread,
Feb 25, 2019, 5:47:26 PM2/25/19
to David Greene, llvm...@lists.llvm.org, n...@arm.com, a...@lowrisc.org
On Mon, Feb 25, 2019 at 7:17 AM David Greene via llvm-dev <llvm...@lists.llvm.org> wrote:
<paul.r...@sony.com> writes:

>> "lower_case" is a pretty drastic change from CamelCase and camelCase.
>> So far the only argument for it I've seen is, "lldb uses it all over the
>> place."  Having one subproject drive the standard for everything else
>> seems backward.  It's certainly possible I missed other opinions on this
>> though.
>
> You have. For example, here's a "significant improvement" comment:
> http://lists.llvm.org/pipermail/llvm-dev/2019-February/130214.html

But I see nothing there about *why* it would be a "significant
improvement."  At best I see an allusion to something like, "this is
really different than anytyhing we've done before so it doesn't conflict
with any existing naming."  If I've misinterpreted I hope Chandler will
correct me.

I wrote more details on this thread about why I significantly prefer this. Can you respond there? I don't want to just repeat things that already make sense or miss the things that actually don't make sense.

David Greene via llvm-dev

unread,
Feb 25, 2019, 9:54:01 PM2/25/19
to Chandler Carruth, llvm...@lists.llvm.org, n...@arm.com, a...@lowrisc.org
Here's what I could find on the thread:

http://lists.llvm.org/pipermail/llvm-dev/2019-February/130430.html

"This is not how I described my preference for it. I prefer it generally and
find it substantially easier to read code using it.

That said, I think the fact that LLDB uses it does contribute two
motivations:
1) It does show some subset of the community is familiar with it and has
found it to be a reasonably good convention for their usage.
2) It removes some cost of transition.

I wouldn't make the decision *because* of this, but I also don't think we
should ignore these points."

http://lists.llvm.org/pipermail/llvm-dev/2019-February/130311.html

"There is also a decent amount of code in Clang using foo_bar_baz. ::shrug::

I think the amount of all of these pales in comparison to LLDB, and I think
generally all of these are not going to significantly change the total cost
of transition."

http://lists.llvm.org/pipermail/llvm-dev/2019-February/130309.html

"I think the fact that named things are called does not fully disambiguate
what they are.

I'm not trying to say that the collision with functions is *as* confusing
as that of colliding with types. Merely that both seem confusing. And I
find `foo_bar_baz` and `fooBarBaz` basically equivalent[1]. So between
those equivalents, I would choose the one with fewer collisions.

[1]: Ok, not quite, but I find this to be a more personal preference and am
trying to weight it lower as a consequence. I find functions much more
similar to types -- they are manifest properties of the program. I find
`FooBarBaz` and `fooBarBaz` to be very similar looking. There is *a*
distinction, but it is a minor one. I would prefer a greater visual
difference for variables, which `foo_bar_baz` provides."

Forgive me if I missed a post, it's a long thread. :)

From the above, I conclude that you prefer lower_case because LLDB uses
it and it's different from everything else other projects use. Which I think
is basically what I summarized as the arguments I've read.

I'm still pretty puzzled why distinguishing functions/callables from "variables"
via under_scores or someCamelCase matters. I guess I believe that with good
names and program structure (small functions, etc.) things should be fairly
clear. I'm actually fine with CamelCase for variables (the current convention)
as I haven't found myself confused by it, but I understand why people want to
reserve that for types.

What's the process for making this kind of change? If we want significant buy-in
from the community how do we get that?

-David
________________________________________
From: Chandler Carruth <chan...@gmail.com>
Sent: Monday, February 25, 2019 4:47 PM
To: David Greene
Cc: paul.r...@sony.com; llvm...@lists.llvm.org; n...@arm.com; a...@lowrisc.org


Subject: Re: [llvm-dev] RFC: changing variable naming rules in LLVM codebase

Chris Lattner via llvm-dev

unread,
Feb 27, 2019, 4:58:14 PM2/27/19
to David Greene, llvm...@lists.llvm.org, n...@arm.com, a...@lowrisc.org


On Feb 22, 2019, at 11:40 AM, David Greene <d...@cray.com> wrote:


- Local variables and formal parameters should be lower_case, with
 one exception: Variables/parameters that have lambda/function
 type should follow the function-name spelling rules.

"lower_case" is a pretty drastic change from CamelCase and camelCase.
So far the only argument for it I've seen is, "lldb uses it all over the
place."  Having one subproject drive the standard for everything else
seems backward.  It's certainly possible I missed other opinions on this
though.

FWIW, I agree.  LLDB has always been a vast outlier from the rest of the LLVM project, owing to the project origins (which aren’t relevant to the discussions at hand).  I don’t think that its conventions should be highly prioritized unless they are obviously and objectively better.

In this case, the different between variable_names and TypeNames seems completely indefensible.

-Chris

Chris Lattner via llvm-dev

unread,
Feb 27, 2019, 5:54:21 PM2/27/19
to Chandler Carruth, llvm-dev, nd, Alex Bradbury


On Feb 22, 2019, at 12:39 PM, Chandler Carruth <chan...@gmail.com> wrote:

On Fri, Feb 22, 2019 at 5:59 AM via llvm-dev <llvm...@lists.llvm.org> wrote:
- Local variables and formal parameters should be lower_case, with
  one exception: Variables/parameters that have lambda/function
  type should follow the function-name spelling rules.

I really dislike this exception. Callable objects are *objects* and locally scoped, I would much prefer they look like variables.

Agreed, this is one of the baseline rationales for going with lowerCamel for variables and functions, and UpperCamel for types.  C++ doesn’t have a reasonable metatype system, and so this is a very useful fork between worlds.  Functions are first class values that can be address-taken, passed around and have other transformations applied to them, so they are fundamentally value-y.

-Chris

Chris Lattner via llvm-dev

unread,
Feb 27, 2019, 5:55:56 PM2/27/19
to Rui Ueyama, llvm-dev, nd, Alex Bradbury


On Feb 22, 2019, at 1:48 PM, Rui Ueyama via llvm-dev <llvm...@lists.llvm.org> wrote:

A more focused and shorter transition period will create a lot of
short-term churn but get us to the good endpoint sooner.  Doing
conversions per-file or per-class (rather than per-function [too
small] or per-library [too big]) are probably the way to go.  
Given we are changing the names used for _data_, and we try to 
practice good data-hiding, the impact of the conversion of any 
given class *ought* to be reasonably confined.

I generally agree with this strategy. That said, I would still do it somewhat lazily rather than eagerly, but batched much as you're suggesting.

If we are going to update variable names in a batch, I'd like to nominate lld as a starter project. It is a middle-sized LLVM subproject which currently follows the today's LLVM naming convention, and because of its size it shouldn't be too hard to convert the entire code base in a single patch or a few patches.

This is a really great idea.  This would also give the ability to do an A/B comparison with a specific set of examples.

The reason this matters to me is that we often get caught up on the theory of various ideas, but seeing what it means in practice in full context can really help sometimes.

-Chris

Michael Platings via llvm-dev

unread,
Mar 12, 2019, 9:31:25 AM3/12/19
to llvm...@lists.llvm.org, nd

Thanks very much for all the feedback. I’ve tried to collect the information into a proposal for a transition plan: https://reviews.llvm.org/D59251.

 

-Michael

Renato Golin via llvm-dev

unread,
Apr 30, 2019, 2:20:43 PM4/30/19
to Michael Platings, llvm...@lists.llvm.org, nd
Wow, nice! Got here a bit late, but I'm happy we're finally taking this leap. :)

I agree with lower case variable names, but don't mind much between
camel and underscores. I personally use both depending on the phase of
the moon, so...

I really dislike variable type separation, especially in a
quasi-strongly typed language where we normally declare the casts
intentionally.

Using auto should be done when the variable is transient or the type
is too complex, which would provide complex and confusing mangling for
the prefix.

I like the separation of "useful acronyms" and "unnecessary
one-letters" and I think your document has a lot of common sense.

Some may say we'll shortly incur into a battle of styles, but I think
that won't last long. Our community is stable enough for that kind of
petty behaviour to endure. :)

I agree with Chris to change what's needed, not a huge refactor. This
has a good number of precedents in LLVM and reduces the impact of
non-functional changes to a minimum. This is also more ameanable to
dowsntream repos.

Though, Rui's idea to do a bigger refactory in lld is brilliant, even
if a little stressfull. Again, we have precedent for big localised
changes and it mostly ends up as being good community work.

Thanks for pushing this, I agree the code ends up a bit dense, even in
places I am very familiar in, but haven't looked for a year, new
acronyms can make it hard to read. And if that increases the number of
new contributors, all the better!

cheers,
--renato

Michael Platings via llvm-dev

unread,
May 21, 2019, 6:01:27 AM5/21/19
to llvm...@lists.llvm.org, nd
Hi folks,

Git is on its way to learning how to ignore commits, allowing us to do variable renaming and other small refactorings without breaking git blame.

It's like git-hyper-blame [1] but significantly more powerful as it uses fuzzy matching to match lines, including lines that may have been split or joined.

A preview release of Git with this new feature is at: https://github.com/mplatings/git/releases/tag/ignore-rev

Some of you have told me that you already have to spend time running git blame multiple times to look past uninteresting commits so I'd love for you to give this feature a try and see if it helps you. Your feedback will be very valuable.

Thanks,
-Michael

[1] https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html

Chris Lattner via llvm-dev

unread,
May 25, 2019, 2:00:27 AM5/25/19
to Michael Platings, llvm...@lists.llvm.org, nd
Hi Michael,

I’m still very interested in seeing a change here.  If someone is interested in seeing a codebase using the proposed camelBack convention for variables names, the MLIR codebase is public and uses it.

If you’re curious to see what this looks like in practice, there are lots of examples in the codebase, here is an example .cpp filehere is another, here is an example header.


We are still working our way through open sourcing logistics (not all the code is out yet), but we would still like to contribute at least parts of this to LLVM if the project is interested.  [[This is just an FYI, not itself a proposal yet - one will be coming when we’re ready.]]

-Chris

Rui Ueyama via llvm-dev

unread,
Jun 7, 2019, 3:42:45 AM6/7/19
to Chris Lattner, llvm...@lists.llvm.org, nd
This thread is not active for a while, but I'm still interested in seeing a change.

Reading through this thread, looks like we can agree that we want to change the local variable naming scheme so that they start with a lowercase letter. Besides that, there were discussions about lower_case, camelCase, m_ prefix, and each argument seems as convincing as others. I'm not sure what is the decision making process in a situation like this.

I'd personally vote for changing local variables to start with a lowercase letter and keep other naming conventions as-is to make it a minimum change.

As I stated before, I'm happy to make a change to lld to see how a naming convention change will look like, but in order to do that I need to get at least a rough consensus to do that. What is a way to proceed?

Michael Platings via llvm-dev

unread,
Jun 10, 2019, 5:27:53 AM6/10/19
to Rui Ueyama, Chris Lattner, llvm...@lists.llvm.org, nd

Hi Rui,

 

As per the provisional plan [1] we’re still at step 1: improving git blame. The status of this is that there are some fairly mature patches in the Git project’s queue [2], and I’m hopeful that it will be accepted in something close to its current form.

 

But if you can get started on steps 2 & 3 i.e. making forks available with the possible changes applied then that would be great. My hope is that once everyone can see what the options really look like then it will be easier to reach consensus.

 

Thanks,

-Michael

 

[1] https://llvm.org/docs/Proposals/VariableNames.html#provisional-plan

[2] https://public-inbox.org/git/2019051521450...@google.com/T/

 

From: Rui Ueyama <ru...@google.com>
Sent: 07 June 2019 08:42
To: Chris Lattner <clat...@nondot.org>
Cc: Michael Platings <Michael....@arm.com>; llvm...@lists.llvm.org; nd <n...@arm.com>
Subject: Re: [llvm-dev] RFC: changing variable naming rules in LLVM codebase

 

This thread is not active for a while, but I'm still interested in seeing a change.

Rui Ueyama via llvm-dev

unread,
Jun 10, 2019, 5:35:20 AM6/10/19
to Michael Platings, llvm...@lists.llvm.org, nd
On Mon, Jun 10, 2019 at 6:27 PM Michael Platings <Michael....@arm.com> wrote:

Hi Rui,

 

As per the provisional plan [1] we’re still at step 1: improving git blame. The status of this is that there are some fairly mature patches in the Git project’s queue [2], and I’m hopeful that it will be accepted in something close to its current form.

 

But if you can get started on steps 2 & 3 i.e. making forks available with the possible changes applied then that would be great. My hope is that once everyone can see what the options really look like then it will be easier to reach consensus.


Sure, I'll try to do that. I'll probably start with finding identifiers and typenames that will conflict after the naming scheme change and rename them so that they won't conflict. The number of such symbols would hopefully be small, and submitting such renaming changes wouldn't be distracting. After that, I think I can create a mechanical change to rename variables to see how it will look like.

Rui Ueyama via llvm-dev

unread,
Jul 5, 2019, 12:50:38 AM7/5/19
to Michael Platings, llvm...@lists.llvm.org, nd
Hi all,

I wrote a tool to batch-rename variable names so that they are in camelCase, and I applied the tool to lld subdirectory. You can see my change at https://reviews.llvm.org/D64121. If you have any comments, please reply.

If people are happy about this change, I can do the same thing for other directories including LLVM itself and Clang.

Chris Lattner via llvm-dev

unread,
Jul 9, 2019, 1:03:36 AM7/9/19
to Rui Ueyama, llvm...@lists.llvm.org, nd
This looks really great to me Rui, and I’m also pleased to see the positive comments on the review thread.  Thank you for driving this forward!

-Chris

Rui Ueyama via llvm-dev

unread,
Jul 9, 2019, 3:23:34 AM7/9/19
to Chris Lattner, llvm...@lists.llvm.org, nd
Thanks, Chris.

Looks like everybody is at least mildly comfortable with my variable name renaming change, so I'll to submit that change to lld subdirectory soon. If you have any objections, please let me know. Note that this is not the end of my effort but actually the beginning of it. As Chris said, I believe we should do this to the entire LLVM codebase. I'm planning to do that directory-by-directory basis.

James Henderson via llvm-dev

unread,
Jul 9, 2019, 6:17:04 AM7/9/19
to Rui Ueyama, llvm...@lists.llvm.org, nd
Hi Rui,

Is your tool in a good state to be run on people's downstream repositories? It would be nice to be able to update variable names there too at the same time.

James

Rui Ueyama via llvm-dev

unread,
Jul 9, 2019, 7:14:08 AM7/9/19
to jh737...@my.bristol.ac.uk, llvm...@lists.llvm.org, nd
It should work on downstream lld repositories if exist, though I didn't try that personally.

The tool is a batch tool, so you can rename variables in your downstream repository using that tool. Given that, here is how to rebase your repo to a commit after a mass renaming:

1. rebase to a commit just before a mass variable renaming,
2. apply the tool to a downstream repo to mass-rename variables locally, and then
3. rebase again to the head.

Most changes made by the tool should be identical for a downstream repository and for the head, so at step 3, almost all changes should be merged and disappear. I'd expect that there would be some lines that you need to merge by hand, but I'd think that that's not too many.

James Henderson via llvm-dev

unread,
Jul 9, 2019, 7:31:52 AM7/9/19
to Rui Ueyama, llvm...@lists.llvm.org, nd
Thanks Rui! That's essentially what I thought would work.

Joan Lluch via llvm-dev

unread,
Jul 9, 2019, 8:46:39 AM7/9/19
to Rui Ueyama, llvm...@lists.llvm.org, nd
Hi Rui,

I’m totally positive on switching to camelCase convention. In fact I have been always uncomfortable with the current naming approach. 

My only suggestion/concern is that we should provide a transition path not only for the trunk code in the repository, but for eventual out-of-trunk code with implementations of custom architectures. I have currently a custom backend implemented on top of LLVM 9 and therefore this change will surely break my code. I assume that developers affected by this will be able to run the converting tools to fix their own code. 

John

Neil Nelson via llvm-dev

unread,
Jul 9, 2019, 11:48:27 AM7/9/19
to llvm...@lists.llvm.org

CamelCase is a style most commonly seen with those having a Microsoft orientation. It is uncommon for Linux code and is avoided by style requirement in the Boost libraries. I have coded CamelCase and otherwise.

Perhaps changing the style of the original code merely on this point is a minor slam at those original authors who had a different perspective in mind. We may want to consider leaving the style as it is for a while to review how the current code-base is styled and why those authors preferred that style. Merely changing to CamelCase has a kind of 'Let's dress ourselves in Microsoft style", a point somewhat eagerly denied in the Linux community.

Neil Nelson

Chris Lattner via llvm-dev

unread,
Jul 9, 2019, 12:39:13 PM7/9/19
to Neil Nelson, llvm...@lists.llvm.org
Hi Neil,

This convention discussion has nothing to do with Microsoft vs Linux.  It was discussed on the list over the course of years, I’d recommend catching up on the back discussions if you’re curious about the rationale.

-Chris

Rui Ueyama via llvm-dev

unread,
Jul 10, 2019, 12:18:46 AM7/10/19
to Joan Lluch, llvm...@lists.llvm.org, nd
Hi Joan,

On Tue, Jul 9, 2019 at 9:46 PM Joan Lluch <joan....@icloud.com> wrote:
Hi Rui,

I’m totally positive on switching to camelCase convention. In fact I have been always uncomfortable with the current naming approach. 

My only suggestion/concern is that we should provide a transition path not only for the trunk code in the repository, but for eventual out-of-trunk code with implementations of custom architectures. I have currently a custom backend implemented on top of LLVM 9 and therefore this change will surely break my code. I assume that developers affected by this will be able to run the converting tools to fix their own code. 

The tool that I wrote for lld's style conversion should work for out-of-trunk code, so as I described in the previous email, third-party code maintainer should be able to use the tool to convert the style first in their repositories and then rebase in order to avoid large merge conflicts.

The tool needs to be polished to convert other subprojects such as clang, but I'll keep your request in mind. I'll try my best to provide a smooth transition path for out-of-trunk code for any change that I'll submit for the style change.

Alex Brachet-Mialot via llvm-dev

unread,
Jul 10, 2019, 7:13:03 AM7/10/19
to Rui Ueyama, llvm...@lists.llvm.org, nd
Rui,

I have created D64474 to change comments explicitly stating the parameter names for constants ( /*parameterName=*/<constant> ). I did this by hand to match the new variable names. Do you know if there would be a way to update these comments with a tool similar to what you used to change these names? Perhaps it would be much more difficult, I'm guessing clang's AST doesn't have a way to describe comments? It's obviously not a huge deal to have these changed it could be done over time.

Best,
Alex

Alex Brachet-Mialot via llvm-dev

unread,
Jul 10, 2019, 7:20:53 AM7/10/19
to Rui Ueyama, llvm...@lists.llvm.org, nd
Also, now that I think about it, I believe doxygen will fail to build if the @parameter comments aren't changed to match the new names, my guess is it case sensitive. So perhaps we will need to find a way to manually change these names for doxygen comments?

Rui Ueyama via llvm-dev

unread,
Jul 10, 2019, 7:21:22 AM7/10/19
to Alex Brachet-Mialot, llvm...@lists.llvm.org, nd
Alex,

Thank you for pointing that out. I haven't thought that before. Maybe it is not very easy to update a comment like that using Clang AST, but I guess that a simple Perl-style regex (such as `/*[A-Z])(.*)=*/.\1`) would work fine as the pattern is unique, and I don't think that there are too many false positives. Let me try.

Rui Ueyama via llvm-dev

unread,
Jul 10, 2019, 7:24:34 AM7/10/19
to Alex Brachet-Mialot, llvm...@lists.llvm.org, nd
Good point, too. I believe I can find lines starting with `@parameter` and apply the same name conversion rules to identifiers after `@parameter`. Since lld doesn't use doxygen comments, it is fine for now, but before moving forward, I'll address that. Thank you for pointing that out.

Rui Ueyama via llvm-dev

unread,
Jul 12, 2019, 1:06:03 AM7/12/19
to Alex Brachet-Mialot, llvm...@lists.llvm.org, nd
So, I submitted a few patches to rename all variables in lld. If you are interested in how it looks like, pick up any .cpp or .h file from https://github.com/llvm/llvm-project/tree/master/lld.

I learned a few things by doing this which will help me do the same thing to other LLVM (sub-)projects.

 - Overall a batch tool to rename variables worked reasonably well, so the coding style change is doable.

 - There were a few classes that have a member variable Foo and a function foo(), which would conflict after renaming. I rename variables manually before renaming them. That's not a scalable solution, though. For a larger codebase, I'd probably need to automate it by, for example, renaming foo() to getFoo() if there's Foo variable already.

 - There were a few variables that would become a reserved word such as `new` or `private` after renaming. I renamed them manually before mass-renaming. For scalability, it probably have to be automated by appending `_` at the end, for example.

 - My clang-based tool didn't work for #ifdef'ed-out code, which caused unexpected failures on buildbots after submitting. I don't know how to fix the tool so that the tool can handle code containing #ifdefs, but at least we need to keep it in mind so that we can check it manually.

 - LLVM's `/*foo=*/`-style comment to annotate function arguments need to be handled automatically to make the tool scalable. So is the doxygen @parameter.

 - Since a variable rename change virtually touches every line of codebase, that would cause massive merge conflicts to downstream repos if we don't do anything to support them. We need to provide a tool and guidance as to how to apply the tool to a out-of-tree repo so that a renaming change is merged smoothly.

Nicolai Hähnle-Montoro via llvm-dev

unread,
Jul 12, 2019, 3:37:16 AM7/12/19
to Rui Ueyama, llvm...@lists.llvm.org, nd
On Fri, Jul 12, 2019 at 7:06 AM Rui Ueyama via llvm-dev
<llvm...@lists.llvm.org> wrote:
> - LLVM's `/*foo=*/`-style comment to annotate function arguments need to be handled automatically to make the tool scalable. So is the doxygen @parameter.

There is also @p for "inline" references to parameters.

Cheers,
Nicolai

Edd Dawson via llvm-dev

unread,
Jul 12, 2019, 9:04:41 AM7/12/19
to Rui Ueyama, llvm...@lists.llvm.org, nd
Hi Rui, all,

Yesterday I brought the variable-renaming commits in to Sony's
downstream LLD. We have a merge-based flow rather than continually
rebasing our patch set, but it went reasonably smoothly nevertheless.

The one snag I hit is that the tool initially missed variables mentioned
in assert()s. I didn't put much time in to investigating this, but I
presume it's because my compile_commands.json was build with assert()s
disabled and so the names mentioned in the predicates were invisible to
clang-llvm-rename. The result was that I ended up with something that
built cleanly with NDEBUG, but not otherwise. I guess this is
essentially the same as the #ifdef'd-out code issue you mentioned, but
its effect is probably more widespread.

It was easily remedied by building in another configuration and
reapplying to tool, but it's something others might want to watch out
for.

Thanks,
Edd

>> I wrote a tool [1] to batch-rename variable names so that they are

>> [2]
>>
> https://public-inbox.org/git/2019051521450...@google.com/T/
>> [3]
>>
>> FROM: Rui Ueyama <ru...@google.com>
>> SENT: 07 June 2019 08:42
>> TO: Chris Lattner <clat...@nondot.org>
>> CC: Michael Platings <Michael....@arm.com>;
>> llvm...@lists.llvm.org; nd <n...@arm.com>
>> SUBJECT: Re: [llvm-dev] RFC: changing variable naming rules in LLVM


>> codebase
>>
>> This thread is not active for a while, but I'm still interested in
>> seeing a change.
>>
>> Reading through this thread, looks like we can agree that we want to
>> change the local variable naming scheme so that they start with a
>> lowercase letter. Besides that, there were discussions about
>> lower_case, camelCase, m_ prefix, and each argument seems as
>> convincing as others. I'm not sure what is the decision making
>> process in a situation like this.
>>
>> I'd personally vote for changing local variables to start with a
>> lowercase letter and keep other naming conventions as-is to make it
>> a minimum change.
>>
>> As I stated before, I'm happy to make a change to lld to see how a
>> naming convention change will look like, but in order to do that I
>> need to get at least a rough consensus to do that. What is a way to
>> proceed?
>>
>> On Sat, May 25, 2019 at 3:00 PM Chris Lattner via llvm-dev
>> <llvm...@lists.llvm.org> wrote:
>>
>> Hi Michael,
>>
>> I’m still very interested in seeing a change here. If someone is
>> interested in seeing a codebase using the proposed camelBack

>> convention for variables names, the MLIR codebase is public [4] and


>> uses it.
>>
>> If you’re curious to see what this looks like in practice, there
>> are lots of examples in the codebase, here is an example .cpp file

>> [5], here is another [6], here is an example header [7].


>>
>> We are still working our way through open sourcing logistics (not
>> all the code is out yet), but we would still like to contribute at
>> least parts of this to LLVM if the project is interested. [[This is
>> just an FYI, not itself a proposal yet - one will be coming when
>> we’re ready.]]
>>
>> -Chris
>>
>> On May 21, 2019, at 3:01 AM, Michael Platings via llvm-dev
>> <llvm...@lists.llvm.org> wrote:
>>
>> Hi folks,
>>
>> Git is on its way to learning how to ignore commits, allowing us to
>> do variable renaming and other small refactorings without breaking
>> git blame.
>>
>> It's like git-hyper-blame [1] but significantly more powerful as it
>> uses fuzzy matching to match lines, including lines that may have
>> been split or joined.
>>
>> A preview release of Git with this new feature is at:

>> https://github.com/mplatings/git/releases/tag/ignore-rev [8]


>>
>> Some of you have told me that you already have to spend time running
>> git blame multiple times to look past uninteresting commits so I'd
>> love for you to give this feature a try and see if it helps you.
>> Your feedback will be very valuable.
>>
>> Thanks,
>> -Michael
>>
>> [1]
>>
> https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html

>> [9]


>> _______________________________________________
>> LLVM Developers mailing list
>> llvm...@lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm...@lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm...@lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

> Links:
> ------
> [1] https://reviews.llvm.org/D64123
> [2] https://llvm.org/docs/Proposals/VariableNames.html#provisional-plan
> [3]
> https://public-inbox.org/git/2019051521450...@google.com/T/
> [4] https://github.com/tensorflow/mlir
> [5]
> https://github.com/tensorflow/mlir/blob/master/lib/Transforms/LoopUnrollAndJam.cpp
> [6]
> https://github.com/tensorflow/mlir/blob/master/lib/Parser/Parser.cpp
> [7]
> https://github.com/tensorflow/mlir/blob/master/include/mlir/IR/Location.h
> [8] https://github.com/mplatings/git/releases/tag/ignore-rev
> [9]

--
Edd Dawson
SN Systems - Sony Interactive Entertainment
http://www.snsystems.com

David Greene via llvm-dev

unread,
Jul 12, 2019, 12:04:30 PM7/12/19
to Rui Ueyama via llvm-dev, nd
Rui Ueyama via llvm-dev <llvm...@lists.llvm.org> writes:

> - LLVM's `/*foo=*/`-style comment to annotate function arguments need
> to be handled automatically to make the tool scalable. So is the
> doxygen @parameter.

This is a bit of a side note, but in my own work I've more recently
tried to move from this style:

foo.h

int foo(int a, bool doSomething);

foo.cpp

x = foo(a, /*doSomething=*/true);
y = foo(a, /*doSomething=*/false);

to something like:

foo.h

inline constexpr DoDoSomething = true;
inline constexpr DontDoSomething = false;

int foo(int a, bool doSomething);

foo.cpp

x = foo(a, DoDoSomething);
y = foo(a, DontDoSomething);

One doesn't need the inline variable (and thus not C++17) if one uses
macros or enums or something else less elegant.

This kind of thing is slightly more cumbersome to do if the parameter
takes a wide range of values, but the most common place I see the former
style is for Boolean arguments. Nevertheless, the wide-ranging case can
be handled:

bar.h

inline int Threshold(int x) { return x; }

int bar(int a, int threshold);

bar.cpp

x = bar(a, Threshold(1000));
y = bar(a, Threshold(100));

With either technique the "named values" could be wrapped in their own
namespaces to avoid collisions.

I wonder if there is any appetite for doing something similar in LLVM.

-David

David Greene via llvm-dev

unread,
Jul 12, 2019, 12:06:00 PM7/12/19
to Edd Dawson via llvm-dev, nd
We also have a merge-based workflow. Did you just pull down the most
recent upstream and do a direct merge to your branch, or was there
another step in there to make conflicts less likely, such as renaming
your downstream branch first before merging from upstream?

-David

David Blaikie via llvm-dev

unread,
Jul 12, 2019, 12:17:18 PM7/12/19
to David Greene, Rui Ueyama via llvm-dev, nd
Why would enums be less elegant than named boolean constants as you've shown here? http://jlebar.com/2011/12/16/Boolean_parameters_to_API_functions_considered_harmful..html (at a random googling for "bool parameters considered harmful") reasonably suggests splitting functions, but that's not always practical - often lots of common code, etc (but writing wrappers isn't too bad in that case either - two different functions with different names that both call the common one with the boolean parameter - so it's not hard to trace from the call to the implementation to know what that bool does, and is obvious by these two side-by-side differently named functions) and the comments on that article discuss enums as well.

Justin Lebar via llvm-dev

unread,
Jul 12, 2019, 12:34:27 PM7/12/19
to David Blaikie, David Greene, Rui Ueyama via llvm-dev, nd
FWIW if I were writing that post today I'd probably say that doing
`foo(/*weak=*/true)` is often as good as `fooWeak()` *so long as* one
has tooling that checks and enforces the `/*weak=*/` comment is
present and correct. The comment thing wasn't something I was aware
of back then, if it even existed.

In the specific case of "add weak observer" I still think that a
separate function is probably a better idea, but it's easy to come up
with examples where it's not, e.g. cases where you often have the
boolean as a variable already, and so you don't want to do `if (thing)
{ doX(); } else { do Y(); }` all the time.

Edd Dawson via llvm-dev

unread,
Jul 12, 2019, 1:05:28 PM7/12/19
to David Greene, Edd Dawson via llvm-dev, nd
Hi David,

This is what I did:

1. Merged to just before the commit that changed the naming convention
in lld/ELF (where we have all of our private patches).

2. Ran clang-llvm-rename on our downstream lld/ELF and diff-ed the
result against the upstream change that did the same. The differences
were only in the areas that we have modified - good. I then committed
this.

3. Finally I merged in the upstream change that changed the naming
convention, simply accepting ours in the case of conflict.

It's surely possible to do it without the intermediate commit, but I
don't think that it would have bought us much. One side effect is that
*I* appear in git blame everywhere now. You may want to avoid that, I
guess.

Thanks,
Edd

--

Edd Dawson
SN Systems - Sony Interactive Entertainment
http://www.snsystems.com

David Greene via llvm-dev

unread,
Jul 12, 2019, 1:32:38 PM7/12/19
to David Blaikie, Rui Ueyama via llvm-dev, nd
David Blaikie <dbla...@gmail.com> writes:

> Why would enums be less elegant than named boolean constants as you've shown here?

Casting, mainly. If the parameters were also changed to an enum type
that would be fine too, probably better than inline variables even.

> http://jlebar.com/2011/12/16/Boolean_parameters_to_API_functions_considered_harmful..html
> (at a random googling for "bool parameters considered harmful")
> reasonably suggests splitting functions, but that's not always
> practical - often lots of common code, etc (but writing wrappers isn't
> too bad in that case either - two different functions with different
> names that both call the common one with the boolean parameter - so
> it's not hard to trace from the call to the implementation to know
> what that bool does, and is obvious by these two side-by-side
> differently named functions) and the comments on that article discuss
> enums as well.

I like the two functions idea. Of course scaling becomes a problem with
multiple Boolean parameters, as pointed out in the comments.

-David

David Greene via llvm-dev

unread,
Jul 12, 2019, 1:34:21 PM7/12/19
to Edd Dawson, Edd Dawson via llvm-dev, nd
Thank you, that's very helpful!

-David

David Blaikie via llvm-dev

unread,
Jul 12, 2019, 3:13:33 PM7/12/19
to David Greene, Rui Ueyama via llvm-dev, nd
On Fri, Jul 12, 2019 at 10:31 AM David Greene <d...@cray.com> wrote:
David Blaikie <dbla...@gmail.com> writes:

> Why would enums be less elegant than named boolean constants as you've shown here?

Casting, mainly.  If the parameters were also changed to an enum type
that would be fine too, probably better than inline variables even.

Oh, yeah, I meant changing the parameter type to a custom enum - forcing users to have to use the more legible enum names rather than the inscrutible boolean constants.
 

Rui Ueyama via llvm-dev

unread,
Jul 16, 2019, 1:55:40 AM7/16/19
to Edd Dawson, llvm-dev, nd
Hi Edd,

On Fri, Jul 12, 2019 at 10:04 PM Edd Dawson <edd-...@mr-edd.co.uk> wrote:
Hi Rui, all,

Yesterday I brought the variable-renaming commits in to Sony's
downstream LLD. We have a merge-based flow rather than continually
rebasing our patch set, but it went reasonably smoothly nevertheless.

I'm very glad to hear that!
 
The one snag I hit is that the tool initially missed variables mentioned
in assert()s. I didn't put much time in to investigating this, but I
presume it's because my compile_commands.json was build with assert()s
disabled and so the names mentioned in the predicates were invisible to
clang-llvm-rename. The result was that I ended up with something that
built cleanly with NDEBUG, but not otherwise. I guess this is
essentially the same as the #ifdef'd-out code issue you mentioned, but
its effect is probably more widespread.

I experienced the exact same issue when I was creating the renaming change. The solution was to build lld with `-DCMAKE_BUILD_TYPE=Debug` (or `-DLLVM_ENABLE_ASSERTIONS=On`), which enables assertions so that clang would see code inside ASSERT.

Rui Ueyama via llvm-dev

unread,
Jul 16, 2019, 2:02:22 AM7/16/19
to David Blaikie, David Greene, Rui Ueyama via llvm-dev, nd
On Sat, Jul 13, 2019 at 4:13 AM David Blaikie via llvm-dev <llvm...@lists.llvm.org> wrote:


On Fri, Jul 12, 2019 at 10:31 AM David Greene <d...@cray.com> wrote:
David Blaikie <dbla...@gmail.com> writes:

> Why would enums be less elegant than named boolean constants as you've shown here?

Casting, mainly.  If the parameters were also changed to an enum type
that would be fine too, probably better than inline variables even.

Oh, yeah, I meant changing the parameter type to a custom enum - forcing users to have to use the more legible enum names rather than the inscrutible boolean constants.

FWIW, I found that clang-tidy can fix up /*foo=*/-style comments. I ran the tool and submitted it as https://reviews.llvm.org/rL366177.
It is loading more messages.
0 new messages