code formatting

25 views
Skip to first unread message

Bob Carpenter

unread,
Mar 20, 2015, 2:40:26 AM3/20/15
to stan...@googlegroups.com
The Google style guide:

http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Return_Values

which is what cpplint.py tries to follow, doesn't go into
a huge amount of detail on all the things it's popping up.

It's definitely worth reading.

My thoughts appended.

- Bob

I'm only going to list things I disagree with or that might
cause us problems.

Unmamed namespaces
----------------------
* no anonymous namespaces in .hpp files; we were going to get rid
of all of them anyway, so this is OK by me

Named namespaces
------------------------
* Don't want any indentation inside namespaces, which is a big change
and not how emacs does it, though it will help keeping < 80 chars; I'm
actually against this as it'll mess up all my work in unless someone
knows how to fix that

* They want the // namespace thing at the end --- it's not in their
notes, but is in their examples; I think it just adds clutter but am
OK with it if everyone else likes it

Nonmember functions
-----------------------
* We'll have trouble with "Nonmember functions should not depend on
external variables" due to our use of globals for memory management

Static and Global
-------------------------
* Not sure about how this impacts our memory management:

"Static or global variables of class type are forbidden: they cause
hard-to-find bugs due to indeterminate order of construction and
destruction. However, such variables are allowed if they are
constexpr: they have no dynamic initialization or destruction."

What they suggest as a workaround is this:

"If you need a static or global variable of a class type, consider
initializing a pointer (which will never be freed), from either your
main() function or from pthread_once()."

The issue with that is that it'll have to be done in the interfaces
(PyStan, RStan, CmdStan).


* Also not sure how we're doing with this (though I can live with it):

"Objects with static storage duration, including global variables,
static variables, static class member variables, and function static
variables, must be Plain Old Data (POD): only ints, chars, floats, or
pointers, or arrays/structs of POD."

It explicitly forbids static std::string as class static (not that
we use many of those).

Exceptions
-----------------------
* These are forbidden, which is nonsense.

Explicit Constructors
-------------------
* This won't work for our use of autodiff, but I agree it's a good
idea where feasible:

"Use the C++ keyword explicit for constructors callable with one
argument."

I don't quite think our uses match their exception:

"Classes that are intended to be transparent wrappers around other
classes are also exceptions."

Structs vs. Classes
---------------------------
* I think this is a good convention:

"Use a struct only for passive objects that carry data; everything
else is a class."

but I think it'll require a lot of updating in our code where we
just use struct to avoid having to say pubilc.



Inheritance
----------------------------
* I'm also OK with

"Limit the use of protected to those member functions that might need
to be accessed from subclasses. Note that data members should be
private."

but we have a lot of protected stuff, especially in vari, that'd need
to be updated.

* I think this would be good:

"For clarity, use exactly one of override, final, or virtual when
declaring an override"

Note that override is C++11, so we need to use virtual until then.


Access Control
-------------------
* This came up earlier, but it's worth saying again, because we're not
doing it now:

"Make data members private, and provide access to them through
accessor functions as needed"

Declaration Order
-----------------------
* This one seems totally backward to me from my Java background, and
unlike other rules, they provide no pro/con justification:

"Use the specified order of declarations within a class: public:
before private:, methods before data members (variables), etc."

It would require every class I ever wrote with privates to be
reordered.

* I'm not sure we're following this:

"Friend declarations should always be in the private section."

but I think it only comes up in autodiff operator methods.

* I'm OK with the rest of the ordering:

- Typedefs and Enums
- Constants (static const data members)
- Constructors
- Destructor
- Methods, including static methods
- Data Members (except static const data members)



cpplint
------------------
* As much as I hate the idea of having to hack to the code for
tools, this could be handy given that we're going down the
all-lint-all-the-time route:

"False positives can be ignored by putting // NOLINT at the end
of the line or // NOLINTNEXTLINE in the previous line."

Reference Arguments
--------------------------
* I strongly disagree with this one, which they claim is a "very
strong convention in Google code"

"All parameters passed by reference must be labeled const."

They want us to use pointers, the reason being:

"References can be confusing, as they have value syntax but pointer
semantics."

I think we all know how the language works w.r.t. references.


Default Arguments
----------------------
* I could live with this:

"We do not allow default function parameters, except in limited
situations"

They want you to overload instead. The default winds up being copied
at every call site, so it does cause object code bloat. Another con
they list is that it messes up function pointers, which we're not
using.


"One specific exception is when the function is a static function (or
in an unnamed namespace) in a .cc file. In this case, the cons don't
apply since the function's use is so localized."

"Another specific exception is when default arguments are used to
simulate variable-length argument lists."

Exceptions
--------------------------------------
* Total malarky:

"We do not use C++ exceptions."

Run-time Type Information
---------------------------------------
* They don't like it:

"Avoid using Run Time Type Information (RTTI)."

In general, I agree, but I really needed it for the exception handling
with line numbers. I don't think we're using dynamic casts elsewhere,
or at least won't be after the command refactor. And the one place
they say you may need it is with existing type hierarchies (and I'm
using it for std::exception, which I can't modify). I couldn't work
around it without a whole lot of code bloat, but it could be done.


Casting
----------------------
* I agree with this, and we should get rid of the (..) C-style casts.
They aren't just static casts.

"Use C++ casts like static_cast<>(). Do not use other cast formats
like int y = (int)x; or int y = int(x);"


Streams
------------------
* More complete bollocks (though it is amended with the fact that
they've had extensive debate on the subject):

"Use streams only for logging."

They want printf instead, which is rife with runtime error
possiblities, as we've bumped up against.

Preincrement and decrement
------------------------
* Agreed, but we need to make many changes

"Use prefix form (++i) of the increment and decrement operators with
iterators and other template objects."

The argument is efficiency.

Const
---------------
* I think we need more of

"Declare methods to be const whenever possible."


Inline Functions
--------------------
* I think this is because they're inline by default:

"Do not put large method definitions inline in the class definition.
Usually, only trivial or performance-critical, and very short, methods
may be defined inline."

I don't think that's a big deal though --- the compiler really
decides what's inline.

Integer Types
---------------------
* I think we're OK with this in most places:

"Of the built-in C++ integer types, the only one used is int."

* I couldn't make this work with var constructors because of Windows
and that __float128 thing, but maybe someone else could give it a go:

"If a program needs a variable of a different size, use a precise-width
integer type from <stdint.h>, such as int16_t"

Unsigned Integers
-----------------------
* Although I agree with the motivation to avoid subtle buts, this
messes up our other error checking that Daniel's been so careful to
stomp (signed vs. unsigned comparisons):

"So, document that a variable is non-negative using assertions. Don't
use an unsigned type."

0 and null
-----------------
* We don't use a lot of pointers, but often use 0 for floating point;
I'd be OK cleaning it up (NULL for us C++03-ers):

"Use 0 for integers, 0.0 for reals, nullptr (or NULL) for pointers,
and '\0' for chars."

Braced Inits
-------------------
* Who knew?

"In C++03, aggregate types (arrays and structs with no constructor)
could be initialized with braced initializer lists.

struct Point { int x; int y; };
Point p = {1, 2};
"

Alas, it's not in general form until C++11, so it'd probably just get
us into trouble. When we hit C++11, we'll be able to do this:

vector<int> v = {1, 2}; // Good -- v starts initialized.
for (int i : {-1, -2, -3})

I really really want to use C++11!

Template Metaprogramming
--------------------------
* Too late for this:

"Avoid complicated template programming."

There is wisdom in

"Think about whether the average member of your team will be able to
understand your code well enough to maintain it."

:-) But then they list Boost Spirit as an exception, which is where
the real pain is. And they especially hate enable_if (or other SFINAE
patterns).


Boost
--------------------
* Of course, we use none of their approved lists, but lots of other
ones:
"Use only approved libraries from the Boost library collection."

General naming rules
----------------------
* I agree with this up to a point:

"Function names, variable names, and filenames should be descriptive;
eschew abbreviation."

They do approve of common abbreviations, though.


File names
---------------
* No, I like Boost and Eigen here, because they follow C++ rather
than C conventions:

"C++ files should end in .cc and header files should end in .h"

* I agree here and think we've been pretty good at following it:

"In general, make your filenames very specific. For example, use
http_server_logs.h rather than logs.h. A very common case is to have a
pair of files called, e.g., foo_bar.h and foo_bar.cc, defining a class
called FooBar."

Type Names
-----------------
* We went with Boost and Stroustroup, not Eigen and Google:

"Type names start with a capital letter and have a capital letter for
each new word, with no underscores: MyExcitingClass, MyExcitingEnum."

though we haven't been entirely consistent (e.g., VectorView).

Variable Names
--------------------
* I think we do this pretty much everywhere and if not, we should fix it:

"The names of variables and data members are all lowercase, with
underscores between words. Data members of classes (but not structs)
additionally have trailing underscores."

They want a strong class vs. struct distinction, by the way, with
structs acting like C structs (nothing but the variable declarations).
We don't have many of those other than in the AST, I think, but I went
with the traling underscores and like the consistency.

Constant Names
-------------------
* We went with the all-caps convention, not this:

"Use a k followed by mixed case, e.g., kDaysInAWeek, for constants
defined globally or within a class."

Function Names
---------------------
* Again, we followed Stroustroup, not Google:

"Regular functions have mixed case; accessors and mutators match the
name of the variable: MyExcitingFunction(), MyExcitingMethod(),
my_exciting_member_variable(), set_my_exciting_member_variable()."

I think it's too late to change all of this as it'd be a massive
change. And I really hate that they capitalize --- they should at
least follow Java's camel case, which would be "myExcitingFunction()"
to distinguish from types.

Namespace Names
-----------------------
* I think we're OK with

"Namespace names are all lower-case"

Enumerator Names
--------------------
* I also think we're OK with

"Enumerators should be named either like constants or like macros:
either kEnumName or ENUM_NAME.

because we followed the usual upper-case convention.

Comment Style
---------------------
* I disagree with

"Use either the // or /* */ syntax, as long as you are consistent."

and believe we should always use // other than for doc comments. The
problem with /* ... */ is that you can't comment it out when you're
debugging.

File Comments
------------------------
* Do we want to do this?

"Start each file with license boilerplate, followed by a description
of its contents."

I just hate all the boilerplate.

They make author lines optional, and I think we should stay away from
them. We have Git blame, after all.

* Just so no to this kind of redundancy:

"Every file should have a comment at the top describing its contents."

I also believe talks and papers shouldn't be organized as "here's what
I'm going to tell you, now I'm telling you, that's what I told you"
structure. I like reinforcing concepts, but not as a knee-jerk rule.
It's just soooo boring. Follow the authorial rule: show, don't tell!

Class Comments
----------------------
* We need to get much better at this; I've been trying to do that
with just about everything for a while now and we should flag anything
that doesn't do it in code review if cpplint doesn't:

"Every class definition should have an accompanying comment that
describes what it is for and how it should be used."

We should also have @tparam comments for all the template parameters.

Function Comments
-------------------
* They must have had a linguist or old-school grammarian on board,
though I agree here:

"Every function declaration should have comments immediately preceding
it that describe what the function does and how to use it. These
comments should be descriptive ("Opens the file") rather than
imperative ("Open the file"); the comment describes the function, it
does not tell the function what to do."

Their general guidelines are also good:

"Types of things to mention in comments at the function declaration:

- What the inputs and outputs are.

- For class member functions: whether the object remembers reference arguments beyond the duration of the method call, and whether it will free them or not.

- If the function allocates memory that the caller must free.

- Whether any of the arguments can be a null pointer.

- If there are any performance implications of how a function is used.

- If the function is re-entrant. What are its synchronization
assumptions?"

* I"m also a strong believer for doc in their injunction:

"However, do not be unnecessarily verbose or state the completely
obvious"

Class data members
------------------------
* I also think we should do more of this:

"Each class data member (also called an instance variable or member
variable) should have a comment describing what it is used for."

Implementation comments
------------------------
* I think we've been pretty good about this, too; I strongly agree:

"In your implementation you should have comments in tricky,
non-obvious, interesting, or important parts of your code."

And their clarification:

"Note that you should never describe the code itself. Assume that the
person reading the code knows C++ better than you do, even though he
or she does not know what you are trying to do"

They give an an anti-example of what not to do:

"
// Now go through the b array and make sure that if i occurs,
// the next element is i+1.
"


Self-documenting constants
---------------------------
* I also think we should do this:

"When you pass in a null pointer, boolean, or literal integer values
to functions, you should consider adding a comment about what they
are, or make your code self-documenting by using constants."

Their example is to do either:

"
bool success = CalculateSomething(interesting_value,
10, // Default base value.
false, // Not the first time we're calling this.
NULL); // No callback.
"

or

"
const int kDefaultBaseValue = 10;
const bool kFirstTimeCalling = false;
Callback *null_callback = NULL;
bool success = CalculateSomething(interesting_value,
kDefaultBaseValue,
kFirstTimeCalling,
null_callback);
"

TODO Comments
------------------
* I'd be OK changing FIXME to TODO:

"Use TODO comments for code that is temporary, a short-term solution,
or good-enough but not perfect."

Formatting
-------------------------

Hey, they have settings file for emacs!

http://google-styleguide.googlecode.com/svn/trunk/google-c-style.el

Line Length
------------------
* I would like to see this in our code:

"Each line of text in your code should be at most 80 characters long."

It makes it much easier to see in code review. Keep in mind you can
chop up string literals, so that ("abc" "de") is the same as "abcde",
even if you put the two literals on different lines.

Header guards are excepted.

Spaces and Tabs
-----------------
* Yes, please:

"Use only spaces, and indent 2 spaces at a time."

Function Declarations
----------------------
* I'm OK ignoring this one if the return type comes first in
order to follow the 80-characters rule more easily:

"Return type on the same line as function name, parameters on the same
line if they fit."

We should definitely follow the rest of their formatting rules here.

Function Arguments
------------------
* We need to fix some of these:

"Arguments may optionally all be placed on subsequent lines with a
four space indent:

DoSomething(
argument1, argument2, // 4 space indent
argument3, argument4);
"

* They don't mention it here, but space after the comma is OK by me,
but we have lots of changes to make

Conditionals (and Loops have the same rules)
---------------------------------------------
* Yes, please:

"
if (condition) { // no spaces inside parentheses
... // 2 space indent.
} else if (...) { // The else goes on the same line as the closing brace.
...
} else {
...
}
"

* And extra goodness:

"
if(condition) { // Bad - space missing after IF.
if (condition){ // Bad - space missing before {.
if(condition){ // Doubly bad.
if (condition) { // Good - proper space after IF and before {.
"

* I also agree with their one-liner rules:

"Short conditional statements may be written on one line if this
enhances readability. You may use this only when the line is brief and
the statement does not use the else clause. ... This is not allowed
when the if statement has an else..."

* They're even with me on this controversial one:

"In general, curly braces are not required for single-line statements,
but they are allowed if you like them;"

and

"However, if one part of an if-else statement uses curly braces, the
other part must too:"

Pointer and Reference Expressions
----------------------------------
* I concur, but prefer space following:

// These are fine, space preceding.
char *c;
const string &str;

// These are fine, space following.
char* c; // but remember to do "char* c, *d, *e, ...;"!
const string& str;

char * c; // Bad - spaces on both sides of *
const string & str; // Bad - spaces on both sides of &

Boolean Expressions
-------------------------
* They allow wraps either way, but I strongly prefer the
typesetting convention of operators initial in lines, so
prefer

if (this_one_thing > this_other_thing
&& a_third_thing == a_fourth_thing
&& yet_another && last_one) {
...
}

to


if (this_one_thing > this_other_thing &&
a_third_thing == a_fourth_thing &&
yet_another && last_one) {
...
}

Return Values
-------------------
* Take that, R:

"Do not needlessly surround the return expression with parentheses."

(Of course, you need the parens in R, so they're not needless, if I
may be permitted to not follow the no double negative rule.)

Class Format
------------------
* This is unconventional for C++, but I'm OK with it. But why 1 space?

Sections in public, protected and private order, each indented one space.

The basic format for a class declaration (lacking the comments, see Class Comments for a discussion of what comments are needed) is:

class MyClass : public OtherClass {
public: // Note the 1 space indent!
MyClass(); // Regular 2 space indent.
explicit MyClass(int var);
~MyClass() {}

void SomeFunction();
void SomeFunctionThatDoesNothing() {
}

void set_some_var(int var) { some_var_ = var; }
int some_var() const { return some_var_; }

private:
bool SomeInternalFunction();

int some_var_;
int some_other_var_;
};

Note the line spacing around public and private (none before public,
none after either one).

I tend to put the privates up front, but everything could be shifted
and I'd be OK following this order.

Initializer Lists
--------------------
* I think we're all over the place on this one, but I'm happy to clean
up and follow their rules:

Constructor initializer lists can be all on one line or with subsequent lines indented four spaces.

"There are two acceptable formats for initializer lists:

// When it all fits on one line:
MyClass::MyClass(int var) : some_var_(var), some_other_var_(var + 1) {}
or

// When it requires multiple lines, indent 4 spaces, putting the colon on
// the first initializer line:
MyClass::MyClass(int var)
: some_var_(var), // 4 space indent
some_other_var_(var + 1) { // lined up
...
DoSomething();
...
}
"


Namespace Formatting
---------------------

* I'm OK with it, but it'd literally change almost every one of our
code files:

"Namespaces do not add an extra level of indentation."

* I do prefer

"When declaring nested namespaces, put each namespace on its own
line."


Hozo Whitespace
---------------
* I agree with Daniel's fix for this and following the rules:

"Never put trailing whitespace at the end of a line."

They say it makes merging a pain, which is why this update's going to
be painful.

Operator Space
----------------
* Yes, again, particularly the no padding within parens and spaces for
everything but factors (i.e. (a*b) or (a/b), but not (a + b)).

"
// Assignment operators always have spaces around them.
x = 0;

// Other binary operators usually have spaces around them, but it's
// OK to remove spaces around factors. Parentheses should have no
// internal padding.
v = w * x + y / z;
v = w*x + y/z;
v = w * (x + z);

// No spaces separating unary operators and their arguments.
x = -5;
++x;
if (x && !y)
...
"

Vertical Whitespace
---------------------------
* I think this is important:

"Minimize use of vertical whitespace.

This is more a principle than a rule: don't use blank lines when you
don't have to. In particular, don't put more than one or two blank
lines between functions, resist starting functions with a blank line,
don't end functions with a blank line, and be discriminating with your
use of blank lines inside functions."



Ben Goodrich

unread,
Mar 20, 2015, 10:19:10 AM3/20/15
to stan...@googlegroups.com
On Friday, March 20, 2015 at 2:40:26 AM UTC-4, Bob Carpenter wrote:

File names
---------------
* No, I like Boost and Eigen here, because they follow C++ rather
than C conventions:

"C++ files should end in .cc and header files should end in .h"

Our convention is actually somewhat important because make test-headers looks for all .hpp files, allowing us to use .h for headers that are not intended to execute anything on their own like src/stan/lang/function_signatures.h.
 

* I agree here and think we've been pretty good at following it:

"In general, make your filenames very specific. For example, use
http_server_logs.h rather than logs.h. A very common case is to have a
pair of files called, e.g., foo_bar.h and foo_bar.cc, defining a class
called FooBar."

But don't go too crazy because apparently it is less than fully portable to have a (full) path name be more than 100 bytes. We actually ran into this problem trying to package example-models for RStan but in the end, they wouldn't let us include example-models anyway so that has to be downloaded from GitHub.

Ben
Reply all
Reply to author
Forward
0 new messages