Here is a proposal.
- rob, for The Go Team
Introduction
When we started working on Go, we were more concerned with semantics than syntax, but before long we needed to define the syntax in order to write programs. One syntactic idea we tried was to reduce the number of semicolons in the grammar, to make the source code cleaner-looking. We managed to get rid of many of them, but the grammar became clumsy and hard to maintain as we worked on the compiler, and we realized we had overreached. We backed up to a compromise that had optional semicolons in a few places, a couple of rules about where they go, and a tool (gofmt) to regularize them.
Although we acclimated to the rules and were comfortable with them, once we launched it became clear that they were not really satisfactory. The compromise didn't seem to fit right with most programmers. The issue needed to be rethought.
The language BCPL, an ancestor to B, which is an ancestor to C, had an interesting approach to semicolons. It can be summarized in two steps. First, the formal grammar requires semicolons in statement lists. Second, a lexical (not syntactic) rule inserts semicolons automatically before a newline when required by the formal grammar.
In retrospect, had we thought of BCPL's approach at the time, we would almost certainly have followed its lead.
We propose to follow it now.
Appended to this mail is a formal specification of the proposed rules for semicolons in Go. They can be summarized much as in BCPL, but Go is a different language so the detailed rules differ; see the specification for the details. In short:
- In the formal grammar, semicolons are terminators, as in C.
- A semicolon is automatically inserted by the lexer if a line's last token is an identifier, a basic literal, or one of the following tokens: break continue fallthrough return ++ -- ) }
- A semicolon may be omitted before a closing ) or } .
The upshot is this: given the proposal, most required semicolons are inserted automatically and thus disappear from Go programs, except in a few situations such as for loop clauses, where they will continue to separate the elements of the clause.
No more optional semicolons, no more rewriting; they're just gone. You can put them in if you want, but we believe you'll quickly stop and gofmt will throw them away.
The code will look much cleaner.
There are compatibility issues. A few things break with this proposal, but not many. (When processing the .go files under src/pkg with an updated parser, most files are accepted under the old and new rules.)
By far the most important breakage is lexical string concatenation across newlines, as in
"hello "
"world"
Since the language permits + to concatenate strings and constant folding is done by the compilers, this feature is simply gone from the language; use a + to concatenate strings. It's not a huge loss because `` strings can still span newlines.
With the new rules, a semicolon is mistakenly inserted after the last element of a multi-line list if the closing parenthesis or brace is on a separate line:
f(
a,
b
)
To avoid this issue, a trailing comma is now permitted in function calls and formal parameter lists, as is the case with composite literals already.
A channel send spanning a newline can accidentally become a receive, as in
a = b
<-c
Inserting a semicolon after the b changes the meaning from a non-blocking send to an assignment followed by a receive. For this transformation to mislead, however, the types of a, b and c must be very specific: interface{}, chan interface{} and chan. So a program might break but it is almost certain never to succeed incorrectly. We are aware of the risk but believe it is unimportant.
A similar thing can happen with certain function calls:
f()
(g())
If f() returns a function whose single argument matches the type of a parenthesized expression such as (g()), this will also erroneously change meaning. Again, this is so rare we believe it is unimportant.
Finally, a return statement spanning a newline is broken up into two statements:
return
f()
For this to miscompile, the return statement must be in a function with a single named result, and the result expression must be parseable as a statement.
Gofmt's style already avoids all three problematic formattings.
This proposal may remind you of JavaScript's optional semicolon rule, which in effect adds semicolons to fix parse errors. The Go proposal is profoundly different. First, it is a lexical model, not a semantic one, and we believe that makes it far safer in practice. The rules are hard and fast, not subject to contextual interpretation. Second, since very few expressions in Go can be promoted to statements, the opportunities where confusion can arise are also very few - they're basically the examples above. Finally, since Go is statically type-safe, the odds are even lower.
Another language the proposal may evoke is Python, which uses white space for indentation. Again, the story here is very different. Program structure is not defined by white space. Instead, a much milder thing is happening: lists of statements, constants, etc. may be separated by placing them one per line instead of by inserting semicolons. That's all.
Please read the proposal and think about its consequences. We're pretty sure it makes the language nicer to use and sacrifices almost nothing in precision or safety.
Rolling out the change.
Gofmt will be a big help in pushing out this change. Here is the plan.
1. Change gofmt to insert the + for string concatenation. Give it a flag to omit the semicolons but leave them in by default.
2. Reformat the tree with that gofmt: this inserts + for lexical string concatenation but otherwise is a no-op.
3. Update the compilers to insert semicolons. They should then accept gofmt output (with semicolons) and semicolon-free programs just fine.
4. Try things out, revising the specification and tools as required.
5. Once happy, make the gofmt default "no semicolons". Reformat the entire tree.
The formal specification.
The following changes are applied to the spec.
1) New semicolon rules:
a) When the input is broken into tokens, a semicolon is automatically inserted into the token stream at the end of a non-blank line if the line's final token is:
- an identifier or basic literal
- one of the keywords break, continue, fallthrough or return
- one of the tokens ++ -- ) ] }
b) To allow complex statements to occupy a single line, a semicolon may be omitted before a closing ) or }.
2) The interpretation of comments is clarified:
a) Line comments start with the character sequence // and continue through the next newline. A line comment acts like a newline.
b) General comments start with the character sequence /* and continue through the character sequence */. A general comment that spans multiple lines acts like a newline, otherwise it acts like a space.
3) Replacements:
a) All uses of StringLit are replaced by string_lit.
b) All uses of StatementList are replaced by { Statement ";" }.
4) The following productions are simplified and always use semicolons as terminators. In idiomatic use, the semicolons are inserted automatically and thus won't appear in the source code.
StructType = "struct" "{" { FieldDecl ";" } "}" .
InterfaceType = "interface" "{" { MethodSpec ";" } "}" .
ImportDecl = "import" ( ImportSpec | "(" { ImportSpec ";" } ")" ) .
ConstDecl = "const" ( ConstSpec | "(" { ConstSpec ";" } ")" ) .
TypeDecl = "type" ( TypeSpec | "(" { TypeSpec ";" } ")" ) .
VarDecl = "var" ( VarSpec | "(" { VarSpec ";" } ")" ) .
SourceFile = PackageClause ";" { ImportDecl ";" } { TopLevelDecl ";" } .
5) The following productions permit optional commas. This enables multi-line constructions where the closing parenthesis or brace is on a new line if the last element is followed by a comma. The optional comma is only new for parameter lists and calls; composite literals permit it already.
Parameters = "(" [ ParameterList [ "," ] ] ")" .
CompositeLit = LiteralType "{" [ ElementList [ "," ] ] "}" .
Call = "(" [ ExpressionList [ "," ] ] ")" .
6) The following productions are gone since they are not needed anymore with the simplified productions outlined in 3) and 4).
StringLit
StatementList
Separator
FieldDeclList
MethodSpecList
ImportSpecList
ConstSpecList
TypeSpecList
VarSpecList
7) The two exceptions about when semicolons may be omitted in statement lists are superseded by 1b above.
Prime sieve example without trailing semicolons
package main
import "fmt"
// Send the sequence 2, 3, 4, ... to channel 'ch'.
func generate(ch chan<- int) {
for i := 2; ; i++ {
ch <- i // Send 'i' to channel 'ch'.
}
}
// Copy the values from channel 'src' to channel 'dst',
// removing those divisible by 'prime'.
func filter(src <-chan int, dst chan<- int, prime int) {
for i := range src { // Loop over values received from 'src'.
if i%prime != 0 {
dst <- i // Send 'i' to channel 'dst'.
}
}
}
// The prime sieve: Daisy-chain filter processes together.
func sieve() {
ch := make(chan int) // Create a new channel.
go generate(ch) // Start generate() as a subprocess.
for {
prime := <-ch
fmt.Print(prime, "\n")
ch1 := make(chan int)
go filter(ch, ch1, prime)
ch = ch1
}
}
func main() {
sieve()
}