Go Language Survey

341 views
Skip to first unread message

Michael Jones

unread,
Jun 12, 2019, 9:08:44 AM6/12/19
to golang-nuts
I've been working on a cascade of projects, each needing the next as a part, the most recent being rewriting text.Scanner. It was not a goal, but the existing scanner does not do what I need (recognize Go operators, number types, and more) and my shim code was nearly as big as the standard library scanner itself, so I just sat down an rewrote it cleanly.

To test beyond hand-crafted edge cases it seemed good to try it against a large body of Go code. I chose the Go 1.13 code base, and because the results are interesting on their own beyond my purpose of code testing, I thought to share what I've noticed as a Github Gist on the subject of the "Go Popularity Contest"—what are the most used types, most referenced packages, most and least popular operators, etc. The data are interesting, but I'll let it speak for itself. Find it here:


Michael

P.S. Generated by go test. I just cut off the "passed" line and posted it. ;-)

--
Michael T. Jones
michae...@gmail.com

Ian Lance Taylor

unread,
Jun 12, 2019, 9:49:29 AM6/12/19
to Michael Jones, golang-nuts
On Wed, Jun 12, 2019 at 6:08 AM Michael Jones <michae...@gmail.com> wrote:
>
> I've been working on a cascade of projects, each needing the next as a part, the most recent being rewriting text.Scanner. It was not a goal, but the existing scanner does not do what I need (recognize Go operators, number types, and more) and my shim code was nearly as big as the standard library scanner itself, so I just sat down an rewrote it cleanly.
>
> To test beyond hand-crafted edge cases it seemed good to try it against a large body of Go code. I chose the Go 1.13 code base, and because the results are interesting on their own beyond my purpose of code testing, I thought to share what I've noticed as a Github Gist on the subject of the "Go Popularity Contest"—what are the most used types, most referenced packages, most and least popular operators, etc. The data are interesting, but I'll let it speak for itself. Find it here:
>
> https://gist.github.com/MichaelTJones/ca0fd339401ebbe79b9cbb5044afcfe2

Pretty interesting. Thanks.

I note that "goto" is more common than "select". That has to be an
artifact of the code base.

Ian

Michael Jones

unread,
Jun 12, 2019, 10:37:36 AM6/12/19
to Ian Lance Taylor, golang-nuts
Yes, quite a bit there of note:

More hex numbers than decimal. That was a surprise.

The "0o" prefixed octal number literals just went in in the last two days. I think a gofix rewrite might be in order--it is more explicit and less vulnerable to mistake than the legacy 0377 style--despite a whole career of the other from the PDP-8 onward.

128640 If statements, and just 8034 else, a 16:1 ratio. I'd like to understand this better, but my test is just a lexer and not a parser so I don't have much context to draw conclusions about this specifically. However it seems such a huge ratio that switch must be carrying the load. There are 5024 switch statements and 24903 cases, so that's "4 virtual elses" per switch in some broad sense.

Type is well used. 2x as much as struct, so even if half are "type X struct {...}" the other half are not.

The import frequency is not interpretable beyond "every package imports something" because of the inner list in imports as typically written and that's available to a parser but not at the lexical level. I could hack it to look at lines between the parens after import, but that's beyond the "test the lexer" goal.

The default to switch ratio is high, and there is at most one per, so this means 60% of switch statements have a default, or by extension, express if-then-else if-else-if-else logic with a final clauseless else.

Few fallthrough statements and that's natural. There were few cases outside Duff's Device where C/C++ code is allowed to fall through by default. [Having just written a lexer, I can share that would I think be much better than the switch's fallthrough would be a way to say, "now that I'm in this case, I've changed my mind and want to PROCEED with the case testing starting with the next case." That would be swell and there is no way to do it.)

Byte by that name is quite popular compared to its other identity, uint8. [Opinion: never have been comfortable about this one. There are no living machines with non 8-bit bytes, so the generality of "byte is the natural size for a byte" is a stretch here, as would be 1 for the size of a bit.]

Using true is almost 3x false. I wonder if that is natural or if the default value of zero/false is behind it.

Lots of panics and not many recovers.

Operators are interesting. nearly 2.5x the != than ==. Perhaps "err != nil" is the story. A lot more < than >, which is curious...presumably from for loops, but can't tell at the token level. Way more left shifts than right. Not true in my own code, so interesting. Some disjunctive/conjunctive dissonance: 3x the && as ||. 7 x the ++ as --. I guess people like to count up, even when counting down has the advantage of the expensive load being done once and the test being against zero. 5x += than -=, surprising. Pretty low incidence of &^ and &^=, not true in my code at all, so I suppose the BIC "Bit Clear" of PDP-11 is not a meme. That's surprising to me: a|=b, sets the b bits in a, a&^b, clears the b bits in a. They are a team, yet | is  50x the usage of &^. Maybe because C did not have it and C++ copied and Java copied and people have not understood? These should be peers. /= is not popular, and %= even less so. I use them both but I may be the only one (there is some code like this in Big from way back.)

External references shows that the Go team writes lots of tests, and that unsafe is wildly popular.

The most popular character constant is '0' 3x '9' so it's not all ('0' <= ch && ch <= '9') ... there are some extra '0's in there.

Bakul Shah

unread,
Jun 12, 2019, 10:56:37 AM6/12/19
to Michael Jones, Ian Lance Taylor, golang-nuts
On Jun 12, 2019, at 7:36 AM, Michael Jones <michae...@gmail.com> wrote:
>
> 128640 If statements, and just 8034 else, a 16:1 ratio. I'd like to understand this better,

There are two patterns that encourage this:

x := v1
if someCond { x = v2 }

And

if someCond { return ... }

The second pattern acts sort of as a filter. Which is useful (fewer choices left).

The first pattern is due to a lack of C’s ?: construct. [In light of that it is amusing to see try as a “function” being proposed that even does a conditional return!]

Michael Jones

unread,
Jun 12, 2019, 11:25:19 AM6/12/19
to Bakul Shah, Ian Lance Taylor, golang-nuts
Bakul, these are good points.

On the second, I used to always write (C/C++):

If (things are good) {
    happy case
} else {
    sad case
}

so the nesting was that the all-good was the first code block even when multiply nested and the exceptions came later. A blunt kind of literate programming that wants to talk about the 99% case first and the weird things later. In Go I've converted to talk about problems all the way to the bitter end and then when you run out of problems, do the job. Now that you point it out, "else" falls by the wayside in this situation because it is else when you did not return already. Each Go-style if-err-die "clause" is consuming an else. Thank you. I had not thought of it so clearly.

The first is a good point too. The debate about the ternary operator might be better viewed as a completion of short variable declaration. Block scope means is not possible to write...

if b {
    x := 1
} else {
    x := 2
}
:
use X

...so the benefits of short variable declaration are lost to a choice between redundancy as you show:

x := 1
if b {
    x = 2
}

which hides the intent -- the if b is the intent and that's not evident from x := 1. The intent is "x := either 1 or 2 depending" and that is well expressible only when the := is in the outer scope and the "1 or 2" are expressions from an inner one being transported across the lexical boundary by an operator -- the "depending" operator whatever its name or form might be.

The pro-? forces might better petition under the "make short assignments complete" banner.

x := if b {1} else {2}
x := switch b {case true: 1; case false: 2}
x := b ? 1 : 2

or, my dream scenario, which is just a dream because of too many moving parts to sell all at once:

allow bool to int/uint casting with obvious meaning T=>1, F=>0
allow "implicit" types in literal definitions like this:

x := {1,2}[int(b)]

...or even if one could dare hope for automatic coercion of the harmless bool...

x := {1,2}[b]

as in 

workers := {runtime.NumCPU(),1}[*ParallelMode]

...oh to dream.


roger peppe

unread,
Jun 12, 2019, 11:43:13 AM6/12/19
to Michael Jones, golang-nuts
I wonder whether the Go 1.13 code base is representative of Go in the wild. It might be interesting to see the results when run on the code in the Go corpus. https://github.com/rsc/corpus

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CALoEmQzK3v4V%2BRYWZorfX0Nst%2BOXwzP9e51mCDm2ri-4jfzXyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Lucio

unread,
Jun 12, 2019, 11:50:47 AM6/12/19
to golang-nuts


On Wednesday, 12 June 2019 17:25:19 UTC+2, Michael Jones wrote:
Bakul, these are good points.


Nice work, Michael; nice comments, Bakul.

It's nice when the philosophy behind intuition is given some solidity. I keep hoping to see more of Dijkstra's "A Discipline of Programming" leaking into the Go psyche, but there have been a few too many diversions that may never be reversed.

Personally, I see exactly zero benefit in x++ or x--, now that these are more clearly represented as x += 1 and x -= 1. I have discarded them from my idioms. But that's just me. I'm hoping no one will find it necessary to edit my code to correct this :-).

Lucio.

Bakul Shah

unread,
Jun 12, 2019, 12:24:54 PM6/12/19
to Michael Jones, golang-nuts
On Jun 12, 2019, at 8:24 AM, Michael Jones <michae...@gmail.com> wrote:
>
> Bakul, these are good points.
>
> On the second, I used to always write (C/C++):
>
> If (things are good) {
> happy case
> } else {
> sad case
> }
>
> so the nesting was that the all-good was the first code block even when multiply nested and the exceptions came later. A blunt kind of literate programming that wants to talk about the 99% case first and the weird things later. In Go I've converted to talk about problems all the way to the bitter end and then when you run out of problems, do the job. Now that you point it out, "else" falls by the wayside in this situation because it is else when you did not return already. Each Go-style if-err-die "clause" is consuming an else. Thank you. I had not thought of it so clearly.

It's just a different style. Here you are chopping off "sad cases" until
you are left with the big fat happy case! And by returning ASAP for the
sad cases, you reduce indentation quite a bit, which helps readability.

>
> The first is a good point too. The debate about the ternary operator might be better viewed as a completion of short variable declaration. Block scope means is not possible to write...
>
> if b {
> x := 1
> } else {
> x := 2
> }
> :
> use X
>
> ...so the benefits of short variable declaration are lost to a choice between redundancy as you show:
>
> x := 1
> if b {
> x = 2
> }
>
> which hides the intent -- the if b is the intent and that's not evident from x := 1. The intent is "x := either 1 or 2 depending" and that is well expressible only when the := is in the outer scope and the "1 or 2" are expressions from an inner one being transported across the lexical boundary by an operator -- the "depending" operator whatever its name or form might be.

You can almost speed-read straight line code but as soon as you
encounter if or switch (or other control flow changing part) you
have to stop and regroup. This is why (for me)

x := b? 2 : 1

is far easier to read than either

var x int
if b {
x = 2
} else {
x = 1
}

or worse,

x := 1
if b {
x = 2
}


> x := {1,2}[b]

This requires evaluating both alternatives. This may not
even be possible:

x := p == nil? 10 : p.val

or may had extra side-effects:

x := {f1(), f2()}[b]

This can also a problem with

x := f1()
if b { x = f2() }



Michael Jones

unread,
Jun 12, 2019, 2:42:28 PM6/12/19
to Bakul Shah, golang-nuts
Roger, here's the same thing, but for Russ's corpus v0.01:


I've been comparing the two side by side and it's fascinating.

Bakul, more good arguments. I have another motivation in the "?" world that I've not argued because it is personal/not general, but a decade ago I had two detached retinas, surgeries, and imperfect recovery. Part of my vision that I lost is just below the center...maybe -15 degrees to -40 degrees. The brain knows when I want to see things things there and moves the eyes around to gather that part of the visual field. This "hunting" is tiring of the muscles and causes issues. left-to-right density is easy for me, vertical is very bad. Your:

x := b? 2 : 1 

is instantaneous, a sight read; while the:

var x int
if b {
  x = 2
} else {
  x = 1
}

and


x := 1
if b {
  x = 2
}


...feel like climbing Everest. It is hard to explain the difficulty. Fortunately it is not a widespread problem. Certainly not Go's problem, but I'd pay double for a "wide" mode where gofmt tolerated "var x int; if b { x = 2 } else { x = 1 }". In fact, now that i've just seen this, I am going to make a local version and hook it to vs code. Why did I not think of this before! Wow.

Peter Weinberger (温博格)

unread,
Jun 12, 2019, 4:21:56 PM6/12/19
to Michael Jones, Bakul Shah, golang-nuts
I agree that ? for simple choices is nice. But my C experience with nested ?s and with long expressions for one or both branches has not been nice. The mandatory {}s make Go's nested ifs more readable (but vertical).

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Volker Dobler

unread,
Jun 12, 2019, 4:51:00 PM6/12/19
to golang-nuts
Cool work!

What I found most astonishing on a first look: Not all
parentheses ( are closed: 4 ) seem to be missing??
For { 5 are unclosed while there is one more ] than [ ?

Are you parsing testfiles with deliberate errors?

V. 

Michael Jones

unread,
Jun 12, 2019, 5:14:25 PM6/12/19
to Volker Dobler, golang-nuts
They matched up until yesterday. When I updated at 2am California time it changed. It also had no "0o" octal literals up until the latest.

I'd planned to joke how the race was on to be the first to check in a new octal literal in my mail, but  a few of those snuck in too.

Yesterday:
Count | Frequency | Detail
---:|---:|---
  929548 | 19.7889% | ,  
  574886 | 12.2386% | .  
  544819 | 11.5985% | (  
  544819 | 11.5985% | )  

  352547 | 7.5053% | {  
  352547 | 7.5053% | }  

  288042 | 6.1321% | =  
  253563 | 5.3980% | :  
  155297 | 3.3061% | :=  
  138465 | 2.9478% | [  
  138465 | 2.9478% | ]  

  78567 | 1.6726% | !=  
  72007 | 1.5329% | *  

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Michael Jones

unread,
Jun 12, 2019, 5:48:36 PM6/12/19
to Volker Dobler, golang-nuts
Volker, did you see a few posts back that I did the run Roger asked about, on RSC’s huge corpus? It is about 10x the size and its parens, braces, and brackets match just fine, all 7476284 of them....

Dan Kortschak

unread,
Jun 12, 2019, 9:53:07 PM6/12/19
to Michael Jones, Bakul Shah, golang-nuts
This is interesting. I have exactly the opposite situation; up-down is
much easier than significant left-right because of faulty saccades.

Volker Dobler

unread,
Jun 13, 2019, 1:27:16 AM6/13/19
to golang-nuts

On Wednesday, 12 June 2019 23:48:36 UTC+2, Michael Jones wrote:
Volker, did you see a few posts back that I did the run Roger asked about, on RSC’s huge corpus? It is about 10x the size and its parens, braces, and brackets match just fine, all 7476284 of them....

If I remember the corpus was curated to be buildable, but on
the other hand the Go 1.13 codebase in master should be
buildable always too, anytime. Weird.

V.
 

On Wed, Jun 12, 2019 at 2:13 PM Michael Jones <michae...@gmail.com> wrote:
They matched up until yesterday. When I updated at 2am California time it changed. It also had no "0o" octal literals up until the latest.

I'd planned to joke how the race was on to be the first to check in a new octal literal in my mail, but  a few of those snuck in too.

Yesterday:
Count | Frequency | Detail
---:|---:|---
  929548 | 19.7889% | ,  
  574886 | 12.2386% | .  
  544819 | 11.5985% | (  
  544819 | 11.5985% | )  

  352547 | 7.5053% | {  
  352547 | 7.5053% | }  

  288042 | 6.1321% | =  
  253563 | 5.3980% | :  
  155297 | 3.3061% | :=  
  138465 | 2.9478% | [  
  138465 | 2.9478% | ]  

  78567 | 1.6726% | !=  
  72007 | 1.5329% | *  
On Wed, Jun 12, 2019 at 1:51 PM Volker Dobler <dr.volk...@gmail.com> wrote:
Cool work!

What I found most astonishing on a first look: Not all
parentheses ( are closed: 4 ) seem to be missing??
For { 5 are unclosed while there is one more ] than [ ?

Are you parsing testfiles with deliberate errors?

V. 

On Wednesday, 12 June 2019 15:08:44 UTC+2, Michael Jones wrote:
I've been working on a cascade of projects, each needing the next as a part, the most recent being rewriting text.Scanner. It was not a goal, but the existing scanner does not do what I need (recognize Go operators, number types, and more) and my shim code was nearly as big as the standard library scanner itself, so I just sat down an rewrote it cleanly.

To test beyond hand-crafted edge cases it seemed good to try it against a large body of Go code. I chose the Go 1.13 code base, and because the results are interesting on their own beyond my purpose of code testing, I thought to share what I've noticed as a Github Gist on the subject of the "Go Popularity Contest"—what are the most used types, most referenced packages, most and least popular operators, etc. The data are interesting, but I'll let it speak for itself. Find it here:


Michael

P.S. Generated by go test. I just cut off the "passed" line and posted it. ;-)

--
Michael T. Jones
michae...@gmail.com

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.

Michael Jones

unread,
Jun 13, 2019, 3:01:59 AM6/13/19
to Volker Dobler, golang-nuts
The "src" subdirectory of go does balance, but building every ".go" file in ./go does lose the balance. I sensed your discomfort so I've changed my plans a little to take as day to make a flexible command line tool out of my go-test so that with it's verbose mode you'll know which files have issues. I am here working on it at this very moment...

// Survey gathers and reports simple statistics about Go code by lexical analysis
// Author: Michael Jones
//
// Survey files named in the file list of the "-f" argument and then those listed in command line
// arguments. Files may be ".go" files or directories. If a named file is a directory then all ".go"
// files in that directory are surveyed without considering subdirectories. With the "-r" flag,
// named directories are processed recursively, eventually finding and surveying each ".go" file in
// that hierarchy. The verbose argument requests details of individual file processing and
// file system traversal, mentioning files with unbalanced "()[]{}." The markdown argument prepares
// output for pretty display.

To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/e401f2d7-44e3-400a-846a-6f3276f0698d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Jones

unread,
Jun 16, 2019, 9:07:14 PM6/16/19
to Volker Dobler, golang-nuts
Volker, the answer to the balance question. My formatting is that "«balance (2:1) [0:0] {1:0}»" gives the number of left and right parenthesis, lefdt and right brackets, and left and right braces in using "<left-char><left-count>:<right count><right char>":

2019/06/16 17:57:59.844755 files that failed the Go lexical scan:
2019/06/16 17:57:59.844763   bad /Users/mtj/go/test/fixedbugs/bug435.go «balance (2:1) [0:0] {1:0}»
2019/06/16 17:57:59.844770   bad /Users/mtj/go/test/fixedbugs/issue13248.go «balance (2:1) [0:0] {1:1}»
2019/06/16 17:57:59.844777   bad /Users/mtj/go/test/fixedbugs/issue13274.go «balance (1:1) [0:0] {1:0}»
2019/06/16 17:57:59.844784   bad /Users/mtj/go/test/fixedbugs/issue13319.go «balance (6:4) [0:0] {2:2}»
2019/06/16 17:57:59.844791   bad /Users/mtj/go/test/fixedbugs/issue15611.go «balance (1:0) [0:0] {0:0}»
2019/06/16 17:57:59.844798   bad /Users/mtj/go/test/fixedbugs/issue17328.go «balance (1:2) [0:0] {2:2}»
2019/06/16 17:57:59.844805   bad /Users/mtj/go/test/fixedbugs/issue18092.go «balance (1:1) [0:0] {3:2}»
2019/06/16 17:57:59.844813   bad /Users/mtj/go/test/fixedbugs/issue19667.go «balance (2:1) [0:0] {1:1}»
2019/06/16 17:57:59.844820   bad /Users/mtj/go/test/fixedbugs/issue20789.go «balance (1:1) [2:0] {1:0}»
2019/06/16 17:57:59.844827   bad /Users/mtj/go/test/fixedbugs/issue22164.go «balance (8:6) [1:1] {5:4}»
2019/06/16 17:57:59.844834   bad /Users/mtj/go/test/fixedbugs/issue22581.go «balance (7:10) [0:3] {7:7}»
2019/06/16 17:57:59.844842   bad /Users/mtj/go/test/syntax/semi1.go «balance (1:1) [0:0] {2:0}»
2019/06/16 17:57:59.844850   bad /Users/mtj/go/test/syntax/semi2.go «balance (1:1) [0:0] {2:0}»
2019/06/16 17:57:59.844857   bad /Users/mtj/go/test/syntax/semi3.go «balance (1:1) [0:0] {2:0}»
2019/06/16 17:57:59.844864   bad /Users/mtj/go/test/syntax/semi4.go «balance (1:1) [0:0] {2:0}»
2019/06/16 17:57:59.844871   bad /Users/mtj/go/test/syntax/semi5.go «balance (1:1) [0:0] {1:0}»
2019/06/16 17:57:59.844879   bad /Users/mtj/go/test/syntax/vareq.go «balance (1:1) [1:1] {2:1}»
Reply all
Reply to author
Forward
0 new messages