I was asked, in the previous discussion about code formatting, to
provide some research on the subject. I'm sorry I didn't get back to
that fast enough, but here we are.
We have a lot of data on this: Michael Hansen at Indiana University has
done a series of eye-tracking experiments regarding code formatting
standards and programmer efficiency and comprehension.
He's written a blog post about it here:
http://synesthesiam.com/posts/what-makes-code-hard-to-understand.html
...and the longer publication is here:
http://arxiv.org/pdf/1304.5257v2.pdf - linked from
http://arxiv.org/abs/1304.5257 if you want citation information or other
formats.
You should read the whole thing, obviously, but a few choice
observations include the fact that small inconsistencies in formatting
have been unequivocally proven to cause dramatic drops in comprehension
and productivity. Likewise, dead code or even just code that doesn't
follow logically from the code above it.
Another observation is that experienced programmers just don't read code
the same way novices do. Novices go word-by-word, line-by-line, and
veterans tend to jump around scanning for keywords and code blocks; the
kicker is that when experienced software engineers see a file or code
block that's in an inconsistent _or just unfamiliar_ format, their
brains kick them out of their block-analysis mode and they have to go
back to reading line by line.
You can actually _see this happen_ with eye tracking software, is the
amazing thing.
The implication, relevant to individual employees and community
contributors as well as Mozilla as a fast-moving organization, is that:
- by not having a single coding standard we are making it a lot more
expensive and error-prone for contributors and employees to move from
one part of the project to another, and
- those costs are very real and largely hidden.
There's a long history of research - Solloway and Ehrlich as early as
1984 is one example - that says pretty much the same thing. Experienced
programmers have very strong, learned expectations about what a codebase
_should_ look like, and when it does not look like that, or when they
have to get used to a new system, productivity and comprehension drop
sharply.
Their paper's available here:
http://www.ics.uci.edu/~redmiles/inf233-FQ07/oldpapers/SollowayEhrlich.pdf
- but it's pretty typically hard-on-the-eyes early IEEE stuff, a scan of
a printout.
Coding standards are nice, and we should have them for sure, but they
should inform the design of a formatting tool, not be a style coach. We
should have a standard, and it should be enforced by a machine before
any of this ever hits a human - particularly a reviewer - in the eyes.
And from a community-engagement perspective, this matters. Every time we
type "nit" and send a patch back instead of cleaned up and ship-ready by
a tool is one more chance to miss an bunch of opportunities, to have
somebody get fed up and leave and let that patch rot on the vine. It's
one more small, annoying barrier to participation, and it'd be great if
we could have automation in place to make it go away.
- mhoye
On 1/6/2014, 12:55 PM, Martin Thomson wrote:
> I think that this is a good start, but it doesn�t go quite far enough.
>
> Part of the problem with a policy that requires people to avoid reformatting of stuff they don�t touch is that it propagates formatting divergence. Sometimes because it�s better to conform to the style immediately adjacent the changed, but more so because it prevents the use of tools that reformat entire files (here, I�m not talking about things like names, but more whitespace conventions).
>
> For whitespace at least, I think that we need to do the following:
>
> 1. pick a style
>
> I really, really don�t care what this is. I�m thinking that we pick whatever people think that the current style is and give folks a fixed period to debate changes.
>
> 2. create some tools
>
> These tools should help people conform to the style.
>
> Primarily, what is needed is a tool with appropriate configuration that runs on the command line � e.g., mach reformat � clang-format is looking like a good candidate for C/C++, it just needs a configuration. For JavaScript, I�ve used the python js-beautify, but it�s pretty light on configuration options, it might need some hacking to make it better.
>
> Ideally though, there should be set of configuration files for common editors. I�m certain there are plenty out there already. Let�s bring them all together (attaching the files
https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style might be enough).
>
> 3. reformat everything
>
> Take the command line tool and run it over all the code. I realise that this is the contentious part, but you don�t get the real benefits until you do this.
>
> Once you do this, it�s safe to reformat files at any time without messing with parts of files that you haven�t touched. This is important because many tools only reformat entire files.
>
> As for `hg blame` and related tools, I have a workaround for that. I�ve knocked together a tool that takes a file and a reformatter command line, and churns out a series of patches that retain blame:
https://github.com/martinthomson/blame-bridge
>
> The patch bitrot problem is not easy to work around, but that will depend on how closely the affected files already conformed to the style guide.
>
> 4. enforce compliance
>
> This is probably a step for the future, but if there was - for example - a commit bot that waited for a clean build+test run, adding a format check to that run would allow the bot to block patches that screwed up the formatting of files.
>
> �Martin
>
>