On 12/01/2009 03:28 PM, Robert Griesemer wrote:
> Unfortunately the tabwriter package (which is used to align columns)
> cannot handle em spaces correctly: At the moment it doesn't accept
> non-byte "padding" chars, but even if it would, it assumes all chars
> have the same width. To fix this at the "file level" one would have to
> tell it the font, but that would make things even more complicated, and
> it's not clear it would even work (what's in the file may look different
> in an editor, and one would need pixel-accurate white space).
Should a bug / feature request be opened against tabwriter?
Gofmt cares about character width only in the special case of indentation, which it already does with regard to tabs. Em-space would be the second width-aware character, not the first. And it would be fixed, not variable.
There should be no need to determine the font, since I believe it would be a configuration error for any editor or printer to fail to correctly display an em-space. If you get a funny square box, then the user needs a better font, something a Unicode source file should care nothing about.
However, gofmt does need to know the desired output indentation increment in em-spaces. If no count is provided, it may either be inferred from the file content (usually 2 or 4), or simply assumed to be a default value, such as 4.
As a temporary hack, we can use awk or sed to replace leading spaces and tabs with a suitable number of em-spaces, and see how the result is handled by the various editors in use. The display of a converted file should not be affected for monospace fonts (unless an indentation change has been requested), and should look better for proportional fonts.
The following should be an em-space: " "
Here's what Gnome Character Map has to say about it:
U+2003 EM SPACE
General Character Properties
In Unicode since: 1.1
Unicode category: Separator, Space
Various Useful Representations
UTF-8: 0xE2 0x80 0x83
UTF-16: 0x2003
C octal escaped UTF-8: \342\200\203
XML decimal entity:  
Annotations and Cross References
Alias names:
• mutton
Notes:
• nominally, a space equal to the type size in points
• may scale by the condensation factor of a font
Approximate equivalents:
• U+0020 SPACE space
Ummm, I just took a look at the other Unicode spaces, and em-space may not be best (though it is suitable). Take a look at figure-space (0x2007):
U+2007 FIGURE SPACE
General Character Properties
In Unicode since: 1.1
Unicode category: Separator, Space
Various Useful Representations
UTF-8: 0xE2 0x80 0x87
UTF-16: 0x2007
C octal escaped UTF-8: \342\200\207
XML decimal entity:  
Annotations and Cross References
Notes:
• space equal to tabular width of a font
• this is equivalent to the digit width of fonts with fixed-width digits
Approximate equivalents:
• U+0020 SPACE
Notice that the figure-space width is rigidly specified, where it is "nominal" for em-space. By definition, this means it is precisely equivalent to a regular space (0x20) in fixed-width fonts.
Both figure-space and em-space should be well understood in an indentation context.
For completeness, let's look at tab (HT, 0x09):
U+0009 <control>
General Character Properties
In Unicode since: 1.1
Unicode category: Other, Control
Various Useful Representations
UTF-8: 0x09
UTF-16: 0x0009
C octal escaped UTF-8: \011
XML decimal entity: 	
Annotations and Cross References
Alias names:
• CHARACTER TABULATION
• horizontal tabulation (HT), tab
No width is mentioned, so since it has no glyph I believe that means it is up to the display system (presumably relative to "tab stops"). Is there any standard that states the displayed width of a tab or the positioning of tab stops? I don't think so!
> The "right" solution is to use (have!) editors that handle tabs like
> (flexible) "tab stops" and thus that show code nicely even if a
> proportional font is used (which I personally find desirable, too).
No, that seems impractical: Tab interpretation has been hacked to death for decades. I'd prefer to completely avoid using the tab character for indentation, since its interpretation by programmers and editors and *printers* is ambiguous. Globally replace all tab instances with "FUBAR".
For now, for minimal uniformity, gofmt could enforce a default tab stop at multiples 8 em-spaces/figure-spaces. I don't know if this would help when printing code files as text.
I would not be against gofmt embedding a special comment that reflects the tab stop settings and indentation character used. This comment, if present (as, say, the second line in a file), would be read by subsequent gofmt runs, overridden by command line parameters, then rewritten in the output. If we could get editors to understand it too, then our work would be done... I suspect something like this must already exist!
Still, I'd very much prefer to avoid tabs altogether. Everywhere I've worked, the coding style guide outlaws tabs for indentation. Expect this to be the norm when/if Go is used in commercial contexts and/or for projects with many contributors.
> I am experimenting with a version that uses tabs for indentation only,
> and blanks (or what-have-you) elsewhere. For various reasons, aligned
> columns are broken across sections of different indentation; thus all
> code at a given tab width is uniformly indented proportional to the tab
> width and (at least using a fixed-width font), everything else in the
> respective section remains aligned independent of tab width.
>
> Still not ideal for proportional fonts but it allows users to set the
> tab width to their liking and the code looks reasonable (without the
> need to re-apply gofmt). It also prevents the rather large gaps between
> columns.
Leading em-spaces/figure-spaces can do all this with greater simplicity and reliability for both printers and editors.
> We are aware that the gofmt output may not be to everybody's liking but
> we much prefer having a uniform and fairly reasonable style over a
> variety of different styles.
Uh, well, since you already have a "variety of different styles"...
Let's try adding one that is unambiguous and looks good with all Unicode fonts, with all Unicode-capable editors and printers, and see if it catches on.
After all, isn't one of Go's selling points that it has better Unicode-awareness? Why continue to play "Hack-the-Tab"?
The underlying issue is really more about consistently obtaining the desired presentation of formatted source code while retaining syntactic code indentation (not that it applies to Go, but does to Python). Tabs can have different meanings to different programs (editors, compilers, formatters, etc.) and printers. Em-space and figure-space are far more consistent.
-BobC
> <mailto:
yuk...@gmail.com><
yuk...@gmail.com
> <mailto:
baldmo...@gmail.com>>