RFC: Chracter width-aware gofmt

741 views
Skip to first unread message

Rui Ueyama

unread,
Jun 22, 2014, 7:59:11 PM6/22/14
to golang-dev
I want to fix the issue that gofmt does not layout multi-column characters correctly. It needs a change to text.tabwriter, so I need to get you guy's input. Any comments are appreciated.

Issue:
gofmt, or text.tabwriter, assumes that all Unicode code points occupy exactly one column in editors or on terminals. That assumption is not correct because most (but not all) Chinese/Japanese/Korean characters, emojis, "fullwidth" Latin characters, etc, occupy two columns. As a result gofmt formats Go code like this.

var Countries = map[string]string{
        "アメリカ合衆国": "United States of America",
        "日本":      "Japan",
        "ドイツ":     "Germany",
        "フランス":    "France",
        "ポーランド":   "Poland",
}

As you can see the column of the map value is misaligned. You cannot fix this by hand because gofmt would reformat it for you in the wrong way if you do that. That's annoying.

In Unicode, there's a zero column character (ZERO WIDTH SPACE; U+200B). SOFT HYPHEN (U+00AD) may be displayed as a hyphen at the end of a line but may be zero-width in other places, depending on your display environment. These chracters also affect the column layout.

Proposal:
Unicode Standard Annex #11 gives the definition of column width for characters in the legacy East Asian character sets. I propose to add the East Asian Width property to the unicode package, so that we can get the column width for a CJK character. East Asian Fullwidth and East Asian Wide characters should be treated as two column by tabwriter.

(Note: East Asian Ambiguous characters need to be treated as one column. They are treated as two columns only in East Asian display environment. The character set contains Cyrillic characters and others which we would never want to handle as two column.)

Because the Annex #11 does not say anything about characters that are not in the legacy East Asian character sets, we need additional rules for characters not in CJK character sets but in Unicode. I propose this simple rule:

 - ZERO WIDTH SPACE is 0 column
 - Emojis are 2 columns
 - Other code points, including U+0000, SOFT HYPHEN, and all control characters, are 1 column

This additional rule will be implemented to an unexported function in text.tabwriter.

Caveats:
I deliberately avoid defining the generic "wcswidth" function to determine the column width for a string in the standard library. That function can never be defined in the right way because there's no standard for it. Also it'd be hard to get a reasonable definition for characters with odd semantics, such as SOFT HYPHEN.

Russ Cox

unread,
Jun 23, 2014, 1:30:00 PM6/23/14
to Rui Ueyama, golang-dev
This comes up once in a while. I think this is an enormous can of worms and not something that gofmt should try to do. Turns out gofmt output doesn't align correctly for Lucida Sans either. You get used to it.

Russ

Robert Griesemer

unread,
Jun 23, 2014, 1:53:08 PM6/23/14
to Rui Ueyama, golang-dev, Marcel van Lohuizen
[Looping in mpvl explicitly, who is working on unicode libraries]

Can you please file an issue for this so it can be tracked accordingly? Thanks.

I'm the author of the tabwriter package. It should be fairly easy to make it work for other character widths, given a function that provides that information. That said, this will of course only work with fixed-width fonts where characters are always a multiple (1 or more) of a single character's width. gofmt won't be able to support variable-width fonts in general.

mpvl may have some input on the Annex #11 tables.

- gri



--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rui Ueyama

unread,
Jun 23, 2014, 1:57:04 PM6/23/14
to Russ Cox, golang-dev
I'd have to say you mixed two different things. People don't use Lucida Sans when writing code in Go, but people are using their native languages in string constants (with a fixed pitch font). We can't (and shouldn't) stop them using their languages in code, but at the same time I can't say to them that they need get used to the format error.

If you don't use the language you may not be able to see the issue, but for those who uses the language this is a reasonable minimum requirement for gofmt.

This is a real issue that annoys every CJK language speaker (and I remember a rumor that Go is more popular in China than the USA). It lowers the value of gofmt, which is considered to be the tool for Go that always formats Go code in "the right way."

Andrew Gerrand

unread,
Jun 23, 2014, 1:59:23 PM6/23/14
to Rui Ueyama, Russ Cox, golang-dev

On 24 June 2014 03:56, 'Rui Ueyama' via golang-dev <golan...@googlegroups.com> wrote:
People don't use Lucida Sans when writing code in Go

Rob and Russ do.

princessa...@gmail.com

unread,
Jun 23, 2014, 2:00:18 PM6/23/14
to golan...@googlegroups.com, r...@golang.org
I've run into this using tabwriter before and would appreciate some sort of fix - FWIW.

Rui Ueyama

unread,
Jun 23, 2014, 2:03:32 PM6/23/14
to Andrew Gerrand, Russ Cox, golang-dev
Oh really? Sorry. I rephrase "most people".

David du Colombier

unread,
Jun 23, 2014, 2:03:41 PM6/23/14
to Rui Ueyama, Russ Cox, golang-dev
> People don't use Lucida Sans when writing code in Go

I do, and not only Go. And I'm probably not the only one.

--
David du Colombier

Andrew Gerrand

unread,
Jun 23, 2014, 2:05:29 PM6/23/14
to Rui Ueyama, Russ Cox, golang-dev

On 24 June 2014 04:03, Rui Ueyama <ru...@google.com> wrote:
Oh really? Sorry. I rephrase "most people".

Most people don't use wide characters in their code, either.

Aram Hăvărneanu

unread,
Jun 23, 2014, 2:11:33 PM6/23/14
to Rui Ueyama, Russ Cox, golang-dev
On Mon, Jun 23, 2014 at 7:56 PM, 'Rui Ueyama' via golang-dev
<golan...@googlegroups.com> wrote:
> People don't use Lucida Sans when writing code in Go

I do, although I'm using Lucida Grande, but Lucida Sans and Lucida
Grande are very similar.

--
Aram Hăvărneanu

Rui Ueyama

unread,
Jun 23, 2014, 2:12:08 PM6/23/14
to Andrew Gerrand, Russ Cox, golang-dev
I disagree. You usually use your language in string constants that are supposed to be read by human. In particular if you don't speak other language that's the only thing you can do.

Rob Pike

unread,
Jun 23, 2014, 2:15:20 PM6/23/14
to Rui Ueyama, Andrew Gerrand, Russ Cox, golang-dev
It should be easy (note I'm saying 'should', not 'is') to make gofmt
handle variable character width using font metrics in a general way,
for nice layout in typesetting systems as well as the trivial case of
double-width characters. I would resist quick fixes that handle just
the double-width case.

-rob

Andrew Gerrand

unread,
Jun 23, 2014, 2:15:52 PM6/23/14
to Rui Ueyama, Russ Cox, golang-dev

On 24 June 2014 04:11, Rui Ueyama <ru...@google.com> wrote:
I disagree. You usually use your language in string constants that are supposed to be read by human. In particular if you don't speak other language that's the only thing you can do.

Oh, I'm sorry. I lost sight of the original issue. I was thinking of struct fields, not map literals. I withdraw my pithy response.

Andrew

Rui Ueyama

unread,
Jun 23, 2014, 2:35:39 PM6/23/14
to Rob Pike, Andrew Gerrand, Russ Cox, golang-dev
I see your plan. But even if you use gofmt to adjust indentation for your font metrics, when you check it in to a central repository like github, you'd (automatically) run gofmt again for the standard style, wouldn't you? Otherwise all lines would be modified on every checkin if multiple people edit it. I imagine the standard style would remain assuming fixed-pitch fonts.

Can't it be considered as a stepwise improvement to gofmt? There's an actual big demand for it that we even have a Unicode standard for that case...

Bakul Shah

unread,
Jun 23, 2014, 2:40:13 PM6/23/14
to Rui Ueyama, golang-dev
Indic languages require even more complex text layout rules!
IMHO support for all this belongs in an editor or IDE -- if
they can do horrible/wonderful things like colorize text, why
can't they do "proper" layout as well? Let gofmt remain a
(relatively) simple tool.

Rui Ueyama

unread,
Jun 23, 2014, 2:43:14 PM6/23/14
to Bakul Shah, golang-dev

Marcel van Lohuizen

unread,
Jun 24, 2014, 8:07:33 AM6/24/14
to Rui Ueyama, golang-dev
Does the proposal really fix anything?

Different fixed-width fonts handle character widths differently. Some editors might enforce a 2-1 certain relation. Vim seems to render a CJK full-width character as having the width of 2 latin characters, as you suggest. However, many editors adhere to the width as defined by the font (Sublime, TextEdit, XCode).  Most fixed-width fonts (on the Mac at least) adhere to the standard as defined in the Annex you referred to: A full-width character should render at 5/3 the width of a Latin character, not 2 times. Some fixed-width fonts indeed implement the times 2 relation and some do something different altogether.

Even if we say we stick to the standard, though, to align characters property tabwriter would need to insert Unicode spaces that reflect the missing width (See U+2000 - U+2006). However, most editors/fonts (if any) are not handling such spaces as one would expect. Tabs could be used, but it has its own issues.

So I agree with the can-of-worms statement of Russ. It seems arbitrary to me to pick the non-standard 2 times Latin width for full-width CJK characters. OTOH, complying to the 5/3rd Latin width results in its own issues and will likely not give the desired results. For some other scripts things get even more tricky. It seems like a rather straightforward thing for IDEs/editors to solve, though, so ideally this is where the solution should be found.

Maybe one solution is to allow a user to write the map value on a new line without gofmt rewriting it:

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Marcel van Lohuizen -- Google Switzerland GmbH -- Identifikationsnummer: CH-020.4.028.116-1

Rui Ueyama

unread,
Jun 24, 2014, 2:16:48 PM6/24/14
to Marcel van Lohuizen, golang-dev
It's not a goal of this proposal to make gofmt work with proportional fonts or characters whose width is not multiples of the fixed-width character.

The coding environment that Courier, whose character size is 3/5 of a full-width character, is used with a fixed-width CJK font is hypothetical -- it's a bad combination of fixed-width and proportional. Most programmers are (still) using editors on which a full-width characters are rendered as double-width of a fixed-width Latin character. I strongly believe this proposal fix the actual problem.

(And I don't think the standard says that a full-width character should render at 5/3 the width of a Latin character. It's just a mention about some environment as a counter example to the traditional, widely-known environment -- in which a full-width character is rendered twice wide as a Latin character.)

As to U+2000 - U+2006 and such, I could withdraw the "additional rule" part of my proposal, and stick only to the Unicode Standard Annex #11 (which is a part of the standard), so that we don't need to repeat the discussion that the Unicode Consortium had on the concept of the character width.

Russ Cox

unread,
Jun 24, 2014, 2:42:17 PM6/24/14
to Rui Ueyama, Marcel van Lohuizen, golang-dev
On Tue, Jun 24, 2014 at 2:16 PM, 'Rui Ueyama' via golang-dev <golan...@googlegroups.com> wrote:
It's not a goal of this proposal to make gofmt work with proportional fonts or characters whose width is not multiples of the fixed-width character.

The coding environment that Courier, whose character size is 3/5 of a full-width character, is used with a fixed-width CJK font is hypothetical -- it's a bad combination of fixed-width and proportional. Most programmers are (still) using editors on which a full-width characters are rendered as double-width of a fixed-width Latin character. I strongly believe this proposal fix the actual problem.

I understand writing off acme programmers using variable-width fonts, but I don't see why you are discounting Marcel's counterexample of programmers using widely-used editors like Sublime or Xcode with fixed-width fonts. The fact that the standard is 5/3x, not 2x, is another strike against this proposal.

It seems like you are adding ad-hoc mechanism to support your specific editor and your specific fonts, along with an unsubstantiated assertion that "most programmers" are like you, despite the fact that there are other widely used editors that behave differently and despite the fact that your fonts do not conform to the Unicode standard.

Marcel has spent more time looking at Unicode details than the rest of us combined. Since he agrees, I suggest (again) that we drop this. 

Russ

Rui Ueyama

unread,
Jun 24, 2014, 3:05:48 PM6/24/14
to Russ Cox, Marcel van Lohuizen, golang-dev
The Unicode standard does not say that the full-width character is 5/3x. I don't understand why you think it's the requirement of a standard. It says Latin fonts like Courier is 3/5x. It doesn't mean Latin characters in other fonts has to be 5/3x, either.

Please look at the Recommendation section of the Annex #11 [1].

Russ Cox

unread,
Jun 24, 2014, 3:22:43 PM6/24/14
to Rui Ueyama, Marcel van Lohuizen, golang-dev
On Tue, Jun 24, 2014 at 3:05 PM, Rui Ueyama <ru...@google.com> wrote:
The Unicode standard does not say that the full-width character is 5/3x. I don't understand why you think it's the requirement of a standard. It says Latin fonts like Courier is 3/5x. It doesn't mean Latin characters in other fonts has to be 5/3x, either.

Okay, but what matters relative to this proposal is whether it says full-width = 2 x fixed-width, and I don't see that there either.

Russ

Brad Fitzpatrick

unread,
Jun 24, 2014, 3:28:34 PM6/24/14
to Russ Cox, Rui Ueyama, Marcel van Lohuizen, golang-dev
In that section, "halfwidth" doesn't mean half of "fullwidth"?

Rui Ueyama

unread,
Jun 24, 2014, 3:34:28 PM6/24/14
to Russ Cox, Marcel van Lohuizen, golang-dev
When displaying data, (my interpretation of) the Recommendations are

1. Wide characters (such as Chinese characters) take up 1 em in fixed-pitch fonts
2. Half-width characters (such as half-width katakanas) take up 1/2 em in fixed-pitch fonts
3. Narrow characters (which includes Latin/ASCII charset as defined in the #11 table) takes up 1/2 em in East Asian fixed-pitch fonts
4. Ambiguous characters follow 1 or 2.

It seems to me that it recommends full-width is 1 em and half-width/Latin are half of it.

On Tue, Jun 24, 2014 at 12:22 PM, Russ Cox <r...@golang.org> wrote:

Russ Cox

unread,
Jun 24, 2014, 4:15:35 PM6/24/14
to Rui Ueyama, Marcel van Lohuizen, golang-dev
On Tue, Jun 24, 2014 at 3:33 PM, Rui Ueyama <ru...@google.com> wrote:
When displaying data, (my interpretation of) the Recommendations are

1. Wide characters (such as Chinese characters) take up 1 em in fixed-pitch fonts
2. Half-width characters (such as half-width katakanas) take up 1/2 em in fixed-pitch fonts
3. Narrow characters (which includes Latin/ASCII charset as defined in the #11 table) takes up 1/2 em in East Asian fixed-pitch fonts

#3 is limits itself to East Asian fixed-pitch fonts. But "most programmers" are using Latin fixed-pitch fonts capable of displaying East Asian characters. And many of those fonts (like the ones Marcel described) do not follow the 2x rule (a fact mentioned on the same page and already discussed here).

My point is that I don't see TR11 justifying this change, which makes it more of an ad-hoc "fixes my editor settings" change.

Russ

Brad Fitzpatrick

unread,
Jun 24, 2014, 4:23:39 PM6/24/14
to Russ Cox, Robert Griesemer, Rui Ueyama, Marcel van Lohuizen, golang-dev
On Tue, Jun 24, 2014 at 1:15 PM, Russ Cox <r...@golang.org> wrote:
On Tue, Jun 24, 2014 at 3:33 PM, Rui Ueyama <ru...@google.com> wrote:
When displaying data, (my interpretation of) the Recommendations are

1. Wide characters (such as Chinese characters) take up 1 em in fixed-pitch fonts
2. Half-width characters (such as half-width katakanas) take up 1/2 em in fixed-pitch fonts
3. Narrow characters (which includes Latin/ASCII charset as defined in the #11 table) takes up 1/2 em in East Asian fixed-pitch fonts

#3 is limits itself to East Asian fixed-pitch fonts. But "most programmers" are using Latin fixed-pitch fonts capable of displaying East Asian characters. And many of those fonts (like the ones Marcel described) do not follow the 2x rule (a fact mentioned on the same page and already discussed here).


If there were any change to godoc, I could imagine something less controversial might be to say if any line in a literal like this contains wide characters, just give up on lining things up and use a single space after the colon instead.


Robert Griesemer

unread,
Jun 24, 2014, 4:26:52 PM6/24/14
to Brad Fitzpatrick, Russ Cox, Rui Ueyama, Marcel van Lohuizen, golang-dev
On Tue, Jun 24, 2014 at 1:23 PM, Brad Fitzpatrick <brad...@golang.org> wrote:
If there were any change to godoc, I could imagine something less controversial might be to say if any line in a literal like this contains wide characters, just give up on lining things up and use a single space after the colon instead.
+1
- gri

Rui Ueyama

unread,
Jun 24, 2014, 5:45:31 PM6/24/14
to Russ Cox, Marcel van Lohuizen, golang-dev
On Tue, Jun 24, 2014 at 1:15 PM, Russ Cox <r...@golang.org> wrote:
On Tue, Jun 24, 2014 at 3:33 PM, Rui Ueyama <ru...@google.com> wrote:
When displaying data, (my interpretation of) the Recommendations are

1. Wide characters (such as Chinese characters) take up 1 em in fixed-pitch fonts
2. Half-width characters (such as half-width katakanas) take up 1/2 em in fixed-pitch fonts
3. Narrow characters (which includes Latin/ASCII charset as defined in the #11 table) takes up 1/2 em in East Asian fixed-pitch fonts

#3 is limits itself to East Asian fixed-pitch fonts. But "most programmers" are using Latin fixed-pitch fonts capable of displaying East Asian characters. And many of those fonts (like the ones Marcel described) do not follow the 2x rule (a fact mentioned on the same page and already discussed here).

But I think it's a reasonable assumption that the "most programmers" who edits code containing East Asian characters daily basis often uses an East Asian fixed-pitch font. For those who don't, I honestly think they wouldn't care about whichever those width is counted. It helps many and wouldn't hurt anyone. I did not expect to get this much push-back to this proposal. Is there anyone who uses CJK daily basis?

Marcel van Lohuizen

unread,
Jun 24, 2014, 7:01:38 PM6/24/14
to Russ Cox, Rui Ueyama, golang-dev
On Tue, Jun 24, 2014 at 10:15 PM, Russ Cox <r...@golang.org> wrote:
On Tue, Jun 24, 2014 at 3:33 PM, Rui Ueyama <ru...@google.com> wrote:
When displaying data, (my interpretation of) the Recommendations are

1. Wide characters (such as Chinese characters) take up 1 em in fixed-pitch fonts
2. Half-width characters (such as half-width katakanas) take up 1/2 em in fixed-pitch fonts
3. Narrow characters (which includes Latin/ASCII charset as defined in the #11 table) takes up 1/2 em in East Asian fixed-pitch fonts

#3 is limits itself to East Asian fixed-pitch fonts. But "most programmers" are using Latin fixed-pitch fonts capable of displaying East Asian characters. And many of those fonts (like the ones Marcel described) do not follow the 2x rule (a fact mentioned on the same page and already discussed here).
Correct. To quote that same annex: ".. the character width for a fixed-pitch Latin font like Courier is generally 3/5 of an Em.".

 
My point is that I don't see TR11 justifying this change, which makes it more of an ad-hoc "fixes my editor settings" change.
Again, agree. With some other scripts things get even messier. 
 

Russ

Marcel van Lohuizen

unread,
Jun 24, 2014, 7:04:10 PM6/24/14
to Brad Fitzpatrick, Russ Cox, Robert Griesemer, Rui Ueyama, golang-dev
On Tue, Jun 24, 2014 at 10:23 PM, Brad Fitzpatrick <brad...@golang.org> wrote:



On Tue, Jun 24, 2014 at 1:15 PM, Russ Cox <r...@golang.org> wrote:
On Tue, Jun 24, 2014 at 3:33 PM, Rui Ueyama <ru...@google.com> wrote:
When displaying data, (my interpretation of) the Recommendations are

1. Wide characters (such as Chinese characters) take up 1 em in fixed-pitch fonts
2. Half-width characters (such as half-width katakanas) take up 1/2 em in fixed-pitch fonts
3. Narrow characters (which includes Latin/ASCII charset as defined in the #11 table) takes up 1/2 em in East Asian fixed-pitch fonts

#3 is limits itself to East Asian fixed-pitch fonts. But "most programmers" are using Latin fixed-pitch fonts capable of displaying East Asian characters. And many of those fonts (like the ones Marcel described) do not follow the 2x rule (a fact mentioned on the same page and already discussed here).

Even for Latin it currently does funny things, btw: http://play.golang.org/p/ks7DI_BO1O 

If there were any change to godoc, I could imagine something less controversial might be to say if any line in a literal like this contains wide characters, just give up on lining things up and use a single space after the colon instead.
+1, but probably for any non-ascii for now.

Rui Ueyama

unread,
Jun 24, 2014, 7:55:53 PM6/24/14
to Marcel van Lohuizen, Brad Fitzpatrick, Russ Cox, Robert Griesemer, golang-dev
OK, so this topic seems well discussed, and it's unlikely that I can convince you guys.

I hope you understand this was not an unreasonable proposal. If you did field research on the CJK programmers there I think you'd find it makes sense. That being said, if there's no other strong supporter, I have to withdraw this proposal because of the lack of support.

Ian Lance Taylor

unread,
Jun 24, 2014, 8:17:29 PM6/24/14
to Rui Ueyama, Marcel van Lohuizen, Brad Fitzpatrick, Russ Cox, Robert Griesemer, golang-dev
On Tue, Jun 24, 2014 at 4:55 PM, 'Rui Ueyama' via golang-dev
<golan...@googlegroups.com> wrote:
>
> OK, so this topic seems well discussed, and it's unlikely that I can
> convince you guys.
>
> I hope you understand this was not an unreasonable proposal. If you did
> field research on the CJK programmers there I think you'd find it makes
> sense. That being said, if there's no other strong supporter, I have to
> withdraw this proposal because of the lack of support.

What do you think of Brad's suggestion?

>>> If there were any change to godoc, I could imagine something less
>>> controversial might be to say if any line in a literal like this contains
>>> wide characters, just give up on lining things up and use a single space
>>> after the colon instead.


Ian

Hiroshi Sakurai

unread,
Jun 24, 2014, 8:29:56 PM6/24/14
to Marcel van Lohuizen, Brad Fitzpatrick, Russ Cox, Robert Griesemer, Rui Ueyama, golang-dev
Hello,

How about the idea of passing a config file to gofmt that knows what
character takes up how much width? Something like the follwoing.

$ gofmt -w file.go -font-width-convention=japanese.txt

I think it would be a cleaner solution than adding 2x rule in gofmt code.

Brad Fitzpatrick

unread,
Jun 24, 2014, 8:34:46 PM6/24/14
to Hiroshi Sakurai, Rui Ueyama, Russ Cox, golang-dev, Marcel van Lohuizen, Robert Griesemer

That's even worse. We are *removing* options from gofmt, not adding them.

The whole point of gofmt is to have one standard and not local rules.

Rui Ueyama

unread,
Jun 24, 2014, 8:41:19 PM6/24/14
to Brad Fitzpatrick, Hiroshi Sakurai, Russ Cox, golang-dev, Marcel van Lohuizen, Robert Griesemer
Hiroshi,

Thank you for the suggestion but I'm with Brad. The aim of this proposal is to change the behavior of gofmt CJK programmer friendly (without hurting others) so that we can encourage them to use the standard style. Creating an alternative one is not what I want.

Robert Griesemer

unread,
Jun 24, 2014, 8:45:39 PM6/24/14
to Ian Lance Taylor, Rui Ueyama, Marcel van Lohuizen, Brad Fitzpatrick, Russ Cox, golang-dev
After discussing this a bit more w/ Rob, I'm not convinced anymore that Brad's suggestion would be "good enough" in the sense that it would make a worthwhile difference:

The command-line gofmt should not depend on the character widths of the specific font used, only the Unicode characters in the source. Thus, Brad's proposal would only work if we could very broadly assume that _some_ Unicode characters always have a different width then "most" others for a given fixed width font. And not just any such fixed width font, but all (most) of them. If that were true, we could abandon alignment in those cases because not aligning might look better than aligning the wrong way. But I am very skeptical that we can identify such characters, and in any case we are always at the mercy of the specific font used. So I'm not sure it's worth the effort.

What we really want is editing and display tools (godoc) that can align and take the true width of characters into account, thus making Go code using variable-width fonts look great. That formatting must naturally happen at display time.

Underneath, gofmt uses text/tabwriter to achieve alignment. It's the tabwriter that puts in the necessary padding using blanks. Before gofmt-ed source is piped through the tabwriter, the source is "tabulated" into columns using tabs (not just for indentation). Ideally, we would store source in that format. The tabwriter algorithm is rather simple (compared to the whole of gofmt) and has no language knowledge and could easily take character widths into account. But it would have to run in your respective editor.

The current gofmt is a compromise: It does format under the assumption that people use a fixed-width (non-proportional) font, and that tabs are 8 chars wide. Indentation is done with tabs, all other alignment is done with blanks. And this is how we store the code.

- gri

Mikio Hara

unread,
Jun 24, 2014, 9:02:37 PM6/24/14
to Rui Ueyama, Marcel van Lohuizen, Brad Fitzpatrick, Russ Cox, Robert Griesemer, golang-dev
Hi Rui,

I can feel your frustration, but I'm not an expert of i18n, l10n
and/or unicode, so have not much to say, but a bit.

On Wed, Jun 25, 2014 at 8:55 AM, 'Rui Ueyama' via golang-dev
<golan...@googlegroups.com> wrote:

> OK, so this topic seems well discussed, and it's unlikely that I can
> convince you guys.

I think Mr. Pike already pointed out the direction. He said "I would
resist quick fixes that handle just
the double-width case." Seems we need an experiment, kinda forking
text/tabwriter and cmd/gofmt then adding more i18n-friendly gofmt (but
it handles unicode only, unlike gettext-based stuff) to go.text
subrepo.

Brad Fitzpatrick

unread,
Jun 24, 2014, 9:11:26 PM6/24/14
to Robert Griesemer, Ian Lance Taylor, Rui Ueyama, Marcel van Lohuizen, Russ Cox, golang-dev
On Tue, Jun 24, 2014 at 5:45 PM, Robert Griesemer <g...@golang.org> wrote:
After discussing this a bit more w/ Rob, I'm not convinced anymore that Brad's suggestion would be "good enough" in the sense that it would make a worthwhile difference:

The command-line gofmt should not depend on the character widths of the specific font used, only the Unicode characters in the source. Thus, Brad's proposal would only work if we could very broadly assume that _some_ Unicode characters always have a different width then "most" others for a given fixed width font. And not just any such fixed width font, but all (most) of them. If that were true, we could abandon alignment in those cases because not aligning might look better than aligning the wrong way. But I am very skeptical that we can identify such characters, and in any case we are always at the mercy of the specific font used. So I'm not sure it's worth the effort.

Aren't there entire Unicode classes for this sort of thing? But failing that, the heuristic of non-ASCII is probably enough. As demonstrated earlier, we already mess up even Latin with diacritics, so saying non-ASCII wouldn't be worse.


Robert Griesemer

unread,
Jun 24, 2014, 11:35:35 PM6/24/14
to Brad Fitzpatrick, Ian Lance Taylor, Rui Ueyama, Marcel van Lohuizen, Russ Cox, golang-dev
On Tue, Jun 24, 2014 at 6:11 PM, Brad Fitzpatrick <brad...@golang.org> wrote:
Aren't there entire Unicode classes for this sort of thing? But failing that, the heuristic of non-ASCII is probably enough. As demonstrated earlier, we already mess up even Latin with diacritics, so saying non-ASCII wouldn't be worse.

Maybe restriction to ASCII for alignment is good enough. Worth an experiment. I'll look into it.
- gri

Dan Kortschak

unread,
Jun 25, 2014, 12:43:52 AM6/25/14
to Robert Griesemer, Brad Fitzpatrick, Ian Lance Taylor, Rui Ueyama, Marcel van Lohuizen, Russ Cox, golang-dev
Will that impact on text/tabwriter?

minux

unread,
Jun 25, 2014, 2:29:07 AM6/25/14
to Rui Ueyama, Marcel van Lohuizen, Brad Fitzpatrick, Russ Cox, Robert Griesemer, golang-dev
On Tue, Jun 24, 2014 at 7:55 PM, 'Rui Ueyama' via golang-dev <golan...@googlegroups.com> wrote:
OK, so this topic seems well discussed, and it's unlikely that I can convince you guys.

I hope you understand this was not an unreasonable proposal. If you did field research on the CJK programmers there I think you'd find it makes sense. That being said, if there's no other strong supporter, I have to withdraw this proposal because of the lack of support.
The biggest complaint I got from interacting with Chinese Go programmers is not the full-width alignment problem,
but the Upper case export rule.

IMHO, if you CJK characters in a string, then you might as well need to support other non-ASCII languages, so it makes
sense to use something like gettext to move the non-english texts into separate files. And most of the times, those files
will be automatically generated from a translation file, so the format of that file is not a big issue.

Marcel van Lohuizen

unread,
Jun 25, 2014, 10:06:50 AM6/25/14
to Robert Griesemer, Ian Lance Taylor, Rui Ueyama, Brad Fitzpatrick, Russ Cox, golang-dev
What about my earlier proposal of allowing the value string to be on a separate line?:
var Countries = map[string]string{
        "アメリカ合衆国": 
         "United States",
        "日本":     
         "Japan",
        "ドイツ":    
         "Germany",
        "フランス":    
         "France",
        "ポーランド":   
         "Poland",
}
Basically gofmt's behavior could remain the same if both key and value are on the same line, but simply indent key and value if they are on separate lines, instead of joining the lines. Code looks more verbose, but it solves the problem in a trivial way and it would look good in all editors, even with variable-width fonts, if the user choses to format it this way. gofmt already allows in some cases for the user to chose whether something is split across lines or not (e.g. one-line functions), so there is precedent for this.

Alexander Rødseth

unread,
Jun 25, 2014, 11:21:39 AM6/25/14
to Marcel van Lohuizen, Robert Griesemer, Ian Lance Taylor, Rui Ueyama, Brad Fitzpatrick, Russ Cox, golang-dev
How about using tabs instead of spaces if the line contains non-ASCII
characters? This leaves the problem of aligning columns to the editor
and requires no changes to the tabwriter package.

---
Best regards,
Alexander Rødseth

Robert Griesemer

unread,
Jun 25, 2014, 1:14:53 PM6/25/14
to Alexander Rødseth, Marcel van Lohuizen, Ian Lance Taylor, Rui Ueyama, Brad Fitzpatrick, Russ Cox, golang-dev
Marcel's proposal is most in spirit with gofmt - in case of key,value pairs, respect line breaks that are present, and format accordingly. This would solve at least this particular instance w/o impacting existing code.

Aram Hăvărneanu

unread,
Jun 25, 2014, 1:54:53 PM6/25/14
to Robert Griesemer, Alexander Rødseth, Marcel van Lohuizen, Ian Lance Taylor, Rui Ueyama, Brad Fitzpatrick, Russ Cox, golang-dev
On Wed, Jun 25, 2014 at 7:14 PM, Robert Griesemer <g...@golang.org> wrote:
> This would solve at least this particular instance w/o impacting existing
> code.

On the other hand, it might encourage authors who don't need it to add
newlines to their code...

--
Aram Hăvărneanu

Russ Cox

unread,
Jun 26, 2014, 11:24:54 AM6/26/14
to Aram Hăvărneanu, Robert Griesemer, Alexander Rødseth, Marcel van Lohuizen, Ian Lance Taylor, Rui Ueyama, Brad Fitzpatrick, golang-dev
You still have to decide which code points trigger alternate behavior. Limiting to ASCII is too restrictive. Fixed-width fonts really are fixed width for a very large variety of characters. If I change from ns to µs all of a sudden I don't get alignment?

Honestly I think the :\n map form is worse than unaligned text.

I still think we should leave well enough alone. We could tweak this forever, and we could talk about it forever. Or we could do other things.

Russ
Reply all
Reply to author
Forward
0 new messages