Unicode variable name error

128 views
Skip to first unread message

Nikhilesh Susarla

unread,
Nov 6, 2022, 1:02:20 AM11/6/22
to golang-nuts
Hi, 

I was trying to declare unicode variable name as Go supports it. 
The language I used is Telugu. It's corresponding chart (https://unicode.org/charts/PDF/U0C00.pdf)

But I get the compilation issue. Am I missing anything ? 
If I write Japanese variable name it works. What is the difference? 

Thank you

Jason Phillips

unread,
Nov 6, 2022, 1:55:06 AM11/6/22
to golang-nuts
Per the Go spec[1], an identifier consists of a Unicode letter followed by zero or more Unicode letters or digits. The character పే is in the Unicode category nonspacing mark rather than the category letter.

If you choose a Telugu letter then your code compiles as expected[2].

Nikhilesh Susarla

unread,
Nov 6, 2022, 3:16:25 AM11/6/22
to Jason Phillips, golang-nuts
So, if the unicode letters are there in the nonspacing mark as you
mentioned they can't be used right ?
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "golang-nuts" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/golang-nuts/hsSdPhzh7EE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> golang-nuts...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/25aeab4e-159a-44b8-bf4a-6a0d070e3166n%40googlegroups.com.
>

Konstantin Khomoutov

unread,
Nov 6, 2022, 5:52:24 AM11/6/22
to golang-nuts
On Sun, Nov 06, 2022 at 01:45:53PM +0530, Nikhilesh Susarla wrote:

>> Per the Go spec[1], an identifier consists of a Unicode letter followed by
>> zero or more Unicode letters or digits. The character పే is in the Unicode
>> category nonspacing mark rather than the category letter.
[...]
> So, if the unicode letters are there in the nonspacing mark as you
> mentioned they can't be used right ?

I sense the source of your misunderstanding might be rooted in your lack of
certain basics about Unicode. You seem to call "a letter" anything which may
appear in a text document (a Go source code file is a text document) but this
it not true. Maybe that's just a terminological problem, but still the fact
is, the Unicode standard calls "letters" a very particular group of things
among those the Unicode standard describes. To give a very simplified example,
in the text string "foo bar" there are six letters (five distinct) and one
space character which is not a letter. The charcter being discussed is not a
letter in Unicode, either.

Rob Pike

unread,
Nov 6, 2022, 7:02:48 AM11/6/22
to golang-nuts

% unicode -d పే

U+0C2A 'ప' telugu letter pa

U+0C47 'ే' telugu vowel sign ee

% unicode -U C2A C47

U+0C2A 'ప' TELUGU LETTER PA

category: Lo

canonical combining classes: 0

bidirectional category: L

mirrored: N

U+0C47 'ే' TELUGU VOWEL SIGN EE

category: Mn

canonical combining classes: 0

bidirectional category: NSM

mirrored: N

%


The problem is the second code point, U+0C47, Telugu vowel sign EE. It is not in the letter class. If I change your program to use just the first code point, it works: https://play.golang.com/p/eNvuZH33s65


The rules for identifiers in Go were chosen because they are easy to implement, but they do have the problem that they do not treat all languages equally. They may expand one day, but at the moment this is the situation.


There are a number of open issues around this. Start with https://github.com/golang/go/issues/20706 if you want to read more.


-rob




--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/20221106105154.xkoemtt6tx25flam%40carbon.

Bakul Shah

unread,
Nov 6, 2022, 9:08:59 AM11/6/22
to golang-nuts
In Indic scripts in certain contexts you have to use a vowel sign for the typography to make sense; you can’t use a vowel letter in its place. So for example the middle “ku” in my name has to be written as ક+ુ — which will be rendered as કુ — even though it is equivalent to ક+્+ઉ. Also, “halant” (્), is not a letter! 

I would strongly urge Nikhilesh and other people wanting to use any Indic script to *avoid*  it (even if Go implements TR31 as in Swift) and instead use the lossless transliteration scheme of IAST if the program calls for an Indian word as a Go object name.   https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration 


On Nov 6, 2022, at 4:02 AM, Rob Pike <r...@golang.org> wrote:



TheDiveO

unread,
Nov 8, 2022, 12:17:03 PM11/8/22
to golang-nuts
I've always wondered how to deal with exported versus unexported identifiers in scripts like Chinese?

Dan Kortschak

unread,
Nov 8, 2022, 1:53:27 PM11/8/22
to golan...@googlegroups.com
On Tue, 2022-11-08 at 09:17 -0800, TheDiveO wrote:
> I've always wondered how to deal with exported versus unexported
> identifiers in scripts like Chinese?

There is an issue for this https://go.dev/issue/22188 which discusses
the approaches that are currently used with a view to making it easier.
It also links to previous issues about this.

Reply all
Reply to author
Forward
0 new messages