Why use unicode characters in function names in the Go source code?

1,365 views
Skip to first unread message

tomwilde

unread,
Nov 20, 2012, 2:11:12 PM11/20/12
to golan...@googlegroups.com
Check out this question I asked on StackOverflow:


Why is it that a lot of the C functions that compose Go's runtime have these weird unicode characters in them?

Isn't that over complicating things? What's the rationale behind this?

minux

unread,
Nov 20, 2012, 2:17:17 PM11/20/12
to tomwilde, golan...@googlegroups.com
the middle dot is introduced to separate the package name and local name
within the package (to match Go's namespace rules).
the slash like character is to provide multiple level package name, e.g.
sync/atomic.

those files are compiled by the plan 9 c compiler, not by a normal c compiler.

bryanturley

unread,
Nov 20, 2012, 2:19:37 PM11/20/12
to golan...@googlegroups.com

I think it is a way to mark them as internal to the runtime.  Don't call me unless you are the linker/compiler/runtime library type thing.
Similar to how some libcs use a leading _ to pseudo hide functions.
Also has a side effect of testing utf8 support?
The spec uses a single rune ... for some things as well

bryanturley

unread,
Nov 20, 2012, 2:22:09 PM11/20/12
to golan...@googlegroups.com


"This source code is compiled by the Part 9 compiler suite "
You guys starting another os??? j/k ;)

minux

unread,
Nov 20, 2012, 2:25:32 PM11/20/12
to bryanturley, golan...@googlegroups.com


On Wednesday, November 21, 2012, bryanturley wrote:

"This source code is compiled by the Part 9 compiler suite "
this is true, although it should be called a slightly modified plan 9 c compiler. 

bryanturley

unread,
Nov 20, 2012, 2:26:02 PM11/20/12
to golan...@googlegroups.com, tomwilde
the middle dot is introduced to separate the package name and local name
within the package (to match Go's namespace rules).
the slash like character is to provide multiple level package name, e.g.
sync/atomic.


Do either of them have special significance?  Or are they just visual breaks?  
data_x (visual break) vs data.x (significant) in go for instance

minux

unread,
Nov 20, 2012, 2:29:39 PM11/20/12
to bryanturley, golan...@googlegroups.com, tomwilde
of cource they have. the middle dot and slash-like character
are handled specially in the c compiler.

the dot serves to put name into its package namespace, and the
slash to separate elements of the package name (path).

bryanturley

unread,
Nov 20, 2012, 2:46:44 PM11/20/12
to golan...@googlegroups.com, bryanturley

Don't think you read my entire message   j/k == just kidding
 

bryanturley

unread,
Nov 20, 2012, 2:48:13 PM11/20/12
to golan...@googlegroups.com, bryanturley, tomwilde

So it is a c extension?  I really should read more of the go code.
 

Russ Cox

unread,
Nov 25, 2012, 4:22:45 PM11/25/12
to bryanturley, golang-nuts, tomwilde
Any language with non-flat naming structure has to decide how it will
encode those names when interacting with a traditional Unix linker and
its flat name space.

You've surely seen what C++ does. The design there follows directly
from the decision that any C++ name must mangle to a valid C
identifier, primarily because C++ was originally compiled via C.
That's an important property, but insisting on it creates names that
are essentially illegible when encountered in a C program, in the
output of nm, or other contexts.

For Go, the gc compilers encode names into unique identifying strings,
but the encoded strings use punctuation like . and /, so that the
encoded symbol name is the entirely readable "encoding/json.Marshal"
instead of something like "P2E8encodingE4jsonN7Marshal".

Although Go does not compile via C, occasionally C code does need to
refer to Go identifiers. Since we chose not to restrict the mangled Go
names to the space of valid C names, we must add some mechanism to
refer to Go names from C. That mechanism is:

1. In the assemblers and C compilers, which already accepted all
non-ASCII Unicode code points in identifiers, the Unicode characters ·
and / rewrite to ordinary . and / in the object files.

2. A symbol with a leading . has its import path inserted before the .
when being linked: inside encoding/json.a, a reference to ".Marshal"
is equivalent to "encoding/json.Marshal".

Because of 2, we went a long time without needing a special character
for slash. Recently the introduction of race detection has made it
convenient for package runtime to be able to refer to a few symbols in
runtime/race, hence the new slash lookalike.

Russ
Reply all
Reply to author
Forward
0 new messages