I had an idea for a language improvement that I thought I'd bounce off
folks. This was inspired by the go/ast package, which declares
several Node interfaces such as:
// All expression nodes implement the Expr interface.
type Expr interface {
Node
exprNode()
}
// All statement nodes implement the Stmt interface.
type Stmt interface {
Node
stmtNode()
}
// All declaration nodes implement the Decl interface.
type Decl interface {
Node
declNode()
}
These interfaces define particular "kinds" of Nodes, which have no
"real" methods apart from those implemented by Node. Instead they
define a dummy method, to keep you from accidentally treating a Stmt
Node as a Decl node or vice versa. This works all right in many ways,
but has the effect of making it impossible to discern from
automatically-generated documentation (and in this case, also from the
human-generated docs, although that could be added) which types
satisfy the Decl, Stmt and Expr interface. In addition, although the
number of such types is bounded (since declNode() etc aren't
exported), the compiler can't do any completeness checking to verify
that all possible Decls are considered in a type switch.
I wonder whether an "alternative" type might help in situations like
this, and serve many of the functions of a union type. The idea is to
be able to declare a type such as
type Decl alternative {
*BadDecl
*GenDecl
*FuncDecl
}
which would declare that a Decl is either a *BadDecl, or a *GenDecl or
a *FuncDecl. A Decl will satisfy interfaces consistent with the
*intersection* of the interfaces of those three types (which in this
case is Node). You would access the alternatives of an alternative
just as you do with interfaces, using type assertions or type selects,
so only the type declaration would be new to the language.
The implementation of alternatives could be very similar to that of
interfaces---in fact, after type checking, every alternative could be
converted into its corresponding interface without affecting the
semantics. But but alternatives could also potentially be "less
boxed", and stored in a smaller or more compact data structure than
interfaces (which I think always occupy more than a pointer's memory).
It won't ever be as efficient as a C union type (unless there's only
one alternative, which seems pointless), but there is no way that a C
union type can be typesafe, so I suspect that's going to happen for
any go feature that is like unions. Certainly a type such as
type Decl alternative {
uint16
int16
byte
}
could be stored in 32 bits or less (but I imagine 32 bits would be
faster than 24 bits), which is more compact than an interface{} would
be.
Finally, if alternatives are implemented, I'd very much appreciate the
compiler erroring out with a warning when a select statement on the
type of an alternative doesn't cover every type. Of course, you can
always add a default: case to avoid the error, but this would allow us
to let the compiler guarantee that we didn't miss any of the options,
which would be lovely, especially in code like my "transform"
package[1], which manually walks a go AST, but if new node types are
added will continue to compile just fine, but crash when run on some
source files.
[1] http://github.com/droundy/go-crazy/blob/master/transform/transform.go
For the curious or Haskell-minded, this "alternative" type allows one
in go to achieve what in Haskell (or ML) would be achieved using an
ADT [2].
[2] http://en.wikipedia.org/wiki/Algebraic_data_type
Any thoughts? Go has a very simple set of types, and I'd hate for it
to gain every new feature under the sun, but I suspect that
alternatives could make a lot of code considerably clearer, and might
even make some code more efficient (with an optimized implementation).
I also think that alternatives are quite complementary to interfaces,
which should allow more expressive programs. In an interface, you
specify the methods and the compiler figures out which types are
members. In an alternative, you specify which types, and the compiler
figures out which methods are members, which could be quite handy at
times... in particular, when you know that you don't want to have
unexpected types thrown your way (as in go/ast).
--
David Roundy
> Hi all,
>
> I had an idea for a language improvement that I thought I'd bounce off
> folks.
>
>
> ...
>
>
> type Decl alternative {
> *BadDecl
> *GenDecl
> *FuncDecl
> }
To me it looks like:
class BadDecl: Decl { ... }
class GenDecl: Decl { ... }
class FuncDecl: Decl { ... }
but using other syntax.
Although I understand the problem you are trying to solve, because I
have an app which uses go/ast package and I do have lots of type switch
statements too. My opinion that this problem is simply moved to run-time
instead of being a compile-time problem. You can catch all not handled
cases using 'default' statement easily and panic after that. When
you're dealing with that kind of code (which has tons of switch cases),
you will have to write tests anyway, and good test suite will show you
very fast that one of the newly introduced cases is missing.
But imho it's a documentation problem also. Current documentation for
go/ast package doesn't have lists of Expr, Decl and Stmt "classes". And
it should have them definitely. Because searching through the page
trying to figure out these lists by type names isn't a nice way to
spend your time really. :D
And as C++ and D clearly shows us, there is no such language that will
save you from writing a good documentation and there is no such
language that will make even a good library useful without good
documentation.
My opinion that specifying the type hierarchy as you're proposing will
not save you from much of the problems. Because after all, it's just
the same documentation (because apparently we don't want to have that
thing as a requirement) and as in documentation case, author can forget
to update it. Etc, etc. There are a lot of problems with that imho.
Go doesn't have type hierarchies. But you can model one as you do know
already. Every model that aren't in the language itself requires
documentation.
(Just few thoughts).
Even if we did add union/alternative, I'm skeptical that
it would include an error for a missing case in a type switch.
That doesn't feel very much in the spirit of a language
that still has goto. (There is no error for if statements
without else clauses either.)
Russ
That's a succinct summary of why they haven't been put in the language.
The original thing they were suggested for is error discrimination: is
this error from the network or the file system? But type switches and
a little planning already make that easy to do.
So the proposal exists but hasn't been looked at in months. Go just
doesn't seem to get significantly more expressive if you add union
types.
-rob
No, there are some major differences. One is that Decl is only
declared once (rather than declared once, and then also mentioned in
each member type. Also, Decl declares a finite set of types, known to
both compiler and humans. And finally, Decl can include primitive
types or types defined in other packages.
> Although I understand the problem you are trying to solve, because I
> have an app which uses go/ast package and I do have lots of type switch
> statements too. My opinion that this problem is simply moved to run-time
> instead of being a compile-time problem. You can catch all not handled
> cases using 'default' statement easily and panic after that. When
> you're dealing with that kind of code (which has tons of switch cases),
> you will have to write tests anyway, and good test suite will show you
> very fast that one of the newly introduced cases is missing.
There's a huge difference between compile-time guarantees and runtime
crashes, in my opinion. I guess if you like python, you may consider
unit tests for every possibility to be a reasonable alternative for
type signatures, but I don't like python. Note: the same problem of
"forgetting some of the infinite set of possible types that might be
passed" applies to unit tests as well as ordinary code.
> And as C++ and D clearly shows us, there is no such language that will
> save you from writing a good documentation and there is no such
> language that will make even a good library useful without good
> documentation.
But as Haskell clearly shows us, a language with a good type system
greatly reduces the amount of documentation needed, and makes it
easier to avoid bugs. Documentation that the compiler can (and does)
check is much nicer (where possible) than documentation that humans
have to read and verify against their code.
> My opinion that specifying the type hierarchy as you're proposing will
> not save you from much of the problems. Because after all, it's just
> the same documentation (because apparently we don't want to have that
> thing as a requirement) and as in documentation case, author can forget
> to update it. Etc, etc. There are a lot of problems with that imho.
I'm not proposing a type hierarchy, so I suspect you misunderstood my proposal.
> Go doesn't have type hierarchies. But you can model one as you do know
> already. Every model that aren't in the language itself requires
> documentation.
Regarding type signatures versus documentation, a function such as
func max(x float64) float64 {
...
}
requires no documentation. But a function such as
func max(x interface{}) interface{} {
...
}
does require documentation. Or if we picked a name that was less
obvious, in the former case we wouldn't need to document "where x is a
float64, the function returns a float64", while in the latter case we
do need to document "where x is a float, float64 or float32 and the
returned value is the same type", or whatever is appropriate.
--
David Roundy
Just as an exercise, I'll give a shot at seeing if go could get much
more expressive with union types. First, let me enumerate the
possible advantages of union types:
1. Compiler-checkable documentation for functions or data structures
that can accept any one of a finite set of types.
2. A more expressive alternative to interface{} for functions that
could accept primitive types, but could accept more than one. e.g.
one could write a:
type Num64 union { float64; complex64 }
func Abs(x Num64) float64 {
switch xx := x.(type) {
case float64: return math.Abs(xx)
case complex64: return math.Sqrt(real(xx)*real(xx)+imag(xx)*(imag(xx))
}
}
3. Improved runtime performance, because tricks like pointer tagging
become possible eliminating a pointer and a dereference relative to
interfaces{}, for small unions.
Performance isn't sufficient by itself... go just doesn't seem to
sacrifice code clarity for performance. My second benefit seems a bit
stretched. A quick grep of interface{} in the go package sources
shows that the only cases where it isn't intended to accept any type
(as in gob, fmt, etc) it isn't intended to accept a finite set of
types either (as in runtime.SetFinalizer, where the types of the two
arguments need to be correlated). So it basically comes down to
benefit 1, which ought to add clarity to at least some code if we're
to add unions. So I'll attempt to write a nice sample program
implemented with unions that might be less nice without them...
How about a binary tree representing a set of ints? This is a classic
Haskell ADT example.
type branch struct {
value int
lower, higher Tree
}
// I assume here that nil is not a possible value for a union (except
as in a nil of one of
// its component types), so the empty set is a nil branch pointer.
type Tree union { *branch; int }
func (t *Tree) HasMember(x int) bool {
for { // Avoid recursion to save stack space... (if go had tail
recursion this could be prettier)
switch tt := (*t).(type) {
case int: return tt == x
case *branch:
if tt == nil { return false } // this represents an empty set.
if x == tt.value {
return true
} else if x > tt.value {
t = &tt.higher
} else {
t = &tt.lower
}
}
}
return false // It's silly that the compiler makes me add dead code...
}
func (t *Tree) Add(x int) *Tree {
// I'm lazy, and writing this directly in gmail, so it's not a
balanced or clever tree
for { // Avoid recursion to save stack space...
switch tt := t.(type) {
case int:
if x > tt {
*t = branch{ x, struct{}, tt }
} else {
*t = branch{ x, tt, struct{} }
}
return t
}
case *branch:
if tt == nil {
*t = x
return t
}
if x == tt.value {
return t // It's already a member
} else if x > tt.value {
t = &tt.higher
} else {
t = &tt.lower
}
}
}
return t
}
I'm not sure how I would implement this sort of code using interfaces.
The most obvious option would be to avoid interfaces entirely, and
just use:
type Tree struct {
value int
lower, higher *Tree
}
and figure an extra two pointers stored at each leaf doesn't hurt us
much. And actually, on 32 bit systems, this might be more efficient
than the union approach, since the union Tree will probably take 64
bits on either 32-bit or 64-bit systems, while the struct Tree will
take only 96 bits on 32-bit systems.
Expressing the same binary tree using interfaces could perhaps look
something like:
func AddToTree(t *interface{}, x int) {
switch tt := (*t).(type) {
default: panic("Expected int or *branch type, but got something else.")
// ... same as before
}
}
or could alternatively use recursion and a "real" interface such as
type Tree interface {
Add(int) Tree // warning: this can't do an in-place modification!
IsMember(int) bool
}
func AddToTree(t *Tree, x int) {
*t = t.Add(x)
}
...
To conclude, since any program written with unions can be translated
into a program written with an interface, I agree that unions don't
make the language more expressive in the sense of allowing more
programs to be written. (The exception to this is that I assumed
above that a union could be the receiver of a method, which isn't true
of interfaces... but I'm doubtful that this is an important
difference, and you could achieve something similar with a
struct{interface{}} type that could be a receiver.
So the advantages of a union type would only be (1) additional
compile-time type checking, and (2) the performance (and
resource-consumption) advantage of giving the compiler a finite set of
types, so pointer tagging or other trickery can be used. In certain
corner cases, the performance advantage could be significant, but it
doesn't feel like a compelling reason. The static type checking
advantage is the reason I'd like to see this in go. In the above
binary tree example, I think the union approach is clearer, more
flexible, and easier to extend than the struct approach, but the
interface equivalent of the union approach is confusing to the point
that I would avoid it. In Haskell, ADTs are sufficiently easy to use
and powerful that just about *everything* is implemented as an ADT
(well, everything that doesn't fit well into a single fixed-size
struct), and I think that union types could serve the same function
for go.
--
David Roundy
Yeah, I think I agree. I was thinking of the Haskell equivalent, which
is an expression, not a statement, so missing cases mean panics. I
do, however, hope that eventually the go compiler gets smarter about
following code flow, so
func min(x, y int) int {
if (x<y) { return x } else { return y }
}
will compile. And when it gets that smart, I'd love to see
func foo(x MyUnion) int {
switch xx := x.(type) {
case Type1: return ...
case Type2: return ...
}
}
compile if all the cases are covered, and fail to compile if they
aren't. If unions are implemented, that is...
--
David Roundy
There are certainly times where it might be more convenient to have
them but those are not the majority of cases. I still think they have
potential and shouldn't be written off completely but they needn't be
the top priority by a long shot, either.
For what it's worth, though, I'd imagined something more like this:
type A union {
B //another union, A inherits B's types and methods
c C // A can be of type C *
d D // as above
}
* - "c1, c2 C" would be an error since a union can only be one value at a time
The empty union, union{}, would be pretty much the same as struct{}
var a A
y := a // type of y is A not the type currently held in a
a = v //fails if v is not assignable to at least one of C, D or a type in B
a.c = v //standard rules apply
Likewise f(A) accepts values of C, D or those of B.
v := a.c //panics if a is not type C
v, ok := a.c //v is the zero value of C if !ok, otherwise it is the value of a.c
Similarly a method on C can be invoked via a iff a "is a" C at that
time, so a union's method set is the union of its methods and that of
its current type's.
In the case that v is an untyped constant and C and D are, say int and
float, then either the first field that v fits into is taken or it's a
compile time error unless a specific field is specified. I'd imagine,
or at least prefer, the latter.
If C and D are interfaces and x is an object that satisfies both C and D then:
a = x
causes
v1 := a.d
to return x as a D, while
v2 := a.c
returns x as a C. Similarly if C is a type that satisfies D itself.
If, say, C is itself a union that has a field f of type F, then a.c.f
is accessible iff a is one of the types of C and C "is a" F
A slice or array of A's doesn't need to be homogeneous as long as each
element is assignable to an A. Likewise, a function of type func(...A)
rejects any values not assignable to A at compile time but each can be
any .