Non-null Pointers: The Design Space

318 views
Skip to first unread message

Jonathan Amsterdam

unread,
Nov 19, 2009, 11:12:54 AM11/19/09
to golang-nuts
This thread has generated a lot of interesting discussion, and a lot
of ideas for non-null pointers in Go. Much of the discussion has been
about whether non-null pointers are a good idea. I'm going to assume
that they are, and try to summarize the designs that have been
proposed or hinted at. With the design space laid out, we can see if
there is any portion of the space I neglected, and if any of the
possible designs can work for Go.

The topic of nullable types in general has also appeared in this
thread, but it's not relevant here: I'm only discussing the issues
with non-null pointers.

In this message, I'm going to consider only the most ambitious goal,
which I'll call static null safety: making it impossible to
dereference a null pointer at runtime. As far as I know, everyone
agrees that null pointers have their uses. All proposals involve
making non-null pointers a distinct type from nullable pointers.
Dereferencing can happen only on non-null pointers.

Aside from the notation, which I'll omit discussing, two issues arise:
conversions and initialization.

Conversions
------------------

There must be a way to convert a nullable pointer to a non-null one,
so that it can be dereferenced.

This conversion could be implicit. In other words, there is no mark in
the source--you just do the assignment. This would be accompanied by a
runtime check. This solution is equivalent to having the compiler
insert implicit null checks on each dereference, which is
essentially what we have now, except that the hardware does the check
for free.

Conversion could be explicit, looking like other type conversions in
Go, and would involve a runtime check. The advantages of this idea
over the current language are that we have clearly marked places in
the code that we can examine for errors, and that we may reduce the
total number of required checks by judicious placement of the
conversions. The disadvantages are code clutter, and the need to
repeat checks that we already do as a matter of course. To see the
latter point, consider this method, from list.go:

func (l *List) iterate(c chan<- interface{}) {
for e := l.front; e != nil; e = e.next {
c <- e.Value
}
close(c);
}

Here e refers to a *Element, which must be nullable to mark the end of
the list. The expressions e.Value and e.next involve dereferencing a
nullable pointer, and so would require conversion and a runtime check.
But we already know that e cannot be null because of the test in the
for statement. So besides the clutter, we would be doing two
additional runtime checks on each iteration.

This leads us to the third possibility for conversion: compiler
deduction. Here, the compiler can deduce from the test that e is not
null at the time e.Value and e.next are evaluated, so no explicit
conversions and no additional checks are necessary.

There are some problems with the deduction approach:

1. To keep the compiler simple and fast, the Go team has chosen not to
perform kinds of analysis necessary to make these deductions.

2. The analysis is straightforward for local variables not captured by
closures, but is complicated by closures. I don't know exactly how
complicated it is, but given #1 above, it might not require much
complexity for the Go folks to balk.

3. For expressions other than local variables, the analysis is not
doable, because of concurrency. E.g. consider

if e.next != nil { return e.next.next }

It would be wrong to allow this code to compile, because another
goroutine might modify e.next between the check and the dereference.

4. The analysis would of necessity be conservative, and one might need
or greatly prefer to write code that was correct but could not be
proved so by the compiler.

I think that these problems can be partially gotten around as follows:
say that expr is an expression of nullable pointer type. Then one can
write:

if p := expr; p != nil { /* dereference p safely here */ }

The compiler would still have to do some analysis, but only enough to
realize that p is fresh and local, and that the test suffices to prove
p non-null. The disadvantages of this are that it clutters the code,
and it still doesn't help in cases like the above iterate method where
the test is in a for loop.


Initialization
-----------------

Now I'd like to talk about the problem of initializing non-null
pointers. Go supports uninitialized variables, makes pervasive use of
the notion of a "zero value" for a type, and has no formal notion of
constructor. These interact quite severely with the idea of non-null
pointers. The fundamental problem, of course, is what to do with a
variable of non-null pointer type between the time it is created and
the time it is first assigned to. There are basically two options,
which can be mixed and matched:

1. Disallow creation without initialization

2. Provide a "zero value" that is something other than nil.

Let's consider #2 first. As an example, one idea I've seen is to have
the zero value for *T be new(T). There are three disadvantages to this
approach:

1. It complicates and slows down memory allocation. Consider a struct
which contains nested structs to several levels. In current Go, this
can be initialized with a very fast bulk setting of the memory to zero
(i.e., calloc()). Initializing non-null pointers to values other than
zero will be slow. (I believe someone suggested that you could
represent the zero value with a zero bit pattern, but short of
checking on every access, I don't see how this could be implemented).

2. It could require a lot of memory allocation. This problem can be
handled by using a single zero value for each type; all pointers would
be initialized to the same object.

3. Instead of crashing due to dereferencing a null pointer, a program
continues running with an unexpected value. Since it is almost always
better to crash than to run with bad data, this is a severe
disadvantage of the "zero value other than nil" approach. Since this
problem will come up a few times in what follows, I'm going to name it
ZVINAE (Zero Value is Not An Error).

Now to the first idea, disallowing creation without initialization.
This makes the most sense for scalar local and global variables; it is
not too much of a burden to write var x = new(T). But how do we
handle fields in structs? Some languages (e.g. Spec#) require that non-
null pointers be initialized before the constructor ends (earlier,
actually); but Go has no constructors.

One possibility is to leave the field uninitialized, but require
assignment to them near (e.g. in the same block as) the new()
expression, or variable declaration. The compiler would enforce that
the variable is assigned to before being read. I don't believe this is
statically analyzable in practice. Whether conservative approximations
will put an unacceptable burden on programmers and execution time I
don't know, but I would guess not. Consider this code, also from
list.go:

// Init initializes or clears a List.
func (l *List) Init() *List {
l.front = nil;
l.back = nil;
l.len = 0;
l.id = new(byte);
return l;
}

// New returns an initialized list.
func New() *List { return new(List).Init() }

If List contained non-null pointers, I don't think New() could be
written as it is. (Perhaps a compiler could analyze a local, simple
function like Init(), but what if the function were in another package
and/or had complex control flow?)

Another solution to the problem of initializing structs containing non-
null pointer fields is to disallow the use of new, and require the
literal syntax. For example, a struct S with field f of type non-null
pointer to T could be initialized with S{f: new(T)}. We still have the
problem of circular references; for those, you'd have to use a dummy
object temporarily. However, what if the circular reference is to a
value of the same type? Consider this code, from ring.go:

type Ring struct {
next, prev *Ring;
Value interface{};
}

The next and prev fields are candidates for non-nullity, since they
are, once initialized, never null. But how do we initialize them?
Since we are disallowing new(Ring), we must write Ring{next: ...,
prev: ...}. Since next is of type *Ring, we have to do something like
Ring{next: Ring{...}, prev: Ring{..}}. You can see where this is
going. It seems that the only way to break the regress is to have a
special value magically given to us by the compiler, with each
internal pointer of the same type initialized to point to the special
value itself.

Of course, use of these dummy objects to break circularity, whatever
their provenance, would be a case of ZVINAE.

There is another, more prosaic problem with the idea of requiring
initialization for structs with non-null pointer fields. Consider the
case where the designer of a large struct S wants to allow immediate
values of S, but requires that values of S be initialized. In the
current language, you'd provide a method

func (s *S) Init() { ...}

and ask users to call it. But if we have to intialize an S at
declaration time, we'll have to write something in S{...} that will
likely get thrown away. We'll waste time writing it, and the machine
will waste time creating it.

Finally, there are two problems with arrays. First How do we handle
initialization of an array of non-null pointers?

1. Disallow them, as in Spec#. It's not clear what affect this would
have on the use of non-null pointers. Say that List.New() returns a
non-null List pointer (as of course it should). Now I want an array of
*Lists (for my own hashtable, perhaps). What can I do? I guess I'll
make an array of nullable pointers to *Lists. Now I have null pointers
AND double indirection.

2. Allow them to be created uninitialized, but require each element to
be initialized before use. I don't believe this is statically
checkable in a reasonable way.

3. Require an initial value to be provided. E.g. something like

var a [100]*T(new(T)).

where each element of a points to the same value. This is the ZVINAE
problem, only a little better because the programmer can choose the
initial value for each array. Still, it's unlikely that the programmer
wants 100 pointers to the same object, so the initial contents of the
array are probably wrong.

4. Provide an initialization function, E.g.

var a [100]*T(func(i int) *T { return &T{x: 1, y: 2 })

The advantage here is that each element is initialized to something
that has a chance of being what the programmer desired. The
disadvantage is the cumbersomeness of the code. And how do we handle a
case where a value depends on a previous array element (but not on the
index i)? There's probably a clever way to do it, but it would not be
as simple as just writing a loop.

The second array initialization problem concerns an array of structs,
where the struct type contains non-null pointers:

type S struct { f *T } // assume *T is non-null

var a [100]S

The function approach has the problem that we'd have to return an S
from the function, and S may be large. If instead the function takes a
pointer to an S to populate, then it will be able to examine the
uninitialized non-null pointer fields of S, which we cannot allow. It
seems the only reasonable solution here is to disallow such arrays,
just as C++ disallows arrays of a class without a no-arg constructor.

Summary
--------------

I've looked at possible designs for static null safety in Go -- the
idea that null pointer dereferences are impossible in a successfully
compiled Go program. The idea's two problem areas are conversions from
nullable to non-null types, and initialization of non-null values. It
appears as though conversions would require an amount of compiler
analysis and/or code clutter that is antithetical to the spirit of the
language. More severely, any general solution to the problem of
initializing non-null pointers when they appear as struct fields would
be cumbersome, would require a magic language-provided dummy value,
and would introduce ZVINAE.

A few final words. I am not passing judgment on the quality of the Go
language, the wisdom of the Go design team, or the value of static
null safety in programming languages. I am simply trying to
dispassionately examine the possibilities.

We can further this discussion by:

1. Pointing out technical flaws in my analysis. For example, perhaps
static analysis of assignment to struct fields is easier than I think.

2. Finding a point in the design space that I missed, and that is
congruent with Go. As a counterexample, consider a proposal that
requires adding constructors to the language. The Go team knew about
constructors, and decided not to add them. It is not hard to guess
why: they realized that two features they already had, package
encapsulation and functions, already provided them with everything
they felt was valuable about constructors, and without the added
complexity. You will not convince them to add static null safety to Go
by requiring that they first add constructors. Any proposal would have
to make minimal changes to the language.

3. Giving up the strong position of static null safety for a weaker
view, one that allows for the possibility of runtime dereferencing of
null in some circumstances.

4. Propose and implement a point in the design space to demonstrate
that issues like compilation slowdown and code clutter are not as bad
as I make them out to be.

Jonathan Amsterdam

unread,
Nov 19, 2009, 11:26:01 AM11/19/09
to golang-nuts
My reference to "this thread" at the beginning appears because, being
a newbie Google groups user, I thought I was posting to an existing
discussion instead of starting a new one. (The discussion about the
"billion-dollar mistake".)

Also, a technical point I missed: if the dummy object created to avoid
null pointers has circularities (e.g. pointers back to itself), then
ZVINAE will likely manifest itself as an infinite loop, as we traverse
the cycle endlessly.

Rick R

unread,
Nov 19, 2009, 1:06:30 PM11/19/09
to Jonathan Amsterdam, golang-nuts
Wow, thanks for the concise write up!

I will interject my opinions, which are all formed from the principle of "least impact", Basically, I think we can come up with something useful by making a lot of this stuff optional and see how things work out.

I'll mention conversion first because my ideas take the burden off of conversion/use. IMO.

For Conversion:
Implicit conversion is not allowed.
explicit conversion should be adjustable by compiler settings, from warning, to requiring a runtime check.
dereferencing nullables can be disallowed, or require a runtime check.

For Initialization:
nonnullables must be initialized with a value:
  • structs containing nonnullables must be initialized with an initializer block.
  • arrays containing nonnullables must be initialized with a function. (being a haskell coder, I love this idea)  
  • structures with circular references will likely have to be initialized with nullables
nullable ptrs will work like they currently do.


This basically makes the use of nonnullable pointers optional. Which is fine by me. I will use them in my code, because I would prefer the added level of clarity they provide. 

In addition, I would imagine that it should be possible to enable/disable runtime nullptr checks for those who want to cowboy it.

And to restate one piece, I would *really* like the ability to construct and initialize arrays with a function. Possibly even begin with an indeterminant length.


Jonathan Amsterdam

unread,
Nov 19, 2009, 6:09:50 PM11/19/09
to golang-nuts
Thanks. A couple of comments:

Conversion:
I like the compiler switches to choose the level of safety.

You would also probably want simple inference of the "if p != nil"
variety. Seems pretty uncontroversial.

Initialization:
You disallow structs with circular non-null pointers, like Ring. I
hadn't thought of that possibility. Here's a problem with it: you
start by coding a struct with no circular references. You use non-null
pointers in the interface. Then later, you decide to add a circularity
(maybe you want each element of your struct to point to some
distinguished head element). Now your code doesn't compile, and you
have to change your interface and break all your users' code.

What about arrays of structs containing nonnullables? Disallow, as I
suggest? Then again, there is a maintenance problem: just adding a
nonnullable field to a struct (even a private one) can break client
code.

In a way non-null pointers would be optional, but if you ever wrote a
library package, then your users would have to deal with them.

Chris Gilbreth

unread,
Nov 20, 2009, 1:59:22 PM11/20/09
to golang-nuts
On Nov 19, 1:06 pm, Rick R <rick.richard...@gmail.com> wrote:
> *For Initialization:*
> nonnullables must be initialized with a value:
>
>    - structs containing nonnullables must be initialized with an initializer
>    block.
>    - arrays containing nonnullables must be initialized with a function.
>    (being a haskell coder, I love this idea)
>    - structures with circular references will likely have to be initialized
>    with nullables
>

Perhaps one could avoid the circular referencing problem for
structures containing non-nullable pointers by allowing self-
referencing in initializer blocks. E.g.

r := Ring{next: &r, prev: &r} // next and prev point to r itself

This way uninitialized non-nullable pointers would never be needed to
create a Ring. Because the structure must be allocated in memory
before the initialization is done, this should be possible (even
easy?) to implement. (In fact, this could be convenient even with
regular pointers.)

There is a similar problem when one has two types, A and B, which can
point to each other. To resolve this case you could implement a new
parallel initialization construct:

a, b := A{bptr: &b}, B{aptr: &a} // a points to b, b points to a

Which is again nice regardless, since it can save typing even when
you're not dealing with non-nullable pointers. In a parallel
initialization expression, the compiler would first do the allocations
so that addresses are available, then do the initializations. This
would also be a natural counterpart to the parallel assignment that Go
has, e.g.

i, j = j, i

to swap i and j.

Any thoughts?

Sebastian Sylvan

unread,
Nov 22, 2009, 2:58:16 PM11/22/09
to Jonathan Amsterdam, golang-nuts
On Thu, Nov 19, 2009 at 4:12 PM, Jonathan Amsterdam <jbams...@gmail.com> wrote:

I've looked at possible designs for static null safety in Go -- the
idea that null pointer dereferences are impossible in a successfully
compiled Go program. The idea's two problem areas are conversions from
nullable to non-null types, and initialization of non-null values. It
appears as though conversions would require an amount of compiler
analysis and/or code clutter that is antithetical to the spirit of the
language. More severely, any general solution to the problem of
initializing non-null pointers when they appear as struct fields would
be cumbersome, would require a magic language-provided dummy value,
and would introduce ZVINAE.

I disagree that it would *introduce* ZVINAE. They designers of Go evidently decided that ZVINAE was not a problem. After all, a random integer value in a struct left as 0 is also only correct by random chance, and yet Go doesn't care - they keep going without crashing.

I personally disagree with the decision to allow uninitialized values everywhere, but that's the philosophy they have chosen to go with, so I don't see why they should take a stance against ZVINAE *only* when it comes to pointers, and not all the other types.

I think initializing to the zero value is consistent with how Go works. Not necessarily ideal, but given that that's the route they've chosen to go down I think sticking to it and being consistent is preferable.

Also, I don't think the runtime cost of allocation/initialization is that big of a deal. You can know the "transitive size" of an object (the size of the object, plus the transitive size of any referenced zero values) statically, so you would just allocate it and all its zero objects in a single allocation, then do a few pointer fixups for the pointers themselves (the offset of the pointers, and the objects they point to, are statically known). It does allocate more memory and so put more pressure on the GC, but I would think that you wouldn't use a non-nullable pointer in the first place unless you envisioned a "real" value being available shortly after construction so the compiler could probably skip the vast majority of these initializations.

--
Sebastian Sylvan

Brian Slesinsky

unread,
Nov 22, 2009, 4:36:12 PM11/22/09
to golang-nuts
I think approach 3 would be simplest. It seems like it would be better
to think of not-null declarations as a concise way of making
assertions (at runtime) that can also be used for static analysis.

The reason we make assertions is to catch errors earlier so that
diagnosis is easier. The most common place to do a runtime check is
immediately after a function is called, because this is often a
boundary between libraries written by different people. Adding some
kind of annotation to function parameters (a single character please!)
that automatically generates a runtime assertion is the easiest change
to make, and allows for static analysis in the caller and callee to be
done separately to find further bugs.

The compiler can remove null checks if it can prove they're not
needed, but we don't need to fully solve the static analysis problem.

It's a similar situation to array index checking. Yes this happens at
runtime because it can't always be optimized away, but complicating
the type system to try to do static analysis in every case isn't a
reasonable solution.

- Brian

Jonathan Amsterdam

unread,
Nov 22, 2009, 10:51:10 PM11/22/09
to golang-nuts
On Nov 22, 4:36 pm, Brian Slesinsky <bslesin...@gmail.com> wrote:
> I think approach 3 would be simplest. It seems like it would be better
> to think of not-null declarations as a concise way of making
> assertions (at runtime) that can also be used for static analysis.

I elaborate this idea in http://groups.google.com/group/golang-nuts/browse_thread/thread/b76356401db5230d.

Jonathan Amsterdam

unread,
Nov 23, 2009, 12:53:05 PM11/23/09
to golang-nuts
> I disagree that it would *introduce* ZVINAE. They designers of Go evidently
> decided that ZVINAE was not a problem. After all, a random integer value in
> a struct left as 0 is also only correct by random chance, and yet Go doesn't
> care - they keep going without crashing.
>
> I personally disagree with the decision to allow uninitialized values
> everywhere, but that's the philosophy they have chosen to go with, so I
> don't see why they should take a stance against ZVINAE *only* when it comes
> to pointers, and not all the other types.

By "introduce" I mean: look at the language before, and after, adding
this feature. Then there exist some errors that are caught before, and
now are not. I didn't mean "introduce as a principle in the language
design." And I think you're mischaracterizing the designers'
intentions when you say that they "decided" ZVINAE was not a problem,
or "took a stance against ZVINAE." They weighed a number of
considerations, and they felt that the speed, convenience and clarity
of zero-initialization outweighed the disadvantages. In the case of
pointers, they just happened to get the failfast check for free.

> Also, I don't think the runtime cost of allocation/initialization is that
> big of a deal. You can know the "transitive size" of an object (the size of
> the object, plus the transitive size of any referenced zero values)
> statically, so you would just allocate it and all its zero objects in a
> single allocation, then do a few pointer fixups for the pointers themselves
> (the offset of the pointers, and the objects they point to, are statically
> known). It does allocate more memory and so put more pressure on the GC, but
> I would think that you wouldn't use a non-nullable pointer in the first
> place unless you envisioned a "real" value being available shortly after
> construction so the compiler could probably skip the vast majority of these
> initializations.
>

In the case of an array of these, there might be more than "a few"
fixups. And I think you overestimate the ability of a compiler to
deduce initialization. At least in some of the Go library code I've
seen, there is often a separate method named Init that is called after
new(T). But in your favor, the memory is a non-issue if the compiler
creates a single dummy value per type.

Jonathan Amsterdam

unread,
Nov 23, 2009, 12:55:12 PM11/23/09
to golang-nuts
> Perhaps one could avoid the circular referencing problem for
> structures containing non-nullable pointers by allowing self-
> referencing in initializer blocks. E.g.
>
> r := Ring{next: &r, prev: &r}  // next and prev point to r itself

I like that. But the &r has to be treated specially, otherwise you
could do something like (&r).next on the RHS and see the uninitialized
value.

Ian Lance Taylor

unread,
Nov 23, 2009, 1:16:56 PM11/23/09
to Jonathan Amsterdam, golang-nuts
Jonathan Amsterdam <jbams...@gmail.com> writes:

> In the case of an array of these, there might be more than "a few"
> fixups. And I think you overestimate the ability of a compiler to
> deduce initialization. At least in some of the Go library code I've
> seen, there is often a separate method named Init that is called after
> new(T). But in your favor, the memory is a non-issue if the compiler
> creates a single dummy value per type.

I don't think we could create a single dummy value per type. The
language has to clearly define what happens to an uninitialized value.
I don't think it would be reasonable to say that an uninitialized
pointer points to a shared allocated value. That would make it nearly
impossible to use an uninitialized pointer--the only reasonable thing
you could do would be to assign a new value to it. I think we have to
say either nil or new(T). Currently, of course, we say nil.

Ian
Reply all
Reply to author
Forward
0 new messages