Feature request/admission of ignorance

479 views
Skip to first unread message

Michael Jones

unread,
Apr 20, 2012, 11:25:23 AM4/20/12
to golang-nuts
I don't know how to code crit-bit trees gracefully in Go.
My first thought was to propose an extension for Go 2.
My second thought was to appeal for advice in Go 1.

Goal:
I want to organize a collection of objects of type O
I need ia tree structure of nodes (Type T) each with a little data and two pointers.
Each pointer points to either a subordinate tree node (type T) or object (type O), or be nil.

In the present C version, to save space, pointer type is encoded in each pointer's LSB.
Given my background, this is the natural, graceful expression. ;-)

For Go I can:
(a) Store four pointers and two flag bits and waste space.
(b) Store two pointers and flag bits and use Unsafe to recast
(c) ? what else... this is my question. is there small/fast solution that keeps both pointers valid?

For the hypothetical future Go feature, my immediate thought was that something like the Algol, Pascal, Modula, Ada, ... variant record would be a simple and natural feature in Go. It already feels idiomatic to me even though it is not in the language. I've not thought about precise syntax but something equivalent to Pascal's version would be ok:

type childKind = (node, string);
     shape = record
                payload : integer;
                case leftKind : childKind of
                   node : ...
                   string : ...
                case rightKind : childKind of
                   node : ...
                   string : ...
              end;

...which can easily be handled by case statements when reading and where the discriminator can be proven to be set on writes. Corresponding Wikipedia article, "Tagged Union", is here:


Oh Go Gods, was something like this considered for Go?

--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765

Ian Lance Taylor

unread,
Apr 20, 2012, 11:52:58 AM4/20/12
to Michael Jones, golang-nuts
Michael Jones <m...@google.com> writes:

> http://en.wikipedia.org/wiki/Tagged_union
>
> Oh Go Gods, was something like this considered for Go?

http://golang.org/doc/go_faq.html#variant_types

Ian

Ian Lance Taylor

unread,
Apr 20, 2012, 11:55:25 AM4/20/12
to Michael Jones, golang-nuts
Michael Jones <m...@google.com> writes:

> For Go I can:
> (a) Store four pointers and two flag bits and waste space.
> (b) Store two pointers and flag bits and use Unsafe to recast
> (c) ? what else... this is my question. is there small/fast solution that
> keeps both pointers valid?

Another possibility is to make each node an interface value, and store
pointers in each interface value. But this does do more memory
allocation than the C solution.

Ian

Michael Jones

unread,
Apr 20, 2012, 11:55:15 AM4/20/12
to Ian Lance Taylor, golang-nuts
Thanks! Maybe I'll just use unsafe and the LSB pointer hack then and consider that idiomatic.

Michael Jones

unread,
Apr 20, 2012, 11:57:42 AM4/20/12
to Ian Lance Taylor, golang-nuts
I will code up all answers suggested here and see how it plays out. I suspect that variant records will remain the pretty answer.

roger peppe

unread,
Apr 20, 2012, 12:14:34 PM4/20/12
to Michael Jones, Ian Lance Taylor, golang-nuts
On 20 April 2012 16:55, Michael Jones <m...@google.com> wrote:
> Thanks! Maybe I'll just use unsafe and the LSB pointer hack then and
> consider that idiomatic.

I think that solution might be liable to horrible failure when the GC gets
better.

You could use exactly four pointers, as Ian says. If the size of O
fits within a pointer, the overhead won't be too bad.

http://play.golang.org/p/_Ous4YzGFu

Kyle Lemons

unread,
Apr 20, 2012, 12:36:53 PM4/20/12
to Ian Lance Taylor, Michael Jones, golang-nuts
Pardon my ignorance, but why would this have more allocations?  The interface value (which would be inlined in the struct, right?) would be admittedly wider than a pointer, but it essentially contains the same information (type and pointer) and the type switch might even be more optimal than masking out the LSB to check the type.  It seems like assigning a pointer to an interface field of a structure is just a multi-word assignment, not an allocation (since the pointer will always fit in the value).

Ian Lance Taylor

unread,
Apr 20, 2012, 1:59:23 PM4/20/12
to Kyle Lemons, Michael Jones, golang-nuts
Kyle Lemons <kev...@google.com> writes:

I was thinking of the case where you store a struct with two pointers
into the interface. That doesn't fit in the interface, so there is an
extra allocation there.

But, you're right, you could structure it differently.

Ian

David Symonds

unread,
Apr 20, 2012, 6:48:22 PM4/20/12
to Michael Jones, golang-nuts, Ian Lance Taylor

On Apr 21, 2012 1:55 AM, "Michael Jones" <m...@google.com> wrote:

> Thanks! Maybe I'll just use unsafe and the LSB pointer hack then and consider that idiomatic.

Using unsafe, while sometimes necessary, is never idiomatic.

Dave.

Saul Hazledine

unread,
Apr 21, 2012, 2:23:42 AM4/21/12
to golan...@googlegroups.com
Hello,


On Friday, 20 April 2012 17:25:23 UTC+2, Michael Jones wrote:
I don't know how to code crit-bit trees gracefully in Go.

In the present C version, to save space, pointer type is encoded in each pointer's LSB.
Given my background, this is the natural, graceful expression. ;-)

 I played with a C version of a critbit tree based on the explanation linked here:


It ended up being far too clever for me.  I abandoned the critbit tree as it became a pain to maintain when I wanted to change the behaviour to handle entries that were a prefixes of other entries. I ended up reading up on alternative trie layouts and one nice one that caught my eye is the LC-trie:


If you haven't seen it, its explanation (and the corresponding paper) may be of interest. It uses arrays instead of pointers. It might not fit your use case but I found it a refreshing change to manipulating bits on pointers.

Saul Hazledine

unread,
Apr 21, 2012, 4:38:25 AM4/21/12
to golan...@googlegroups.com
On Friday, April 20, 2012 5:25:23 PM UTC+2, Michael Jones wrote:
I don't know how to code crit-bit trees gracefully in Go.
My first thought was to propose an extension for Go 2.
My second thought was to appeal for advice in Go 1.

Goal:
I want to organize a collection of objects of type O
I need ia tree structure of nodes (Type T) each with a little data and two pointers.
Each pointer points to either a subordinate tree node (type T) or object (type O), or be nil.

In the present C version, to save space, pointer type is encoded in each pointer's LSB.
Given my background, this is the natural, graceful expression. ;-)

For Go I can:
(a) Store four pointers and two flag bits and waste space.
(b) Store two pointers and flag bits and use Unsafe to recast
(c) ? what else... this is my question. is there small/fast solution that keeps both pointers valid?

(c) 2*unsafe.Pointer + 2*byte + data

Jan Mercl

unread,
Apr 21, 2012, 4:51:11 AM4/21/12
to golan...@googlegroups.com
On Friday, April 20, 2012 5:25:23 PM UTC+2, Michael Jones wrote:
type childKind = (node, string);
     shape = record
                payload : integer;
                case leftKind : childKind of
                   node : ...
                   string : ...
                case rightKind : childKind of
                   node : ...
                   string : ...
              end;

I would probably do something like:

type (
        shape struct {
                payload int
                union   interface{} // one of the bellow
        }

        nodeNode struct {
                left, right *shape
        }

        nodeString struct {
                left  *shape
                right string
        }

        stringNode struct {
                left  string
                right *shape
        }

        stringString struct {
                left, right string
        }
)

func foo(s *shape) {
        switch x := s.union.(type) {
        case nodeNode:
                // ...
        case nodeString:
                // ...
        case stringNode:
                // ...
        case stringString:
                // ...
        default:
                panic(fmt.Errorf("%T", x))
        }
}
 
-jan

Michael Jones

unread,
Apr 21, 2012, 7:35:09 AM4/21/12
to Jan Mercl, golan...@googlegroups.com
The key feature, whatever the syntax, is that:

(a) there is a named tag identifying which "state" the union is in.

(b) the language allows querying the tag

(c) reads are dependent on the reading-tag matching the present state

(d) writing is smart: if the tag matches then the write is a write. if they differ, then either (1) it panics or (2) it sets all values to defaults before doing the write and the updates the tag after. Pascal used (1) but (2) is friendlier.

You are using Interface semantics/abilities for discrimination. I'm thinking that this is really just a "defensive programming/mistake proof" version of a C-language union. I'd almost like declaring it using a "backwards" switch statement:
const (
    NODE = 0
    LEAF = 1
}

struct {
    payload int;
    left switch {
    case NODE: node *T
    case LEAF: leaf *O
    }
    right switch {
    case NODE: node *T
    case LEAF: leaf *O
    }

:

if s.left == NODE {
    ...
}

or

switch s.left {
case NODE:
    ... = s.left.node
case LEAF:
    ... = s.left.leaf

or

s.right.leaf = p
Reply all
Reply to author
Forward
0 new messages