What is the best way to construct a union of data structures in GO?

2,652 views
Skip to first unread message

Travis Keep

unread,
Apr 25, 2016, 8:44:10 PM4/25/16
to golang-nuts
We are building a server that uses large amounts of memory.

To help with memory usage, we pre-allocate a set number of pages of fixed length when the server starts. As the server runs, pages can be recycled for different data structures as needed. Right now, our server has two main data structures, lists of struct A and lists of struct B.  We want to control the size of the pages with a command line flag.

Right now, each page looks something like this, but the code below does not work.

struct A {
  ...
}

struct B {
  ...
}

type alist []A
type blist []B

type page struct {
  pageMetaDataType
  raw []byte // allocated with make([]byte, pageSize)
  alist  // covers same region of memory as raw
  blist  // covers same region of memory as raw
}

I then do:
page.raw = make([]byte, 1024)

After allocating raw, I initialize alist and blist like so.

sizeInBytes := len(page.raw)
rawPtr := (*reflect.SliceHeader)(unsafe.Pointer(&page.raw)).Data
aHeader := (*reflect.SliceHeader)(unsafe.Pointer(&page.alist))
bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&page.blist))
aheader.Data = rawPtr
aHeader.Len = 0
aHeader.Cap = sizeInBytes / int(reflect.TypeOf(alist).Elem.Size())  // Divide page size by size of one A
bheader.Data = rawPtr
bHeader.Len = 0
bHeader.Cap = sizeInBytes / int(reflect.TypeOf(blist).Elem.Size())  // Divide page size by size of one B


Unfortunately, this scheme does not work for me. I get strange behavior such as values getting randomly overwritten in my data structures. I have traced this bug down to the commit where I do nothing but introduce using pages for multiple purposes by introducing unsafe. So I am pretty sure I am doing something wrong in the code above.


Questions:

1. reflect.SliceHeader has a uintptr for the data pointer. If I initialize a slice with SliceHeader, will the GC know about it?
2. What is the best way to accomplish what I am trying to do? Is it best to allocate my pages in C and use cgo? I remember reading that the Go garbage collector will leave C pointers alone.
3. What am I doing wrong? How do I do it correctly?

Thanks in advance,
Travis




Tim K

unread,
Apr 25, 2016, 10:16:23 PM4/25/16
to golang-nuts
I'm no Go expert but initializing a slice from a fixed size array is simpler this way and the slice will be backed by the array:

var raw []byte = make(...)
var a *[10]string = (*[10]string)(unsafe.Pointer(&raw[0])) // fixed size array
var s []string = (*a)[:] // slice


Now if you start using append() on your slice and Go needs to re-allocate your slice, all bets are off, but as long as the capacity of the slice doesn't need to change, you should be OK, the slice will be backed by your array.

I suspect code like this will be hard to maintain and the source of potentially frequent bugs.

ulu...@gmail.com

unread,
Apr 26, 2016, 12:33:43 AM4/26/16
to golang-nuts
Why are you writing your own memory allocator? Is there a reason why using sync.Pool is insufficient?

Besides sync.Pool, there are a number of examples in the Go source of doing preallocation to ease pressure on the GC: (allocation, use) (another one).

Egon

unread,
Apr 26, 2016, 1:02:20 AM4/26/16
to golang-nuts
AFAIK, it should.

2. What is the best way to accomplish what I am trying to do? Is it best to allocate my pages in C and use cgo? I remember reading that the Go garbage collector will leave C pointers alone.

It depends what exactly are you trying to do. i.e. what business problem is your program solving. It's hard to reverse engineer any better solutions from the description you have given.

Yes you can use C pointers, but the approach you showed should work as well.
 
3. What am I doing wrong?

I suspect the bug is in the code that you didn't show. Do you run your code with race detector?

How do I do it correctly?

It depends how the whole workflow looks.

However, if all possible use sync.Pool or techniques pointed out in Go source.

+ Egon

Travis Keep

unread,
Apr 26, 2016, 10:27:01 AM4/26/16
to golang-nuts, ulu...@gmail.com
We are using 30GB of memory and we want to allocate it all up front.

Tamás Gulácsi

unread,
Apr 26, 2016, 12:09:57 PM4/26/16
to golang-nuts
The simplest is to allocate two slices for the two types, with some length.

Travis Keep

unread,
Apr 26, 2016, 1:03:18 PM4/26/16
to golang-nuts
I guess the lesson learned here is that "unsafe" is really unsafe.

I finally decided to make my two types alist and blist to both be []A instead of blist being a []B and expanding the A type to include everything for both A and B data structures. By doing that I could legally write go code like this

func (p *page) AsAList() *alist {
  return (*alist)(&p.thelist)
}

func (p *page) AsBList() *blist {
  return (*blist)(&p.thelist)
}

Now I don't even need the unsafe package and now my code works!

The downside is that I am no longer using pages as efficiently. My A data structure is a struct { float64, interface{} } and B is float64. Now my blist slice can only be 1/3 as big as it used to be because I have to use A to store nothing but a float64, but at least my code works.


On Monday, April 25, 2016 at 5:44:10 PM UTC-7, Travis Keep wrote:

Travis Keep

unread,
Apr 26, 2016, 2:47:37 PM4/26/16
to golang-nuts
Been reading about cgo and all the rules about passing go pointers in C memory and C pointers in go memory.

onfeof my data types is a []struct {ts float64, value interface{}} and I am trying to overlay this on a simple []byte. Since interfaces{} can contain up to two go pointers (one for the itable and one for the value), I am storing go pointers in memory that is supposed to contain only non-pointers. I wonder if this act is messing up the GC and causing the data corruption that I am seeing. Indeed the data corruption always happens in the interface{} value, not the float64.



Ian Lance Taylor

unread,
Apr 26, 2016, 3:17:28 PM4/26/16
to Travis Keep, golang-nuts
Your suspicions are correct: you can not do that.

The current docs for the unsafe package explain when it is safe to
convert a pointer value to uintptr (https://golang.org/pkg/unsafe/).

Ian

Egon

unread,
Apr 27, 2016, 2:32:50 AM4/27/16
to golang-nuts
On Tuesday, 26 April 2016 21:47:37 UTC+3, Travis Keep wrote:
Been reading about cgo and all the rules about passing go pointers in C memory and C pointers in go memory.

onfeof my data types is a []struct {ts float64, value interface{}} and I am trying to overlay this on a simple []byte. Since interfaces{} can contain up to two go pointers (one for the itable and one for the value), I am storing go pointers in memory that is supposed to contain only non-pointers. I wonder if this act is messing up the GC and causing the data corruption that I am seeing. Indeed the data corruption always happens in the interface{} value, not the float64.

Your custom allocation shouldn't contain any pointers for the least amount of GC pressure.

If you need more help: 
* What are the exact structs definitions that you need to allocate/store in huge quantities? (i.e. including things stored in interface{})
* What is the pattern of allocation/deallocation? I.e. allocate 10, deallocate 5, allocate 5, deallocate 10... etc.
* How are those things processed?

+ Egon

Peter Mogensen

unread,
May 27, 2016, 5:00:52 AM5/27/16
to golan...@googlegroups.com
I think I have a use case which (for the foolish, but brave) could end
up in similar considerations about using unsafe and/or CGO.

I have an (almost) lock-free FIFO queue of many writers and one reader
in a hot path.

It's based on a datastructure like this:

---------------------------------------
type event struct {
seq uint64 // running sequence number of the value.
val int64
// val interface{}
}

type FifoQueue [queueSize]event
---------------------------------------

Simplified... the idea is that it's a ring-buffer and there's only used
locks when the writer catches up with the reader or the "seq" number is
about to wrap.

Anyway... The "val" field is supposed to be any *numerical* type. But if
I use an interface{}, I run into escape-analysis causing an extra
unnecessary allocation for boxing the numerical in an interface (and
some additional runtime cost).
The allocation unnecessary because I really don't need the full
interface{} generality. I only need numerical types. Something like:

type Value union {
int64
uint64
float64
}

type event struct {
seq uint64 // running sequence number of the value.
val Value
}

Now ... I could of course just duplicate all the code for the lock-free
FIFO queue in a copy for each of the needed types. :(

But not doing that would AFAICS involve fiddling with "unsafe".

/Peter

PS: I thing this discussion is related too:
http://webmail.dev411.com/p/gg/golang-nuts/156dvh40hs/go-nuts-re-reconsidering-union-types

Egon

unread,
May 27, 2016, 6:30:42 AM5/27/16
to golang-nuts, a...@one.com

Convert the cell struct into

type Tag uint32
const (
    TagInt64 = Tag(iota)
    TagUint64
    TagFloat64
)
type Cell struct {
sequence int64
tag       Tag
data     uint64
}

use: https://golang.org/pkg/math/#Float64bits to convert from64 float to uint64, use tag to distinguish between the types.

If necessary add some methods such as:

func (cell *Cell) Set(v interface{}) { ... }
func (cell *Cell) Get() interface{} { ... }

That hides the tag and usage of Float64bits.

If the calling func with interface adds too much overhead, you can use:
func (cell *Cell) Tag() Tag { ... }
func (cell *Cell) SetInt64(v int64) { ... }
func (cell *Cell) GetInt64() int64 { ... }

Of course, if that is a problem, then you probably should start batching anyways e.g.
type Tag int16
type Cell struct {
sequence int64
tag      Tag
        count   int16
data     [32]uint64
}

+ Egon

Peter Mogensen

unread,
May 27, 2016, 6:49:17 AM5/27/16
to Egon, golang-nuts


On 2016-05-27 12:30, Egon wrote:
> See: https://github.com/egonelbre/exp/blob/master/ring/buffer.go

Yeah ... amazingly that seems like almost exactly the same design as
I've ended up with. There's probably some minor details, ...
I actually switch to locking the buffer, flushing it and blocking
writers on a condition variable when it runs full instead of busy
waiting... but yes... the principle is exactly the same.

> Convert the cell struct into
>
> type Tag uint32
> const (
> TagInt64 = Tag(iota)
> TagUint64
> TagFloat64
> )
> type Cell struct {
> sequence int64
> tag Tag
> data uint64
> }

I have all "Cells" being of the same type. So I could probably have
avoided code duplication if there had been a template mechanism i Go.
But again... yes.

> use: https://golang.org/pkg/math/#Float64bits to convert from64 float to
> uint64, use tag to distinguish between the types.

AHH! ... Yes! ... that handles the "unsafe" stuff without importing the
unsafe package myself.

Thanks,
/Peter

Peter Mogensen

unread,
May 27, 2016, 6:57:27 AM5/27/16
to golan...@googlegroups.com


On 2016-05-27 12:48, Peter Mogensen wrote:
>
>
> On 2016-05-27 12:30, Egon wrote:
>> See: https://github.com/egonelbre/exp/blob/master/ring/buffer.go
>
> Yeah ... amazingly that seems like almost exactly the same design as
> I've ended up with. There's probably some minor details, ...
> I actually switch to locking the buffer, flushing it and blocking
> writers on a condition variable when it runs full instead of busy
> waiting... but yes... the principle is exactly the same.

Well... that, and that I just use atomic.AddUint64() to allocate a
slot/cell in the buffer.
Writers happening to allocate a slot beyond the max sequence will then
retry when the go-routine allocating the last sequence number has
flushed and reset the buffer.

/Peter

Reply all
Reply to author
Forward
0 new messages