large arrays on 64 bit architectures

591 views
Skip to first unread message

kortschak

unread,
May 1, 2011, 7:17:35 PM5/1/11
to golang-nuts
I am following up a discussion that talked about the use of native 64
bit ints as array indexes on 64 bit architectures, here
http://groups.google.com/group/golang-nuts/browse_thread/thread/49da12516f29c3f6/d1df3b29c7a8205d?#d1df3b29c7a8205d

I'm interested in what has happened with this since the discussion
stopped mid last year.

My problem arises in a program I'm writing that [may] use(s) an
[]int32 of just over 2^32 in length. This is determined dynamically
depending on user input.

In the object creation I essentially make(int32[], 1<<2*k) where
8<k<=16 and is determined algorithmically. When k=16 the runtime
panics with an index out of range (not in the object creation
interestingly, but rather in the first reference to the slice - I'm
guessing this is due to some kind of laziness - I'd love to hear if
this is the case).

I'm running the program on a Corei5 under Ubuntu amd64, so I would
expect (possibly unreasonably) that go would be implementing ints as
64 bits wide allowing me to use more than int32 equivalent indexes for
array, but this is obviously not the case.

So my questions (other than the implicit one above about laziness -
and how to catch errors like that so it is clear what the cause is -
try to assign/refer immediately after allocation?) are:

What was the outcome in terms of int implementation dependency on
underlying architecture outside the discussion referred to above?

Is there some way to force 6g to make ints 64 bits wide?

thanks

Jessta

unread,
May 1, 2011, 7:34:59 PM5/1/11
to kortschak, golang-nuts
On Mon, May 2, 2011 at 9:17 AM, kortschak <dan.ko...@adelaide.edu.au> wrote:
> I am following up a discussion that talked about the use of native 64
> bit ints as array indexes on 64 bit architectures, here
> http://groups.google.com/group/golang-nuts/browse_thread/thread/49da12516f29c3f6/d1df3b29c7a8205d?#d1df3b29c7a8205d
>
> I'm interested in what has happened with this since the discussion
> stopped mid last year.
>
> My problem arises in a program I'm writing that [may] use(s) an
> []int32 of just over 2^32 in length. This is determined dynamically
> depending on user input.
>
> In the object creation I essentially make(int32[], 1<<2*k) where
> 8<k<=16 and is determined algorithmically. When k=16 the runtime
> panics with an index out of range (not in the object creation
> interestingly, but rather in the first reference to the slice - I'm
> guessing this is due to some kind of laziness - I'd love to hear if
> this is the case).

No laziness, 1<<(2*16) overflows the uint32 and becomes a zero.
You just made a slice of zero length.

>
> I'm running the program on a Corei5 under Ubuntu amd64, so I would
> expect (possibly unreasonably) that go would be implementing ints as
> 64 bits wide allowing me to use more than int32 equivalent indexes for
> array, but this is obviously not the case.
>
> So my questions (other than the implicit one above about laziness -
> and how to catch errors like that so it is clear what the cause is -
> try to assign/refer immediately after allocation?) are:
>
> What was the outcome in terms of int implementation dependency on
> underlying architecture outside the discussion referred to above?
>
> Is there some way to force 6g to make ints 64 bits wide?
>
> thanks

--
=====================
http://jessta.id.au

kortschak

unread,
May 1, 2011, 7:38:56 PM5/1/11
to golang-nuts
Thanks for that. One less item of confusion.

I've also wondered why int was chosen as the type for indexes rather
than uint given that negative indexes really shouldn't be used (IMO).

Andrew Gerrand

unread,
May 1, 2011, 8:34:58 PM5/1/11
to kortschak, golang-nuts
On 1 May 2011 16:38, kortschak <dan.ko...@adelaide.edu.au> wrote:
> I've also wondered why int was chosen as the type for indexes rather
> than uint given that negative indexes really shouldn't be used (IMO).

They can't be used; that would cause a panic. Just like you'd get a
panic for indexing beyond the length of the slice (which is what would
happen if you underflowed an unsigned integer, anyway).

Algorithms can benefit from the ability to express negative offsets
and such. If indexes were unsigned you'd always need a conversion in
these cases.

Andrew

kortschak

unread,
May 1, 2011, 10:03:26 PM5/1/11
to golang-nuts
OK, that makes sense. The situation seems to be a balance of ease of
coding (not having to explicitly convert int types for the cases you
are describing) at the cost of the loss of the ability to address more
that 2G of elements.

I guess my question is how is this efficiently circumvented. My
interests are in bioinformatics and so involve dealing with very large
data structures (as can be seen in this example) in memory
efficiently. I can see that addressing memory in large scales (at
least >2G elements) is going to increasingly be important in general
computing as well. Changing the int type to 64 bits wide on 64 bit
architectures would obviate the loss of 2G elements because of choice
that has been made to make coding cleaner.

I've had a look at compartmentalising arrays in a BigArray object
(naively an array of arrays), but all the solutions I see are going to
give a significant time cost and either loss of slice functionality or
really dreadful hits to efficiency when pulling slices across sub-
arrays.

If anyone can suggest a solution to this I would be very grateful -
I'm seeing go as a very good tool for bioinformatics because of many
of the languages constructs (as a solution that sits between python/
perl and C/C++, for tasks that are computationally intensive but don't
warrant heavy investment in working with biologically unfriendly
lagauages), but without some way of efficiently addressing large
memory, the utility drops significantly.

thanks

Andrew Gerrand

unread,
May 2, 2011, 1:14:39 AM5/2/11
to kortschak, golang-nuts
On 1 May 2011 19:03, kortschak <dan.ko...@adelaide.edu.au> wrote:
> OK, that makes sense. The situation seems to be a balance of ease of
> coding (not having to explicitly convert int types for the cases you
> are describing) at the cost of the loss of the ability to address more
> that 2G of elements.
>
> I guess my question is how is this efficiently circumvented. My
> interests are in bioinformatics and so involve dealing with very large
> data structures (as can be seen in this example) in memory
> efficiently. I can see that addressing memory in large scales (at
> least >2G elements) is going to increasingly be important in general
> computing as well. Changing the int type to 64 bits wide on 64 bit
> architectures would obviate the loss of 2G elements because of choice
> that has been made to make coding cleaner.

At some point we will make int 64-bits wide on amd64. That's why the
language spec permits it.

> I've had a look at compartmentalising arrays in a BigArray object
> (naively an array of arrays), but all the solutions I see are going to
> give a significant time cost and either loss of slice functionality or
> really dreadful hits to efficiency when pulling slices across sub-
> arrays.

Really? I'm surprised to hear that. I'd imagine you could index with
an int64 and use the high bits to select sub-arrays with a very small
cost.

What kinds of algorithms need random access to such a large piece of memory?

Andrew

kortschak

unread,
May 2, 2011, 1:31:07 AM5/2/11
to golang-nuts
Thanks Andrew.

I'm sure you guys get bored of hearing things like this, but can you
give a time frame for implementation of native 64 bit ints on amd64?


On May 2, 2:14 pm, Andrew Gerrand <a...@golang.org> wrote:
> At some point we will make int 64-bits wide on amd64. That's why the
> language spec permits it.


> Really? I'm surprised to hear that. I'd imagine you could index with
> an int64 and use the high bits to select sub-arrays with a very small
> cost.

The approach you mention is essentially what I was thinking, and does
work for the specific case I describe below (where only a single
element is addressed), however it fails when we want to grab a slice
that spans a sub-array boundary, and while it may not be very much
more expensive that simply addressign directly, a small percentage
increase (even less than 1%) makes a fairly significant difference
when jobs run for 3-6 months.

> What kinds of algorithms need random access to such a large piece of memory?

Fast Kmer table generation uses memory like that (very space
inefficiently - but we have plenty of that around, grant life is less
favourable). Rather than hashing to store kmer counts we just map
kmers to their mapped addresses (hence to 1<<2*k space - nucleotides
are quaternary, so a base is storable as two bits).

Making assembly networks takes similar resourcing - and this is an
area that I'm looking at using go for in the hopefully not too distant
future for metadata generation from high throughput sequencing data.

kortschak

unread,
May 2, 2011, 10:54:04 PM5/2/11
to golang-nuts
I've just put together a simple BigArray (minimal functionality), but
it gives me some strange issues that I was hoping people could help
out with.

package bigarray

const (
loBits = 29 // magic number - 30 should work but doesn't
and is underlying type dependent
residualLoBitMask = int64(^uint(0) >> loBits)
loBitMask = int64(^uint(residualLoBitMask << loBits))
)

type bodySlice [1 << loBits]int32

type BigI32Array struct {
body []bodySlice
tail []int32
}

func NewBigI32Array(size int64) (b *BigI32Array) {
lo := int(size & loBitMask)
hi := int(size >> loBits)
body := make([]bodySlice, hi)
tail := make([]int32, lo+1)
b = &BigI32Array{
body: body,
tail: tail,
}

return
}

func (self *BigI32Array) At(i int64) (v int32) {
lo := int(i & loBitMask)
hi := int(i >> loBits)
if hi == len(self.body)+1 {
v = self.tail[lo]
} else {
v = self.body[hi][lo]
}

return
}

func (self *BigI32Array) Set(i int64, x int32) {
lo := int(i & loBitMask)
hi := int(i >> loBits)
if hi == len(self.body)+1 {
self.tail[lo] = x
} else {
self.body[hi][lo] = x
}
}


When loBits > 29 for int32 or 27 for interface{} as the underlying
type I get the following from 6g:

$ gb
(in pkg/bio/bigarray) building pkg "bio/bigarray"
bigarray.go:35: internal compiler error: index is zero width
1 broken target

It will build at loBits = 30 for underlying types byte and int16.

What am I not understanding? Shouldn't the underlying type make no
difference?

thanks

peterGo

unread,
May 3, 2011, 12:56:42 AM5/3/11
to golang-nuts
kortschak,

type bodySlice [(1 << loBits)-1]int32

Peter

Ian Lance Taylor

unread,
May 3, 2011, 1:10:25 AM5/3/11
to kortschak, golang-nuts
kortschak <dan.ko...@adelaide.edu.au> writes:

> type bodySlice [1 << loBits]int32
>
> type BigI32Array struct {
> body []bodySlice
> tail []int32
> }
>

> v = self.body[hi][lo]

> bigarray.go:35: internal compiler error: index is zero width

(when loBits == 30).

I think this is a compiler bug in 6g. It's happening because the total
size of the object is (1 << 30) * 4 == 0 (when computed using a 32-bit
type). But I think 6g ought to permit this, since the index will always
fit in an int. At the very least it should give a different error
message. Please open an issue for this. Thanks.

I reduced the test case down to this:

package p
type bodySlice [1 << 30]int32


type BigI32Array struct {
body []bodySlice
}

func (self *BigI32Array) At(hi, lo int) int32 {
return self.body[hi][lo]
}

By the way, this fails with 8g:

foo.go:2: type [1073741824]int32 larger than address space

That won't be fixed.

Ian

kortschak

unread,
May 3, 2011, 2:15:13 AM5/3/11
to golang-nuts
Thanks Ian, I will.

On May 3, 2:10 pm, Ian Lance Taylor <i...@google.com> wrote:
> At the very least it should give a different error
> message.  Please open an issue for this.  Thanks.
>
> I reduced the test case down to this:
>
> package p
> type bodySlice [1 << 30]int32
> type BigI32Array struct {
>         body []bodySlice}
>
> func (self *BigI32Array) At(hi, lo int) int32 {
>         return self.body[hi][lo]
>
> }
>
> By the way, this fails with 8g:
>
> foo.go:2: type [1073741824]int32 larger than address space
>
> That won't be fixed.

Damn. I was going to ask for the gravity vector to be reversed too. ;)

kortschak

unread,
May 3, 2011, 2:20:39 AM5/3/11
to golang-nuts
Not so much. The magic number is 27 when the underlying type is
interface{}. The issue is more the non-agnosticism on the underlying
type.

andrey mirtchovski

unread,
May 3, 2011, 10:33:56 AM5/3/11
to Ian Lance Taylor, kortschak, golang-nuts
>> bigarray.go:35: internal compiler error: index is zero width
>
> (when loBits == 30).
>
> I think this is a compiler bug in 6g.  It's happening because the total

a variation of issue 713, perhaps?

Ian Lance Taylor

unread,
May 3, 2011, 10:37:16 AM5/3/11
to andrey mirtchovski, kortschak, golang-nuts
andrey mirtchovski <mirtc...@gmail.com> writes:

I believe that is a different issue, happening at link time rather than
at compile time. Fixing this problem may run into that one, though, I
don't know.

Ian

kortschak

unread,
May 3, 2011, 6:29:39 PM5/3/11
to golang-nuts
Possibly the same pattern, but certainly a different bug. I can't
reproduce that with my current 6g.

kortschak

unread,
May 4, 2011, 3:33:41 AM5/4/11
to golang-nuts
I've struck another problem. In my implementation of an Append
function I am unable to append an [n]int32 to an [][n]int32 slice.

The relevant section of code is:

const (
loBits = 28
sliceSize = 1 << loBits
)

type BigI32Array struct {
body [][sliceSize]int32
tail []int32
}

func NewBigI32Array(size int64) (b *BigI32Array) {
lo := int(size & loBitMask)
hi := int(size >> loBits)
body := make([][sliceSize]int32, hi)
tail := make([]int32, lo+1)
b = &BigI32Array{
body: body,
tail: tail,
}

return
}

func (self *BigI32Array) Len() int64 {
return int64(len(self.body))*sliceSize + int64(len(self.tail))
}

func (self *BigI32Array) Append(a *BigI32Array) {
var t [sliceSize]int32 // this breaks if sliceSize > 1 << 28 ‽
offset := int64(sliceSize - len(self.tail))
if a.Len()+offset > sliceSize {
padding := a.NativeSlice(0, offset-1)
self.tail = append(self.tail, padding...)
if len(self.tail) != sliceSize {
panic("Logic error in padding tail slice for BigArray append")
}
/// broken - not exactly repeatable in test case
self.body = append(self.body, t)
/// broken
copy(self.body[len(self.body)-1][:int(offset-1)], self.tail)
self.tail = nil
for i := int64(0); (i+1)*sliceSize+offset < a.Len(); i++ {
copy(t[:], a.NativeSlice(i*sliceSize+offset, (i+1)*sliceSize
+offset-1))
self.body = append(self.body, t)
}
// put remainder in a new tail
} else {
s := len(self.tail)
padding := a.NativeSlice(0, a.Len()-1)
self.tail = append(self.tail, padding...)
if len(self.tail) != s+int(a.Len()) {
panic("Logic error in padding tail slice for BigArray append")
}
}
}

func (self *BigI32Array) NativeSlice(start, end int64) (slice []int32)
{
slo := int(start & loBitMask)
shi := int(start >> loBits)
elo := int(end & loBitMask)
ehi := int(end >> loBits)
if end-start > int64(maxInt32) {
panic("Attempting to take a native slice bigger than Go array
capacity")
}
slice = make([]int32, int(end-start))
if shi == ehi {
if shi == len(self.body)+1 {
copy(slice, self.tail[slo:elo])
} else {
copy(slice, self.body[shi][slo:elo])
}
} else {
if ehi == len(self.body)+1 {
copy(slice, self.body[shi][slo:])
slice = append(slice, self.tail[:elo]...)
} else {
copy(slice, self.body[shi][slo:])
slice = append(slice, self.body[ehi][:elo]...)
}
}

return
}

Compilation fails with:

$ gb
(in pkg/bio/bigarray) building pkg "bio/bigarray"
bigarray.go:82: internal compiler error: bad argwid func(*uint8, int,
[][268435456]int32, [268435456]int32) [][268435456]int32
1 broken target

(line 82 is bracketed with the comments, but: 'self.body =
append(self.body, t)')

A simpler version of this:

package main

func main() {
var t [1<<28]byte
a:= make([][1<<28]byte,5)
a = append(a, t)
}

fails with the similar but not identical error:

$ 6g t.go
t.go:6: internal compiler error: bad width

Using the value 1<<20 compiles without complaint, so the syntax is
clearly fine.

Suggestions?

thanks

Ian Lance Taylor

unread,
May 4, 2011, 10:07:59 AM5/4/11
to kortschak, golang-nuts
kortschak <dan.ko...@adelaide.edu.au> writes:

> Compilation fails with:
>
> $ gb
> (in pkg/bio/bigarray) building pkg "bio/bigarray"
> bigarray.go:82: internal compiler error: bad argwid func(*uint8, int,
> [][268435456]int32, [268435456]int32) [][268435456]int32
> 1 broken target
>
> (line 82 is bracketed with the comments, but: 'self.body =
> append(self.body, t)')
>
> A simpler version of this:
>
> package main
>
> func main() {
> var t [1<<28]byte
> a:= make([][1<<28]byte,5)
> a = append(a, t)
> }
>
> fails with the similar but not identical error:
>
> $ 6g t.go
> t.go:6: internal compiler error: bad width
>
> Using the value 1<<20 compiles without complaint, so the syntax is
> clearly fine.
>
> Suggestions?

Any time you see the words "internal compiler error" you've found a
compiler bug. Often the code is invalid, though this code looks OK to
me.

However, it's worth pointing out that your test cases are going to copy
huge amounts of data. While the compiler really ought to accept them,
or at least give a sensible error message, in practice this code is not
going to be feasible to run. Calling append for a gargantuan array is
going to copy the entire array. I think you need to be working with
slices of slices here, and just arrange to allocate your slices at the
desired size. Using arrays tends to lead to copying, which you can't
afford.

Ian

kortschak

unread,
May 4, 2011, 6:36:44 PM5/4/11
to golang-nuts
Yeah, I was trying to figure out an obvious way to do a slice of
slices, but I couldn't think of one. Would that be declared as e.g.
body[]*[]int32? This means that now each body slice needs to be
explicitly made.

The Append is really just for the sake of completeness and I was
thinking that if it was called at all it would be infrequently. But I
find the whole situation less than satisfactory and wistfully look
forward to the day when I can replace the New function with a
deprecation warning.

thanks

As an aside, can anyone tell me why runtime.Free() is noted as only
for debugging?

kortschak

unread,
May 4, 2011, 6:49:08 PM5/4/11
to golang-nuts
Figured out how to do a slice of slices - it is so obvious I missed
it. Still interested in the free questions though.

thanks

Ian Lance Taylor

unread,
May 4, 2011, 6:58:07 PM5/4/11
to kortschak, golang-nuts
kortschak <dan.ko...@adelaide.edu.au> writes:

> As an aside, can anyone tell me why runtime.Free() is noted as only
> for debugging?

Calling runtime.Free() is unsafe, in that it can cause your program to
fail if you free something and retain a pointer to it.

Ian

kortschak

unread,
May 4, 2011, 7:08:53 PM5/4/11
to golang-nuts
OK, that makes a lot of sense. Thanks

On May 5, 7:58 am, Ian Lance Taylor <i...@google.com> wrote:

jp

unread,
Aug 28, 2011, 12:31:36 AM8/28/11
to golan...@googlegroups.com
Jessta <jessta@...> writes:

>

One more array proposal:

I'd keep ints 32 bit.
int size does not hinder bigger arrays.

Currently, make() and [] accept ANY integral type without explicit
conversion to int:
a := make([]int32, byte(2))
println(a[byte(1)])

Why do not they deal appropriately with integrals longer than int
(uint32,int64,uint64) ?
a := make([]int8, uint64(1<<32+2))
println(a[uint64(1<<32)])

Well, there is one problem still remain: return type of len() and cap()

Simplest (but ugly) solution: to add len64() and cap64()

A bit nicer:
to give them a "fuzzy" integral type, similar to parameter of make().
In "len(a) + 1" it is int.
In "len(a) < byte(1)" it is byte.
In "println(len(a))" or "x := len(a)" it is uintptr.

kortschak

unread,
Aug 28, 2011, 8:06:05 PM8/28/11
to golang-nuts
As far as I know, make and [] *do* accept any integral type, so the
only remaining issue is the central one - the issue of cap and len
returning int type integers. Making another function int64 does not
solve the problem because arrays, slices, chans and strings and maps
must all support the len function and slice, array and chans support
the cap function. The other approach, fuzziness, is an implicit cast
by stealth. The solution, which is worth waiting for is for int on 64
bit architectures to be 64 bit (although I quite like Rog Peppe's
suggestion of a variety of index types much like the variety of key
types that are acceptable in maps).

j...@webmaster.ms

unread,
Aug 28, 2011, 11:47:26 PM8/28/11
to golang-nuts
BTW, Go already has this feature for numeric literals.
They defaults to int unless casted implicitly to the type they assign
to.
len() and cap() could default to uintptr same way.

kortschak

unread,
Aug 29, 2011, 12:05:17 AM8/29/11
to golang-nuts
Constants are untyped unless a type is specified or inferred (really
specified as a result of the expression defining the constant) so they
are not cast/type converted, they are defined (and the default
interpretation of an integer value is int) - so that's not really the
same thing (http://golang.org/doc/go_spec.html#Constant_expressions).

len and cap *could* return uintptr, but then every use of those
returned values except as uintptr would have to be type converted.
Again, the only way to avoid this is to implicitly cast, something
which currently does not happen anywhere and is not described anywhere
in the specification.
Reply all
Reply to author
Forward
0 new messages