Table proposal next steps

1,243 views
Skip to first unread message

Brendan Tracey

unread,
Jul 15, 2014, 2:24:47 AM7/15/14
to golan...@googlegroups.com
A little over a year and a half ago, I chose to switch to Go as my primary language. Go is not marketed to scientific programmers, and contains only the barest of support for numeric computing, but it seemed to have the right mix of features. I have not looked back. 

Still, Go was mostly devoid of libraries to support scientific computation, and a group of us started the Gonum project [0] to change that. One thing we've found as a result of our efforts is that matrices are a pain. Several months ago I created a detailed proposal [1] to add a new data structure to the language which will eliminate the problems we face.

The proposal was released to the Go community at large in March during the 1.3 feature freeze. Since then, I have addressed all comments from the community. As far as I am aware, the outstanding objections are either the mere fact that it adds to the language at all or that it does not include one or more of the non-goals described in the proposal.

I consider the Go development team as a model for software development, and I am hopeful that we can work together to make this an acceptable proposal. The Gonum team is eager to rework our libraries to create safe, performant libraries that actually feel like Go. From this base, more and more powerful libraries will be constructed.

There is an issue in the issue tracker [2] marked as "Thinking", so presumably it's open for discussion. There has been a golang-nuts thread [3] and a gonum-dev thread [4]. I posted a (possibly misplaced) reminder about the proposal in the 1.4 release thread. After continued silence, the proposal was slightly expanded, prompting a new message on the issue tracker and the golang-nuts thread, although this has received no response. I know you are all busy working on the next release and attending to other duties, but some sort of comment would be appreciated. I am unsure of how to proceed from here without feedback.

Thanks for your attention.

[0] https://github.com/gonum
[1] https://docs.google.com/document/d/1eHm7KqfKP9_s4vR1zToxq-FBazdUQ9ZYi-YhcEtdfR0/edit
[2] https://code.google.com/p/go/issues/detail?id=6282
[3] https://groups.google.com/forum/#!searchin/Golang-nuts/tables/golang-nuts/osTLUEmB5Gk/3-f9_UKfE9MJ
[4] https://groups.google.com/forum/#!topic/golang-dev/ec0gPTfz7Ek

minux

unread,
Jul 15, 2014, 2:35:19 AM7/15/14
to Brendan Tracey, golang-dev
It seems Go 1.4 doesn't plan to include any language changes, according to golang.org/s/go14todo.
This presumably because the Go rewrite is touching so many fundamental pieces that it's unwise to introduce
language changes at this stage, and the compiler and runtime seem still frozen (I didn't see notices that
those parts are open.)

Egon Elbre

unread,
Jul 15, 2014, 3:18:15 AM7/15/14
to golan...@googlegroups.com
Overall, I'm not convinced.

Two approaches not mentioned are source-generation and runtime code-generation:

Basically one option would be to add scientific additions to Go, that either generate Go code or generate code at runtime. Of course this can cause the language to fragment, as we have seen with javascript, so the additions should be chosen carefully.

With just generating Go code, it's easy to communicate with regular code and given good enough libraries the end-user may not need to touch the scientific code generator at all.

With generating code at runtime you can make interesting trade-offs. For example, you can generate specialized code for multiplying with a scalar before running it on a large matrix or you can do very agressive optimizations at the cost of code size. Also you can choose automatically the best methods for doing computations, instead of running on the CPU you generate code for the GPU. It has a significant drawback that the approach is more complicated and you are doing the compilation at runtime.

PS. Struct version of AddMul can be simplified by introducing a method that returns the underlying row slice, this would also encourage algorithms to process by row:

func (m *Dense) Row(r int) []float64 {
x := r*mat.Stride
return m.mat.Data[x:x+mat.Stride]
}

func (c *Dense) AddMul(a *Dense, b *Dense) {
for i := 0; i < c.mat.Rows; i += 1 {
ci, ai, bi := c.Row(i), a.Row(i), b.Row(i)
for j := range ci {
ci[j] += ai[i]*bi[i]
}
}
}

+ egon

Dan Kortschak

unread,
Jul 15, 2014, 6:19:02 AM7/15/14
to Egon Elbre, golan...@googlegroups.com
I'm afraid I don't really understand how the majority of this supports the first sentence.

Ian Lance Taylor

unread,
Jul 15, 2014, 9:58:13 AM7/15/14
to Brendan Tracey, golang-dev
On Mon, Jul 14, 2014 at 11:24 PM, Brendan Tracey
<tracey....@gmail.com> wrote:
>
> The proposal was released to the Go community at large in March during the
> 1.3 feature freeze. Since then, I have addressed all comments from the
> community. As far as I am aware, the outstanding objections are either the
> mere fact that it adds to the language at all or that it does not include
> one or more of the non-goals described in the proposal.

Here are the things that trouble me personally about the proposal.

In the language it's barely possible to justify having only
single-dimensional arrays and slices. I think that if we had
two-dimensional matrixes, we need to have a clear understanding of how
we would introduce multi-dimensional matrixes. I understand that your
proposal explicitly decides not to do that, but it seems like such an
obvious request.

Your implementation of the slice operator says that the stride field
does not change. This does let you pick out a contiguous submatrix
from a matrix. But once the concept of the stride exists, it seems
likely that people will want to be able to manipulate it. Many matrix
operations, particularly in image processing, involve operations with
different strides. Should there be direct support for this?

The range clause seems frankly awkward. Did you consider
for i, j, v := range m
That is, the range would iterate over the entire matrix, returning the
row, column, and value. You can still iterate over a single row or
column by taking a slice; you will still get both a row and column
index, but one of the values would not change.

In more or less the same vein, did you consider having len, cap, and
copy return multiple values, rathern [2]int? I guess that does make
it harder to use them in expressions, though.

The reflect.SliceHeader type was a mistake, let's not introduce
reflect.TableHeader.

If you look at the C roots of Go, then, conceptually slices are
bounded pointers, and arrays are the memory to which they refer. This
is confused somewhat in the language because the make function, and
composite literals, quietly create the array in the background and
then give you the bounded pointer (the slice). You are conceptually
introducing a strided slice, in which each access requires two
indexes. But as I mentioned above the stride is implicit, and there
is no way to set it explicitly. I wonder if there is some way to
separate out the notion of striding, since that seems to be the key
idea.

Ian

Egon Elbre

unread,
Jul 15, 2014, 11:09:46 AM7/15/14
to golan...@googlegroups.com, egon...@gmail.com
On Tuesday, 15 July 2014 13:19:02 UTC+3, Dan Kortschak wrote:
I'm afraid I don't really understand how the majority of this supports the first sentence.

Sorry, about the non sequitur. (Note to self, people can't read minds) :)

Anyways, the text in this post starts with supporting scientific computing. So to start with that, is this sufficient to provide all-around good scientific libraries and language?

- With messy data you are also dealing with NaN/NA-s, how well that would integrate with the rest of the language.
- Although, it's not in the scope of this proposal, how would you do symbolic computation that would integrate with scientific libraries? This either means that we need some how extend the language more or find an alternative way to do it.
- How much help would this proposal be if you are talking about big-data? You will be communicating, splicing and dicing the data in multiple ways and does this improvement matter?
- And what about speed/optimizations for different targets cpu/gpu?
- Of course is 2D really sufficient?
- Operators for convenience? Reading scientific code with c.Plus(a.Mul(b)), is quite annoying.
- Also, scientific libs are often filled with a lot of generics; so you would still need to use code generation.

Basically, to me it looks like Go shouldn't address these scientific computation concerns and should be somehow separated. The only reasonable way is either to add some sort of way to extend Go language itself or use code generation. So this lead me to the two approaches I suggested.

So, would this proposal/implementation be an improvement for writing scientific libraries, yes. Would this be an improvement to the language, I'm not sure. Is it good enough/sufficient for any kind of scientific computing, I'm not so sure? 

+ egon

Brendan Tracey

unread,
Jul 15, 2014, 11:26:17 AM7/15/14
to Ian Lance Taylor, golang-dev
Thanks for your reply Ian.


On Jul 15, 2014, at 6:58 AM, Ian Lance Taylor <ia...@golang.org> wrote:

> On Mon, Jul 14, 2014 at 11:24 PM, Brendan Tracey
> <tracey....@gmail.com> wrote:
>>
>> The proposal was released to the Go community at large in March during the
>> 1.3 feature freeze. Since then, I have addressed all comments from the
>> community. As far as I am aware, the outstanding objections are either the
>> mere fact that it adds to the language at all or that it does not include
>> one or more of the non-goals described in the proposal.
>
> Here are the things that trouble me personally about the proposal.
>
> In the language it's barely possible to justify having only
> single-dimensional arrays and slices. I think that if we had
> two-dimensional matrixes, we need to have a clear understanding of how
> we would introduce multi-dimensional matrixes. I understand that your
> proposal explicitly decides not to do that, but it seems like such an
> obvious request.

We thought about including them in the proposal, and, in fact, a lot of the proposed behaviors extend nicely.

make([,,]bool, i, j, k, capi, capj, capk)
len returns an [n]int
range still has only one : (range t[i,:,k])

I think the only non-obvious extension to the syntax is the question of whether one may use copy between a [,]int and a [,,,,]int (for example). If I am wrong in that assertion, please let me know.

Implementation, on the other hand, is another issue. Is it
type Table struct{
Data uintptr
Stride int
Len [n]int
Cap [n]int
}

where n is generated at compile time? Is it

type Table struct{
Data uintptr
Stride int
Len []int
Cap []int
}

where there is only one type for all of the tables? This simplifies implementation, but is probably slower, and then to return [n]int you’d have to change the []int into [n]int. In addition, accessing the additional data field is more complicated code.

During the original public discussion, several people made the comment you did. However, no one provided examples of code that would be improved significantly by multi-dimensional tables. I don’t think any of the proposers are opposed to higher-dimensional tables, but it seemed better to keep things simple, and wait until real need is demonstrated before adding to the proposal. If I am wrong in that, the proposal can be modified, though I would need help in deciding implementation issues, as I don’t understand the complexities of having a bunch of different types.

>
> Your implementation of the slice operator says that the stride field
> does not change. This does let you pick out a contiguous submatrix
> from a matrix. But once the concept of the stride exists, it seems
> likely that people will want to be able to manipulate it. Many matrix
> operations, particularly in image processing, involve operations with
> different strides. Should there be direct support for this?

Could you provide some examples? The only one I can think of for matrices would be computing the trace of a matrix. I don’t see how modifying the stride could work as expected, unless you also allow people to change the length (and capacity). This would seem difficult to do correctly, especially given that the current table may have been a sliced version of a previous table.

>
> The range clause seems frankly awkward. Did you consider
> for i, j, v := range m
> That is, the range would iterate over the entire matrix, returning the
> row, column, and value. You can still iterate over a single row or
> column by taking a slice; you will still get both a row and column
> index, but one of the values would not change.

I understand that the syntax seems awkward at first glance, but I personally like it after using it a couple of times.

We did consider the syntax you suggest. In my experience, ranging over a row or column is much more common than ranging over all of the elements (for example, the BLAS libraries don’t have any functions that range over all elements directly). As a result, this case should be easy. If we look at the 3- element syntax, for matrix multiply it would look something like

for i := range a[:,0:1]{
for _, k, va := range a[i:i+1,:]{
for _, j, vb := range b[k:k+1, :]{
c[i,j] += va * vb
}
}
}

This seems to have a couple of problems. First, I would argue that it at least doesn’t look any better than the proposed syntax. Secondly, it doesn’t solve the case where ‘a’ has zero length in the second dimension (the above code will panic, and there is no way to write the code to accept zero length tables without additional checks). Third, now the idea of choosing a specific row is no longer compile-time verifiable. The user must ensure they have i:i+1 (and not something else), which on it’s own isn’t that bad, but I can imagine users putting two variables there. Lastly, I feel that it’s possible to write much more mind-warping code with the full table range. Seeing something like

for i, j, v := range a[2:7, :]{
for k, l, w := range b[2:4, 7:9]{
// Some operation
}
}

starts to get hard to figure out what’s going on.

>
> In more or less the same vein, did you consider having len, cap, and
> copy return multiple values, rathern [2]int? I guess that does make
> it harder to use them in expressions, though.

This is talked about in the discussion for length. It’s harder to use them in expressions, it’s harder to compare all of the lengths of tables, and it doesn’t extend well to higher dimensional tables.

>
> The reflect.SliceHeader type was a mistake, let's not introduce
> reflect.TableHeader.

Okay. I thought such an addition would be necessary.

> If you look at the C roots of Go, then, conceptually slices are
> bounded pointers, and arrays are the memory to which they refer. This
> is confused somewhat in the language because the make function, and
> composite literals, quietly create the array in the background and
> then give you the bounded pointer (the slice). You are conceptually
> introducing a strided slice, in which each access requires two
> indexes. But as I mentioned above the stride is implicit, and there
> is no way to set it explicitly. I wonder if there is some way to
> separate out the notion of striding, since that seems to be the key
> idea.

Possibly. I think there are really two ideas. The first is the strided slice, and the second is the idea that as far as the programmer is concerned, the data is laid out in a rectangle. In the discussion of down-slicing, I partially discuss the idea of having strided slices. I don’t really see how strided slices would help solve the problems we discuss here. Maybe I’m not understanding what you are suggesting, but when I think of a strided slice, I think of s[6] actually giving me 6*stride of the real underlying data array. For tables, we need to access both with a stride of 1 and with a stride of n. Furthermore, it doesn’t solve the issues of row-major vs. column-major, and it doesn’t help with bounds checking.

Thanks again for your detailed reply. Further thoughts are appreciated.



Michael Jones

unread,
Jul 15, 2014, 11:28:50 AM7/15/14
to Egon Elbre, golan...@googlegroups.com
Rather than discuss scientific computation as a concept, let's remember that multidimensional arrays are a fundamental data type for all kinds of computation. 

The C language was weak here but got by because it made it easy to "cheat" using pointer arithmetic and macros in CPP. Go is weak in the same way but discourages the cheating, which is the real issue. C++ allows cheating and nice packaging via inlines so it is a solved problem there. None of this is as good, actually, as the multidimensional array features of FORTRAN in 1959. That's what seems so strange to me. ;-)

I like the proposal, and would be happy to see something (anything) that makes multidimensional arrays a robust part of Go by supporting them at compile time (as now), dynamically, some reasonable range mechanisms, etc. I hope Brendan's proposal makes headway.


--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765

Brendan Tracey

unread,
Jul 15, 2014, 11:47:54 AM7/15/14
to Egon Elbre, golan...@googlegroups.com
On Jul 15, 2014, at 8:09 AM, Egon Elbre <egon...@gmail.com> wrote:

On Tuesday, 15 July 2014 13:19:02 UTC+3, Dan Kortschak wrote:
I'm afraid I don't really understand how the majority of this supports the first sentence.

Sorry, about the non sequitur. (Note to self, people can't read minds) :)

Anyways, the text in this post starts with supporting scientific computing. So to start with that, is this sufficient to provide all-around good scientific libraries and language?

- With messy data you are also dealing with NaN/NA-s, how well that would integrate with the rest of the language.

NaN’s are already in the language as part of IEEE-754.

- Although, it's not in the scope of this proposal, how would you do symbolic computation that would integrate with scientific libraries? This either means that we need some how extend the language more or find an alternative way to do it.

If one really wants to do that in Go, it seems like something that is a library issue, or, more likely, building a program in go to do such symbolic manipulation. Go is not about to implement Mathematica, nor should it. 

- How much help would this proposal be if you are talking about big-data? You will be communicating, splicing and dicing the data in multiple ways and does this improvement matter?
- And what about speed/optimizations for different targets cpu/gpu?

The proposal discusses the speed improvements and the slicing capabilities.

- Of course is 2D really sufficient?

Please see my reply to Ian.

- Operators for convenience? Reading scientific code with c.Plus(a.Mul(b)), is quite annoying.

Please see the discussion in the proposal.

- Also, scientific libs are often filled with a lot of generics; so you would still need to use code generation.

They are often filled with generics, but they don’t have to be. float64 and complex128 go a long way.


Basically, to me it looks like Go shouldn't address these scientific computation concerns and should be somehow separated. The only reasonable way is either to add some sort of way to extend Go language itself or use code generation. So this lead me to the two approaches I suggested.

They are separated. The proposal doesn’t include any of the extra behavior you suggest. This proposal just provides support for a rectangular data container.

So, would this proposal/implementation be an improvement for writing scientific libraries, yes. Would this be an improvement to the language, I'm not sure. Is it good enough/sufficient for any kind of scientific computing, I'm not so sure? 

I think so. My appeal to authority is that I have spent a lot of time in the Matlab world, and a significant amount amount of time in the C++  and Numpy worlds.


+ egon


On 15/07/2014, at 4:48 PM, "Egon Elbre" <egon...@gmail.com> wrote:

> Overall, I'm not convinced.
>
> Two approaches not mentioned are source-generation and runtime code-generation:
>
> Basically one option would be to add scientific additions to Go, that either generate Go code or generate code at runtime. Of course this can cause the language to fragment, as we have seen with javascript, so the additions should be chosen carefully.
>
> With just generating Go code, it's easy to communicate with regular code and given good enough libraries the end-user may not need to touch the scientific code generator at all.
>
> With generating code at runtime you can make interesting trade-offs. For example, you can generate specialized code for multiplying with a scalar before running it on a large matrix or you can do very agressive optimizations at the cost of code size. Also you can choose automatically the best methods for doing computations, instead of running on the CPU you generate code for the GPU. It has a significant drawback that the approach is more complicated and you are doing the compilation at runtime.
>
> PS. Struct version of AddMul can be simplified by introducing a method that returns the underlying row slice, this would also encourage algorithms to process by row:
>
> func (m *Dense) Row(r int) []float64 {
>         x := r*mat.Stride
>         return m.mat.Data[x:x+mat.Stride]
> }
>
> func (c *Dense) AddMul(a *Dense, b *Dense) {
>         for i := 0; i < c.mat.Rows; i += 1 {
>                 ci, ai, bi := c.Row(i), a.Row(i), b.Row(i)
>                 for j := range ci {
>                         ci[j] += ai[i]*bi[i]
>                 }
>         }
> }
>
> + egon

Egon Elbre

unread,
Jul 15, 2014, 12:03:58 PM7/15/14
to golan...@googlegroups.com, egon...@gmail.com

On Tuesday, 15 July 2014 18:47:54 UTC+3, Brendan Tracey wrote:

On Jul 15, 2014, at 8:09 AM, Egon Elbre <egon...@gmail.com> wrote:

On Tuesday, 15 July 2014 13:19:02 UTC+3, Dan Kortschak wrote:
I'm afraid I don't really understand how the majority of this supports the first sentence.

Sorry, about the non sequitur. (Note to self, people can't read minds) :)

Anyways, the text in this post starts with supporting scientific computing. So to start with that, is this sufficient to provide all-around good scientific libraries and language?

- With messy data you are also dealing with NaN/NA-s, how well that would integrate with the rest of the language.

NaN’s are already in the language as part of IEEE-754.
- Although, it's not in the scope of this proposal, how would you do symbolic computation that would integrate with scientific libraries? This either means that we need some how extend the language more or find an alternative way to do it.

If one really wants to do that in Go, it seems like something that is a library issue, or, more likely, building a program in go to do such symbolic manipulation. Go is not about to implement Mathematica, nor should it. 
- How much help would this proposal be if you are talking about big-data? You will be communicating, splicing and dicing the data in multiple ways and does this improvement matter?
- And what about speed/optimizations for different targets cpu/gpu?

The proposal discusses the speed improvements and the slicing capabilities.

- Of course is 2D really sufficient?

Please see my reply to Ian.

- Operators for convenience? Reading scientific code with c.Plus(a.Mul(b)), is quite annoying.

Please see the discussion in the proposal.

- Also, scientific libs are often filled with a lot of generics; so you would still need to use code generation.

They are often filled with generics, but they don’t have to be. float64 and complex128 go a long way.


Basically, to me it looks like Go shouldn't address these scientific computation concerns and should be somehow separated. The only reasonable way is either to add some sort of way to extend Go language itself or use code generation. So this lead me to the two approaches I suggested.

They are separated. The proposal doesn’t include any of the extra behavior you suggest. This proposal just provides support for a rectangular data container.

So, would this proposal/implementation be an improvement for writing scientific libraries, yes. Would this be an improvement to the language, I'm not sure. Is it good enough/sufficient for any kind of scientific computing, I'm not so sure? 

I think so. My appeal to authority is that I have spent a lot of time in the Matlab world, and a significant amount amount of time in the C++  and Numpy worlds.


Just to clarify, I'm not against this proposal. It just feels like I'm not getting the big picture how scientific computing will look in 20 years with Go. Of course you&others have been doing more scientific computing than me, so, if this really is sufficient, then I have no good argument against it. Of course, if it's not then eventually Go needs some new thing or libraries will be using code generation anyway or there will be some language workaround.

+ egon

Ian Lance Taylor

unread,
Jul 15, 2014, 1:27:56 PM7/15/14
to Brendan Tracey, golang-dev
On Tue, Jul 15, 2014 at 8:26 AM, Brendan Tracey
<tracey....@gmail.com> wrote:
>
> Implementation, on the other hand, is another issue. Is it
> type Table struct{
> Data uintptr
> Stride int
> Len [n]int
> Cap [n]int
> }
>
> where n is generated at compile time? Is it
>
> type Table struct{
> Data uintptr
> Stride int
> Len []int
> Cap []int
> }
>
> where there is only one type for all of the tables?

I think it is clearly the former. The type of an array with N
dimensions is not the same as the type of an array with M dimensions.
There is no advantage to using the latter representation.


> During the original public discussion, several people made the
> comment you did. However, no one provided examples of code that
> would be improved significantly by multi-dimensional tables. I don’t
> think any of the proposers are opposed to higher-dimensional tables,
> but it seemed better to keep things simple, and wait until real need
> is demonstrated before adding to the proposal. If I am wrong in
> that, the proposal can be modified, though I would need help in
> deciding implementation issues, as I don’t understand the
> complexities of having a bunch of different types.

Go generally aims to have a complete set of orthogonal concepts. If
we have 2-dimensional arrays, it seems to me that we should have
N-dimensional arrays.


>> Your implementation of the slice operator says that the stride field
>> does not change. This does let you pick out a contiguous submatrix
>> from a matrix. But once the concept of the stride exists, it seems
>> likely that people will want to be able to manipulate it. Many matrix
>> operations, particularly in image processing, involve operations with
>> different strides. Should there be direct support for this?
>
> Could you provide some examples? The only one I can think of for
> matrices would be computing the trace of a matrix. I don’t see how
> modifying the stride could work as expected, unless you also allow
> people to change the length (and capacity). This would seem
> difficult to do correctly, especially given that the current table
> may have been a sliced version of a previous table.

There is a simple example at
http://en.wikipedia.org/wiki/Stride_of_an_array . Or you can do a
simple zoom by fiddling with the stride. I know only slightly more
about image processing than about scientific computation, but my
understanding is that the stride is a common element of image access.
You can certainly it in, e.g., http://golang.org/pkg/image/#RGBA .


>> If you look at the C roots of Go, then, conceptually slices are
>> bounded pointers, and arrays are the memory to which they refer. This
>> is confused somewhat in the language because the make function, and
>> composite literals, quietly create the array in the background and
>> then give you the bounded pointer (the slice). You are conceptually
>> introducing a strided slice, in which each access requires two
>> indexes. But as I mentioned above the stride is implicit, and there
>> is no way to set it explicitly. I wonder if there is some way to
>> separate out the notion of striding, since that seems to be the key
>> idea.
>
> Possibly. I think there are really two ideas. The first is the
> strided slice, and the second is the idea that as far as the
> programmer is concerned, the data is laid out in a rectangle. In the
> discussion of down-slicing, I partially discuss the idea of having
> strided slices. I don’t really see how strided slices would help
> solve the problems we discuss here. Maybe I’m not understanding what
> you are suggesting, but when I think of a strided slice, I think of
> s[6] actually giving me 6*stride of the real underlying data
> array. For tables, we need to access both with a stride of 1 and
> with a stride of n. Furthermore, it doesn’t solve the issues of
> row-major vs. column-major, and it doesn’t help with bounds
> checking.

Those all seem to me to be aspects of the same idea. You can say that
the data is laid out in a rectangle, but that is just a point of view.
In reality it's laid out linearly in memory, and you are accessing it
in a way that makes it look like a rectangle. Since Go is a
relatively low-level language, matrixes of any dimension are always
going to boil down to some sort of bounded pointer access. What I'm
wondering is whether we can drop the idea of multi-dimensional arrays
and just try to focus on strided slices.

Clearly the use cases that matter for you require a form of strided
slice in which each element is itself a slice (it's sort of an array,
but we don't want that because we don't want to copy it). I have no
idea about syntax, but the effect would be a type where the element
type is known at compile time, but the bound and stride are not. So
the implementation would be something like

type StrideInfo struct {
len int
cap int
stride int
}

type StridedSlice [N]StrideInfo

The first index is governed by the first len/cap/stride, and looking
up the index gives you (depending on the type) either an element or
another len/cap/stride.

Ian

Brendan Tracey

unread,
Jul 15, 2014, 1:49:16 PM7/15/14
to Ian Lance Taylor, golang-dev

On Jul 15, 2014, at 10:27 AM, Ian Lance Taylor <ia...@golang.org> wrote:

> On Tue, Jul 15, 2014 at 8:26 AM, Brendan Tracey
> <tracey....@gmail.com> wrote:
>>
>> Implementation, on the other hand, is another issue. Is it
>> type Table struct{
>> Data uintptr
>> Stride int
>> Len [n]int
>> Cap [n]int
>> }
>>
>> where n is generated at compile time? Is it
>>
>> type Table struct{
>> Data uintptr
>> Stride int
>> Len []int
>> Cap []int
>> }
>>
>> where there is only one type for all of the tables?
>
> I think it is clearly the former. The type of an array with N
> dimensions is not the same as the type of an array with M dimensions.
> There is no advantage to using the latter representation.

Okay. This is something that’s better decided by the compiler writers, so I’d trust your judgement.


>> During the original public discussion, several people made the
>> comment you did. However, no one provided examples of code that
>> would be improved significantly by multi-dimensional tables. I don’t
>> think any of the proposers are opposed to higher-dimensional tables,
>> but it seemed better to keep things simple, and wait until real need
>> is demonstrated before adding to the proposal. If I am wrong in
>> that, the proposal can be modified, though I would need help in
>> deciding implementation issues, as I don’t understand the
>> complexities of having a bunch of different types.
>
> Go generally aims to have a complete set of orthogonal concepts. If
> we have 2-dimensional arrays, it seems to me that we should have
> N-dimensional arrays.

I don’t think anyone is opposed to the idea. We thought a more minimal specification would be more desirable. If that is not true the proposal can be modified.


>>> Your implementation of the slice operator says that the stride field
>>> does not change. This does let you pick out a contiguous submatrix
>>> from a matrix. But once the concept of the stride exists, it seems
>>> likely that people will want to be able to manipulate it. Many matrix
>>> operations, particularly in image processing, involve operations with
>>> different strides. Should there be direct support for this?
>>
>> Could you provide some examples? The only one I can think of for
>> matrices would be computing the trace of a matrix. I don’t see how
>> modifying the stride could work as expected, unless you also allow
>> people to change the length (and capacity). This would seem
>> difficult to do correctly, especially given that the current table
>> may have been a sliced version of a previous table.
>
> There is a simple example at
> http://en.wikipedia.org/wiki/Stride_of_an_array . Or you can do a
> simple zoom by fiddling with the stride. I know only slightly more
> about image processing than about scientific computation, but my
> understanding is that the stride is a common element of image access.
> You can certainly it in, e.g., http://golang.org/pkg/image/#RGBA .

If you look at the “crop” code, you’ll notice that the “stride” of the Image stays the same (crop passes widthStride to the image constructor). The crop function is almost exactly the same as subslicing the table (although the java code has an offset rather than an updated pointer). With tables, one may have
image := [,]color.RGBA
and one could zoom with
zoomedImage := image[a:b, c:d]
It seems like this would be similar to [][]T. When I access the first element, I get a []T, and when I access the second element I get T. The difference would be some form of memory layout?

In order to be an interesting alternative (for a Matrix implementation anyway), the strided slices would need to support efficient indexing and bounds checking. I don’t quite see how that would be accomplished.

Ian Lance Taylor

unread,
Jul 15, 2014, 5:13:24 PM7/15/14
to Brendan Tracey, golang-dev
The memory layout is the same. The difference is the type. I don't
have a good idea for syntax, but let's say that [+]T is a strided
slice of T. Then you would have [+][+]T. Indexing that will give you
[+]T, and indexing that will give you T.


> In order to be an interesting alternative (for a Matrix
> implementation anyway), the strided slices would need to support
> efficient indexing and bounds checking. I don’t quite see how that
> would be accomplished.

Whether it's v[i,j] or v[i][j], we are going to have to compare both i
and j against the relevant bounds. In your representation we compare
i with v.Len[0] and j with v.Len[1]. In mine we compare i with
v.Info[0].len and compare j with v.Info[1].len (here I'm using v.Info
to indicate the fact that a strided slice is an array of StrideInfo
structs). I don't think there would be a significant efficiency
difference.


By the way, while I'm interested in the topic, I want to be clear that
nothing in this area is going to change for 1.4 or any time soon.

Ian

Dan Kortschak

unread,
Jul 15, 2014, 5:59:57 PM7/15/14
to Ian Lance Taylor, Brendan Tracey, golang-dev
All of us who have a significant interest in this understand that this is a long term goal, but it is good to hear that members of the core team are interested.

Dan

Dan Kortschak

unread,
Jul 15, 2014, 11:11:47 PM7/15/14
to Ian Lance Taylor, Brendan Tracey, golang-dev
On Tue, 2014-07-15 at 10:27 -0700, Ian Lance Taylor wrote:
> > Could you provide some examples? The only one I can think of for
> > matrices would be computing the trace of a matrix. I don’t see how
> > modifying the stride could work as expected, unless you also allow
> > people to change the length (and capacity). This would seem
> > difficult to do correctly, especially given that the current table
> > may have been a sliced version of a previous table.
>
> There is a simple example at
> http://en.wikipedia.org/wiki/Stride_of_an_array . Or you can do a
> simple zoom by fiddling with the stride. I know only slightly more
> about image processing than about scientific computation, but my
> understanding is that the stride is a common element of image access.
> You can certainly it in, e.g., http://golang.org/pkg/image/#RGBA .


I don't think any of these operations involve mutating the stride. Doing
so in an image would result in a skewed image. In the matrix world
though, mutating a stride though can be seen as the operation used to
downslice or flatten a matrix.

However, having said that....


> Those all seem to me to be aspects of the same idea. You can say that
> the data is laid out in a rectangle, but that is just a point of view.
> In reality it's laid out linearly in memory, and you are accessing it
> in a way that makes it look like a rectangle. Since Go is a
> relatively low-level language, matrixes of any dimension are always
> going to boil down to some sort of bounded pointer access. What I'm
> wondering is whether we can drop the idea of multi-dimensional arrays
> and just try to focus on strided slices.
>
> Clearly the use cases that matter for you require a form of strided
> slice in which each element is itself a slice (it's sort of an array,
> but we don't want that because we don't want to copy it). I have no
> idea about syntax, but the effect would be a type where the element
> type is known at compile time, but the bound and stride are not. So
> the implementation would be something like
>
> type StrideInfo struct {
> len int
> cap int
> stride int
> }
>
> type StridedSlice [N]StrideInfo
>
> The first index is governed by the first len/cap/stride, and looking
> up the index gives you (depending on the type) either an element or
> another len/cap/stride.
>
... I really like this idea.

Dan

yiyus

unread,
Jul 16, 2014, 1:14:59 PM7/16/14
to golan...@googlegroups.com
On Tuesday, 15 July 2014 19:27:56 UTC+2, Ian Lance Taylor wrote:

type StrideInfo struct {
        len int
        cap int
        stride int
}

type StridedSlice [N]StrideInfo

The first index is governed by the first len/cap/stride, and looking
up the index gives you (depending on the type) either an element or
another len/cap/stride.

Wouldn't this mean that, for a value [N]StridedSlice s, we would need as many [N-1]StridedSlice values as elements in s? This is not optimal, specially for large matrices. In my opinion, a more useful type would be:

type StridedSlice struct { 
        len int 
        cap int 
        stride [N]int 

From the user point of view, this StridedSlice would have the same behaviour as the one suggested by Ian: when indexing a [N]StridedSlice, we would get a [N-1]StridedSlice for N > 1, and an element otherwise.

I am not sure being able to directly manipulate strides is necessary. In my experience with scientific code in Python and Fortran, it is enough to manipulate them indirectly, with slice operations. But I have not done any serious work with image processing.


--
- yy.

Nigel Tao

unread,
Jul 31, 2014, 7:28:49 PM7/31/14
to Brendan Tracey, golang-dev
On Tue, Jul 15, 2014 at 4:24 PM, Brendan Tracey
<tracey....@gmail.com> wrote:
> Still, Go was mostly devoid of libraries to support scientific computation,
> and a group of us started the Gonum project [0] to change that. One thing
> we've found as a result of our efforts is that matrices are a pain. Several
> months ago I created a detailed proposal [1] to add a new data structure to
> the language which will eliminate the problems we face.

We appreciate the amount of work and care that went into this
proposal, but we're sorry that it's not likely to be adopted, at least
for Go 1.

The problem isn't with the proposal itself. The reason is that the
language is stable: the bar for new language features is high, and is
only getting higher. The "Changes to the language" sections of
http://golang.org/doc/go1.1, http://golang.org/doc/go1.2 and
http://golang.org/doc/go1.3 are getting smaller, not bigger. The
changes that did happen were also minor compared to introducing
tables, which touch make, literals, indexing, slices, len/cap, copy,
range, reflect and the type system in general.

We don't doubt that adding tables to the language has benefits. Any
reasonable language feature has benefits, and the gonum-dev mailing
list's existence clearly shows that it would make some people very
happy. Yet even if tables were part of the language, it's hard to see
any packages in the standard library, or even in the
code.google.com/p/go.foo sub-repositories, that would use them. There
is a chicken-and-egg factor here, but it's still not encouraging for
tables.

The only candidate seems to be the image-related packages, but even
then, we can't change, for instance, the image.RGBA type in the Go 1.x
time frame, and even for a hypothetical Go 2.0, it's not clear that
changing the design of the image package is a win. One of the
motivations for the current design is that, when decoding from or
encoding to formats like GIF or PNG, it's useful to linearize the
rectangle of pixels as a []byte, as spoken by general-purpose
compressors like LZW and ZLIB. Another deliberate design decision,
based on Plan 9 GUI experience, was that the top-left of an image
isn't necessarily at (0, 0).

In any case, debating the proposal's benefits is secondary. To repeat
the main point, we value API and language stability very highly. Yes,
the proposal is backwards-compatible, but it's a feature request, not
a bug fix, and we err on the side of making no changes.

As an alternative, one could define a computational language a la
halide-lang, and write a program that worked with "go generate". This
program would parse the specialized code and generate Go 1.x code
(which possibly uses package unsafe for pointer arithmetic), or
generate C code, or generate 6a-compatible assembly code, or generate
GPU-specific code. Of course, this still requires finding someone to
do the work, but that person or group of people don't have to be
familiar with the runtime and compilers, blocked on Go's release
cycles, or bound by the Go 1 compatibility promise.

Brendan Tracey

unread,
Jul 31, 2014, 10:37:19 PM7/31/14
to Nigel Tao, golang-dev

On Jul 31, 2014, at 4:28 PM, Nigel Tao <nige...@golang.org> wrote:

> On Tue, Jul 15, 2014 at 4:24 PM, Brendan Tracey
> <tracey....@gmail.com> wrote:
>> Still, Go was mostly devoid of libraries to support scientific computation,
>> and a group of us started the Gonum project [0] to change that. One thing
>> we've found as a result of our efforts is that matrices are a pain. Several
>> months ago I created a detailed proposal [1] to add a new data structure to
>> the language which will eliminate the problems we face.
>
> We appreciate the amount of work and care that went into this
> proposal, but we're sorry that it's not likely to be adopted, at least
> for Go 1.

Thank you for taking the time to construct a lengthy reply. It means a lot after the time invested in constructing the proposal.

Obviously I’m saddened by the decision. I came to Go to dodge the tradeoff between speed, safety, and expressiveness. Obviously this is still largely true, but I’m sad that the one caveat will likely remain.

There is a chicken-and-egg factor with the standard library and the lack of tables, but it’s more that the Go team is not interested in having basic numeric building blocks in the standard library. This isn’t a complaint -- it’s something I knew coming to the langauge. Were the Go team interested in such an endeavor, they would find a use for tables in Matrices, and basic things that require matrices such as sampling multi-variate Gaussians and function optimization software (BFGS, many convex functions etc.). They would find use for non-matrix tables when writing sampling algorithms for multi-dimensional spaces (such as latin-hypercube), and writing Machine learning algorithms that take in a list of identically-sized feature vectors. Tables are a great way to represent a structured computational mesh when solving partial differential equations such as the Poisson equation or the Navier-Stokes equations. Again, this is not a criticism, just a statement that tables are widely useful for the kinds of programs the Gonum team is interested in supporting, even though they may not be useful for those the core team wishes to support.

I understand that among Go’s biggest strengths is its simplicity, and it is only after spending some time trying to use Go 1 did I come to the conclusion tables carry their weight. However, I understand the reluctance of the core team to add to the language in order to support a (hopefully only currently) small section of the language userbase.

Even with the decision, I still feel Go is a better language for scientific computing than C++ or Python (especially so when we can find more time to fully build Gonum). I would like to know if the Go team sees any priority in supporting numeric programming. For example, is it in the vision that code like s[i*stride + j] will become as efficient as t[i,j] could be, or would such optimizations slow down compilation times too much (same goes for SIMD support)? I have heard that when go moves to a bump-the-pointer allocator, it may no longer be possible to pass pointers to C functions. Have I understood that correctly? I ask, because currently we at Gonum support calling C blas/lapack libraries if users wish to supply them. Is this a feature that we can continue to provide?

Speak for the Gonum team, we are interested in working with the Go team to make the language better. I can’t write a compiler or in assembly, but I can help (and have helped) by providing benchmarks, bug reports, etc. If there are ways we can help, please let us know. We have already discovered some of Go’s unique potential to scientific computing (the Matrix interface is very cool), and I have only scratched the surface when it comes Go’s ability to provide easy distributed memory computation.

Thank you for the time you put into the consideration of the proposal.

Dan Kortschak

unread,
Aug 1, 2014, 1:31:13 AM8/1/14
to Nigel Tao, Brendan Tracey, golang-dev
On Fri, 2014-08-01 at 09:28 +1000, Nigel Tao wrote:
> On Tue, Jul 15, 2014 at 4:24 PM, Brendan Tracey
> <tracey....@gmail.com> wrote:
> > Still, Go was mostly devoid of libraries to support scientific computation,
> > and a group of us started the Gonum project [0] to change that. One thing
> > we've found as a result of our efforts is that matrices are a pain. Several
> > months ago I created a detailed proposal [1] to add a new data structure to
> > the language which will eliminate the problems we face.
>
> We appreciate the amount of work and care that went into this
> proposal, but we're sorry that it's not likely to be adopted, at least
> for Go 1.

This makes me sad, but I have a lot of patience.


> In any case, debating the proposal's benefits is secondary. To repeat
> the main point, we value API and language stability very highly. Yes,
> the proposal is backwards-compatible, but it's a feature request, not
> a bug fix, and we err on the side of making no changes.
>
> As an alternative, one could define a computational language a la
> halide-lang, and write a program that worked with "go generate". This
> program would parse the specialized code and generate Go 1.x code
> (which possibly uses package unsafe for pointer arithmetic), or
> generate C code, or generate 6a-compatible assembly code, or generate
> GPU-specific code. Of course, this still requires finding someone to
> do the work, but that person or group of people don't have to be
> familiar with the runtime and compilers, blocked on Go's release
> cycles, or bound by the Go 1 compatibility promise.

The unfortunate consequence of this is that researchers using
computational tools are in this context required to be able to review
more and more complex languages. In the domain of scientific
computation, ease of review is a sorely missing attribute that Go could
well fit the niche of. As the environment stands, people don't review
and we have garbage as a result, leading to garbage results that are
published and accepted on the basis of trust in the code. Trust that is
often wildly misfounded.

Dan

Nigel Tao

unread,
Aug 5, 2014, 3:25:12 AM8/5/14
to Brendan Tracey, golang-dev
On Fri, Aug 1, 2014 at 12:37 PM, Brendan Tracey
<tracey....@gmail.com> wrote:
> I would like to know if the Go team sees any priority in supporting numeric programming. For example, is it in the vision that code like s[i*stride + j] will become as efficient as t[i,j] could be, or would such optimizations slow down compilation times too much (same goes for SIMD support)?

I imagine that some optimizations will be acceptable, but compiler
folk like Russ have already stated a preference for simpler, more
maintainable code as opposed to performance trumping everything else.
For example, https://codereview.appspot.com/107180044/ was deemed to
complicated for the performance gain.

Ignoring the fact that s[i*stride + j] won't always panic where t[i,j]
would, I personally think it plausible that 6g could get better on
hoisting some things like bounds checking outside loops, or optimizing
succesive a[x+i]; i++ things as *p++, but we have to draw the line
somewhere. I skimmed the 2nd edition Dragon book earlier this year,
and one of my (possibly mistaken) impressions is that you can write
Ph. D. dissertations on squeezing the most out of matrix-like
computations and SIMD, but that level of complexity seems out of scope
for 6g. As I said, writing a separate program to produce 6a code is a
feasible solution. I'm sorry that there doesn't appear to be anyone
currently with all three of the skills, time and motivation to do so.


> I have heard that when go moves to a bump-the-pointer allocator, it may no longer be possible to pass pointers to C functions. Have I understood that correctly? I ask, because currently we at Gonum support calling C blas/lapack libraries if users wish to supply them. Is this a feature that we can continue to provide?

I don't know enough to answer this one.


> Speak for the Gonum team, we are interested in working with the Go team to make the language better. I can't write a compiler or in assembly, but I can help (and have helped) by providing benchmarks, bug reports, etc. If there are ways we can help, please let us know.

If something comes up, we'lll let you know.

David Crawshaw

unread,
Aug 5, 2014, 8:44:48 AM8/5/14
to Brendan Tracey, Nigel Tao, golang-dev
On Thu, Jul 31, 2014 at 10:37 PM, Brendan Tracey
<tracey....@gmail.com> wrote:
> I have heard that when go moves to a bump-the-pointer allocator, it may no longer be possible to pass pointers to C functions. Have I understood that correctly? I ask, because currently we at Gonum support calling C blas/lapack libraries if users wish to supply them. Is this a feature that we can continue to provide?

This is being discussed on https://golang.org/issue/8310
Reply all
Reply to author
Forward
0 new messages