Block-based data structures and the Type Parameters - Draft Design

166 views
Skip to first unread message

Andrew Werner

unread,
Jun 20, 2020, 1:23:16 AM6/20/20
to golan...@googlegroups.com
[The body of this email is a duplication of the README in https://github.com/ajwerner/go2dequeue/ which also contains the sample implementation]

Exercise building a dequeue with the go2 Type Parameter Draft

This project is an exploration with the go2go / Type Parameters - Design Draft with an eye towards block-based data structures which promote access locality.

Such data structures traditionally utilize blocks of objects laid out contiguously in memory. In go, today, one generally specializes such data structures which are performance critical. This is especially important in the case of reducing allocations which occur on critical paths. Using a sync.Pool can help with allocations in some cases but can be clunky to use and, well, do create more pointers for GC to worry about and don't have the access locality.

Whether it's really important to have data structures which include arrays of other data structures laid out contiguously in RAM is really important is sort of orthogonal. Let's just assume that it is for now and look at how we'd do it. We only need to consider rejecting the importance of this case if we can't find a good solution or idiom to support it. The google/btree library README links this Google Open Source Blog Post on block-based container performance indicating it does matter. The interesting thing about that library is it hardly gets the access locality of the C++ library in the blog post it references.

The challenge this library explores is to layout blocks in structures contiguously in RAM while allowing the user to have some control over that block size.

This example

The approach this experiment takes is to allow the users to specify the block size by providing a function to map an array type to a slice. The allows the blocks to contain a slice which references the array. The overhead to maintain the slice header is 3 words but the cost is probably less in memory and more in the optimizations the compiler will be less able to perform. In particular, it probably will need to do bounds checking on that slice and there are probably some opportunities to avoid the pointer interactions. The interface ends up being pretty reasonable though a little bit ... opaque?

What this looked like in this demo is*

// Say we want to create a dequeue of things.
type Thing struct { Foo int64, Bar int64, ID [16]byte }
// The below incantation will create one with blocks of 37 elements.
d := NewWithArray(func(a *[37]Thing) []Thing { return (*a)[:] })
// The nodes in the linked list of this structure are 1240 bytes, not that
// that's a great number or anything like that, just saying it's pretty big.
// Maybe there'd be something good to get out of aligning things are cache line
// sizes. See TestNodeSize.

Other things you could do

Use a slice for the buffer

A different approach would be to just have the array in the node, take a buffer size and then keep a sync.Pool or freelist of slices. Today's github.com/google/btree does this but way worse in that it just keeps a slice of btree.Item. So fine, it's be two objects, the nodes and their item buffers. Maybe that is the answer but it's pretty lame. My feeling is once somebody is optimizing a data structure for access locality they are trying to optimize as far as they can go.

Ignore or augment the language generics with manual or automatic specialization

Today, in performance critical cases, people use a variety of automatic or manual specialization. There are tools like https://github.com/cheekybits/genny or https://github.com/mmatczuk/go_generics. Note that the latter has been used to good effect in CockroachDB for building specialized copy-on-write, interval B-trees.

One answer to this problem might just be that go's generics solution doesn't solve for the problem of specializing block-based data structures. That's not a particularly satisfying answer.

What I might have expected / guess I'm proposing

Type lists are a thing for utilizing language features which only apply to certain kinds of types. There should be a way to create a type list for arrays of types.

As a first pass I would have anticipated that there would be a way to inform this package of a buffer size maybe drawn from a fixed number of sizes. For example:

type BlockSize int

const (
    _ BlockSize = 1 << (4*iota)
    Block16
    Block256
)

// Array here allows
type Array(type T) interface {
    type [Block16]T, [Block256]T
}

func New(type T)(blockSize BlockSize) Dequeue(T) {
    switch blockSize {
    case Block16: 
        return &(dequeue(T, [Block16]T){ ... })
    case Block256:
        return &(dequeue(T, [Block16]T){})
    }
}

type dequeue(type T interface{}, ArrayT Array(T)) struct {
    len
}

type node(type T interface{}, ArrayT Array(T)) struct {
    left, right *node(T, ArrayT)
    rb ringBuf(T, ArrayT)
}

type ringBuf(type T interface{}, ArrayT Array(T)) struct {
    head, len int
    // buf is known to be an array type with a size of either 16 or 256 so
    // I'd expect to be able to interact with this like an array. I don't think
    // that that works today.
    buf ArrayT 
}

The above also seems needlessly clunky. I'd prefer if I could just represent all array types like this:

type Array(type T) interface {
    [...]T
}

Then I could do the following and everything else should just fall out.

func New(type T interface{}, ArrayT Array(T))() Dequeue(T) {
    return &(dequeue(T, ArrayT){})
}

The above feels to me to be compatible with the goals and idioms of the proposal but I could be missing out on some major complexity. Hope y'all find this helpful.

Aside

It would be cool to be able to embed a generic type in another generics type. I can see why it's annoying because the name of the field will get totally mangled in the specialization but maybe that's not so bad? Maybe you'd actually call the embedded type by its unspecialized type name even in the specialized version.

I would have liked to embed the ringBuf in the node. Given it's not embedded it's almost not worth abstracting.

David Finkel

unread,
Jun 21, 2020, 12:35:53 PM6/21/20
to Andrew Werner, golang-nuts
Hmm, this function signature would require being able to return different types based on a non-type-parameter, which would require Dequeue to be an interface (I don't see a definition, so I'm not sure)

From what I can tell, though, if you make the array-type a type-parameter directly, though, this becomes workable in the current proposal (although rather awkward for users).
e.g.

type ArrayT(type T) interface {
        type [3]T, [4]T
}

func NewArray(
type T ArrayT(V), V interface{})() T {
         var v T
         return v
}

    switch blockSize {
    case Block16: 
        return &(dequeue(T, [Block16]T){ ... })
    case Block256:
        return &(dequeue(T, [Block16]T){})
nit: I assume the second type-parameter for this case statement should be Block256.
    }
}

type dequeue(type T interface{}, ArrayT Array(T)) struct {
    len
}

type node(type T interface{}, ArrayT Array(T)) struct {
    left, right *node(T, ArrayT)
    rb ringBuf(T, ArrayT)
}

type ringBuf(type T interface{}, ArrayT Array(T)) struct {
    head, len int
    // buf is known to be an array type with a size of either 16 or 256 so
    // I'd expect to be able to interact with this like an array. I don't think
    // that that works today.
    buf ArrayT 
}

The above also seems needlessly clunky. I'd prefer if I could just represent all array types like this:

type Array(type T) interface {
    [...]T
}
I like that approach to distinguishing between slices and arrays in type-lists.
Nit: I think it would still need the type keyword, so it would look like:

type Array(type T) interface {
        type [...]T
}

I feel like there must be an interaction with something else that I'm missing, though. Since there is some discussion about limitations with respect to the lack of non-type parameters in the omissions section. (although not about arrays in type-list constraints, specifically)

Then I could do the following and everything else should just fall out.

func New(type T interface{}, ArrayT Array(T))() Dequeue(T) {
    return &(dequeue(T, ArrayT){})
}

The above feels to me to be compatible with the goals and idioms of the proposal but I could be missing out on some major complexity. Hope y'all find this helpful.

Aside

It would be cool to be able to embed a generic type in another generics type. I can see why it's annoying because the name of the field will get totally mangled in the specialization but maybe that's not so bad? Maybe you'd actually call the embedded type by its unspecialized type name even in the specialized version.

I would have liked to embed the ringBuf in the node. Given it's not embedded it's almost not worth abstracting.

I think this is already addressed in the current design. You are allowed to directly embed type-parameters, and they have the name of the template argument (rather than its instantiated value). 

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CA%2BvRuzOYXz1t6CsWLd6gxS4AErRNCwA_ctdSYiu3Mq-YH_TEDA%40mail.gmail.com.

Andrew Werner

unread,
Jun 21, 2020, 12:43:51 PM6/21/20
to golang-nuts
Thanks for the reply! If I read it correctly, it is already possible to specify a type constraint as an Array of a given size and so the first part of the "What I might have expected" actually does work. I noticed that it works in that it works as a type constraint but I found that types constrained that way do not support indexing or slicing. Perhaps that's just a limitation of the prototype and not the proposal.

The other thing that I think I'm proposing (thanks for the correction) is a way to represent all arrays of a given type:


type Array(type T) interface {
    type
[...]T
}

Thanks for dealing with my very long-winded way of saying that. 

Hmm, maybe I was just holding it wrong and the proposal in "What I might have expected" already is the proposal.
 

    switch blockSize {
    case Block16: 
        return &(dequeue(T, [Block16]T){ ... })
    case Block256:
        return &(dequeue(T, [Block16]T){})
nit: I assume the second type-parameter for this case statement should be Block256.
    }
}

type dequeue(type T interface{}, ArrayT Array(T)) struct {
    len
}

type node(type T interface{}, ArrayT Array(T)) struct {
    left, right *node(T, ArrayT)
    rb ringBuf(T, ArrayT)
}

type ringBuf(type T interface{}, ArrayT Array(T)) struct {
    head, len int
    // buf is known to be an array type with a size of either 16 or 256 so
    // I'd expect to be able to interact with this like an array. I don't think
    // that that works today.
    buf ArrayT 
}

The above also seems needlessly clunky. I'd prefer if I could just represent all array types like this:

type Array(type T) interface {
    [...]T
}
I like that approach to distinguishing between slices and arrays in type-lists.
Nit: I think it would still need the type keyword, so it would look like:

type Array(type T) interface {
        type [...]T
}
Yes, definitely, my bad.  

I feel like there must be an interaction with something else that I'm missing, though. Since there is some discussion about limitations with respect to the lack of non-type parameters in the omissions section. (although not about arrays in type-list constraints, specifically)

Then I could do the following and everything else should just fall out.

func New(type T interface{}, ArrayT Array(T))() Dequeue(T) {
    return &(dequeue(T, ArrayT){})
}

The above feels to me to be compatible with the goals and idioms of the proposal but I could be missing out on some major complexity. Hope y'all find this helpful.

Aside

It would be cool to be able to embed a generic type in another generics type. I can see why it's annoying because the name of the field will get totally mangled in the specialization but maybe that's not so bad? Maybe you'd actually call the embedded type by its unspecialized type name even in the specialized version.

I would have liked to embed the ringBuf in the node. Given it's not embedded it's almost not worth abstracting.

I think this is already addressed in the current design. You are allowed to directly embed type-parameters, and they have the name of the template argument (rather than its instantiated value). 
Cool, I'll try it out. I found it not compiling with `go tool go2go` in my first pass. Maybe I was holding it wrong. 

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.

roger peppe

unread,
Jun 22, 2020, 7:58:43 AM6/22/20
to Andrew Werner, golang-nuts
Thanks for the interesting use case, Andrew!

I've experimented with a slightly different approach:


It expects the caller to implement a method on the array to get a reference to the
underlying storage, but that's fairly trivial to implement.

I've built it on top of the "list" package from the stdlib (available in the dev.go2go branch under src/cmd/go2go/testdata/go2path).

I think it has the potential to be a bit more performant as the type-hiding abstraction (Block in my example) is inside the static type, so most calls can be direct rather than indirect.

It's almost possible to make this implementation use a slice-based implementation of Block too (thus allowing dynamically specified block sizes at the cost of an extra indirection through the linked list node). That's not possible because the pointer method is invoked on the zero-value of the type, so without using a global variable, there's no way for the method to find out the required block size. A way to work around that is to use another level of indirection instead of using pointer methods directly:


This makes me wonder whether the pointer methods restriction in the generics draft proposal might not be unnecessary and perhaps not necessarily a good idea. ISTM that the only reason for restricting methods to pointer-only is so that they can be called on zero values, and that necessarily restricts the available parameterisation to static-only (or using global variables). I suspect that might turn out to be an anti-pattern. Instead, using function or interface value that operates on the pointer is strictly more general, I think, and doesn't suffer from that problem.

  cheers,
    rog.


To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/2c712c9d-2a5b-44a9-b783-15585a6654b1o%40googlegroups.com.

Andrew Werner

unread,
Jun 22, 2020, 9:26:07 AM6/22/20
to golang-nuts
Hey Rog,

I think I sent you a private reply earlier. My bad. I now see what you're proposing in the first proposal think it makes a lot of sense.

The thing that I think I had missed was this piece of magic:
// New returns a new Dequeue instance using B as the block storage backing.
func
New(type T interface{}, *B Block(T))() *Dequeue(T) {

One concern I have here is that the `Slice` method is on the array itself so it's possible that invoking it will copy the entire array to the stack. Does that concern you?

The second proposal feels almost identical to what I proposed. Can you help me understand how it differs?

roger peppe

unread,
Jun 22, 2020, 9:46:22 AM6/22/20
to Andrew Werner, golang-nuts
On Mon, 22 Jun 2020 at 14:26, Andrew Werner <awer...@gmail.com> wrote:
Hey Rog,

I think I sent you a private reply earlier. My bad. I now see what you're proposing in the first proposal think it makes a lot of sense.

The thing that I think I had missed was this piece of magic:
// New returns a new Dequeue instance using B as the block storage backing.
func
New(type T interface{}, *B Block(T))() *Dequeue(T) {

One concern I have here is that the `Slice` method is on the array itself so it's possible that invoking it will copy the entire array to the stack. Does that concern you?

It won't copy the array because it's a pointer method (that's what the "*B" means).
In fact it must not copy the array because then it couldn't return a reference to it.

The second proposal feels almost identical to what I proposed. Can you help me understand how it differs?

It is indeed a similar approach. The main difference (beyond the somewhat simpler dequeue implementation itself) is that my New method returns a concrete type, not an interface. This means that it's easier to jump to the definition of its method implementations, potentially more performant because it can use direct rather than indirect calls (*), and arguably more idiomatic Go. It also means it's possible to add more methods without breaking backward compatibility, which is a significant benefit in my view.

  cheers,
    rog.

(*) you'll still get some indirect calls, amortised over the buffer size.
 
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/24115910-1de9-4ae4-bfc9-c9c8ecdbf539o%40googlegroups.com.

Andrew Werner

unread,
Jun 22, 2020, 9:59:49 AM6/22/20
to roger peppe, golang-nuts


On Mon, Jun 22, 2020 at 9:45 AM roger peppe <rogp...@gmail.com> wrote:


On Mon, 22 Jun 2020 at 14:26, Andrew Werner <awer...@gmail.com> wrote:
Hey Rog,

I think I sent you a private reply earlier. My bad. I now see what you're proposing in the first proposal think it makes a lot of sense.

The thing that I think I had missed was this piece of magic:
// New returns a new Dequeue instance using B as the block storage backing.
func
New(type T interface{}, *B Block(T))() *Dequeue(T) {

One concern I have here is that the `Slice` method is on the array itself so it's possible that invoking it will copy the entire array to the stack. Does that concern you?

It won't copy the array because it's a pointer method (that's what the "*B" means).
In fact it must not copy the array because then it couldn't return a reference to it.

Oh! It’s saying that *B implements the constraint, nifty. Is this idea in the current proposal? I agree that it’ll do the trick in the same way as your second proposal and my proposal and is even a bit cleaner. That being said, it still seems worse than having a mechanism to indicate that a type really is an array and can be syntactically used as such, no? 

I suppose if the Slice method gets reasonably inclined the compiler could do good things. 

I think I’m in favor of this proposal in addition to my proposal that allows a type to be declared as an array and to thus be used syntactically as an array. 


The second proposal feels almost identical to what I proposed. Can you help me understand how it differs?

It is indeed a similar approach. The main difference (beyond the somewhat simpler dequeue implementation itself) is that my New method returns a concrete type, not an interface. This means that it's easier to jump to the definition of its method implementations, potentially more performant because it can use direct rather than indirect calls (*), and arguably more idiomatic Go. It also means it's possible to add more methods without breaking backward compatibility, which is a significant benefit in my view.

  cheers,
    rog.

(*) you'll still get some indirect calls, amortised over the buffer size.

Ack, sure, thanks for clarifying. No quibbles from me on this front. 

FWIW the Alloc* methods came from some experience with these data structures where I want to decode a stream of structs off the network or something so I want pointers to zero values to then pass to Decode or Unmarshal. I’ll admit it feels weird without the context. 

roger peppe

unread,
Jun 22, 2020, 10:17:01 AM6/22/20
to Andrew Werner, golang-nuts
On Mon, 22 Jun 2020 at 14:59, Andrew Werner <awer...@gmail.com> wrote:


On Mon, Jun 22, 2020 at 9:45 AM roger peppe <rogp...@gmail.com> wrote:


On Mon, 22 Jun 2020 at 14:26, Andrew Werner <awer...@gmail.com> wrote:
Hey Rog,

I think I sent you a private reply earlier. My bad. I now see what you're proposing in the first proposal think it makes a lot of sense.

The thing that I think I had missed was this piece of magic:
// New returns a new Dequeue instance using B as the block storage backing.
func
New(type T interface{}, *B Block(T))() *Dequeue(T) {

One concern I have here is that the `Slice` method is on the array itself so it's possible that invoking it will copy the entire array to the stack. Does that concern you?

It won't copy the array because it's a pointer method (that's what the "*B" means).
In fact it must not copy the array because then it couldn't return a reference to it.

Oh! It’s saying that *B implements the constraint, nifty. Is this idea in the current proposal?

 
I agree that it’ll do the trick in the same way as your second proposal and my proposal and is even a bit cleaner. That being said, it still seems worse than having a mechanism to indicate that a type really is an array and can be syntactically used as such, no? 

I'm not sure. ISTM that there is a reasonable workaround and that the alternative might significantly complicate the generics proposal. The authors aren't trying to address every possible generics use case, especially when it's possible to work around the problem, and I like that approach :)

I suppose if the Slice method gets reasonably inclined the compiler could do good things. 

I think I’m in favor of this proposal in addition to my proposal that allows a type to be declared as an array and to thus be used syntactically as an array. 


The second proposal feels almost identical to what I proposed. Can you help me understand how it differs?

It is indeed a similar approach. The main difference (beyond the somewhat simpler dequeue implementation itself) is that my New method returns a concrete type, not an interface. This means that it's easier to jump to the definition of its method implementations, potentially more performant because it can use direct rather than indirect calls (*), and arguably more idiomatic Go. It also means it's possible to add more methods without breaking backward compatibility, which is a significant benefit in my view.

  cheers,
    rog.

(*) you'll still get some indirect calls, amortised over the buffer size.

Ack, sure, thanks for clarifying. No quibbles from me on this front. 

FWIW the Alloc* methods came from some experience with these data structures where I want to decode a stream of structs off the network or something so I want pointers to zero values to then pass to Decode or Unmarshal. I’ll admit it feels weird without the context.  

Note that with the concrete type return, it would be possible to add Alloc* methods at a later stage without breaking backward compatibility.

As noted in a comment in my code, I think that it would be nice to add similar methods to the list package too - they wouldn't have been useful before with the interface{} value approach, but definitely are now (for example you could have an in-element mutex).

  cheers,
    rog.

Andrew Werner

unread,
Jun 22, 2020, 10:57:01 AM6/22/20
to golang-nuts


On Monday, June 22, 2020 at 10:17:01 AM UTC-4, rog wrote:


On Mon, 22 Jun 2020 at 14:59, Andrew Werner <awer...@gmail.com> wrote:


On Mon, Jun 22, 2020 at 9:45 AM roger peppe <rogp...@gmail.com> wrote:


On Mon, 22 Jun 2020 at 14:26, Andrew Werner <awer...@gmail.com> wrote:
Hey Rog,

I think I sent you a private reply earlier. My bad. I now see what you're proposing in the first proposal think it makes a lot of sense.

The thing that I think I had missed was this piece of magic:
// New returns a new Dequeue instance using B as the block storage backing.
func
New(type T interface{}, *B Block(T))() *Dequeue(T) {

One concern I have here is that the `Slice` method is on the array itself so it's possible that invoking it will copy the entire array to the stack. Does that concern you?

It won't copy the array because it's a pointer method (that's what the "*B" means).
In fact it must not copy the array because then it couldn't return a reference to it.

Oh! It’s saying that *B implements the constraint, nifty. Is this idea in the current proposal?

 
I agree that it’ll do the trick in the same way as your second proposal and my proposal and is even a bit cleaner. That being said, it still seems worse than having a mechanism to indicate that a type really is an array and can be syntactically used as such, no? 

I'm not sure. ISTM that there is a reasonable workaround and that the alternative might significantly complicate the generics proposal. The authors aren't trying to address every possible generics use case, especially when it's possible to work around the problem, and I like that approach :)

Cool, I'm on board with this view. I'll adopt this approach and see how it feels. Hopefully with aggressive inlining the trip through the compiler will be able to optimize the access as though the array were being indexes directly.
I'm just re-visiting this now and I wonder if it's what I want. While indeed you can embed a type parameter, it's not clear that you can embed a type that is parameterized on a type-parameter. For example the following example is definitely kosher but the one after it is not as obviously kosher. I don't think I was clear enough in my problem statement but I'm requesting that the latter be allowed and that the embedded field in that case be `List`. Am I still misunderstanding something?

type Foo(type T) struct {
   T
// Ok
   
// ...
}

type Foo(type T) struct {
   list
.List(T) // Seemingly not ok
   
// ...
}



 

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CA%2BvRuzOYXz1t6CsWLd6gxS4AErRNCwA_ctdSYiu3Mq-YH_TEDA%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/2c712c9d-2a5b-44a9-b783-15585a6654b1o%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.

ajwe...@cockroachlabs.com

unread,
Jun 22, 2020, 12:47:22 PM6/22/20
to golang-nuts


On Monday, June 22, 2020 at 10:17:01 AM UTC-4, rog wrote:


On Mon, 22 Jun 2020 at 14:59, Andrew Werner <awer...@gmail.com> wrote:


On Mon, Jun 22, 2020 at 9:45 AM roger peppe <rogp...@gmail.com> wrote:


On Mon, 22 Jun 2020 at 14:26, Andrew Werner <awer...@gmail.com> wrote:
Hey Rog,

I think I sent you a private reply earlier. My bad. I now see what you're proposing in the first proposal think it makes a lot of sense.

The thing that I think I had missed was this piece of magic:
// New returns a new Dequeue instance using B as the block storage backing.
func
New(type T interface{}, *B Block(T))() *Dequeue(T) {

One concern I have here is that the `Slice` method is on the array itself so it's possible that invoking it will copy the entire array to the stack. Does that concern you?

It won't copy the array because it's a pointer method (that's what the "*B" means).
In fact it must not copy the array because then it couldn't return a reference to it.

Oh! It’s saying that *B implements the constraint, nifty. Is this idea in the current proposal?

 
I agree that it’ll do the trick in the same way as your second proposal and my proposal and is even a bit cleaner. That being said, it still seems worse than having a mechanism to indicate that a type really is an array and can be syntactically used as such, no? 

I'm not sure. ISTM that there is a reasonable workaround and that the alternative might significantly complicate the generics proposal. The authors aren't trying to address every possible generics use case, especially when it's possible to work around the problem, and I like that approach :)

Thanks for pointing this out. I'm cool with this approach. I'll update my library to utilize it (and consider also adopting the list.List, though I do like my freedom to pool list nodes). Hopefully with aggressive inlining of the specialized type the compiler will be able to make the optimizations that it would be able to make if the slicing were performed directly on an array.



I suppose if the Slice method gets reasonably inclined the compiler could do good things. 

I think I’m in favor of this proposal in addition to my proposal that allows a type to be declared as an array and to thus be used syntactically as an array. 


The second proposal feels almost identical to what I proposed. Can you help me understand how it differs?

It is indeed a similar approach. The main difference (beyond the somewhat simpler dequeue implementation itself) is that my New method returns a concrete type, not an interface. This means that it's easier to jump to the definition of its method implementations, potentially more performant because it can use direct rather than indirect calls (*), and arguably more idiomatic Go. It also means it's possible to add more methods without breaking backward compatibility, which is a significant benefit in my view.

  cheers,
    rog.

(*) you'll still get some indirect calls, amortised over the buffer size.

Ack, sure, thanks for clarifying. No quibbles from me on this front. 

FWIW the Alloc* methods came from some experience with these data structures where I want to decode a stream of structs off the network or something so I want pointers to zero values to then pass to Decode or Unmarshal. I’ll admit it feels weird without the context.  

Note that with the concrete type return, it would be possible to add Alloc* methods at a later stage without breaking backward compatibility.

As noted in a comment in my code, I think that it would be nice to add similar methods to the list package too - they wouldn't have been useful before with the interface{} value approach, but definitely are now (for example you could have an in-element mutex).

:+1:


  cheers,
    rog.

  
 

roger peppe

unread,
Jun 22, 2020, 1:55:51 PM6/22/20
to Andrew Werner, golang-nuts
On Mon, 22 Jun 2020 at 15:57, Andrew Werner <awer...@gmail.com> wrote:
On Monday, June 22, 2020 at 10:17:01 AM UTC-4, rog wrote
On Mon, 22 Jun 2020 at 14:59, Andrew Werner <awer...@gmail.com> wrote:
Oh! It’s saying that *B implements the constraint, nifty. Is this idea in the current proposal?

 
I agree that it’ll do the trick in the same way as your second proposal and my proposal and is even a bit cleaner. That being said, it still seems worse than having a mechanism to indicate that a type really is an array and can be syntactically used as such, no? 

I'm not sure. ISTM that there is a reasonable workaround and that the alternative might significantly complicate the generics proposal. The authors aren't trying to address every possible generics use case, especially when it's possible to work around the problem, and I like that approach :)

Cool, I'm on board with this view. I'll adopt this approach and see how it feels. Hopefully with aggressive inlining the trip through the compiler will be able to optimize the access as though the array were being indexes directly.

I wouldn't necessarily expect aggressive inlining of generic calls. This isn't C++, and the direct corollary of aggressive inlining is code bloat from excessive specialisation. I'm expecting performance somewhat better than interface values because the calls can be direct, but not necessarily as fast as if everything was fully specialised. All this is still to be decided though, of course.
 
I think this is already addressed in the current design. You are allowed to directly embed type-parameters, and they have the name of the template argument (rather than its instantiated value). 
Cool, I'll try it out. I found it not compiling with `go tool go2go` in my first pass. Maybe I was holding it wrong. 

I'm just re-visiting this now and I wonder if it's what I want. While indeed you can embed a type parameter, it's not clear that you can embed a type that is parameterized on a type-parameter. For example the following example is definitely kosher but the one after it is not as obviously kosher. I don't think I was clear enough in my problem statement but I'm requesting that the latter be allowed and that the embedded field in that case be `List`. Am I still misunderstanding something?

type Foo(type T) struct {
   T
// Ok
   
// ...
}

type Foo(type T) struct {
   list
.List(T) // Seemingly not ok
   
// ...
}

This issue is addressed here. Because of a grammatical ambiguity, you have to do:

     type Foo(type T) struct{
         (list.List(T))
         ...
     }

  cheers,
    rog.

roger peppe

unread,
Jun 22, 2020, 5:42:12 PM6/22/20
to ajwe...@cockroachlabs.com, golang-nuts

Thanks for pointing this out. I'm cool with this approach. I'll update my library to utilize it (and consider also adopting the list.List, though I do like my freedom to pool list nodes).

Personally, I'd start by re-using list.List, and only use pools if there really is a significant performance benefit to be gained there.
If you just need zero-valued elements, I could imagine that sync.Pool might end up slower than just using the GC.

Hopefully with aggressive inlining of the specialized type the compiler will be able to make the optimizations that it would be able to make if the slicing were performed directly on an array.

What optimisations are you thinking of here? Are you concerned that slice operations are too slow?

  cheers,
    rog.

Andrew Werner

unread,
Jun 22, 2020, 5:49:27 PM6/22/20
to golang-nuts


On Monday, June 22, 2020 at 5:42:12 PM UTC-4, rog wrote:

Thanks for pointing this out. I'm cool with this approach. I'll update my library to utilize it (and consider also adopting the list.List, though I do like my freedom to pool list nodes).

Personally, I'd start by re-using list.List, and only use pools if there really is a significant performance benefit to be gained there.
If you just need zero-valued elements, I could imagine that sync.Pool might end up slower than just using the GC.

Fair.


Hopefully with aggressive inlining of the specialized type the compiler will be able to make the optimizations that it would be able to make if the slicing were performed directly on an array.

What optimisations are you thinking of here? Are you concerned that slice operations are too slow?


* Bound check elimination
* Avoiding an indirection through the pointer in the slice header would be nice
* Compile-time evaluation of len and cap

IIUC all of these would happen if we were operating on a raw array type rather than a slice.


  cheers,
    rog.

Robert Engels

unread,
Jun 22, 2020, 6:00:00 PM6/22/20
to Andrew Werner, golang-nuts
The compiler could conceivably generate code that did that - and with aggressive inlining you would end up the same as native type code. 

I would argue that using raw slices in data structures leads to maintainability issues anyway. The key is higher level constructs that give you similar if not exact performance. 

After all a slice is a simple data structure whose operators can easily be methods on a backing array - and they are :)

On Jun 22, 2020, at 4:49 PM, Andrew Werner <awer...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/6a6bd9d6-deec-42f2-a9d8-7a1413221f4do%40googlegroups.com.

roger peppe

unread,
Jun 22, 2020, 6:01:46 PM6/22/20
to Andrew Werner, golang-nuts
On Mon, 22 Jun 2020 at 22:49, Andrew Werner <awer...@gmail.com> wrote:


On Monday, June 22, 2020 at 5:42:12 PM UTC-4, rog wrote:

Thanks for pointing this out. I'm cool with this approach. I'll update my library to utilize it (and consider also adopting the list.List, though I do like my freedom to pool list nodes).

Personally, I'd start by re-using list.List, and only use pools if there really is a significant performance benefit to be gained there.
If you just need zero-valued elements, I could imagine that sync.Pool might end up slower than just using the GC.

Fair.


Hopefully with aggressive inlining of the specialized type the compiler will be able to make the optimizations that it would be able to make if the slicing were performed directly on an array.

What optimisations are you thinking of here? Are you concerned that slice operations are too slow?


* Bound check elimination

I'm not sure how it could do that, given that it's inevitable that an index is stored in a field with no range limits on it (i guess if you used a 256-element array and a uint8 index, it could, but then it wouldn't work for other array sizes).

* Avoiding an indirection through the pointer in the slice header would be nice

Isn't it going to have to indirect through a pointer regardless of whether it's a slice or an array? The array isn't stored inside the Dequeue struct itself.

* Compile-time evaluation of len and cap

Isn't that just a one-word load of memory that's very likely in the same cache line as the array-pointer member?
I can't see that this is likely to make a significant performance difference, but I'd love to see some numbers.

Andrew Werner

unread,
Jun 22, 2020, 6:22:41 PM6/22/20
to golang-nuts

Hey Roger,

Thanks for taking the time to engage with me on this. Really appreciate it!



On Monday, June 22, 2020 at 6:01:46 PM UTC-4, rog wrote:


On Mon, 22 Jun 2020 at 22:49, Andrew Werner <awer...@gmail.com> wrote:


On Monday, June 22, 2020 at 5:42:12 PM UTC-4, rog wrote:

Thanks for pointing this out. I'm cool with this approach. I'll update my library to utilize it (and consider also adopting the list.List, though I do like my freedom to pool list nodes).

Personally, I'd start by re-using list.List, and only use pools if there really is a significant performance benefit to be gained there.
If you just need zero-valued elements, I could imagine that sync.Pool might end up slower than just using the GC.

Fair.


Hopefully with aggressive inlining of the specialized type the compiler will be able to make the optimizations that it would be able to make if the slicing were performed directly on an array.

What optimisations are you thinking of here? Are you concerned that slice operations are too slow?


* Bound check elimination

I'm not sure how it could do that, given that it's inevitable that an index is stored in a field with no range limits on it (i guess if you used a 256-element array and a uint8 index, it could, but then it wouldn't work for other array sizes).
Fair point. I imagine that the fact that it's an array doesn't really change much w.r.t. when a bounds check needs to occur, it just might change the shape of such a check. It does seem like in my ring buffer example use of unsigned integers in some places can eliminate some bounds checking given I'm going to mod the whole thing by `len(buf)`

I do recognize that the above case has nothing to do with the slice/array discussion.



type ringBuf(type T, *ArrayT Slicer(T)) struct {
    head
, len uint32
   
Array
}

func
(rb *ringBuf(T, ArrayT)) at(idx int32) *T {
    buf
:= rb.Slice()
   
// no bounds check needed.
   
return &buf[int(rb.head+idx) % len(buf)]
}





* Avoiding an indirection through the pointer in the slice header would be nice

Isn't it going to have to indirect through a pointer regardless of whether it's a slice or an array? The array isn't stored inside the Dequeue struct itself.

In your example where you put the slice into the dequeue itself, yes, sure. I used the Dequeue as a simplified example where the motivating case is really a B-Tree. In the B-tree we'd instead be operating at a node basis where the node does actually contain the array. In that case we either realize a slice on the stack or realize a slice in the node to refer to itself. In either case, there will be an indirection through the slice header. I'll buy the argument that it should be cheap and hit cache lines.
 

* Compile-time evaluation of len and cap

Isn't that just a one-word load of memory that's very likely in the same cache line as the array-pointer member?
I can't see that this is likely to make a significant performance difference, but I'd love to see some numbers.
Fair enough. Anything we do here is going to be better than a world today. I'm just trying to push on the idea that it will be a bummer if after go generics land, people in the performance critical space still find themselves using code-generating generics solutions like https://github.com/mmatczuk/go_generics to specialize data structures.
Reply all
Reply to author
Forward
0 new messages