COW slices ? Was : Re: [go-nuts] why append (maybe) modify the referenced array?

214 views
Skip to first unread message

Vincent Vanackere

unread,
Dec 2, 2011, 9:08:04 AM12/2/11
to roger peppe, 韩拓, golan...@googlegroups.com
I wonder if a way to get a "copy-on-write" copy of a given slice wouldn't be the best solution to this general class of problem (i.e. sharing data safely without the requirement to do a full copy each time).

This would require no change to the type system (i.e. still no "const" keyword), and only perhaps only a new builtin function or some other way to return a copy of a slice with the property that any modification to the underlying array should first generate a new (non-cow) slice operating on a copy of the array.

To give a better idea of the expected behavior : a naive implementation would simply add some "cow" flag as another field to the current internal (ptr,len,cap) representation of slices and any write operation to an element of a slice would first require checking this cow flag before doing any modification to the internal array.

The benefit would be a nice way to safely share read-only view of buffers without the performance impact of doing a full copy.
There are of course some obvious disadvantages :
- writing to a slice element could now sometimes become as expensive as making a copy of the slice.
- the added checks have also the potential to be quite expensive unless the compiler is taught to inline / remove as many of those checks as possible
- probably a lot of other details, some of them perhaps making this idea completely unrealistic...

Just a thought... Does this make any sense ?

Vincent

On Fri, Dec 2, 2011 at 11:25, roger peppe <rogp...@gmail.com> wrote:
if this might be an issue, you can copy the slice.

here is an outstanding issue which is perhaps relevant:
http://code.google.com/p/go/issues/detail?id=1642

2011/12/2 韩拓 <hantu...@gmail.com>:
> hi all.
> slice type is the safe reference to an array,almost all the time.but why the
> built-in function append break the rule?
> see some code:
> http://play.golang.org/p/yU8gEXo7lA
>
> package main
>
> import "fmt"
>
> func main() {
>   d := []byte{1,2,3,4}
>   a := d[:2]
>   b := d[2:]
>   foo(a) // pass a,but b is unsafe.
>   fmt.Println(a, b)
> }
>
> func foo(d []byte) {
>   // do something...
>   _ = append(d, 0)
>   // do something...
> }
>
> when i use var a for something,i do not sure whether var b is "safe",i think
> it`s bad. all things to a slice shoud be limited in its range
>
> --
>     此致,
> 敬礼!
>
>                      韩拓

Jan Mercl

unread,
Dec 2, 2011, 9:31:46 AM12/2/11
to golan...@googlegroups.com
On Friday, December 2, 2011 3:08:04 PM UTC+1, Vincent wrote:
I wonder if a way to get a "copy-on-write" copy of a given slice wouldn't be the best solution to this general class of problem (i.e. sharing data safely without the requirement to do a full copy each time).

No. The famous note from http://golang.org/doc/effective_go.html#sharing says it better than I ever could:

"Do not communicate by sharing memory; instead, share memory by communicating."

André Moraes

unread,
Dec 2, 2011, 10:11:20 AM12/2/11
to golan...@googlegroups.com
> No. The famous note
> from http://golang.org/doc/effective_go.html#sharing says it better than I
> ever could:
>
> "Do not communicate by sharing memory; instead, share memory by
> communicating."

I think his point isn't by sharing memory for concurrent access.

But instead on a single routine avoid overwriting something
accidentaly if he uses two slices pointing to the same array.

--
André Moraes
http://andredevchannel.blogspot.com/

André Moraes

unread,
Dec 2, 2011, 10:12:10 AM12/2/11
to golang-nuts
You should comment that alternativo on the issue pointed by Roger

Jan Mercl

unread,
Dec 2, 2011, 10:36:40 AM12/2/11
to golan...@googlegroups.com
On Friday, December 2, 2011 4:11:20 PM UTC+1, André Moraes wrote:

I think his point isn't by sharing memory for concurrent access.

But instead on a single routine avoid overwriting something
accidentaly if he uses two slices pointing to the same array.

So then the problem could be perhaps reduced to: "Why a slice I'm mutating is being mutated?" and that's a "strange" question. In other words, if I pass a slice to a function and append to that very slice within that function then I'm doing it either, if correct, intentionally - or I fail to understand how slices work. The later case can be solved by a bit of reading(*) - not by unnecessarily inventing COW slices, non append-able slices or some other strange beasts.

*: Among the Specs and Tutorial and Effective Go I would highly recommend to the OP's attention also this rsc's excellent article http://research.swtch.com/2009/11/go-data-structures.html
 

André Moraes

unread,
Dec 2, 2011, 10:51:03 AM12/2/11
to golan...@googlegroups.com
I agree that reading the docs one can understand the issue, just like
a posted o the other thread,
where the code demonstrates how slices works.

But as pointed by roger, there is a issue that deserves at least some
thinking before been classified as: Just read the docs issue.

unread,
Dec 2, 2011, 4:04:32 PM12/2/11
to golang-nuts
On Dec 2, 3:08 pm, Vincent Vanackere <vincent.vanack...@gmail.com>
wrote:

> To give a better idea of the expected behavior : a naive implementation
> would simply add some "cow" flag as another field to the current internal
> (ptr,len,cap) representation of slices and any write operation to an
> element of a slice would first require checking this cow flag before doing
> any modification to the internal array.

You would need to add [a pointer to a boolean flag] to all slices, so
that there aren't any distinctions among writers.

Many Go programs are using goroutines. To ensure safety, you would
probably need to protect the flag with a mutex or use a compare-and-
swap CPU instruction. This would negatively affect the performance of
writers, as well as of readers and of assignments to slice variables.

>
> The benefit would be a nice way to safely share read-only view of buffers
> without the performance impact of doing a full copy.

Are you thinking that a COW slice should be the return value of
functions like bufio's ReadSlice?

Vincent Vanackere

unread,
Dec 5, 2011, 1:25:34 PM12/5/11
to ⚛, golang-nuts
On Fri, Dec 2, 2011 at 22:04, ⚛ <0xe2.0x...@gmail.com> wrote:
On Dec 2, 3:08 pm, Vincent Vanackere <vincent.vanack...@gmail.com>
wrote:
> To give a better idea of the expected behavior : a naive implementation
> would simply add some "cow" flag as another field to the current internal
> (ptr,len,cap) representation of slices and any write operation to an
> element of a slice would first require checking this cow flag before doing
> any modification to the internal array.

You would need to add [a pointer to a boolean flag] to all slices, so
that there aren't any distinctions among writers.


This wasn't really the idea, I just wanted a way to produce a new slice that cannot possibly modify the original array behind the slice (the original slice would remain writable without reallocation).
 
Many Go programs are using goroutines. To ensure safety, you would
probably need to protect the flag with a mutex or use a compare-and-
swap CPU instruction. This would negatively affect the performance of
writers, as well as of readers and of assignments to slice variables.


I must admit I don't see why a cow field would introduce any new problem (ie one that doesn't currently exist with the existing len/cap fields)... I am missing something ?
 
>
> The benefit would be a nice way to safely share read-only view of buffers
> without the performance impact of doing a full copy.

Are you thinking that a COW slice should be the return value of
functions like bufio's ReadSlice?

AFAICT there is nothing inherently wrong in writing to a slice returned by bufio.ReadSlice (as long as you don't try to grow your slice and write past the original bounds of course)... so I don't think this particular example would be a good usage of a COW slice.

Anyway, I understand from Russ's remark at http://code.google.com/p/go/issues/detail?id=1642#c3 that COW slices would probably be impractical, the cost of doing the check at each write access being prohibitive... So it may be the case that there is no good solution to this problem -- sharing a slice without duplicating the underlying array and without taking the risk to have that array modified -- , at least not without some help from the type system.

On that subject, perhaps I'm biased but I find somewhat unfair that channels did receive a special treatment in the language that other types haven't :  today you can make a channel unidirectional and the compiler will check for you that your code only reads from / writes to this channel... Wouldn't it be nice if your could make any type "unidirectional" (ie read-only and perhaps also write-only) and have the compiler ensure that you cannot write code that breaks your specification ?
Of course I'm over-simplifying and there are probably a myriad of reasons why there is currently no such read-only / const / whatever mechanism in the language (I mean, reasons better than : "if you think you need this, you're doing it wrong"). Whatever the reason, my personal experience it that for performance-sensitive code it is usually the case that you want to avoid memory allocations & data-copying as much as possible... and to achieve this in Go I don't see any other way than giving away some safety and rely only on convention and documentation, which not a bad way but -- from my point of view -- far from optimal because the compiler would do a much better job of ensuring these rules and could probably even make use of this additional typing information to further optimize the generated code.

Vincent

Jan Mercl

unread,
Dec 5, 2011, 1:49:26 PM12/5/11
to golan...@googlegroups.com, ⚛
On Monday, December 5, 2011 7:25:34 PM UTC+1, Vincent wrote:
This wasn't really the idea, I just wanted a way to produce a new slice that cannot possibly modify the original array behind the slice (the original slice would remain writable without reallocation).

newSlice := append(T[]{}, originalSlice...)
 
I must admit I don't see why a cow field would introduce any new problem (ie one that doesn't currently exist with the existing len/cap fields)... I am missing something ?

A problem IMO is e.g. the unpredictable performance. Sometimes updating a slice would be fast, sometimes the same happens only after a full but hidden copy. No, thanks.
 
Anyway, I understand from Russ's remark at http://code.google.com/p/go/issues/detail?id=1642#c3 that COW slices would probably be impractical, the cost of doing the check at each write access being prohibitive... So it may be the case that there is no good solution to this problem -- sharing a slice without duplicating the underlying array and without taking the risk to have that array modified -- , at least not without some help from the type system.

On that subject, perhaps I'm biased but I find somewhat unfair that channels did receive a special treatment in the language that other types haven't :

Int, float, bool, string,  []T, struct{} ... are all simple PODS. Channel is a complex beast which guarantees safe concurrent access, does task switching, ... 
 
  today you can make a channel unidirectional and the compiler will check for you that your code only reads from / writes to this channel... Wouldn't it be nice if your could make any type "unidirectional" (ie read-only and perhaps also write-only) and have the compiler ensure that you cannot write code that breaks your specification ?

Wrap R/O data into a R/O method set of an interface and hope the future compiler can optimize the calls away. That way no language change would be required.
 
Of course I'm over-simplifying and there are probably a myriad of reasons why there is currently no such read-only / const / whatever mechanism in the language (I mean, reasons better than : "if you think you need this, you're doing it wrong"). Whatever the reason, my personal experience it that for performance-sensitive code it is usually the case that you want to avoid memory allocations & data-copying as much as possible...

So you don't want COW, right?
 
and to achieve this in Go I don't see any other way than giving away some safety and rely only on convention and documentation, which not a bad way but -- from my point of view -- far from optimal because the compiler would do a much better job of ensuring these rules and could probably even make use of this additional typing information to further optimize the generated code.

Optimization possibilities are next to endless. But let's not forget slowing down the compiler and the law of diminishing results. 

Ian Lance Taylor

unread,
Dec 5, 2011, 2:16:55 PM12/5/11
to Vincent Vanackere, ⚛, golang-nuts
Vincent Vanackere <vincent....@gmail.com> writes:

> Wouldn't it be nice if your could make any type "unidirectional"
> (ie read-only and perhaps also write-only) and have the compiler ensure
> that you cannot write code that breaks your specification ?

That leads us to const poisoning.

I think there might be valid uses for immutable data in Go, but the type
system is not the place for it.

Ian

Tim Harig

unread,
Dec 5, 2011, 2:29:00 PM12/5/11
to golang-nuts
On Mon, Dec 05, 2011 at 07:25:34PM +0100, Vincent Vanackere wrote:

> On Fri, Dec 2, 2011 at 22:04, ??? <0xe2.0x...@gmail.com> wrote:
> This wasn't really the idea, I just wanted a way to produce a new slice
> that cannot possibly modify the original array behind the slice (the
> original slice would remain writable without reallocation).

Use a linked list. Linked lists can be modified without ever having to
move the existing data. It is one of the reasons that I argue for keeping
some basic alternate data structures available in the standard packages.
Sometimes there are better data structures to use then arrays.

Reply all
Reply to author
Forward
0 new messages