First-Class Cancellation

892 views
Skip to first unread message

invino4

unread,
Apr 23, 2015, 8:38:23 PM4/23/15
to golan...@googlegroups.com

Moving this conversation from GitHub to the mailing list...

-----

Go should have a first-class cancellation pattern (in much the same way that it has a first class error pattern). The cancellation pattern should meet the following requirements:

  1. Very light-weight: Such that it can be used to control cancellation at a very fine grained level. e.g. it should be perfectly reasonable for every goroutine started in an application to have a cancellation token. It should not be unreasonable for many individual methods to have cancellation.
  2. Least Privilege: The ability to observe cancellation and the ability to cause cancellation MUST be independent.
  3. Broadcast: It MUST be possible for multiple consumers to observe the same cancellation token without impacting each other.
  4. Idempotent: A cancellation token, once cancelled, should never become uncancelled.
  5. Concurrency Safe: It MUST be possible for multiple consumers to observe cancellation in parallel without additional synchronization.
  6. Asynchronous: Cancellation of the token only guarantees that all observes will eventually see the cancellation, NOT that cancellation of a tree of computation will be cancelled synchronously.
  7. Synchronous Polling: It MUST be possible to synchronous poll whether cancellation has occurred. This allows long running computations to periodically check whether they should stop running even if they never block on IO.
  8. Blocking Composition: It MUST be possible to compose a cancellation token with blocking I/O in aselect statement. e.g. block on a data channel value or cancellation, which ever happens first.
  9. Boolean Composition: It MUST be possible to combine cancellation token in an and or or manner such that you get a new cancellation token that becomes cancelled when its inputs become cancelled.
  10. Hierarchical Composition: It MUST be possible to create child cancellation tokens that cancel when their parent cancels, but can also be cancelled independently without affect their parent.
  11. IO Composition: It MUST be possible to create RPC contexts, HTTP contexts, etc. that are linked to a standard cancellation token as its parent.
  12. Debuggable: Cancellation SHOULD provide a diagnostic message that can be used in logging or other debugging to indicate the reason a computation was asked to cancel.

What Is It For

Cancellation allows a tree or pipeline of computation whose output is no longer needed to be terminated. Go makes describing pipelines of computation easy. Goroutines representing each stage of such a pipeline link together via channels to form a computational pipeline whose final channel delivers the final result of the computation. Once a pipeline has produced sufficient output for the final consumer the pipeline needs to be torn down. Sometimes this corresponds to natural boundaries of data being produced but sometimes it doesn't (e.g. when errors are encountered or the consumer disconnects abruptly).

What

I propose that cancellation be implemented like errors through an interface of the following form:

type Cancel interface {
    // Returns a channel that is closed when signalled by the cancellation function.  
    // Guaranteed to return the same channel on each call 
    // (i.e. can be safely cached by the caller.)
    Done() <-chan struct{}
    // Err returns the error provided to the cancellation function.  
    // If not yet signalled always returns nil.  
    // (i.e. can be called at any time to poll the state of cancellation.)
    Err() error
}

// CancelFunc is called to signal that cancellation should begin.  
// Signalling the cancellation token is synchronous and atomic.
// CancelFunc is idempotent but only the first reason is preserved.  
// If reason is nil, CancelFunc panics.
type CancelFunc func(reason error)

// NewCancel creates a new cancellation signal and a cancellation 
// function to cancel it.
func NewCancel() (Cancel, CancelFunc) { ... }

// WithParent creates a new cancellation signal and a cancellation 
// function to cancel it.  The resulting token is cancelled with either
// the cancellation function is called or the parent becomes cancelled.
func WithParent(parent Cancel) (Cancel, CancelFunc) { ... }

// And creates a new cancellation signal that becomes cancelled when
// both of its inputs become cancelled.
func And(left Cancel, right Cancel) Cancel { ... }

// Or creates a new cancellation signal that becomes cancelled when
// either of its inputs become cancelled.
func Or(left Cancel, right Cancel) Cancel { ... }

This interface allows for implementations that meet all of the above specifications. Because the Done() channel's write endpoint is closely held in this design, it is possible for a language runtime implementation to use a cheaper channel implementation that allows for Boolean and Hierarchical composition without actually allocating a goroutine to wait on the inputs.

Why It Should Be Part Of The Language

For most of the same reasons that error is part of the language. First-class treatment of cancellation will guarantee the proper level of composability that an external implementation cannot achieve.

First-class treatment allows the core libraries to also level the pattern.

As mentioned above, a language runtime level implementation can be significantly less expensive for composition that would be possible by a pure library implementation. A low-cost enables its ubiquity as it allows it to be used in domains that simply would not permit a more expensive implementation.



- Jason.


David Crawshaw

unread,
Apr 23, 2015, 9:15:48 PM4/23/15
to invino4, golang-nuts
Have you seen Context from the net package? There are some details in
the blog post that includes a cancellation example:
https://blog.golang.org/context
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Egon

unread,
Apr 24, 2015, 2:02:27 AM4/24/15
to golan...@googlegroups.com, jthu...@google.com
Can you supplement this with concrete examples how this would improve things.

i.e.
1. This is how it looks now: ...
1.1 this isn't ideal because X, Y
1.2. this is better because S
2. This is how it looks afterwards: ...
2.1 this is better because Z, W
2.2 it is worse because Q

Also why would x/net/context solution cause problems?

+ Egon

Jason Hunter

unread,
Apr 27, 2015, 5:37:31 PM4/27/15
to Egon, golan...@googlegroups.com
I think the article:


walks through the derivation from first principles about why this design pattern is needed in Go and what it might be used for.  Certainly this article doesn't cover all scenarios but it covers a very interesting one and from which others can easily be imagined.  As much as this article describes several interesting design patterns for cancellation, there are **many many many** ways to skin this cat.  Composability and ubiquity are hurt by having each use case invent their own.  You end up with a bunch of different mechanisms to accomplish the same goal that are not efficiently composable by a program that combines more than one simultaneously.

Lastly, I think there are many reasons that x/net/context would not be appropriate as the general solution to this problem.  Here are a few (I'm sure there are more):

1.)  It is too big.  Context includes things like deadline and values that don't make sense to pure cancellation.  This increases the size and cost of cancellation which reduces the granularity at which it can be expressed.  Maximizing ubiquity is, in part, best achieved by minimize the cost of the mechanism.

2.)  Even the name Context derives mostly from the "values" part of the definition (which pure cancellation wouldn't use) and so the name provides poor discoverability for this as a cancellation feature.  This decreases the probability that new uses will correctly use the feature which hurts long term composability.

3.)  It is not part of the core language or even core libraries.  If you wanted to add a cancellable parallel sort function to the "sort" package would you feel comfortable having "sort" take a dependency on x/net/context?  I wouldn't.

4.)  Its implementation is not closely held by the runtime.  This prohibits the runtime from providing more efficient implementations than are achievable with a open-coded solution (i.e. exactly what is available in x/net/context now).  A closely held solution would allow, for instance, a.) more efficient done channel implementation that uses less state and is more efficient in select,  b.) more efficient composition (AND/OR/hierarchical) that wouldn't require a full goroutine that only applies the composition arithmetic and then passes the signals on (e.g. time.AfterFunc is cheaper to use than time.After or time.Ticker),  c.) etc.  I'm sure there are many other things that could be done.  Even future language syntax might take advantage of the interface in some way which might not be practical with an open-coded solution.

5.)  Context is meant to represent contextual information associated with a request pathway.  It is meaningful when one end of that pipeline is a request.  However, cancellation applies more broadly.  For instance, cancellation applies to asynchronous state like a cache or background computation that many requests "rendezvous" with but no single request "owns".  Such multi-tenant state/computation still needs a notion of asynchronous cancellation (for the reasons shown in the article above), but its lifetime is not easily described as a function of the requests that interact with it.  As such, asynchronous cancellation should be its own first-class concept that manages the lifetime of these computations.  It is certainly mandatory that such cancellation compose with Context (what I called I/O Composition above) but that doesn't mean they should be the same thing.  

6.)  Furthermore, transitioning from one lifetime domain to the other (e.g. when a background computation itself performs a dependent I/O) should require the programmer to think about that transition.  It may be the case that bounding dependent I/O ONLY by the lifetime of the background computation is the "right thing to do", but more often it is not.  The dependent I/O should have smaller lifetime, should have additional context, should have a deadline, etc.  Having these expressions of lifetime be different types forces the programming to think about that transition.


- Jason.

Egon

unread,
Apr 28, 2015, 4:06:35 AM4/28/15
to golan...@googlegroups.com, jthu...@google.com, egon...@gmail.com
Thanks for more information but it's still missing concrete real-world examples - for me it's hard to reason about a feature without those. Feature needs to solve a real-world problem before it's even beneficial to think about it's actual pros/cons and how it affects everything else. And I really do mean concrete-real-world-example not a facilitated or an abstract example.

* Maybe the problem you are having can be solved with X, Y or Z much better. But judging only by the information you have given, it's hard to do so.
* Maybe the problem is that the initial domain modeling was problematic and fixing that removes the need for this feature.
* Maybe the actual problem was over-simplified, and this feature doesn't actually solve it when the problem is scaled up.
* Maybe the feature doesn't actually satisfy all the problem requirements. e.g. it needs to be fault tolerant, but by using language features for it isn't sufficient.

etc.

I can understand that you have done a lot of thinking on this topic and examined pros/cons of it, but currently it's hard to judge whether the whole chain is sound. i.e.

Domain Problem -> 
Implementation -> 
Problems -> 
Reasoning -> 
Cons/Pros -> 
Solution -> 
Solution Pros/Cons ->
Feature Proposal

I'm missing everything (mostly) before "Solution" part. 

+ Egon

Jason Hunter

unread,
Apr 28, 2015, 10:45:48 PM4/28/15
to Chris Kastorff, Egon, golan...@googlegroups.com
We were also using the plain "<-chan struct{}" approach for a while as well before moving to the above interface design.  We found the plain channel had a couple of weaknesses:

1.)  The type didn't constrain the semantics enough.  Even with <-chan struct{} there are still different ways to use it.  For instance, some early attempts on our project wrote a single value into the channel to signal cancellation instead of closing it.  One attempt even inserted the value back again after reading it to support fanout.  Crazy, right?  We also had some other uses of chan struct{} in the code that were not cancellation.  This meant the type itself wasn't enough to explain what the expected semantics were.  When using the interface approach both of these problems were addressed.  The mere presence of the interface type in a signature uniquely associated the parameter with a very specific set of semantics which a reader could reference and were conveyed by the type of the parameter without any additional documentation.

2.)  It is frequently valuable to convey diagnostic information along with cancellation.  Components that are cancelled can log the reason they were cancelled (especially when the diagnostic contains an string that uniquely identifies a sub-computation) which makes correlation of asynchronous activities MUCH easier after the fact when there is a bug.  This led to several different variations on the "<-chan struct{}" scheme which attempted to address this need and included "<-chan error", "<-chan string", and finally "{error, <-chan struct{}}" where both an error and a chan we passed in a struct.  All of these approaches worked for their specific purpose but different components used different approaches which proved not very composable.  Further the semantics become less obvious making the code more difficult to read and more difficult to maintain.

3.)  The function NewCancel() closely controls the send-end of the channel.  It only provides a cancellation function (e.g. func(error)) as a return value.  This still allows you to pass around the 'right to cancel' as a first class object while simultaneously preventing the caller from abusing the send-end (->chan struct{}) of the channel.  Creating a channel yourself doesn't restrict this use appropriately.  Imagine what would happen if a component erroneously wrote a single value into the channel instead of closing it while multiple other components were waiting on the channel for cancellation?  One of the consumers (at random) would experience cancellation while all of the others would block forever.  I admit that is certainly a bug in the code, but when the send-end is closely held this kind of bug is impossible.

4.)  As mentioned above, closely controlling the send-end of the channel allows for runtime optimization in compositions that would not otherwise be possible.  Consider how x/net/context leverages its control over the channel to optimize away the need for a goroutine in parent-child composition when it controls the implementation.  If the channel alone were used for cancellation this optimization would not be possible.  Anywhere that you create a nested cancellation scope (aka child context) you'd need to burn a whole goroutine just to propagate the cancellation from the parent scope to the child scope.  That's pretty expensive.


All that being said, there is no real reason that Cancel needs to be an interface.  It could be a pointer to a struct.  However, the interface has a few benefits.  It avoids accidental copy-by-value bugs if someone forgets that cancellation MUST have reference semantics.  It looks cleaner since you don't have express it as a pointer.  It is backward compatible with x/net/context allowing Context to be a cancellation token implicitly.  It allows others to provide their own implementation if someone comes up with a better one.  And lastly, it looks and feels like the error interface which is already part of the language.

- Jason.


On Tue, Apr 28, 2015 at 6:43 PM, Chris Kastorff <encr...@gmail.com> wrote:
At my last job, I implemented a system that included local file
caching. I used "chan struct{}" as my cancellation type directly, only
reading (incl. selecting) and close()ing it.

The resource library I wrote (to abstract over various transfer
protocols) implemented an interface similar to:
    func Download(fromURL string, toFileName string, cancel <-chan
struct{}) error
    func Upload(fromURL string, toFileName string, cancel <-chan struct{}) error

That library included retries, which used cancellation like:
    for i := 0; i < retries; i++ {
        if i > 0 {
            select {
                case <-time.After(retryInterval):
                case <-cancel:
                    return ErrCanceled
            }
        }

        // attempt the transfer, passing along the cancel channel
    }

But more interesting was the local file cache. It had a fairly simple interface:
    func (*Cache) Get(url string, cancel <-chan struct{}) (*CachedFile, error)

    // and a helper to avoid having to split and merge goroutines when
you want many files in parallel.
    func (*Cache) GetMany(urls []string, cancel <-chan struct{})
([]*CachedFile, error)

But the Cache had many jobs:
- Cache files in a directory to avoid redownloading
- Watch the cache size and remove the old, currently unused files
- Merge concurrent Get requests on the same URL to avoid bandwidth waste
- Handle cancellation properly (cancel the Download iff all concurrent
Gets on that URL were canceled)

The implementation of Get creates its own cancel channel for the
Download call, which would be shared between all concurrent Gets of
the same URL.

The implementation of GetMany is similarly complex, where it creates a
cancel channel *just* for the GetMany call, passes that on to parallel
calls to Get, and if an error occurs on one of those Gets, it will
cancel all the other Gets it made (without cancelling the downstream
user's cancel channel.) Simultaneously, if the caller cancels, we
cancel the Get calls we made as well. (The implementation is about 80
lines long due to the complex error and cancellation handling, so I'm
not including source code for it.)

Each job running on the system would make its own cancel channel for
the job overall, and pass that into each of its calls into the *Cache,
so that jobs can exit cleanly and quickly.

The implementation of the *Cache was one of the most difficult
concurrency issues I've had to work with, but using channels for all
the communication tended to make it relatively easy to do. I needed
non-blocking reads on the cancel channel (to see if I should start the
next operation), blocking selects with other channels (e.g. retries),
and sometimes needed to split off another goroutine to cancel
operations in libraries that don't use a channel directly (like
net/http, *os.File, and various object storage libraries.)

For all the main worker code, dealing with a cancel channel directly
was very, very easy; any time you're waiting on something, select with
the cancel channel and abort if it becomes readable.

Overall, I found the usage of chan struct{} directly to be pretty
clean. In all the readers of the cancel channels (which is all but the
place where it's closed), they'd declare their types to be <-chan
struct{} (and thus cannot close the channel (which is a coding
error)). The syntax is a bit unintuitive for the first few minutes,
but since it's the same as all other channel operations, it was easy
to get used to. It's also easy to think about the semantics of, which
is very important when implementing complex things like the local file
cache.

I tried to use the x/net/context library as well as labix's
gopkg.in/tomb.v1 for this, but found both of them to be trying to
solve more problems than cancellation and being more difficult to
think about than using channels directly.

I'm skeptical if any object-like interface will ever be able to match
the clarity of using a close-only channel of struct{} for me, aside
from one literally only exposing the read side of the channel and a
close method (and at that point, why wrap it in an object?) I don't
think standardizing the interface you mentioned would be worth the
trouble, but I *DO* think this is an important part of programming
really good Go servers that is often ignored in libraries I've seen
due to a lack of any standard at all.

Egon Elbre

unread,
Apr 29, 2015, 4:37:41 AM4/29/15
to Chris Kastorff, golan...@googlegroups.com, jthu...@google.com
On Wed, Apr 29, 2015 at 4:43 AM, Chris Kastorff <encr...@gmail.com> wrote:
At my last job, I implemented a system that included local file
caching. I used "chan struct{}" as my cancellation type directly, only
reading (incl. selecting) and close()ing it.

The resource library I wrote (to abstract over various transfer
protocols) implemented an interface similar to:
    func Download(fromURL string, toFileName string, cancel <-chan
struct{}) error
    func Upload(fromURL string, toFileName string, cancel <-chan struct{}) error

That library included retries, which used cancellation like:
    for i := 0; i < retries; i++ {
        if i > 0 {
            select {
                case <-time.After(retryInterval):
                case <-cancel:
                    return ErrCanceled
            }
        }

        // attempt the transfer, passing along the cancel channel
    }

But more interesting was the local file cache. It had a fairly simple interface:
    func (*Cache) Get(url string, cancel <-chan struct{}) (*CachedFile, error)

    // and a helper to avoid having to split and merge goroutines when
you want many files in parallel.
    func (*Cache) GetMany(urls []string, cancel <-chan struct{})
([]*CachedFile, error)

This looks very fragile, I'm not sure what *CachedFile contains, but what if the CachedFile is purged immediately after a Get request. I assume you thought of it.

Anyways, from gut feeling, I would've used a separate service to handle the caching, and avoid letting Cache handle the downloading. For example used groupcache instead and create some Downloader that could handle cancellation.

Chris Kastorff

unread,
Apr 29, 2015, 9:56:47 AM4/29/15
to Egon, golan...@googlegroups.com, jthu...@google.com
At my last job, I implemented a system that included local file
caching. I used "chan struct{}" as my cancellation type directly, only
reading (incl. selecting) and close()ing it.

The resource library I wrote (to abstract over various transfer
protocols) implemented an interface similar to:
func Download(fromURL string, toFileName string, cancel <-chan
struct{}) error
func Upload(fromURL string, toFileName string, cancel <-chan struct{}) error

That library included retries, which used cancellation like:
for i := 0; i < retries; i++ {
if i > 0 {
select {
case <-time.After(retryInterval):
case <-cancel:
return ErrCanceled
}
}

// attempt the transfer, passing along the cancel channel
}

But more interesting was the local file cache. It had a fairly simple interface:
func (*Cache) Get(url string, cancel <-chan struct{}) (*CachedFile, error)

// and a helper to avoid having to split and merge goroutines when
you want many files in parallel.
func (*Cache) GetMany(urls []string, cancel <-chan struct{})
([]*CachedFile, error)

Chris Kastorff

unread,
Apr 29, 2015, 5:15:35 PM4/29/15
to Egon Elbre, golan...@googlegroups.com
There's a Close() method on the *CachedFile which must be called if
you get any *CachedFile from Get or GetMany (and you're guaranteed to
only get a non-nil *CachedFile/[]*CachedFile or a non-nil error from
these.) *CachedFile forces the file to never be garbage collected
while it is still open, so there's no safety race. Since that detail
was completely irrelevant to the discussion of cancellation, I skipped
it.

Also, this is used for handling *large* files (tens of gigabytes
each), and a large number of them (about a terabyte per node.)
Groupcache is not meant to handle this kind of load. It's not a
problem solvable by a memcached-like system.

All that said, I'd like to keep this thread focused on cancellation,
not caching.

Jason Hunter

unread,
Apr 29, 2015, 6:13:08 PM4/29/15
to Chris Kastorff, Egon Elbre, golan...@googlegroups.com
I had another thought based on a conversation I was having with Bryan Mills on another thread.  If the cancel interface were part of the language then consider the following additional language extension:

The go-statement is extended (in a backward compatible way) to optionally return a cancel instance if used in an assignment:

cancel := go func() {...}()

With the idea that the cancel token gets closed when the goroutine has completed and been torn down by the runtime.

Further supposed that if the function being called in the go-statement returns 1 or more values and the last return value is of type error then the cancel token will return that error value from its Err() interface after it becomes resolved.  If the func doesn't return an error value then Err() returns nil.

cancel := go func() error {... return err}()

This has wonderful composability!  Consider this code that someone on that other thread had asked about:

var wg sync.WaitGroup
errs := make(chan error, 3)
for i := range []int{1, 2, 3} {
wg.Add(1)
go func(i int) {
defer wg.Done()

// Do the work - what ever that is...  1/2 will fail here.
var err error
if i > 0 {
err = errors.New("some error")
}

// Write the outcome, nil for success, non-nil for error
errs <- err
}(i)
}
        wg.Wait()

for range a {
if err := <-errs; err != nil {
fmt.Printf("%v\n", err)
}
}

This is a typical fork-join where you are interested if any of the child computations had failed.  Consider this code rewritten using the cancel interface and the above go-statement extension:

var c cancel
for i := range []int{1, 2, 3} {
next := go func(i int) error {
// Do the work - what ever that is...  1/2 will fail here.
var err error
if i > 0 {
return errors.New("some error")
}
return nil
}(i)
c = cancel.Add(c, next)
}


if <-c.Done(); c.Err() != nil {
fmt.Printf("%v\n", c.Err())
}

That is pretty.  This does fork-join concurrency, uses only core languages features (no sync.WaitGroup).  The control flow is simple and obviously.

Lastly, Bryan and I talked about this piece of code:

for {
wg := sync.WaitGroup{}
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer time.Sleep(time.Nanosecond)
wg.Done()
}()
}
wg.Wait()
}

and recognized that this code will actually OOM because the programming model cannot guarantee that resources released by the goroutine (namely the goroutine itself) are actually released at the synchronization point wg.Done().  If this code were re-written using the above extension it looks like:

for {
for i := 0; i < 100; i++ {
var c cancel
c = cancel.Add(c, go func() {
defer time.Sleep(time.Nanosecond)
}())
}
<-c.Done()
}

Not only is this code simpler but it is guaranteed by the programming model to not OOM.  The aggregate cancellation token will not resolve until *after* the goroutine has already been torn down because the cancellation token is resolved by the runtime and not any code running within the goroutine.  This can only be guaranteed by a language/runtime level feature.

This model has incredible composability.  For instance, instead of fork-join concurrency, it could be used to implement efficient either-or concurrency:

var c2 cancel
c1 := go func() error {
// do some work and use c2 to check for cancellation.
}()
c2 = go func() error {
// do some work and use c1 to check for cancellation.
}()
c :=cancel.And(c1, c2)
  
if <-c.Done(); c.Err() != nil {
fmt.Printf("%v\n", c.Err())
return
}

This completes when exactly one of the sub-computations completes (either successfully or with an error) but its waits for both to tear down (and so doesn't leak resources or orphan computation).  It accomplishes this by cancelling the other computation when the first one is done, and then waiting for both to tear down.

So sweet.

- Jason.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/TQ5TdJEBamY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Jason Hunter

unread,
Apr 29, 2015, 6:28:05 PM4/29/15
to Chris Kastorff, Egon Elbre, golan...@googlegroups.com
Oops, I noticed a typo in my second to last example.  Should have been:

for {
var c cancel
for i := 0; i < 100; i++ {
c = cancel.Add(c, go func() {
defer time.Sleep(time.Nanosecond)
}())
}
<-c.Done()
}

Sorry for the confusion.

- Jason.

Chris Kastorff

unread,
Apr 30, 2015, 12:42:59 AM4/30/15
to Jason Hunter, golan...@googlegroups.com
That example makes me think of this problem differently and start to
see more value in using more than just a bare channel; your examples
have nothing to do with cancellation, but are instead focused on
completion. This object we're discussing really isn't about
cancellation or completion, it's a "set once" variable intended for
many readers, one writer, possibly with a rich reader API (including
combinators like your cancel.Add, compatibility with select, and a
blocking "await" (<-c.Done()).)

I think my mind was led astray by the name "cancellation". It really
is this generic useful pattern about boolean state that changes
precisely once; cancellation can be a common use case and a good
example, but in terms of what it does, "cancel" implies far too narrow
of a scope, and the other uses seem like they're stretching the
intended use (which shouldn't be the case.) I'm having trouble
thinking of an appropriate word for it that doesn't have narrow
implications...

Egon

unread,
Apr 30, 2015, 2:51:54 AM4/30/15
to golan...@googlegroups.com, jthu...@google.com, encr...@gmail.com, egon...@gmail.com
Not really, I could write a library that could do:

errs := async.Spawn(3, func(i int) error {
// Do the work - what ever that is...  1/2 will fail here.
var err error
if i > 0 {
return errors.New("some error")
}
return nil
})
fmt.Printf("Errors: %v\n", errs)
result := async.All(
func(terminated func()bool) error {
for !terminated(){
time.Sleep(100 * time.Millisecond)
}
return nil
},
func(terminated func()bool) error {
time.Sleep(200 * time.Millisecond)
return errors.New("blah")
},
)

// alternatively
if <-result.Done; result.Err() != nil {}

Egon

unread,
Apr 30, 2015, 2:53:01 AM4/30/15
to golan...@googlegroups.com, jthu...@google.com


On Thursday, 30 April 2015 07:42:59 UTC+3, Chris Kastorff wrote:
That example makes me think of this problem differently and start to
see more value in using more than just a bare channel; your examples
have nothing to do with cancellation, but are instead focused on
completion. This object we're discussing really isn't about
cancellation or completion, it's a "set once" variable intended for
many readers, one writer, possibly with a rich reader API (including
combinators like your cancel.Add, compatibility with select, and a
blocking "await" (<-c.Done()).)

I think my mind was led astray by the name "cancellation". It really
is this generic useful pattern about boolean state that changes
precisely once; cancellation can be a common use case and a good
example, but in terms of what it does, "cancel" implies far too narrow
of a scope, and the other uses seem like they're stretching the
intended use (which shouldn't be the case.) I'm having trouble
thinking of an appropriate word for it that doesn't have narrow
implications...

Promise/Future come into mind. :)

Egon

unread,
Apr 30, 2015, 2:57:20 AM4/30/15
to golan...@googlegroups.com, egon...@gmail.com
On Thursday, 30 April 2015 00:15:35 UTC+3, Chris Kastorff wrote:
There's a Close() method on the *CachedFile which must be called if
you get any *CachedFile from Get or GetMany (and you're guaranteed to
only get a non-nil *CachedFile/[]*CachedFile or a non-nil error from
these.) *CachedFile forces the file to never be garbage collected
while it is still open, so there's no safety race. Since that detail
was completely irrelevant to the discussion of cancellation, I skipped
it.  

Also, this is used for handling *large* files (tens of gigabytes
each), and a large number of them (about a terabyte per node.)
Groupcache is not meant to handle this kind of load. It's not a
problem solvable by a memcached-like system.

Yeah, I was somewhat guessing that there had to be some concerns like that.
 

All that said, I'd like to keep this thread focused on cancellation,
not caching.

The goal is not to focus on "cancellation", but on problem solving. "Cancellation" is as useful as how it can solve real problems.

The download/upload case is a real problem that you had and hence can show how the cancellation can improve things.

roger peppe

unread,
Apr 30, 2015, 3:34:35 AM4/30/15
to Egon, golang-nuts, jthu...@google.com, encr...@gmail.com
Indeed, there are already several (probably many) packages around
that do just this. Two arbitrary examples:

http://godoc.org/gopkg.in/tomb.v2#Tomb.Go
http://godoc.org/github.com/juju/utils/parallel#Run.Do

There are lots of ways to phrase this kind of thing and
the language seems to be doing pretty well at providing the
required primitives AFAICS.

cheers,
rog.

atd...@gmail.com

unread,
Apr 30, 2015, 10:45:54 AM4/30/15
to golan...@googlegroups.com, jthu...@google.com
I have a package laying around that does just that although it's not particularly well tested, or reviewed.

But typically, you might not want the ability to interrupt execution at such granularity because it will probably not scale.
Plus it would not be safe to do it naively so you would need to provide hints anyway about safe spots.
What we have currently is the best solution I think.

I will come back with what I have.

Jason Hunter

unread,
Apr 30, 2015, 2:45:42 PM4/30/15
to atd...@gmail.com, golan...@googlegroups.com
@rog, @atdiar thanks for the references.  This is EXACTLY my point.  EVERYONE has a library that does this.  I'm sure there are MANY libraries available.  Which is great!  Because doing this is so fundamental to writing concurrent programs that it is invariably necessary.  Unfortunately there are MANY libraries available.  And they are all a little different even though they all solve the same fundamental issues.  The differences are what hurts composability, readability, and maintainability.  I rewrote the above fork-join example using each of the packages that @rog mentioned.  Here are the results:

func jujuVersion() {
a := []int{1, 2, 3}
r := juju.NewRun(len(a))
for i := range a {
j := i
r.Do(func() error {
// Do the work - what ever that is...  1/2 will fail here.
if j > 0 {
return errors.New("some error")
}

return nil
})
}
err := r.Wait()

if err != nil {
switch errs := err.(type) {
case juju.Errors:
for _, e := range errs {
fmt.Printf("%v\n", e)
}
default:
fmt.Printf("%v\n", err)
}
}
}

func tombVersion() {
var t tomb.Tomb
for i := range []int{1, 2, 3} {
j := i
t.Go(func() error {
// Do the work - what ever that is...  1/2 will fail here.
if j > 0 {
return errors.New("some error")
}

return nil
})
}
err := t.Wait()

if err != nil {
fmt.Printf("%v\n", err)
}
}

func original() {
var wg sync.WaitGroup
errs := make(chan error, 3)
a := []int{1, 2, 3}
for i := range a {
wg.Add(1)
go func(i int) {
defer wg.Done()

// Do the work - what ever that is...  1/2 will fail here.
var err error
if i > 0 {
err = errors.New("some error")
}

// Write the outcome, nil for success, non-nil for error
errs <- err
}(i)
}
wg.Wait()

for range a {
if err := <-errs; err != nil {
fmt.Printf("%v\n", err)
}
}
}

These are all very similar but different.  Imagine having to compose these different libraries in the same program!  Even when writing this small sample as one program I ran into package versioning issues (juju actually imports tomb.v1 internally while I was using tomb.v2 directly in the example).  Each version has its own interface (t.Go, r.Do, etc.), its own limitations (tomb supports cancellation, juju doesn't), and I'm sure its own bugs.  Building a large project that might combine different libraries each of which internally uses a different implementation of these fundamentals is a composability nightmare.  

And this simple example doesn't even attempt to do cancellation.  Juju doesn't support cancellation at all in its fork-join framework.  So each use-case will roll its own on top.  Tomb supports cancellation but through a specific method on the tomb object.  So there is no way direct way to compose cancellation from a higher scope (say from a x/net/context or an explicit <-chan struct{}) to the tomb object without creating a goroutine that listens on the parent cancellation signal and then calls t.Kill():

cancel := make(chan struct{})  // passed in from higher scope
go func() {
select {
case <-cancel:
t.Kill(errors.New("Cancelled by parent"))
case <-t.Dying():
}
}()

Do you want to have to write this every place that two libraries compose cancellation/completion frameworks?  Will you get it right each time or will there be bugs?

These issues with composition, readability, ubiquity, and correctness I believe are very similar to the justifications for including error in the language and defining a single canonical pattern for error handling.  I'm not trying to be innovative here with cancellation or completion tracking.  There is nothing novel here.  I'm suggesting we standardize what everyone is already doing.  

The requirements at the top of this thread try to capture a set of observations that are distilled from many individual attempts at implementing these fundamentals over and over again from primitives in the language.  The go-statement makes concurrency a fundamental element of the language.  Tracking completion and providing cancellation go hand-in-hand with concurrency at all but the most modest scales.  It should also get first-class treatment in the language.

- Jason.

Egon

unread,
Apr 30, 2015, 4:04:51 PM4/30/15
to golan...@googlegroups.com, jthu...@google.com, atd...@gmail.com
I still haven't seen a real-world program containing the problems you are explaining.
I really do mean real-world... something that is running in production. The current examples you are showing are facilitated.
 
As for composing multiple of them; as far as I've noticed, such package are used as internal implementation details - not as the public abstraction. So there shouldn't be any problem composing multiple of them.

atd...@gmail.com

unread,
Apr 30, 2015, 7:00:36 PM4/30/15
to golan...@googlegroups.com, atd...@gmail.com, jthu...@google.com
The thing is, everyone has different requirements (as proven by the numerous packages) so, the only sensible thing would be to provide the lowest common denominator. But I am not sure it needs to be absolutely first-class. Again, it depends on what you want to be able to do and whether that makes sense.

I had started a small package to deal with cancellation: https://godoc.org/github.com/atdiar/Components/Task#example-package

Maybe it will inspire something. But look at this with two or three grains of salt if you do. Needs some more work(names, tests). The example seems to do what I want it to but I don't trust the code yet.

Tried to capture the Parent/Child goroutine semantic, cancellation by timeout or triggered manually, cancellation propagation, retrieval of specifics about the nature of the cancellation via an error channel.

Is anything else needed ?

(I welcome any review, I hope I haven't written any atrocities :)

da...@gratafy.com

unread,
Aug 21, 2015, 9:05:39 PM8/21/15
to golang-nuts, jthu...@google.com
Regarding real-world use-cases, I'm using exactly this proposed interface to support graceful shutdown of a tree of goroutines.  I'm sure everyone has their own flavor, but it would be helpful if there was at least one version of this pattern that was part of the standard library.

xiio...@gmail.com

unread,
Aug 22, 2015, 7:33:50 AM8/22/15
to golang-nuts, encr...@gmail.com, egon...@gmail.com, jthu...@google.com


On Wednesday, 29 April 2015 23:13:08 UTC+1, Jason Hunter wrote:
I had another thought based on a conversation I was having with Bryan Mills on another thread.  If the cancel interface were part of the language then consider the following additional language extension:

The go-statement is extended (in a backward compatible way) to optionally return a cancel instance if used in an assignment:

cancel := go func() {...}()

With the idea that the cancel token gets closed when the goroutine has completed and been torn down by the runtime.


I haven't fully absorbed this yet - but I like, but what about scope...

Does not returning to a channel achieve what is wanted, with less new stuff ?
c <- go func(){..}() 
Reply all
Reply to author
Forward
0 new messages