Go channels don't say they're closed until they've emitted junk past
the end of their content. Is this intentional?
Issac
package main
import (
"fmt";
"container/vector";
)
// This appends an extra zero to the output array.
func getTooMuch(in chan int, out chan []int) {
v := vector.NewIntVector(0);
for !closed(in) {
v.Push(<-in)
}
out <- v.Data();
}
// This outputs an array containing only the input numbers.
func getJustRight(in chan int, out chan []int) {
v := vector.NewIntVector(0);
for {
x := <-in;
if closed(in) {
break
}
v.Push(x);
}
out <- v.Data();
}
func run(get func(in chan int, out chan []int)) []int {
inChan := make(chan int);
outChan := make(chan []int);
go get(inChan, outChan);
inChan <- 1;
close(inChan);
result := <-outChan;
return result;
}
func main() {
fmt.Printf("%v\n", run(getTooMuch));
fmt.Printf("%v\n", run(getJustRight));
}
> Go channels don't say they're closed until they've emitted junk past
> the end of their content. Is this intentional?
Yes. This avoids a race condition between detecting that a channel
has been closed by another channel and retrieving the last value.
http://golang.org/doc/go_spec.html#Close_and_closed
Ian
On Dec 17, 11:59 am, Ian Lance Taylor <i...@google.com> wrote:
Ah, good to know.
In case anyone else is searching for this, here is a more concise way
to iterate over the channel without getting junk at the end:
func getJustRight(in chan int, out chan []int) {
v := vector.NewIntVector(0);
for x := range in {
v.Push(x);
}
out <- v.Data();
}
Thanks,
Issac
Rephrasing code from last answer:
| exists := false;
| for x := range in {
| v.Push(x);
| exists = true;
| break;
| }
|
| Would take one element if it exists.
I think that having b = <-a return true or false instead of value of
"a" would break some rules known from C and be odd to many, but has
more actual value than returning the value of "a".
Moreover, having:
a, b, c := <-x, <-y, <-z
should return true if none of those channels are closed (to keep away
the series of if's, which would be bad style).
Thus, one could do:
if a, b, c = <-x, <-y, <-z {
// do something with those values
} else {
// break the loop
}
As such, select statement should have an "else" clause when all
channels are closed.
This would not change a behavior of such kind of if clauses:
if <-boolchannel {
if you do:
for x := range in {
put <- x
}
then you won't send any junk down put. (the for range statement
discards the zero value)
> Rephrasing code from last answer:
> | exists := false;
> | for x := range in {
> | v.Push(x);
> | exists = true;
> | break;
> | }
> |
> | Would take one element if it exists.
>
> I think that having b = <-a return true or false instead of value of
> "a" would break some rules known from C and be odd to many, but has
> more actual value than returning the value of "a".
Assignment is not an expression in Go, so "b = <-a" does not evaluate
to anything.
We've discussed having the closed function return two values (perhaps
under a different name).
> As such, select statement should have an "else" clause when all
> channels are closed.
The select statement already has a default case. But when a channel
is closed the case for that channel is taken, and that is the way it
should work.
Channels are quite useful without ever worrying about closing them.
The point of closing a channel is to give a unique indicator for the
range clause, one that works for all channel types. If you aren't
using a range clause there are many data-specific ways to indicate
end-of-data on a channel, closing it just being one of them.
Ian
close is also useful for indicating EOF when there are many
readers reading from a single channel (e.g. a set of workers).
it means you don't have to have a separate channel to tell them
to exit (which means they don't have to take the performance hit of
an alt)
unfortunately you get a panic if you try to use this technique
and you have more than 2048 workers.
i'm not convinced this is ideal behaviour.
Qtvali <qtv...@gmail.com> writes:
Assignment is not an expression in Go, so "b = <-a" does not evaluate
to anything.
We've discussed having the closed function return two values (perhaps
under a different name).
The select statement already has a default case. But when a channel
is closed the case for that channel is taken, and that is the way it
should work.
Channels are quite useful without ever worrying about closing them.
The point of closing a channel is to give a unique indicator for the
range clause, one that works for all channel types. If you aren't
using a range clause there are many data-specific ways to indicate
end-of-data on a channel, closing it just being one of them.
Ian
Maybe the way of:
value, ok := <-chan, chan.closed()
would make the most sense in all cases. it should be supported by select clause, too.
As you probably know, the 2048 limit is to catch a single channel
spinning on a read of a closed channel. But we probably don't want to
record which channel tried the read, since that would increase the
size of the channel structure for everybody to handle a rather unusual
case.
You could handle your case with minor efficiency loss by using a tree
of channels. But you would have to know about the issue first.
What is the case where you have 2048 readers and you want to be able
to shut them down but not actually exit the program?
Ian
inspired by john asmuth's example, i implemented a concurrent
reduce operator where many workers read values from a channel
and reduce them to a single value.
it's an interesting problem actually, and there are quite
a few ways of doing it. one distinction between appropriate
algorithms seems to be how fast the workload
can be computed and how much it takes in the way
of local resources.
for instance, if i've got a 4 processor local machine and
the tasks are compute bound, then it doesn't make
any sense to run more than four workers at once.
but if i'm sending jobs out over the network with minimal
data that take some time, then it makes sense to have
as many going as i have remote workers (maybe > 2048).
to fan the channel out to a tree seems unnecessary,
especially as it slows things down considerably when the
tasks are small. for example, in this example: http://gopaste.org/view/3p52Z
the constant overhead is really quite low.
to be honest, i think i'd prefer it if read and write operations
on a closed channel were distinguished. it's in the nature of read
operators that the code usually depends on the value read,
so out-of-control spinning is going to be much rarer.
you get that same asymmetricality with unix pipes - writes kill
the process where reads just return 0. occasionally you do
get a spinning process, but it's pretty rare.
> to be honest, i think i'd prefer it if read and write operations
> on a closed channel were distinguished. it's in the nature of read
> operators that the code usually depends on the value read,
> so out-of-control spinning is going to be much rarer.
>
> you get that same asymmetricality with unix pipes - writes kill
> the process where reads just return 0. occasionally you do
> get a spinning process, but it's pretty rare.
I suppose the difference is that read from a pipe returning 0 means
that you got no data, whereas read from a channel returning 0 means
that you got a value which happened to be 0.
I agree that it would be nice if this could be handled better, but I'm
not sure how. Suggestions certainly welcome.
Ian
"val, ok := <-ch" should be a threadsafe operation that blocks, rather
than a non blocking poll. Equivalent to "val, ok := <-ch, closed(ch)"
except it would be threadsafe.
Then to do a non-blocking poll of ch, you'd have to use select
select {
case val = <-ch:
ok = true
default:
ok = false
}
I think that most of the time people want to block on the channel,
since it's often used for synchronization.
I can't think of an elegant way to do this without changing the
meaning of "_,_ = <-ch".
- John
On Dec 18, 1:25 pm, Ian Lance Taylor <i...@google.com> wrote:
> This would break plenty of code, so it might not be feasible.
>
> "val, ok := <-ch" should be a threadsafe operation that blocks, rather
> than a non blocking poll. Equivalent to "val, ok := <-ch, closed(ch)"
> except it would be threadsafe.
There has been some discussion of changing the function closed to
return two values like this (and possibly renaming it at the same
time).
At first I didn't see how that would address this issue, but now I do;
when using the closed (or whatever) function on a closed channel, we
wouldn't increment the error count. Sounds like another argument in
favor of this change.
Ian
i think i like this approach.
non-blocking receive is rarely used, and when it is,
it's often for the wrong reason.
by making it more syntactically heavyweight, it perhaps
becomes a less obvious solution, and it'll stand out more
in the code.
and with gofmt, making the change shouldn't be too hard,
if you think you need to busy wait, you're doing things
the wrong way. in quite a few years of writing code
with these kinds of primitives, i have *never* needed to busy wait.
it won't work properly either - you've turned a synchronous channel
into a channel with a buffer size of 1 - which can easily
introduce deadlock.
the basic idea, though, is a possibility - you could call it "dup".
the underlying data structure would look like:
struct Channel {
int closed
HChan *c;
}
so all the actual channel operations go to the underlying
channel, the close is unique for each dupped channel.
but would dup(c) be equal to c?
if you think you need to busy wait, you're doing things
the wrong way. in quite a few years of writing code
with these kinds of primitives, i have *never* needed to busy wait.
it won't work properly either - you've turned a synchronous channel
into a channel with a buffer size of 1 - which can easily
introduce deadlock.
the basic idea, though, is a possibility
> Is there a way to ask a channel if someone is trying to read it? That could
> fix the buffer-of-1 issue. I can also think of a way to do this without busy
> wait, if that question can be asked.
There is no way to ask whether some goroutine is waiting to read a
channel without sending it a value. You can use a nonblocking send,
of course. I think that asking whether a goroutine is waiting to read
without sending a value would be vulnerable to race conditions.
Ian
no, you're right - what you're trying to do is what a channel does,
and that would be fine if you had a function-based interface,
but not when you want a channel.
but i still stand by my statement. if you *ever* find yourself
busy waiting, it's a sure sign you're doing something wrong.
For the blocking check-if-closed/get-value:
var v int
ok := false
for v = range in { ok = true; break }
I showed you how, with a pair of cheap wrap/unwrap operations, you can
get the same effect without a busy wait by passing boxed values that
can be distinguished from the 0 passed when the channel is closed.
Ian then pointed out that you can box more simply by having a channel
that passes pointers to things, then testing whether the pointer is
nil. If you're always passing a pointer to actual things, then nil
only shows up when the channel is closed.
This provides race free ways to pass arbitrary data through a channel
in a threadsafe way. However you can't make the wrapping/unwrapping
transparent in the way that your code did.
Personally I don't think that the busy wait overhead is justified by
eliminating that visible wrapping code.
Cheers,
Ben
On 12/18/09, John Asmuth <jas...@gmail.com> wrote:Define "what your code does".
> On Fri, Dec 18, 2009 at 2:21 PM, roger peppe <rogp...@gmail.com> wrote:
> > if you think you need to busy wait, you're doing things
> > the wrong way. in quite a few years of writing code
> > with these kinds of primitives, i have *never* needed to busy wait.
> >
> I'd love to see a way to do what my code does without a busy wait. I don't
> think the language primitives are there.
>
I showed you how, with a pair of cheap wrap/unwrap operations, you can
get the same effect without a busy wait by passing boxed values that
can be distinguished from the 0 passed when the channel is closed.
Ian then pointed out that you can box more simply by having a channel
that passes pointers to things, then testing whether the pointer is
nil. If you're always passing a pointer to actual things, then nil
only shows up when the channel is closed.
This provides race free ways to pass arbitrary data through a channel
in a threadsafe way. However you can't make the wrapping/unwrapping
transparent in the way that your code did.
Personally I don't think that the busy wait overhead is justified by
eliminating that visible wrapping code.
There has been some discussion of changing the function closed toreturn two values like this (and possibly renaming it at the same
time).
At first I didn't see how that would address this issue, but now I do;
when using the closed (or whatever) function on a closed channel, we
wouldn't increment the error count. Sounds like another argument in
favor of this change.
I think you are simply wrong (and I worded badly what I thought).
There is no single reason to have "val, ok := <-ch, closed(ch)" to be
non-threadsafe. I think that too few people want to do it in non-
threadsafe way to put an exception there; better one can write her own
non-threadsafe function and use that if needed. I think that one-line
channel reads should be specifically ordered for thread-safety in such
way that this "ok" exactly says if channel was open when "val" was
sent.
Having val, ok := <-ch removes way to do "val, val2 := <-ch, <-ch2;"
and makes odd "ch, ch2 <- val, val2".
Which would panic() if all channels were not closed at once? This
would guarantee that all receives were in sync with sends. This would
also have this use:
Sender:
ch1, ch2, ch3, error, errorplace <- val1, val2, val3, errval,
reflection.callStack();
close(ch1, ch2, ch3, error, errorplace);
Receiver 1:
v1, v2, v3, ok := <-ch1, <-ch2, <-ch3, closed(ch1, ch2, ch3); // Will
panic if closed(ch1) != closed(ch2) || closed(ch2) != closed(ch3)
Receiver 2:
error, errorplace, ok := <-errch, <-errpch, closed(errch, errpch) //
Will panic if closed(errch) != closed(errpch)
This is a kind of assert for safety - did anything read a value at
wrong time, it will fail.
On Dec 18, 4:06 pm, Qtvali <qtv...@gmail.com> wrote:
> On Dec 18, 8:39 pm, John Asmuth <jasm...@gmail.com> wrote:
>
> > This would break plenty of code, so it might not be feasible.
>
> > "val, ok := <-ch" should be a threadsafe operation that blocks, rather
> > than a non blocking poll. Equivalent to "val, ok := <-ch, closed(ch)"
> > except it would be threadsafe.
>
> I think you are simply wrong (and I worded badly what I thought).
> There is no single reason to have "val, ok := <-ch, closed(ch)" to be
> non-threadsafe.
I was not saying that "<-ch,closed(ch)" shouldn't be threadsafe. I am
saying it simple *is not* threadsafe. "val, ok := <-ch, closed(ch)" is
syntactic sugar for the following code:
val := <- ch
ok := closed(ch)
What is to say that some other thread won't try to read from ch in
between those two statements? Unless you propose that any instructions
that are separated by a comma be atomic (and that isn't a good idea
for a number of reasons), the "<-ch,closed(ch)" idiom will not be
threadsafe.
- John
I was being sarcastic for some reason, sry. Possibly as my proposal
was left unread :)
What I meant - what stops just to write this case into the manual?
I mean:
"a, b = b, a" // By definition: handles the case of swap, not ends up
with a == b
looking at:
val, ok := <-ch, closed(ch);
val, ok := <-ch
If the latter can be parsed, why not the first?
The general rule would be:
* Having "closed(x)" keyword on the same comma-separated list of
expressions with <-x will make it so that <-x will be executed first.
If there is a case:
func a() int { return <-ch; }
And:
val, ok := a(), closed(ch);
They are not on the same line and thus this is not thread-safe.
Having:
func a(int) { }
And:
_, ok := a(<-ch), closed(ch);
Now they are inside the same expression list, thus "ok" will be
"false" if and only if the value from <-ch was nilled.
These are pretty simple rules.
This was actually an implication joke.
If a is equivalent to b except in a having parameter c, then it
implies that b has no parameter c. So I joked about the sentence built
in such way as a suggestion to have no thread-safety in the latter.
I think that the "val, ok" case would look rather nice and clever -
and it certainly is undocumented case right now so that one of
possible things it could do occasionally is exactly the case, which is
needed; other possibilities can just be dropped and then it's very
nice way to do this thing. I would love to read that from the manual
about as much as I would love to read that "a, b = b, a" does
something useful instead of giving undocumented result.
> What I meant - what stops just to write this case into the manual?
>
> I mean:
> "a, b = b, a" // By definition: handles the case of swap, not ends up
> with a == b
The "b" and "a" on the right hand side are evaluated independently,
but they are not evaluated atomically.
> looking at:
> val, ok := <-ch, closed(ch);
> val, ok := <-ch
>
> If the latter can be parsed, why not the first?
>
> The general rule would be:
> * Having "closed(x)" keyword on the same comma-separated list of
> expressions with <-x will make it so that <-x will be executed first.
That rule already exists, but I think you want something else: you
want the two expressions ("<-ch" and "closed(ch)") to be computed
atomically. That is very unlikely to be added to Go.
Ian
"val, ok <- ch"
I really like this. I thought that the language worked like this originally, and was surprised when it didn't.
I think the current way lacks orthogonality with select and isn't consistent with other uses of val,ok.
Ryanne
- from my phone -
On Dec 18, 2009 12:40 PM, "John Asmuth" <jas...@gmail.com> wrote:
This would break plenty of code, so it might not be feasible.
"val, ok := <-ch" should be a threadsafe operation that blocks, rather
than a non blocking poll. Equivalent to "val, ok := <-ch, closed(ch)"
except it would be threadsafe.
Then to do a non-blocking poll of ch, you'd have to use select
select {
case val = <-ch:
ok = true
default:
ok = false
}
I think that most of the time people want to block on the channel,
since it's often used for synchronization.
I can't think of an elegant way to do this without changing the
meaning of "_,_ = <-ch".
- John
On Dec 18, 1:25 pm, Ian Lance Taylor <i...@google.com> wrote:
> roger peppe <rogpe...@gmail.com> writes: > > to be honest, i think i'd prefer it if read and write...
You can prevent race condition if you do, for example:
<?-chan; // A method to read when there is a writer
chan <?- value; // A method to write when there is a reader
First of those two should set "hasReader" true for some 1
milliseconds, 2 milliseconds on second nearby try etc. Second one
should set "hasWriter" true for some 1 milliseconds. I don't know if
it's reasonable, but it's a way to avoid a race condition [to keep it
general - not this specific timer, but the conception that the check
if a channel has writer will make it more probable that it has a
reader].
Such kind of channel is especially slow in case there are few threads
sending messages to each other most of time. But it's also probable
that when doing this kind of check, this routine actually has
something to do.
Another, better way to avoid a race condition is still making channels
bounded to specific routines - if bound occurs, then it will have
reader/writer; if routine dies or unbound happens, it changes it's
state to not having writer or closed. Such bounds, of course, must be
managed as well as memory.
I give an example of avoiding this race condition. It might be, and
probably is, somewhat inefficient solution, but that was still an
interesting play with this conception :)
I write pseudocode. This is inefficient more or less (takes a lot of
memory), but it's probably a good starting point – and those
statistical things can usually be made compact. Also it does about 20
useless receive tries. Also it uses float type, which could be uint64,
but to keep calculations clearer. I think it would anyway need another
channel type, which allows waiting, otherwise it's bad way to waste
memory – maybe even to provide a function, which return sender and
receiver channels of given type with given “normwait” and other
consts. That when generics are done. Anyway, having keyword to be
overloaded there would be nice in all cases, channel operator
overload.
Excpected behavior: if receiver and sender appear at the same time,
there is short cycle with timeouts getting longer and longer until
they catch each other, after that the configuration is considered to
be OK, small delays can be there for a while, making configuration
more exact. If sender disappears, receiver stays, then receiver is
having some delays for a while until it stops waiting there. If new
sender comes now, it will possibly wait for receiver enough time. If
receiver disappears, some time goes and then sender disappears, new
receiver would have delays in beginning until values left from old
sender are cleaned. Some tuning and it could be reasonably fast; like
some engine switched on or off.
// It supposes that there is some normal waiting time – if waiting
takes longer, it supposes that it's OK to not wait for it.
const normwait 1.0;
type waitingChannel struct {
// Time, when last receive was done
// in some native time units.
time_last_receive_try float;
time_last_send_try float;
// Time amounts between two receive tries
time_receive [20]float;
time_send [20]float;
time_receive_sum float;
time_send_sum float;
// Current pointer to receive or send array
current_receive byte;
current_send byte;
// Count of failed tries
failed_receive float;
failed_send float;
}
// no division-by-zero checks
func (w waitingChannel) try_receive() (data int) {
if there_is_sender {
// return received data (parameters OK).
} else_at_the_same_time_break_when_sender_is_there {
// timeshot before wait [(?)]
current_time := ctime() - w.time_last_receive_try;
fs := w.failed_send / (w.current_send_sum / w.current_receive_sum);
sleep(w.time_send_sum / 20 * (w.failed_send if w.failed_send>1.0
else 1.0));
// timeout, so atomically:
w.current_receive++;
w.current_receive_sum -= w.time_receive[w.current_receive];
w.time_receive[w.current_receive] = current_time;
w.current_receive_sum += w.time_receive[w.current_receive];
// Balance that receive failed and sender might be not there
// also their lower boundary is zero, should be checked
failed_receive += 2 * w.current_receive_sum / w.current_send_sum;
failed_send -= w.current_send_sum / w.current_receive_sum;
}
if time_last_receive_try != 0 {
w.time_last_receive_try = w.current_receive;
}
}
// There is exact analog for sending
The race is a fundamental "time of check to time of use" error:
1. you check whether the goroutine is receiving; it is.
2. someone else sends to the goroutine; now it's not.
3. because it was receiving before, you try to send to the goroutine.
Adding timeouts and retries and such may obscure the
race but cannot avoid it. The way to avoid it is to do
the check and the send at the same time.
Russ
> On Dec 18, 9:41 pm, Ian Lance Taylor <i...@google.com> wrote:
>> John Asmuth <jasm...@gmail.com> writes:
>> There is no way to ask whether some goroutine is waiting to read a
>> channel without sending it a value. You can use a nonblocking send,
>> of course. I think that asking whether a goroutine is waiting to read
>> without sending a value would be vulnerable to race conditions.
>>
>> Ian
>
> You can prevent race condition if you do, for example:
> <?-chan; // A method to read when there is a writer
> chan <?- value; // A method to write when there is a reader
Those are already in the language.
What I meant was that there is no way to test whether a goroutine is
waiting to read without sending it a value.
Ian
Qtvali <qtv...@gmail.com> writes:
What I meant was that there is no way to test whether a goroutine is
waiting to read without sending it a value.
Russ
I read this in the Language Specification
"Upon creation, a channel can be used both to send and to receive
values. By conversion or assignment, a channel may be constrained only
to send or to receive. This constraint is called a channel's
direction; either send, receive, or bi-directional (unconstrained)."
but have not checked the implementation. Which is true?
Duncan.
This was context-specific notion of being unidirectional - you can,
yes, both send and receive with a channel, but you can't do it so that
one side is sending and receiving, but does not get messages sent by
itself. If you send something to channel and right after that receive
something from that, you might get something you just sent by
yourself. Thus, you cant create an int channel to send queries and get
replies - queries and replies will be messed up, there is no way to
receive only queries by one side and only replies by another. Anyway,
you don't need different channels for sending queries and receiving
them.
Thanks Qtvali. That is a lot clearer now. Yes I had already tripped
over the problem of receiving what I had just sent.
that's only true if the channel is buffered.
for synchronous (unbuffered) channels, the default case, it is not possible
to read something that you've just written.
> Thus, you cant create an int channel to send queries and get
> replies - queries and replies will be messed up, there is no way to
> receive only queries by one side and only replies by another.
you absolutely *can* create an int channel to send queries and get
replies, although if there's more than one process sending
queries, and more than one can be processed concurrently,
then you'll need a separate reply channel to ensure that
the reply goes to the correct process.
i'm afraid i don't know what russ meant when he said:
> Channels are unidirectional.
because the only unidirectional channels that i'm aware of
are those with the <- type annotation.
however it is often conventional to treat particular channels as
unidirectional, even if they're not so annotated.
2009/12/22 Qtvali <qtv...@gmail.com>:
i'm afraid i don't know what russ meant when he said:
> Channels are unidirectional.
because the only unidirectional channels that i'm aware of
are those with the <- type annotation.
however it is often conventional to treat particular channels as
unidirectional, even if they're not so annotated.
I meant that the information flow on channels is
unidirectional (except for the implied knowledge
that someone got the value you sent).
I was replying to the suggestion that
c <- f()
first ensure that the send is ready before calling f.
Doing that would make the information flow bidirectional,
because you'd be able to tell that a value was wanted
before computing the value. If you need that, it can
be simulated with two channels, one to request a value
and one to read the value back.
Russ