Hi,
Haven't had a lot of time to fix this recently but I wanted to address
some of the comments.
2013/3/8 <
dvy...@google.com>:
That will get inlined, yes? With some of the other comments you made,
the number of times this is copied will probably decrease (as a result
of implementing the work stealing).
That's fair -- didn't want to make this a bigger change than I needed
to. I'll look at doing this.
Good point. I'm going to remove the other places where I changed this
that aren't specifically related to the run queue. I didn't look at
independed load / store dependencies, and that's clearly showing.
This check won't work. We can't enqueue a new value into the ring if
that would overflow the ring. So, if delta == start, we don't insert.
If we have a ring of size 4, the tail is at 0x3 and the head is at
0x0, this will end up being if(0&mask), which is of course always
false.
The store fence is required to serialize the update of the tail with
respect to putting the entry in the ring, so it's not really a
question of whether this interface provides the fence. It's to make
sure that all other consumers do not read that the tail is updated
without being guaranteed to be able to read the entry at the tail.
I will change these for consistency / to reduce diff size as suggested later.
This is to guarantee consistency of our view of the snapshot of the head.
Same reason, guarantee we're looking at the proper state of the head
when we do the assignment from the ring.
It would be possible to do this with a memcpy indeed, but you have to
do the copy every iteration of the loop because head may have changed.
Another way I was thinking of doing it was just to "steal" the blocks
and do the copies once that was successful -- but the problem is that
another producer might invalidate the consistency of the ring by the
time the copy started. I'll do the memcpy approach for now.
To serialize loading with respect to updating the head poitner.
Good call. Will do.
Also good call.
Then we pay the cost for assigning to the old value everywhere, even
when we don't need it. I think this just adds overhead.