template<typename T>
class spsc_lifo_stack
{
public:
struct node
{
node* volatile next_;
T data_;
node(node* next = 0) : next_(next) {}
};
spsc_lifo_stack()
{
head_ = new node();
}
~spsc_lifo_stack()
{
node* head = head_;
node* next;
while (head)
{
next = head->next_;
delete head;
head = next;
}
}
void push(T const& v)
{
node* n = new node(head_);
n->next_ = head_;
head_->data_ = v;
head_ = n; // store-release
}
bool pop(T& v)
{
node* volatile head = head_; // load-depends
node* volatile next = head; // load-depends
if (0 == next)
return false;
v = next->data_;
head->next_ = next->next_;
return true;
}
private:
node* volatile head_;
};
Dmitriy V'jukov
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I don't think you have to set n->next_ right here because the constructor of
node already did that for you.
> head_->data_ = v;
> head_ = n; // store-release
> }
>
> bool pop(T& v)
> {
> node* volatile head = head_; // load-depends
> node* volatile next = head; // load-depends
> if (0 == next)
> return false;
> v = next->data_;
> head->next_ = next->next_;
> return true;
> }
>
> private:
> node* volatile head_;
> };
I need to look over this code some more. Okay, we got fast spsc-queue/stack.
You can always apply the multiplexing thing I mentioned.
Humm... Well, its little more difficult to create a spsc_lifo because the
threads are always meeting. There is no head and tail pointer.
It seems like you are trying to bind the push function to the _head pointer,
and pop function to the _head->next pointer.
>
> Dmitriy V'jukov
I think if you combine the spsc_stack and spsc_queue algorithms you
can get spsc_deque.
I.e. one thread can execute push_front() or push_back(), and another
pop_front() or pop_back().
I think implementation must be straightforward.
Dmitriy V'jukov
> I don't think you have to set n->next_ right here because the constructor of
> node already did that for you.
Arghhh! Yes, of course.
> I need to look over this code some more. Okay, we got fast spsc-queue/stack.
> You can always apply the multiplexing thing I mentioned.
What "multiplexing thing"?
Dmitriy V'jukov
> Okay, we got fast spsc-queue/stack.
Do someone see it before?
Dmitriy V'jukov
> It seems like you are trying to bind the push function to the _head pointer,
> and pop function to the _head->next pointer.
Yes.
Only producer modify head_
head_ always != 0
head_->data_ always contains fake value
head_->next_ is indication that stack contains at least one element
Only consumer modify head_->next_
Dmitriy V'jukov
This:
http://groups.google.com/group/comp.programming.threads/msg/4230a9d3cbc20ea9
http://groups.google.com/group/comp.arch/msg/a7501814f19731cd
http://groups.google.com/group/comp.arch/msg/ac41b1c179f0e7ca
> void push(T const& v)
> {
> node* n = new node(head_);
> n->next_ = head_;
> head_->data_ = v;
> head_ = n; // store-release
> }
>
> bool pop(T& v)
> {
> node* volatile head = head_; // load-depends
> node* volatile next = head; // load-depends
> if (0 == next)
> return false;
> v = next->data_;
> head->next_ = next->next_;
> return true;
> }
Little corrections:
void push(T const& v)
{
node* n = new node(head_);
head_->data_ = v;
head_ = n; // store-release
}
bool pop(T& v)
{
node* volatile head = head_; // load-depends
node* volatile next = head->next_; // load-depends
if (0 == next)
return false;
v = next->data_;
head->next_ = next->next_;
delete next;
return true;
}
Dmitriy V'jukov
Haha. Was going to point that out, but you caught it before me. Very
cool. This version is a winner as far as speed is concerned.
My implementation (same as yours really) does ~ 9.1Mops/s
in contrast, the second-fastest algo (which is Microsoft's LFStack
implementation) does ~8Mops/s
[...]
I have not seen a spsc-stack before. Has anyone else?
It’s interesting and useful implementation. But I can't understand the
memory ordering guaranty. Couldn't you comment it? What does mean
"store-release"? Why can’t a consumer thread read a fake value, is
there data dependency?
Yes, there is a data race.
First of all, let me give you a suggestion if it is wellcome: It would
be nice if you try to run your code before you publish it here. Your
API has a defect as well.
Now about the data race. Consider the following scenario:
Producer has pushed item1 and item2. Now consumer is about to pop a
data element. Consumer copies `head_' into local scope. At that point
the producer interferes and pushes item3 into the data structure
updating `head_'. Now your data structure contains [item1, item2,
item3] and `head_' points to item3. However, consumer continues and
returns with item2 since its local `head' still points to it---data
race!
The data structure is left behind with [item1, item3]. Therefore, it
is not an spsc-lifo but a sequential lifo. So, "atomic-free" does not
count, whatever you are trying to say with it.
Furthermore, I highly recommend you to try to write unit tests for
your code.
Best Regards,
Szabolcs
> Yes, there is a data race.
> First of all, let me give you a suggestion if it is wellcome: It would
> be nice if you try to run your code before you publish it here. Your
> API has a defect as well.
> Now about the data race. Consider the following scenario:
> Producer has pushed item1 and item2. Now consumer is about to pop a
> data element. Consumer copies `head_' into local scope. At that point
> the producer interferes and pushes item3 into the data structure
> updating `head_'. Now your data structure contains [item1, item2,
> item3] and `head_' points to item3. However, consumer continues and
> returns with item2 since its local `head' still points to it---data
> race!
Humm, yeah; I think your right!
> The data structure is left behind with [item1, item3]. Therefore, it
> is not an spsc-lifo but a sequential lifo. So, "atomic-free" does not
> count, whatever you are trying to say with it.
> Furthermore, I highly recommend you to try to write unit tests for
> your code.
[...]
I think you just misunderstand what is non-blocking synchronization
and how it works. Read about 'linearizable' property of non-blocking
data structures and about 'serialization points' in non-blocking
algorithms.
My algorithm is 'linearizable', and it's a 'good' property.
Particularly 'serialization point' of push() is last store 'head_ =
n;', 'serialization point' of pop() is first load 'node* volatile head
= head_;'.
So scenario you describe has following linearized representation:
1. push 1
2. push 2
3. pop 2
4. push 3
And algorithm gives perfectly correct output for this sequence.
Informally you can just reason that 'pop 2' happens 'before' 'push 3'.
> Your API has a defect as well.
What defect?
> First of all, let me give you a suggestion if it is wellcome: It would
> be nice if you try to run your code before you publish it here.
>
> Furthermore, I highly recommend you to try to write unit tests for
> your code.
You must consider this code as 'detailed sketch of the algorithm' and
not as 'production-ready solution'.
And in general case I will *not* write any unit tests. Because instead
of writing unit tests and fixing trivial bugs I better invent new
superior algorithm.
If you want to use this algorithm in production you can (and I highly
recommend it) write unit tests yourself.
Dmitriy V'jukov
> > Can someone notice some data races? Comments and suggestions are
> > welcome too.
> > Yes, there is a data race.
> > First of all, let me give you a suggestion if it is wellcome: It would
> > be nice if you try to run your code before you publish it here. Your
> > API has a defect as well.
> > Now about the data race. Consider the following scenario:
> > Producer has pushed item1 and item2. Now consumer is about to pop a
> > data element. Consumer copies `head_' into local scope. At that point
> > the producer interferes and pushes item3 into the data structure
> > updating `head_'. Now your data structure contains [item1, item2,
> > item3] and `head_' points to item3. However, consumer continues and
> > returns with item2 since its local `head' still points to it---data
> > race!
>
> Humm, yeah; I think your right!
Ok. And what output you are expecting from algorithm in such
scenarion?
Dmitriy V'jukov
>> > Can someone notice some data races? Comments and suggestions are
>> > welcome too.
>> > Yes, there is a data race.
[...]
>>
>> Humm, yeah; I think your right!
>Ok. And what output you are expecting from algorithm in such
>scenarion?
Well, I like the algorithm, and I would expect the outcome that Szabolcs
outlined.
He didn't describe expected output.
Dmitriy V'jukov
I absolutely agree with you. It is unimportant which of the action 3
or 4 takes place at first when non-blocking synchronization is used.
It may be important only when additional external lock-based
synchronization is used.
He described a real scenario. I am not trying to say your algorithm is
broken.
I would expect that item2 gets removed; not item3. Szabolcs pointed that
out. Does that mean your algorithm is broken? I say no.
I think I will implement the algorithm as-is (e.g., with your correction)
and post the code here. I will do it in pure asm. That way, everybody can
run this thing and _learn_ from it. Okay?
;^)
No problem ;)
Dmitriy V'jukov
> >>> >> Humm, yeah; I think your right!
> >>> >Ok. And what output you are expecting from algorithm in such
> >>> >scenarion?
>
> >>> Well, I like the algorithm, and I would expect the outcome that Szabolcs
> >>> outlined.
>
> >> He didn't describe expected output.
>
> I would expect that item2 gets removed; not item3.
It's actual behaviour. What is expected?
> Szabolcs pointed that
> out. Does that mean your algorithm is broken? I say no.
At this point I don't see any defects.
Dmitriy V'jukov
Ok. It's a real scenario. I agree.
Szabolcs Ferenczi and you are saying that my algorithm gives out...
'not optimal results'. Please describe what is 'optimal results' for
such scenario. Describe requirements for pop() function, what it must
return.
Dmitriy V'jukov
> It's interesting and useful implementation. But I can't understand the
> memory ordering guaranty. Couldn't you comment it? What does mean
> "store-release"? Why can't a consumer thread read a fake value, is
> there data dependency?
"store-release" memory barrier means that no previous store and no
previous load can move below this store.
x86 architecture has implied store-release memory barrier associated
with every normal (i.e. non-temporal) store.
"load-depends" memory barrier means that data loaded through this
pointer will be not older than the pointer itself.
x86 architecture has implied load-depends memory barrier associated
with every load of a pointer.
Dmitriy V'jukov
>> >>> >> Humm, yeah; I think your right!
>> >>> >Ok. And what output you are expecting from algorithm in such
>> >>> >scenarion?
>>
>> >>> Well, I like the algorithm, and I would expect the outcome that
>> >>> Szabolcs
>> >>> outlined.
>>
>> >> He didn't describe expected output.
>>
>> I would expect that item2 gets removed; not item3.
>It's actual behaviour. What is expected?
wrt the senerio outlined by Szabolcs, I would expect that item2 gets
removed.
>> Szabolcs pointed that
>> out. Does that mean your algorithm is broken? I say no.
>At this point I don't see any defects.
Agreed.
Here is a crude sample implmentation of your algorihtm:
http://appcore.home.comcast.net/misc/spsc_stack_c.html
The x86 assembly part is simple and it fully compiles with VC++ 6.0 or
higher:
_________________________________________________________________
__declspec(naked)
void spsc_stack_push(
spsc_stack* const _this,
node* const n,
void* const state
) {
_asm {
MOV EAX, [ESP + 4]
MOV ECX, [ESP + 8]
MOV EDX, [EAX]
MOV [ECX], EDX
MOV ECX, [ESP + 12]
MOV [EDX + 4], ECX
MOV ECX, [ESP + 8]
MOV [EAX], ECX
RET
}
}
__declspec(naked)
node* spsc_stack_trypop(
spsc_stack* const _this
) {
_asm {
MOV ECX, [ESP + 4]
MOV EDX, [ECX]
MOV EAX, [EDX]
TEST EAX, EAX
JE spsc_stack_trypop_failed
MOV EDX, [EAX]
MOV ECX, [ECX]
MOV [ECX], EDX
spsc_stack_trypop_failed:
RET
}
}
_________________________________________________________________
As you can see, the "meat" of Dmitriy's algorithm is currently implemented
with VC++ inline assembly. I am going to move it off into a separate
externally assembled file, and do it in MASM and AT&T syntax. I am also
going to create some simple test applications using pthreads.
Enjoy!
:^)
Ok, everything is clear now. "Store-release" means store with release
semantics, which is equivalent to #LoadStore | #StoreStore membar.
Thank you.
> Ok, everything is clear now. "Store-release" means store with release
> semantics, which is equivalent to #LoadStore | #StoreStore membar.
> Thank you.
Bingo!
So where you see data race? In my opinion algorithm is completely
linearizable.
Dmitriy V'jukov
I did not say I saw one. I only confirmed that senerio Szabolcs outlined
was real; item2 gets removed. Is that a data-race? I would have to say, no
it is not.
> In my opinion algorithm is completely linearizable.
I think I agree with that assertion. I have implemented this thing, and it
seems to work fine. Have you tried doing a proof on it yet? I think you can
prove that its linearizable.
> Yes, there is a data race.
> First of all, let me give you a suggestion if it is wellcome: It would
> be nice if you try to run your code before you publish it here. Your
> API has a defect as well.
I created an implmentation for the x86:
http://appcore.home.comcast.net/misc/spsc_stack_c.html
You can run it all you want...
;^)
> Now about the data race. Consider the following scenario:
> Producer has pushed item1 and item2. Now consumer is about to pop a
> data element. Consumer copies `head_' into local scope. At that point
> the producer interferes and pushes item3 into the data structure
> updating `head_'. Now your data structure contains [item1, item2,
> item3] and `head_' points to item3. However, consumer continues and
> returns with item2 since its local `head' still points to it---data
> race!
> The data structure is left behind with [item1, item3]. Therefore, it
> is not an spsc-lifo but a sequential lifo. So, "atomic-free" does not
> count, whatever you are trying to say with it.
item2 was inserted before item3 at a specific point in time.
____________________________________________________
void push(T const& v)
{
A1: node* n = new node(head_);
A2: head_->data_ = v;
A3: head_ = n; // store-release
}
bool pop(T& v)
{
B1: node* volatile head = head_; // load-depends
B2: node* volatile next = head->next_; // load-depends
B3: if (0 == next)
B4: return false;
B5: v = next->data_;
B6: head->next_ = next->next_;
B7: delete next;
B8: return true;
}
____________________________________________________
The start of the "main" linearization point for the algorithm is A3, the end
of that point is B1. This means that A3 will start at point in time in which
an item is made visible to B1. In your example we can say that item2 was
introduced (e.g., executed A3) _before_ item3. Therefore, B1 MUST remove
item2 in order to complete the linearization process. If item3 was removed,
then the algorithm would not be linearizable, and it would not be correct.
BTW, this same property holds for a lock-based stack. Think about it...
A lock-based stack with two items:
item2->item1->NULL;
The consumer takes the lock; get preempted by the producer which tries to
take the lock in order to push item3, and blocks; the consumer resumes and
removes item2. The state is now:
item1->NULL;
the consumer unlocks; the producer locks and inserts item3 and unlocks. The
state is now:
item3->item1->NULL;
unless I a missing something, this is exactly how Dmitriy's algorithm works;
their behavior is the same...
> Furthermore, I highly recommend you to try to write unit tests for
> your code.
[...]
> Ok. It's a real scenario. I agree.
> Szabolcs Ferenczi and you are saying that my algorithm gives out...
> 'not optimal results'.
Your algorithm gives correct result wrt the situation Szabolcs laid out. I
think he was confused a bit when he called the correct behavior a data-race.
> Please describe what is 'optimal results' for
> such scenario. Describe requirements for pop() function, what it must
> return.
It gives the same result as a lock-based stack would:
http://groups.google.com/group/comp.programming.threads/msg/7408cda12a141c69
____________________________________________________________
A lock-based stack with two items:
item2->item1->NULL;
The consumer takes the lock; get preempted by the producer which tries to
take the lock in order to push item3, and blocks; the consumer resumes and
removes item2. The state is now:
item1->NULL;
the consumer unlocks; the producer locks and inserts item3 and unlocks. The
state is now:
item3->item1->NULL;
____________________________________________________________
unless I a missing something, this is exactly how your algorithm works;
their linearization characteristics are exactly the same...
The only difference is that your solution is single-producer/consumer ,and
oh so much faster!!!
;^)
No. Release store means that subsequent stores and loads (but not
to/from the same location and issuer) can overtake that store and that
that store can not overtake preceding stores and loads.
regards,
alexander.
1: Store(&local, 1);
2: membar #LoadStore | #StoreStore
3: Store(&locB, 2);
4: Store(&locC, 3);
I read that as line 1 will always be rendered visible before line 3, and
line 4 can be become visible before lines 1, 2 or 3. When you say "issuer",
that would refer to line 3, right?
I re-wrote your algorithm and found no issues myself. My own little
sanbox/testapp that measures the speed also tests for correctness and
this algo passed it :). I also don't believe it has any defects.
The behavior that is explained above is completely acceptable (to me
anyhow).
Very cool :). FYI (in case you didn't know). You can use the
__fastcall suggestion for MSVC which assigns the first variable to ecx
and the second to edx. Since this is all inline, it has the potential
to be more optimal without any potential to be slower (in theory).
"Store-release" == #LoadStore | #StoreStore executed *before* store,
and affecting all previous stores and loads; and only *one* subsequent
store.
See the full brain-damaging hierarchy of memory barriers here:
http://groups.google.ru/group/comp.programming.threads/msg/a08095e4e4b61155
Dmitriy V'jukov
Ok. I just misunderstood you. Last words of Szabolcs Ferenczi's post
was '- data race!'. And you answered 'I think your right!'.
> > In my opinion algorithm is completely linearizable.
>
> I think I agree with that assertion. I have implemented this thing, and it
> seems to work fine. Have you tried doing a proof on it yet? I think you can
> prove that its linearizable.
I just *know* that it's linearizable. And it's enough for me :)
Dmitriy V'jukov
> Ok. I just misunderstood you. Last words of Szabolcs Ferenczi's post
> was '- data race!'. And you answered 'I think your right!'.
Sorry for any confusions. I did not clarify my position when I agreed with
Szabolcs. However, if you think about it.... Szabolcs basically created the
linearization proof of your algorithm for you...
:^)
>> > In my opinion algorithm is completely linearizable.
>>
>> I think I agree with that assertion. I have implemented this thing, and
>> it
>> seems to work fine. Have you tried doing a proof on it yet? I think you
>> can
>> prove that its linearizable.
> I just *know* that it's linearizable. And it's enough for me :)
Agreed.
Thank you for introducing clarity. I think I understand the
difference. Store with release semantics gives the same guarantee as
store release membar but related to stores and loads to the same
location. Am I right?
It has been a habit of mine to omit __fastcall semantics:
http://msdn2.microsoft.com/en-us/library/k1a8ss06.aspx
(read all...)
?
Basically rendering this more or less useless. NICE. GCC has a
better way of doing this (clearly).
aside from typical MS doc retardation... I really do remember reading
something which said the semantics of __fastcall can be subjected to changes
between various versions of vc++...
Yup!
Dmitriy, do you mind if I stick the code above in a future AppCore update? I
will provide links to this thread in source-code.
No problem. It will be great.
But ... this algorithm is silly and already deprecated by me.
See ultimate version:
http://groups.google.ru/group/comp.programming.threads/browse_frm/thread/7f5f07401d3bbd91
Notice benchmark results and scaling factor.
Dmitriy V'jukov
> No problem. It will be great.
Thank you.
> But ... this algorithm is silly and already deprecated by me.
Well, I think that "silly" is a bit too harsh? :^)
> See ultimate version:
> http://groups.google.ru/group/comp.programming.threads/browse_frm/thread/7f5f07401d3bbd91
> Notice benchmark results and scaling factor.
Ahhh... Well, I like the API change in that users can now have some power
over the a/dellocation of nodes. I noticed that there is a loop in the pop
function... Humm, I need to read the code... I will post follow-ups in that
thread.
> > But ... this algorithm is silly and already deprecated by me.
>
> Well, I think that "silly" is a bit too harsh? :^)
432 cycles vs. 39 cycles (11x improvement)
>
> > See ultimate version:
> >http://groups.google.ru/group/comp.programming.threads/browse_frm/thr...
> > Notice benchmark results and scaling factor.
>
> Ahhh... Well, I like the API change in that users can now have some power
> over the a/dellocation of nodes. I noticed that there is a loop in the pop
> function... Humm, I need to read the code... I will post follow-ups in that
> thread.
Yeah, it's a bit trickier than this one. I will try to provide some
comments on algorithm.
Dmitriy V'jukov
> Can someone notice some data races? Comments and suggestions are
> welcome too.
I can only comment that the atomic-free SPSC LIFO is such a smart data
structure, that you do not need any data race for it in the
implementation to freely get any result whatsoever. It is especially
free to provide any trace result. I guess we are very lucky that there
is such an efficient and free data structure available, which scales
so well. Well, it scales up to one producer and one consumer but that
is quite enough. We should not be perfectionists, should we.
It is a very safe spy resistant data structure as well. A perfectly
implemented SPSC LIFO is resistant to any spying because if any
external observer would ever like to figure it out what kind of data
structure it is, I am afraid, any attempt would fail. That makes it
especially safe in an atomic-free multi-threaded-free program.
Let us suppose that an observer is checking the input and the output
of the atomic-free SPSC LIFO in action. The observer will be totally
confused. Just think about how the observer would speculate when he or
she records this sequence:
IN: 1 2
OUT: 1 2
He or she might erroneously think it is a FIFO. No. As a matter of
fact, it just happened that producer pushed 1 and consumer took 1 then
producer pushed 2 and consumer took 2. They did it concurrently and it
just happened that small timing differences resulted in this sequence.
No. No data race whatsoever.
And then he would check it once more and he could observe:
IN: 1 2
OUT: 2 1
This time small timing differences resulted in another inter-leavings:
p1, p2, c2, c1. No data race at all. But our observer does not give it
up, keeps listening and observes this:
IN: 1 2 3
OUT: 3 2 1
SPSC:p1 p2 p3 c3 c2 c1
IN: 1 2 3
OUT: 2 3 1
SPSC:p1 p2 c2 p3 c3 c1
IN: 1 2 3
OUT: 2 1 3
SPSC:p1 p2 c2 c1 p3 c3
IN: 1 2 3
OUT: 1 3 2
SPSC:p1 c1 p2 p3 c3 c2
IN: 1 2 3
OUT: 1 2 3
SPSC:p1 c1 p2 c2 p3 c3
I am not sure, the observer could ever figure it out that he or she
can never see the sequence 3 1 2 coming out of the data structure. No
problem. The only reasonable conclusion he or she could make is that
the number of items going in and coming out is the same. So, we are on
the safe side indeed. The fact that it is a very fast ATOMIC-FREE SPSC
LIFO is kept in secret from any external observer. As we can see from
the observation pattern, it is a stack property-free atomic-free SPSC
LIFO. Perhaps that is why this property cannot be observed on it.
Thank you very much, it is a great work. I like it. Please keep going
and find some more smart data structure for us but keep them xxx-free
and very, very fast. Behaviour is not so much important. Just be fast
and behaviour-free. I am very curious how many such a valuable and
smart invention can be out there.
Best Regards,
Szabolcs
You must be forgot that you already provide some comments on this:
http://groups.google.ru/group/comp.programming.threads/msg/17da9df4a0988d85
> that the atomic-free SPSC LIFO is such a smart data
> structure, that you do not need any data race for it in the
> implementation to freely get any result whatsoever.
Sh#t! How can I miss it?! I was thinking that values returned by pop()
somehow correlate with values provided to push(). Thank you that you
point it out.
> It is especially
> free to provide any trace result. I guess we are very lucky that there
> is such an efficient and free data structure available, which scales
> so well. Well, it scales up to one producer and one consumer but that
> is quite enough. We should not be perfectionists, should we.
I think we should. It's my guilt. Sorry. I think programmer must
always provide as bloated solution as he possibly can. I will try to
improve. I think I must add some functions for XML handling and CD
recording too.
> It is a very safe spy resistant data structure as well. A perfectly
> implemented SPSC LIFO is resistant to any spying because if any
> external observer would ever like to figure it out what kind of data
> structure it is, I am afraid, any attempt would fail. That makes it
> especially safe in an atomic-free multi-threaded-free program.
OK. In next version I will add support for IObserever/ISubject COM
interfaces. Will it satisfy your needs?
> I am not sure, the observer could ever figure it out that he or she
> can never see the sequence 3 1 2 coming out of the data structure. No
> problem. The only reasonable conclusion he or she could make is that
> the number of items going in and coming out is the same. So, we are on
> the safe side indeed. The fact that it is a very fast ATOMIC-FREE SPSC
> LIFO is kept in secret from any external observer. As we can see from
> the observation pattern, it is a stack property-free atomic-free SPSC
> LIFO. Perhaps that is why this property cannot be observed on it.
Hummm... What you can suggest? I think I can embed .wav file with
description of functioning into stack. I think this will eliminate any
misunderstanding, because then anyone will be able to hear description
and reason correctly about data structure.
> Thank you very much, it is a great work. I like it. Please keep going
> and find some more smart data structure for us but keep them xxx-free
> and very, very fast. Behaviour is not so much important. Just be fast
> and behaviour-free. I am very curious how many such a valuable and
> smart invention can be out there.
I am already improving. In future I will try to post maximally bloated
solutions with all possible overheads included. In the end it's the
right approach to software design. Right?
Btw, in what university you have graduated? I envy your algorithm
analysis skills.
Dmitriy V'jukov
> > Can someone notice some data races? Comments and suggestions are
> > welcome too.
> I can only comment that the atomic-free SPSC LIFO is such a smart data
> structure, that you do not need any data race for it in the
> implementation to freely get any result whatsoever. It is especially
> free to provide any trace result. I guess we are very lucky that there
> is such an efficient and free data structure available, which scales
> so well. Well, it scales up to one producer and one consumer but that
> is quite enough. We should not be perfectionists, should we.
[...]
You seriously think that a SPSC data-structure cannot be worked into a
distributed MPMC design? Please take your trolling elsewhere. Wow.
:^/
> You must be forgot that you already provide some comments on this:
>
> http://groups.google.ru/group/comp.programming.threads/msg/17da9df4a0988d85
Szabolcs did not seem to realize that the real condition he saw could be
generated by a 100% lock-based stack; I pointed that out here:
http://groups.google.com/group/comp.programming.threads/msg/7408cda12a141c69
wrt the scenario at hand, lock-free or not, item2 would be removed.
> > that the atomic-free SPSC LIFO is such a smart data
> > structure, that you do not need any data race for it in the
> > implementation to freely get any result whatsoever.
> Sh#t! How can I miss it?! I was thinking that values returned by pop()
> somehow correlate with values provided to push(). Thank you that you
> point it out.
[...]
I think he is looking for a level of deterministic behavior that requires a
crystal ball. I suspect full-blown globally ordered sequential consistency
does not meet his requirements.
> > Can someone notice some data races? Comments and suggestions are
> > welcome too.
[...]
> This time small timing differences resulted in another inter-leavings:
> p1, p2, c2, c1. No data race at all. But our observer does not give it
> up, keeps listening and observes this:
> IN: 1 2 3
> OUT: 3 2 1
> SPSC:p1 p2 p3 c3 c2 c1
> IN: 1 2 3
> OUT: 2 3 1
> SPSC:p1 p2 c2 p3 c3 c1
> IN: 1 2 3
> OUT: 2 1 3
> SPSC:p1 p2 c2 c1 p3 c3
> IN: 1 2 3
> OUT: 1 3 2
> SPSC:p1 c1 p2 p3 c3 c2
> IN: 1 2 3
> OUT: 1 2 3
> SPSC:p1 c1 p2 c2 p3 c3
[...]
I wonder if you understand that all of the above could be realized on a 100%
lock-based stack? Think about it... I will just do two of the above and let
you finish the rest:
IN: 1 2 3
OUT: 3 2 1
SPSC: \
lock(); p1; unlock() \
lock(); p2; unlock() \
lock(); p3; unlock() \
lock(); c3; unlock() \
lock(); c2; unlock() \
lock(); c1; unlock()
IN: 1 2 3
OUT: 2 3 1
SPSC: \
lock(); p1; unlock() \
lock(); p2; unlock() \
lock(); c2; unlock() \
lock(); p3; unlock() \
lock(); c3; unlock() \
lock(); c1; unlock()
Don't tell me you don't like lock-based algorithms either!
:^0
No that data race I have demonstrated there can only happen with a
buggy implementation. Besides, the author of the implementation of the
non-sense data structure for multi-threaded environment is very proud
that he even does not try to run his code.
Best Regards,
Szabolcs
> I wonder if you understand that all of the above could be realized on a 100%
> lock-based stack?
Oh yes, it is clear to everyone who has a basic understanding of
concurrent programming. I was trying to indicate in a gentle way that
even with a perfectly implemented stack all this can happen in a multi-
threaded environment. Well, until you add features to introduce
something like stack sessions but I am not going to detail that here
any further.
The author of the smart atomic-free spsc stack is proud that "It is
relatively straightforward. But I don't see such algorithm
before." It is not an accident that no one is trying to use a stack in
a concurrent environment the same way it is used in a sequential/
deterministic environment. He can put it even into the Guiness Book of
Records.
Besides, all scenario I have illustrated can even happen with a stack
in a sequential system.
The bottom line is that, although the shared FIFO is used in
concurrent programming, from the fact that a FIFO is a useful shared
data structure it does not follow that you should go and implement a
shared LIFO just because you have the tools at hand to implement it.
Well, I have told you in the other discussion thread that multicore
programming is really hard and you guys keep to demonstrate it. I have
said that those who used to sequential/deterministic programming often
miss the point in concurrent programming.
Best Regards,
Szabolcs
> > I wonder if you understand that all of the above could be realized on a
> > 100%
> > lock-based stack?
> Oh yes, it is clear to everyone who has a basic understanding of
> concurrent programming. I was trying to indicate in a gentle way that
> even with a perfectly implemented stack all this can happen in a multi-
> threaded environment. Well, until you add features to introduce
> something like stack sessions but I am not going to detail that here
> any further.
> The author of the smart atomic-free spsc stack is proud that "It is
> relatively straightforward. But I don't see such algorithm
> before." It is not an accident that no one is trying to use a stack in
> a concurrent environment the same way it is used in a sequential/
> deterministic environment. He can put it even into the Guiness Book of
> Records.
You seriously think that nobody uses a lock-free stack? Tell that to
Microsoft and IBM. Don't forget to tell SUN and all the others...
> Besides, all scenario I have illustrated can even happen with a stack
> in a sequential system.
> The bottom line is that, although the shared FIFO is used in
> concurrent programming, from the fact that a FIFO is a useful shared
> data structure it does not follow that you should go and implement a
> shared LIFO just because you have the tools at hand to implement it.
That does not make any sense at all. What if you need prompt responses to
freshly pushed items? A FIFO will not give you that, however, a LIFO will.
BTW, did you know that Microsoft has a standard API for a lock-free stack?
Did you know that IBM has a lock-free stack algorithm in a Principals of
Operation? I don't think you knew any of that. If you did know about it, why
would you make the false assertion that nobody uses a shared LIFO in
concurrent programming... Humm...
> Well, I have told you in the other discussion thread that multicore
> programming is really hard and you guys keep to demonstrate it.
I am not sure if this is trolling or what...
> I have
> said that those who used to sequential/deterministic programming often
> miss the point in concurrent programming.
Indeed.
> No that data race I have demonstrated there can only happen with a
> buggy implementation.
You did not find a data-race. You highlighted a real condition that can
happen to a lock-based or lock-free stack.
> Besides, the author of the implementation of the
> non-sense data structure for multi-threaded environment is very proud
> that he even does not try to run his code.
Yeah. LIFO is a non-sense data-structure. LOL! Have you ever implemented
message-passing system? LIFO is great for receiving time-critical items. How
could you ensure that an item you push on a FIFO will be processed ASAP? You
can't. However, if you push an item on a LIFO, well its either at the front
of the list, or its very close to it...
Also, LIFO is ideal for a free-list. You generally want the most recently
used item to be reused because there is a higher probability that it will be
"warm" wrt cache. Why do you think LIFO is non-sense in a multi-threaded
environment? AFAICT, you are wrong on multiple fronts.