Re: A question using your MPMC sample of 1024cores

Dmitriy V'jukov

unread,

Oct 18, 2013, 1:12:26 PM10/18/13

to hedgezzz, lock...@googlegroups.com

On Mon, Sep 30, 2013 at 5:54 AM, hedgezzz <hedg...@yahoo.com.tw> wrote:
> Hi Dmitry Vyukov !
>
> First of all , I like to thank you for your excellent 1024cores website , it
> is really an amazing website ,
> a lot of sources deserve studying !!!
>
> I have a question about using your MPMC source , I don't know if I am right
> or not , then I post to stackoverflow :
> http://stackoverflow.com/questions/19041683/bounded-mpmc-queue-with-semaphore
>
>
> All i like to know is ,is it possible that 2 threads enqueue at the same
> time , buffer[1] filled finished before buffer[0] ?
>
> Why I asked this is , I like to add a semaphore in my application , each
> time one thread enqueue data ,
> semaphore value add one , so that there would be one thread waiting for
> semaphore awaken to dequeue ,
> if buffer[1] is filled and semaphore value add one , buffer[0] is not yet ,
> the thread waiting for semaphore awakened
> and it will try to dequeue buffer[0] , not buffer[1] , so I should use
> dequeue in a loop , until buffer[0] filled finished
> then dequeue will break the loop , that is , like the following :
>
>
> ==================================
> Right !!
>
> while(1){
> sem_wait(sem1) ;
> while(1){
> if(mq.dequeue(strx))
> break ;
> }
> //doing something about strx
> }
>
> =====================================
>
> Wrong !!
>
> while(1){
> sem_wait(sem1) ;
> mq.dequeue(strx) ;
> //doing something about strx
> }
> =====================================
>
>
> My question is : it is possible that buffer[1] enqueued finished before
> buffer[0] , so that each time semaphore wait is awakened ,
> I know there must be a enqueue , but I should do dequeue in loop , otherwise
> the data is not ready at that time may happen ,
> and that would cause problems to me !!!!
>
>
> I like to know if I am wrong or right about it ? I am sorry for my english
> , I am living in taiwan ,
> trying to type correct english , but I know still not good enough , hope
> won't make many troubles to you !!

+lock-free

Hi Mars,

As far as I understand you mean this queue:
http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue

Yes, it's possible that enqueue into buffer[1] finishes before enqueue
into buffer[0].

However, the following solution is also incorrect.

while(1){
sem_wait(sem1) ;
while(1){
if(mq.dequeue(strx))
break ;
}
//doing something about strx
}

Consider that thread 1 starts enqueue into buffer[0], but not finishes it.
Thread 2 enqueues element into buffer[1], and increments the sem.
Then Thread 3 starts dequeue operation, decrements the sem, but
queue.dequeue() fails, because buffer[0] is not filled yet. So thread
3 blocks on sem again.
Now thread 1 finished enqueue into buffer[0], and increments the sem.
Now thread 3 decrements the sem, and dequeues buffer[0].
Now thread 4 starts dequeue. Deadlock happens -- sem value is 0, but
the queue contains 1 element (buffer[1]).

It's possible to solve it with pthread_cond_broadcast, but it (as well
as the semaphore solution) destroys performance properties of the
non-blocking queue.

What you really want in this situation is eventcount primitive.
Check out this thread:
http://software.intel.com/en-us/forums/topic/299245
and
https://www.google.ru/?q=comp.programming.thread+eventcount

--
Dmitry Vyukov

All about lockfree/waitfree algorithms, multicore, scalability,
parallel computing and related topics:
http://www.1024cores.net

Oleg Zabluda

unread,

Feb 21, 2014, 6:42:28 PM2/21/14

to lock...@googlegroups.com, hedgezzz

Dmitry, previously, you said that MPMC queue is an anti-pattern:

https://groups.google.com/forum/#!msg/comp.programming.threads/AYgyIH-pVz4/DTrviZjEUq0J

And yet it keeps coming up in this newsgroup (with comments by you) and is present on 1024cores. Why is that?

Oleg.

Dmitriy V'jukov

unread,

Feb 22, 2014, 5:19:19 AM2/22/14

to lock...@googlegroups.com, hedgezzz

Hi,

Well, it's not completely black-white. There are legitimate use cases
for MPMC. One such case was discussed here recently, basically you
have a moderate contention case and MPMC queue allows you to
significantly simplify system structure. Another case is e.g. Go
channels, where semantics require you to provide an MPMC queue, but
you can do your best to make it fast and scalable.

However, all scalable fine-grained schedulers use SCMP deques/MPSC
queues sacrificing system complexity.

What I meant is that you should not start with dropping in an MPMC
queue into design, and then trying to solve scalability problem by
optimizing the queue implementation. And that is what people
frequently do expecting that "lock-free magic" will magically solve
all problems.

Mars Chen

unread,

Feb 24, 2014, 1:16:02 AM2/24/14

to lock...@googlegroups.com, hedgezzz

Hi Dmitry ,

It is really helpful for your explanations , I have figure out what I posted in stackoverflow is wrong ,

using your MPMC with semaphore :

http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue

I should do the following :

while(1)

{

sem_wait(sem1);

if(mq.dequeue(strx))

{

sem_post(sem1);

continue ;

}

//do something about strx

} //while

so that if buffer[1] is filled before buffer[0] and increase semaphore , then while thread3 dequeue failed ,

it will increase the semaphore back and continue the while , so semaphore now is correct number ,

and the next time thread3 or thread4 like to dequeue , still from buffer[0] , the correct one !!!!

I will try to study even counter you mentioned , thanks for everything , it is really kind of you !!

Dmitry Vyukov於 2013年10月19日星期六UTC+8上午1時12分26秒寫道：

Chris M. Thomasson

unread,

Feb 28, 2014, 4:57:24 PM2/28/14

to lock...@googlegroups.com, hedgezzz

[...]

I accidentally replied to author. Sorry about that! ;^(

Anyway, here are the relevant links:

https://groups.google.com/d/msg/comp.programming.threads/qoxirQbbs4A/k1yTOXvXavQJ

https://groups.google.com/d/topic/comp.programming.threads/qoxirQbbs4A/discussion

FWIW, I created the eventcount impl contained within the links above, and know it happens to work when you apply the proper memory barriers. WRT the data contained in the links, Joe Seigh does a great job of listing the memory barriers for the algorithm...

:^)

Chris M. Thomasson

unread,

Feb 28, 2014, 9:56:31 PM2/28/14

to lock...@googlegroups.com, hedgezzz

On Friday, February 28, 2014 1:57:24 PM UTC-8, Chris M. Thomasson wrote:

[...]
I accidentally replied to author. Sorry about that! ;^(

[...]

FWIW, here is a nicely distributed eventcount from Dmitry Vyukov:

http://software.intel.com/en-us/forums/topic/299245

(read all!)

http://software.intel.com/en-us/forums/topic/295834

(a non-distributed eventcount by me)

Chris M. Thomasson

unread,

Feb 28, 2014, 9:59:49 PM2/28/14

to lock...@googlegroups.com, hedgezzz

On Friday, February 28, 2014 6:56:31 PM UTC-8, Chris M. Thomasson wrote:

On Friday, February 28, 2014 1:57:24 PM UTC-8, Chris M. Thomasson wrote:
> [...]
> I accidentally replied to author. Sorry about that! ;^(
> [...]
>

> [...]

> http://software.intel.com/en-us/forums/topic/295834
> (a non-distributed eventcount by me)

I need to point out that the code in C++ is correct, however, the C# link has a race-condition in the following function:

public void signal() {
long cmp = System.Threading.Thread.VolatileRead(ref m_count);
System.Threading.Thread.MemoryBarrier();
prv_signal(cmp, false);
}