Re: Yet Another MPMC Queue

54 views
Skip to first unread message

Dmitriy Vyukov

unread,
Jan 8, 2010, 6:00:31 AM1/8/10
to Scalable Synchronization Algorithms
On Aug 18 2009, 7:25 pm, Dmitriy Vyukov <dvyu...@gmail.com> wrote:
> The main design goal was to achieve maximum scalability under high
> load. The queue is intended to transfer pointers. The underlying data
> structure is an linked-list of fixed-size arrays. The algorithm is
> lock-free, but mostly wait-free. Requires double-word CAS.
>
> Following benchmark was used for testing:
>
> thread()
> {
> for (int i = 0; i != iter_count; i += 1)
> {
> for (int j = 0; j != batch_size; j += 1)
> queue.enqueue((void*)1);
> for (int j = 0; j != batch_size; j += 1)
> queue.dequeue();
> }
>
> }
>
> number_of_threads = 16
> batch_size = 64
>
> Hardware is Intel Core2Quad Q6600.
> The test was conducted with SetProcessAffinityMask(GetCurrentProcess
> (), affinity); where affinity = 1, 3, 5, 15.
> Mean number of cycles per operation (enqueue/dequeue) was measured.
>
> Here is the results:
> affinity=1, cycles/op = 32
> affinity=3, cycles/op = 65
> affinity=5, cycles/op = 120
> affinity=15, cycles/op = 265
>
> For a comparison I benchmarked tbb::concurrent_bounded_queue<int*>
> from Intel Threading Building Blocks 2.2, and here is the results:
> affinity=1, cycles/op = 235 (7.3 slowdown)
> affinity=3, cycles/op = 1071 (16.7 slowdown)
> affinity=5, cycles/op = 2364 (19.7 slowdown)
> affinity=15, cycles/op = 6085 (22.9 slowdown)


Hi Mr. Dmitriy,

This method used in the queue can work on a 64bit system?
Seems that your implementation assume a pointer is 4 byte.

Thanks.
-Yuan

Dmitriy Vyukov

unread,
Jan 8, 2010, 6:08:27 AM1/8/10
to Scalable Synchronization Algorithms
> Hi Mr. Dmitriy,
>
>  This method used in the queue can work on a 64bit system?
>  Seems that your implementation assume a pointer is 4 byte.
>
>  Thanks.
> -Yuan


Hi Yuan,

As far as I remember, it must work on 64-bit systems as well. It
requires double-word atomic operations, so it will require 128-bit
atomic operations on a 64-bit system. That's the only possible show-
stopper.

Thank you.
--
Dmitriy V'jukov

huapeng

unread,
Jan 10, 2010, 5:54:36 PM1/10/10
to Scalable Synchronization Algorithms
Thanks for your answer.

Would you please tell me how to make 128-bit atomic operations?

Dmitriy Vyukov

unread,
Jan 11, 2010, 2:02:50 AM1/11/10
to Scalable Synchronization Algorithms
On Jan 11, 1:54 am, huapeng <huapengy...@gmail.com> wrote:
> Thanks for your answer.
>
> Would you please tell me how to make 128-bit atomic operations?

It depends on hardware/compiler.
If you use MSVC then you can use _InterlockedCompareExchange128().
If you use x86 platform and gcc then you can use embed assembly and
CMPXCHG16B instruction.

--
Dmitriy V'jukov

Reply all
Reply to author
Forward
0 new messages