On Aug 18 2009, 7:25 pm, Dmitriy Vyukov <
dvyu...@gmail.com> wrote:
> The main design goal was to achieve maximum scalability under high
> load. The queue is intended to transfer pointers. The underlying data
> structure is an linked-list of fixed-size arrays. The algorithm is
> lock-free, but mostly wait-free. Requires double-word CAS.
>
> Following benchmark was used for testing:
>
> thread()
> {
> for (int i = 0; i != iter_count; i += 1)
> {
> for (int j = 0; j != batch_size; j += 1)
> queue.enqueue((void*)1);
> for (int j = 0; j != batch_size; j += 1)
> queue.dequeue();
> }
>
> }
>
> number_of_threads = 16
> batch_size = 64
>
> Hardware is Intel Core2Quad Q6600.
> The test was conducted with SetProcessAffinityMask(GetCurrentProcess
> (), affinity); where affinity = 1, 3, 5, 15.
> Mean number of cycles per operation (enqueue/dequeue) was measured.
>
> Here is the results:
> affinity=1, cycles/op = 32
> affinity=3, cycles/op = 65
> affinity=5, cycles/op = 120
> affinity=15, cycles/op = 265
>
> For a comparison I benchmarked tbb::concurrent_bounded_queue<int*>
> from Intel Threading Building Blocks 2.2, and here is the results:
> affinity=1, cycles/op = 235 (7.3 slowdown)
> affinity=3, cycles/op = 1071 (16.7 slowdown)
> affinity=5, cycles/op = 2364 (19.7 slowdown)
> affinity=15, cycles/op = 6085 (22.9 slowdown)
Hi Mr. Dmitriy,
This method used in the queue can work on a 64bit system?
Seems that your implementation assume a pointer is 4 byte.
Thanks.
-Yuan