appcore::ac_queue_spsc

61 views
Skip to first unread message

Dmitriy Vyukov

unread,
Nov 23, 2007, 1:23:52 PM11/23/07
to
Can anyone (especially Chris Thomasson) explain how
appcore::ac_queue_spsc work?

High-level pseudo-code for push() and pop() operations looks like
this:

ac_queue_spsc_push():
1. store relase (store pointer to node in queue,
ac_i686_queue_spsc_push() function)
2. naked load (load of eventcount, ac_eventcount_algo1_signal()
function)
3. check (check of eventcount - decide whether we need to signal or
not, ac_eventcount_algo1_signal() function)

ac_queue_spsc_timedpop():
1. load acquire (try to load pointer to node, ac_i686_queue_spsc_pop()
function)
2. if fail, cas acquire (set eventcount, ac_eventcount_algo1_get()
function)
3. load acquire (try to load pointer to node one more time,
ac_i686_queue_spsc_pop() function)
4. if fail, block on eventcount (wait for signal,
ac_eventcount_algo1_timedwait() function)

IMVHO here can be race conditions between push and pop operations.
I.e. push operation can falsely decide to *not* signal eventcount,
when it have to signal. The reason for this - there is operations in
push and pop which form global order.

ac_stack_mpmc doesn't have such problem because push operation include
operation with global ordering (namely 'lock cmpxchg').


Dmitriy V'jukov

Chris Thomasson

unread,
Nov 23, 2007, 10:12:39 PM11/23/07
to
"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:0c7615de-797a-4700...@e4g2000hsg.googlegroups.com...

> Can anyone (especially Chris Thomasson) explain how
> appcore::ac_queue_spsc work?
>
> High-level pseudo-code for push() and pop() operations looks like
> this:
>
> ac_queue_spsc_push():
> 1. store relase (store pointer to node in queue,
> ac_i686_queue_spsc_push() function)
> 2. naked load (load of eventcount, ac_eventcount_algo1_signal()
> function)
> 3. check (check of eventcount - decide whether we need to signal or
> not, ac_eventcount_algo1_signal() function)

I need to change the memory barriers in the eventcount code to the ones that
Joe outlined here:

http://groups.google.com/group/comp.programming.threads/msg/8e7a0379b55557c0


I have to add another function to the eventcount api that puts a store/load
barrier before it does the naked load of the eventcount: something like
'ac_eventcount_algo1_signal_fence'. Otherwise, the load can rise above the
store that added the new node to the queue... I have this function
implemented in the vZOOM library, but not it AppCore. I just never got
around to adding it in there. I will add it in when I finish moving AppCore
over to SourceForge:

http://sourceforge.net/projects/appcore

[...]


Is that what you are concerned about?

Dmitriy Vyukov

unread,
Nov 24, 2007, 7:10:25 AM11/24/07
to
On 24 нояб, 06:12, "Chris Thomasson" <cris...@comcast.net> wrote:

> Is that what you are concerned about?


So. In container that have full membar in push operation I can use
ac_eventcount_algo1_signal(). And in container that have only store-
store membar in push operation I have to use
ac_eventcount_algo1_signal_fence().
Right?


Dmitriy V'jukov

Dmitriy Vyukov

unread,
Nov 24, 2007, 8:00:06 AM11/24/07
to
On 24 нояб, 15:10, Dmitriy Vyukov <dvyu...@gmail.com> wrote:
> So. In container that have full membar in push operation I can use
> ac_eventcount_algo1_signal(). And in container that have only store-
> store membar in push operation I have to use
> ac_eventcount_algo1_signal_fence().
> Right?


Interesting. Appcore here for several years, and cited on several
sites (including Intel), and one of few lock-free libraries out there.
And it seems that nobody not even trying to think and validate... On
nobody interested in such things, or nobody don't try to use such
things...
Now I understand IBM and Paul E. McKenney, and why they present RCU to
Linux for free... :)


Dmitriy V'jukov

Joe Seigh

unread,
Nov 24, 2007, 9:07:55 AM11/24/07
to

More than free. Linux kernel developers who work for IBM did the implementation.
That's a case of the exception to the rule. The current business environment
doesn't encourage long term experimental efforts. Any investment is going into
platform exclusive bloated solutions where it won't be clear there is an
actual advantage. Which is good for some markets where it's important that
there isn't any distinct indicators of success since that also implies distinct
indicators of failure. Not the optimal environment for the mediocre.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Chris Thomasson

unread,
Nov 24, 2007, 1:19:10 PM11/24/07
to
"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:8aab88b5-8c95-4921...@r31g2000hsg.googlegroups.com...

Right. I don't know why I never got around to doing it especially since its
a clear bug that simply needs to be fixed. Well, yeah, I guess I do: I was
just way to busy working on the VZ stuff and got lazy wrt AppCore. However,
I am going to see if I can resurrect it in SourceForge under GPL.

Chris Thomasson

unread,
Nov 24, 2007, 1:26:32 PM11/24/07
to
"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:095d62dc-38af-4a85...@p69g2000hsa.googlegroups.com...

On 24 нояб, 15:10, Dmitriy Vyukov <dvyu...@gmail.com> wrote:
> So. In container that have full membar in push operation I can use
> ac_eventcount_algo1_signal(). And in container that have only store-
> store membar in push operation I have to use
> ac_eventcount_algo1_signal_fence().
> Right?


> Interesting. Appcore here for several years, and cited on several
> sites (including Intel), and one of few lock-free libraries out there.
> And it seems that nobody not even trying to think and validate...

Even worse. I knew that the problem was there wrt the ac_queue_spsc API, and
was too lazy to fix it. I am very sorry about that non-sense.


> On nobody interested in such things, or nobody don't try to use such
> things...

I think the answer is both; I have only managed to license vZOOM to a mere
handful of entities. Well, I am not sure why anybody would want to actually
use a library that's in a stale pre-alpha status; AppCore is meant for
experimental and academic scenarios only. But, still, that's no excuse for
not adding that extra eventcount function!

;^(...


Also, I think that most programmers simply do not feel all that comfortable
looking at the implementation of synchronization primitives in assembly
language. Humm...

> Now I understand IBM and Paul E. McKenney, and why they present RCU to
> Linux for free... :)

No shi%!!

:^0

Chris Thomasson

unread,
Nov 24, 2007, 1:34:07 PM11/24/07
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:%eW1j.21225$rg1.3709@trndny04...

Humm... Well, since I have only managed to license my stuff to a very little
few, I was wondering if I should just bite the bullet and dual-license the
fu%cking thing with GPL and the commercial license I have been using;
similar to what Hoard does... I worry about GPL, because it doesn't stop
somebody from looking at the source-code; reverse engineer it; silently
create their own slightly modified version, and conveniently forget to tell
anybody about it...

Humm, well, what would you advise I do Joe?

Shi%t!!!!

Joe Seigh

unread,
Nov 24, 2007, 6:10:59 PM11/24/07
to
Chris Thomasson wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>>>
>>
>> More than free. Linux kernel developers who work for IBM did the
>> implementation.
>> That's a case of the exception to the rule. The current business
>> environment
>> doesn't encourage long term experimental efforts. Any investment is
>> going into
>> platform exclusive bloated solutions where it won't be clear there is an
>> actual advantage. Which is good for some markets where it's important
>> that
>> there isn't any distinct indicators of success since that also implies
>> distinct
>> indicators of failure. Not the optimal environment for the mediocre.
>
> Humm... Well, since I have only managed to license my stuff to a very
> little few, I was wondering if I should just bite the bullet and
> dual-license the fu%cking thing with GPL and the commercial license I
> have been using; similar to what Hoard does... I worry about GPL,
> because it doesn't stop somebody from looking at the source-code;
> reverse engineer it; silently create their own slightly modified
> version, and conveniently forget to tell anybody about it...
>
> Humm, well, what would you advise I do Joe?
>


If you distribute source, no license can prevent that. Patents can
protect against that as long as the algorithm has some amount of
detectability, i.e. you can distinguish it from other lock-free
implementations.

You might have to have a little patience. We're going to have to get
a lot more cores out there before there's any serious interest.

Chris Thomasson

unread,
Nov 25, 2007, 12:24:31 PM11/25/07
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:0cadncBsNosB9NXa...@comcast.com...

> "Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
> news:8aab88b5-8c95-4921...@r31g2000hsg.googlegroups.com...
> On 24 нояб, 06:12, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Is that what you are concerned about?
>
>
>> So. In container that have full membar in push operation I can use
>> ac_eventcount_algo1_signal(). And in container that have only store-
>> store membar in push operation I have to use
>> ac_eventcount_algo1_signal_fence().
>> Right?
>
> Right.
[...]

You can try to amortize the expense of the #StoreLoad by pushing more than
one item on the queue before you signal... E.g.:
________________________________________________________________
void producer() {
for(;;) {
// gather up a "batch" of work items

for_each_item_in_batch() {
// push item onto spsc queue
}

// signal eventcount
}
}
________________________________________________________________

You can do this in AppCore (after I add the eventcount signal w/ fence
function) by using the ac_cpu_queue_spsc_XXX and ac_eventcount_algo1_XXX
API's as seperate entiries, as apposed to using the ac_queue_spsc_XXX API
which wraps up the queue with an eventcount. Something like:
________________________________________________________________
static ac_cpu_queue_spsc_t queue;
static ac_eventcount_algo1_t waitset;


void producer() {
ac_thread_t* const _tls = ac_thread_self();

for(;;) {
build_batch_of_nodes() {
void* const state = Get_Work_State(...);

// obtain a node and initalize it with the state
ac_cpu_node_t* const node =
ac_thread_cpu_node_cache_pop(_tls, state);

ac_cpu_queue_spsc_push(&queue, node);
}


// signal the consumer
ac_eventcount_algo1_signal_fence(&waitset);
}
}

void consumer() {
ac_thread_t* const _tls = ac_thread_self();

for(;;) {
ac_cpu_node_t *node;


// try to obtain some work; wait if none avaliable
while (! (node = ac_cpu_queue_spsc_pop(&queue)) {

ac_eventcount_cmp_t const cmp = ac_eventcount_algo1_get(&waitset);

if ((node = ac_cpu_queue_spsc_pop(&queue)) { break; }

ac_eventcount_algo1_timedwait(&waitset, _tls, cmp, AC_INFINITE);
}


// do the work...
Process_Work(ac_cpu_node_get_state(node));


// return node to cache
ac_thread_cpu_node_cache_push(_tls, node);
}
}
________________________________________________________________

Dmitriy Vyukov

unread,
Nov 26, 2007, 4:34:47 AM11/26/07
to
On Nov 24, 9:19 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Dmitriy Vyukov" <dvyu...@gmail.com> wrote in message
>
> news:8aab88b5-8c95-4921...@r31g2000hsg.googlegroups.com...

> On 24 ÎÏÑÂ, 06:12, "Chris Thomasson" <cris...@comcast.net> wrote:
>
> > Is that what you are concerned about?
> > So. In container that have full membar in push operation I can use
> > ac_eventcount_algo1_signal(). And in container that have only store-
> > store membar in push operation I have to use
> > ac_eventcount_algo1_signal_fence().
> > Right?
>
> Right. I don't know why I never got around to doing it especially since its
> a clear bug that simply needs to be fixed. Well, yeah, I guess I do: I was
> just way to busy working on the VZ stuff and got lazy wrt AppCore. However,
> I am going to see if I can resurrect it in SourceForge under GPL.


Personally for me it will be sufficient if you just correct the
source... under whatever license :)
... and since I am already understand this moment, you can even not
correct the source. And it seems that next embarrassed person will be
appeared only in a few years... :)


Dmitriy V'jukov

Dmitriy Vyukov

unread,
Nov 26, 2007, 5:08:54 AM11/26/07
to
On Nov 25, 8:24 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> >> Right?
>
> > Right.


>
> You can try to amortize the expense of the #StoreLoad by pushing more than
> one item on the queue before you signal... E.g.:


Yes. This is the only rational solution. I can bet that you have such
logic in vZOOM ;)
And if you have fully- (partially-) connected by spsc-queues event-
driven system, you can amortize even further:

void sync()
{
store_load_membar();
foreach_spsc_queue()
{
q.signal_*without*_fence();
}
}

By event-driven I mean that you have means to execute sync()
periodically.

>
> void producer() {
> ac_thread_t* const _tls = ac_thread_self();
>
> for(;;) {
> build_batch_of_nodes() {
> void* const state = Get_Work_State(...);
>
> // obtain a node and initalize it with the state
> ac_cpu_node_t* const node =
> ac_thread_cpu_node_cache_pop(_tls, state);
>
> ac_cpu_queue_spsc_push(&queue, node);
> }
>
> // signal the consumer
> ac_eventcount_algo1_signal_fence(&waitset);
> }
> }


And every single ac_cpu_queue_spsc_push() operation can be followed by
ac_eventcount_algo1_signal_*without*_fence() (if batch forming takes
some time).


Dmitriy V'jukov

Dmitriy Vyukov

unread,
Nov 26, 2007, 5:36:28 AM11/26/07
to
On Nov 24, 9:26 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> I think the answer is both; I have only managed to license vZOOM to a mere
> handful of entities.


You can appeal to particular communities in the latest fashion by
creating
ZOOM++
ZOOM.NET
JZOOM
WebZOOM
ZOOM-XML
PyZOOM
VisualZOOM
and ZOOM On Rails
:)))


> Well, I am not sure why anybody would want to actually
> use a library that's in a stale pre-alpha status

It is still better than 0.0.0.0 stable version of non-existent
library :)))

> Also, I think that most programmers simply do not feel all that comfortable
> looking at the implementation of synchronization primitives in assembly
> language. Humm...

Maybe port them to perl or basic...


Dmitriy V'jukov

Chris Thomasson

unread,
Nov 26, 2007, 12:38:53 PM11/26/07
to
"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:872ac8fb-db04-478c...@d27g2000prf.googlegroups.com...

> On Nov 24, 9:26 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> I think the answer is both; I have only managed to license vZOOM to a
>> mere
>> handful of entities.
>
>
> You can appeal to particular communities in the latest fashion by
> creating
> ZOOM++
> ZOOM.NET
> JZOOM
> WebZOOM
> ZOOM-XML
> PyZOOM
> VisualZOOM
> and ZOOM On Rails
> :)))
>
>
>> Well, I am not sure why anybody would want to actually
>> use a library that's in a stale pre-alpha status
>
> It is still better than 0.0.0.0 stable version of non-existent
> library :)))

;^)

>
>> Also, I think that most programmers simply do not feel all that
>> comfortable
>> looking at the implementation of synchronization primitives in assembly
>> language. Humm...
>
> Maybe port them to perl or basic...

Humm. I was thinking about porting them to Delphi or something... Although,
I have not programmed Pascal for a long while now. I have the compiler, and
it seems as if I can bolt AppCore onto a Delphi class library or something.
Humm... Well, yeah. I wonder if I should create a .NET/Java interface as
well. Humm, well, I think that probably will expose the primitives to a
wider range of programmers indeed...

Reply all
Reply to author
Forward
0 new messages