Communication between Threads, without blocking

Volkan Gueler

unread,

Mar 12, 2004, 5:01:59 AM3/12/04

to

I want to use 2 thread, which one of them should be running with
realtime priority. I'am also want to share data between this threads. To
let them both to communicate.

Can I do this, without to block my realtime thread with mutex or
condition variables ?

I'm using Linux with NPTL as threalibrary.

I've read something about barriers, is this a good starting point?

cheers
volkan

SenderX

unread,

Mar 12, 2004, 6:16:28 AM3/12/04

to

> Can I do this, without to block my realtime thread with mutex or
> condition variables ?

If you want real fast lock-free communication on linux, which is perfect for
real-time, you are going to code your own or use another implementations
sync primitives.

Ross Bencina

unread,

Mar 16, 2004, 1:06:21 PM3/16/04

to

"Volkan Gueler" <vol...@gmx.de> wrote

> I want to use 2 thread, which one of them should be running with
> realtime priority. I'am also want to share data between this threads. To
> let them both to communicate.
>
> Can I do this, without to block my realtime thread with mutex or

You can do it if you use lock-free data structures. Depending on the nature
of the communication you may be able to use a lock-free single-reader
single-writer FIFO which is relatively easy to implement, but as you say
needs memory barriers for operation on multiprocessor machines.

There are more advance lock free structures, and a few libaries available
that implement them. I'm not an expert in this field, but I share your
interest in this kind of problem. I've started a survey page on related
issues if you're interested:
http://www.audiomulch.com/~rossb/code/lockfree/

Best wishes

Ross.

Volkan Gueler

unread,

Mar 17, 2004, 4:23:17 AM3/17/04

to

> You can do it if you use lock-free data structures. Depending on the nature
> of the communication you may be able to use a lock-free single-reader
> single-writer FIFO which is relatively easy to implement, but as you say
> needs memory barriers for operation on multiprocessor machines.

With FIFO's, how is it guaranteed that my reader is not reading the
current writing data's.

I want to read in my real-time thread data out of my I/O Hardware and
pass it to a thread with minor priority.

> There are more advance lock free structures, and a few libaries available
> that implement them. I'm not an expert in this field, but I share your
> interest in this kind of problem. I've started a survey page on related
> issues if you're interested:
> http://www.audiomulch.com/~rossb/code/lockfree/

These libs and implementation are related to multi-processor machines,
are they ?

Eric Sosman

unread,

Mar 17, 2004, 11:11:13 AM3/17/04

to

Ross Bencina wrote:
>
> "Volkan Gueler" <vol...@gmx.de> wrote
> > I want to use 2 thread, which one of them should be running with
> > realtime priority. I'am also want to share data between this threads. To
> > let them both to communicate.
> >
> > Can I do this, without to block my realtime thread with mutex or
>
> You can do it if you use lock-free data structures. Depending on the nature
> of the communication you may be able to use a lock-free single-reader
> single-writer FIFO which is relatively easy to implement, but as you say
> needs memory barriers for operation on multiprocessor machines.

The question asks for a non-*b*locking solution, not a
non-*l*ocking solution.

Every synchronization method, whether lockless or not,
has the potential to stall the progress of a thread. The
nature of the stall can differ, of course, and different
kinds of stall have different efficiency trade-offs. Still,
a thread that's retrying a lockless operation that encountered
a synchronization conflict is making no more forward progress
than a thread that's asleep waiting for a lock to be released.

Lockless != Blockless. The very notion of synchronization
*requires* that a thread can be prevented from proceeding
past a synchronization point until some condition is met.

--
Eric....@sun.com

SenderX

unread,

Mar 17, 2004, 1:12:40 PM3/17/04

to

> Lockless != Blockless.

How can a lock-free algo block a thread in the kernel?

> Still,
> a thread that's retrying a lockless operation that encountered
> a synchronization conflict is making no more forward progress
> than a thread that's asleep waiting for a lock to be released.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Wrong.

The collection as a whole is making constant forward progress.

I have said this may times:

If a thread fails its CAS, that means that another thread has "completely,
100%" modified and committed its update to the shared collection. This is
100% IMPOSSIBLE with lock-based anything, think about it:

Lock-Free
------------------

static SharedCollection;

// update in one atomic op
while ( ! CAS( &SharedCollection, ... ) )
{
// We know that the thread that caused us to fail is 100% complete with
its update

// We know that the algo has made progress as a whole, even though we
failed! :)

// We know that there is no reason to wait here, because there is nobody
to wait for!!!!!!!!!!!!!!
}

Lock-Based
------------------

static SharedCollection;
static Mutex;

// lock
while ( ! CAS( &Mutex, ... ) )
{
// We know that a thread caused us to fail, but we don't know if its
finished!!!

// We have to wait, or spin... Yuck!

EnterKernelWaitset(...);
}

// Modify shared data

// unlock
CAS( &Mutex, ... );

SignalKernelWaitset(...);

You understand?

In a lock-based, BLOCKING, operation contention with a waitset goes into
kernel. This is so expensive. You could hit a page-fault or a context-switch
inside the critical-section protected by the lock! That would be horrible.

Also, lock-free algos are safe in signal handlers, and other areas where you
can't hold a lock or block.

Eric Sosman

unread,

Mar 17, 2004, 2:03:01 PM3/17/04

to

SenderX wrote:
>
> > Lockless != Blockless.
>
> How can a lock-free algo block a thread in the kernel?
>
> > Still,
> > a thread that's retrying a lockless operation that encountered
> > a synchronization conflict is making no more forward progress
> > than a thread that's asleep waiting for a lock to be released.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Wrong.
>
> The collection as a whole is making constant forward progress.

... which is of no comfort to the original poster. Allow
me to re-quote the original question, which you snipped:

>>I want to use 2 thread, which one of them should be running with
>>realtime priority. I'am also want to share data between this threads. To
>>let them both to communicate.
>>
>>Can I do this, without to block my realtime thread with mutex or

>>condition variables ?

He doesn't care whether "the collection as a whole" is
making progress; he's looking for something that will allow
one designated thread ("my realtime thread") to make unimpeded
progress no matter what the other thread is doing. I contend
that no synchronization scheme, lock-based or lock-free, can
satisfy this requirement.

> I have said this may times:

I think you mean "many," and I think that even "many"
understates the facts.

> [umpty-thousandth demonstration that lock-free algorithms
> are the universal panacea snipped]

--
Eric....@sun.com

Ross Bencina

unread,

Mar 24, 2004, 6:03:35 AM3/24/04

to

Eric Sosman wrote:
> The question asks for a non-*b*locking solution, not a
> non-*l*ocking solution.
>
> Every synchronization method, whether lockless or not,
> has the potential to stall the progress of a thread.

This is simply not true. Perhaps you have not heard of wait-free algorithms?
Wait-free algorithms guarantee that every thread makes progress. Admittedly
lock-free != wait-free so your correction is at least partially relevant.

> The nature of the stall can differ, of course, and different
> kinds of stall have different efficiency trade-offs. Still,
> a thread that's retrying a lockless operation that encountered
> a synchronization conflict is making no more forward progress
> than a thread that's asleep waiting for a lock to be released.

Wait-free algorithms often use helping techniques to get around this.

> Lockless != Blockless.

Sometimes.

> The very notion of synchronization
> *requires* that a thread can be prevented from proceeding
> past a synchronization point until some condition is met.

I'm not sure this is relevant. The original poster asked for a method of
_communicating_ between threads. FIFOs based on atomic operations fulfill
this goal without requiring the kind of synchronisation point you appear to
be referring to (or, more correctly, the synchronisation occurs at a single
atomic position).

Best wishes

Ross.

Ross Bencina

unread,

Mar 24, 2004, 6:07:28 AM3/24/04

to

Volkan Gueler wrote:
> With FIFO's, how is it guaranteed that my reader is not reading the
> current writing data's.

The current writing data is not visible to the reader until the writer
updates the write pointer.

> I want to read in my real-time thread data out of my I/O Hardware and
> pass it to a thread with minor priority.

I use these single-writer atomic FIFOs for a similar purpose -- passing data
from a real-time thread to a low-priority GUI thread.

> > There are more advance lock free structures, and a few libaries
available
> > that implement them. I'm not an expert in this field, but I share your
> > interest in this kind of problem. I've started a survey page on related
> > issues if you're interested:
> > http://www.audiomulch.com/~rossb/code/lockfree/
>
> These libs and implementation are related to multi-processor machines,
> are they ?

Yes, they are most commonly used for multi-processor systems, but they are
still useful on uniprocessors if you want to avoid locks.

Ross.

SenderX

unread,

Mar 24, 2004, 6:31:11 AM3/24/04

to

> Volkan Gueler wrote:
> > With FIFO's, how is it guaranteed that my reader is not reading the
> > current writing data's.
>
> The current writing data is not visible to the reader until the writer
> updates the write pointer.

Absolutely correct!

VERY simple acquire/release memory visibility semantics come into play here.

Ross Bencina

unread,

Mar 24, 2004, 6:18:25 AM3/24/04

to

Eric Sosman write:
> He doesn't care whether "the collection as a whole" is
> making progress; he's looking for something that will allow
> one designated thread ("my realtime thread") to make unimpeded
> progress no matter what the other thread is doing. I contend
> that no synchronization scheme, lock-based or lock-free, can
> satisfy this requirement.

If the original poster is only concerned with the single read, single writer
case then I contend that the following ring buffer fulfills the requirements
so long as it is accepted that data must be thrown away if the buffer
overflows:

http://www.geocities.com/shuki_duv/asynch_queue.h.txt

There is no way that the reader can force the writer to wait (or vis-versa).
As was discussed here recently, that implementation may need memory barriers
to be correct on an SMP system, or when accessing i/o memory.

btw. If you care to google for "wait free queue" you will find that there is
such a thing as a generalised non-blocking multiple reader, multiple writer
solution to this problem when hard-real-time constraints must be met. I
don't believe that these are necessarily the same as the lock free
constructions that SenderX is using, although they are related. Although I
have read one study which showed that in practice Lock Free algorithms
perform better than Wait Free algorithms for some real-time loads.

Ross.

SenderX

unread,

Mar 24, 2004, 6:47:56 AM3/24/04

to

"Ross Bencina" <ro...@audiomulch.com> wrote in message
news:40616afb$1...@news.iprimus.com.au...

> Eric Sosman wrote:
> > The question asks for a non-*b*locking solution, not a
> > non-*l*ocking solution.
> >
> > Every synchronization method, whether lockless or not,
> > has the potential to stall the progress of a thread.
>

There is a link on your site that points to this:

http://www.musicdsp.org/files/ATOMIC.H

The code comments, "actually" DARES to mention, an atomic
pointer!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

WOW!!!

There is ONLY 1 """""Simple and Easy to Read""""" "normal"-CAS
compatible atomic pointer out there, in "current existence".........

Check this out Ross:

(C++):::

http://groups.google.com/groups?selm=3FDE3890.79C24B88%40xemaps.com

(Win32 C /w Special ABA counter added by me ):::

http://appcore.home.comcast.net/src/AtomicPtr.c

Enjoy "real" lock-free pointers!

Eric Sosman

unread,

Mar 24, 2004, 11:26:13 AM3/24/04

to

Ross Bencina wrote:
>
> Eric Sosman wrote:
> > The question asks for a non-*b*locking solution, not a
> > non-*l*ocking solution.
> >
> > Every synchronization method, whether lockless or not,
> > has the potential to stall the progress of a thread.
>
> This is simply not true. Perhaps you have not heard of wait-free algorithms?
> Wait-free algorithms guarantee that every thread makes progress. Admittedly
> lock-free != wait-free so your correction is at least partially relevant.

I am no expert on lock-free algorithms or on wait-free
algorithms (or on a lot of other things, if it comes to that).
But I'll continue to maintain that "synchronization" *requires*
either that multiple activities proceed in absolute lock-step
at all times (think SIMD designs) or that some mechanism exists
to bring them back into step at chosen points. In the latter
case, it must be possible to stall at least one activity.

If threads T1 and T2 attempt an "atomic" operation at the
same time, no more than one of them can actually succeed. The
other (or both, if there's a collide-and-back-off protocol)
fails, and must retry or go to sleep or something. But it must
*not* just bumble along as if it had succeeded; it cannot "make
progress" beyond the synchronization point until success is
achieved.

Something about this argument reminds me of the (possibly
apocryphal) tale of DEC optimizing the OpenVMS idle loop. This
effort (if it actually happened) presumably "sped up" the system,
but it didn't help the system "make progress."

--
Eric....@sun.com

SenderX

unread,

Mar 24, 2004, 12:33:31 PM3/24/04

to

> Something about this argument reminds me of the (possibly
> apocryphal) tale of DEC optimizing the OpenVMS idle loop. This
> effort (if it actually happened) presumably "sped up" the system,

******************************

> but it didn't help the system "make progress."

Do a simple test for yourself... Really.

Use this "Simple" lock-free lifo API:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/interlocked_singly_linked_lists.asp

Compare to the best lock-free design you got, make sure to align the
lock-free lifo header on a cache boundary and pad it accordingly.

I bet your best lock-based design will die to this under load.

You will soon learn that lock-based stacks can stand up to load, like
lock-free stacks can.

SenderX

unread,

Mar 24, 2004, 1:22:41 PM3/24/04

to

> You will soon learn that lock-based stacks can stand up to load, like

^^^
doh! I meant:

can't/have big trouble

:O

Ross Bencina

unread,

Mar 25, 2004, 12:43:39 AM3/25/04

to

SenderX wrote

> There is a link on your site that points to this:
>
> http://www.musicdsp.org/files/ATOMIC.H
>
> The code comments, "actually" DARES to mention, an atomic
> pointer!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
> WOW!!!

Sorry to take your sarcasm so seriously, but...

Part of the reason that FIFO is so simple is that only the (single)reader
updates the read poiner, and only the (single)writer updates the write
pointer, so I can't imagine that a true "atomic pointer" is necessary for
correct implementation. Atomic pointer size load/store should be sufficient.

Ross.

Ross Bencina

unread,

Mar 25, 2004, 1:00:24 AM3/25/04

to

Eric Sosman wrote:
> I am no expert on lock-free algorithms or on wait-free
> algorithms (or on a lot of other things, if it comes to that).

Me neither. But I have been researching the topic for a while.

> If threads T1 and T2 attempt an "atomic" operation at the
> same time, no more than one of them can actually succeed. The
> other (or both, if there's a collide-and-back-off protocol)
> fails, and must retry or go to sleep or something. But it must
> *not* just bumble along as if it had succeeded; it cannot "make
> progress" beyond the synchronization point until success is
> achieved.

That's true. Except that wait free algorithms generally employ techniques
where it T2 interrupts T1, then T2 detects that it has interrupted T1 and
"helps" it by executing the other thread's operation before its own. That
way both operations succeed and neither have to retry. This is probably a
simplification of wait-free algorithm methods, and reflects the fact that I
am not an expert, but it should give you the general idea: wait free
algorithms by-definition _guarantee_ that _all_ threads make progress.

> Something about this argument reminds me of the (possibly
> apocryphal) tale of DEC optimizing the OpenVMS idle loop. This
> effort (if it actually happened) presumably "sped up" the system,
> but it didn't help the system "make progress."

Here's a classic paper with some performance graphs showing clear
performance and scalability benefits from lock-free algorithms on SMP
systems:
http://www.research.ibm.com/people/m/michael/podc-1996.pdf

I have about 5 other studies showing similar results. Lock free lists are
used in the Windows NT kernel, and lock-free dictionaries are used in the
Sun JVM. There is surely still debate to be had about _when_ to use them,
but it is clear that they are sometimes useful.

A couple of reasons I have found to use lock-free algorithms:

- You can't aquire a lock in an interrupt handler
- If you don't aquire a lock, priority inversion doesn't happen

Best wishes

Ross.

SenderX

unread,

Mar 25, 2004, 1:59:29 AM3/25/04

to

> Sorry to take your sarcasm so seriously, but...

;)

>
> Part of the reason that FIFO is so simple is that only the (single)reader
> updates the read poiner, and only the (single)writer updates the write
> pointer, so I can't imagine that a true "atomic pointer" is necessary for
> correct implementation.

Yes, your correct. It would be overkill for that algo.

However, if you need to support a variable number of threads then
atomic_ptr, or other lock-free garbage collectors, are basically required
for true "dynamic" multi-read and/or multi-write lock-free collections.
Memory reclamation for lock-free collections is a key component for any
lock-free system. Basically, multi-threads can get pointers to nodes even if
they did not own a previous reference... This will totally crash if you
decide to delete any popped nodes, or any objects that the nodes are
pointing to. And a normal reference counted pointer will crash you in weird
ways, because it can drop to zero and concurrently allow another thread to
increment and own while the pointer is deleted/reused.

David Butenhof

unread,

Mar 25, 2004, 9:16:49 AM3/25/04

to

Ross Bencina wrote:

> Eric Sosman wrote:
>> I am no expert on lock-free algorithms or on wait-free
>> algorithms (or on a lot of other things, if it comes to that).
>
> Me neither. But I have been researching the topic for a while.
>
>> If threads T1 and T2 attempt an "atomic" operation at the
>> same time, no more than one of them can actually succeed. The
>> other (or both, if there's a collide-and-back-off protocol)
>> fails, and must retry or go to sleep or something. But it must
>> *not* just bumble along as if it had succeeded; it cannot "make
>> progress" beyond the synchronization point until success is
>> achieved.
>
> That's true. Except that wait free algorithms generally employ techniques
> where it T2 interrupts T1, then T2 detects that it has interrupted T1 and
> "helps" it by executing the other thread's operation before its own. That
> way both operations succeed and neither have to retry. This is probably a
> simplification of wait-free algorithm methods, and reflects the fact that
> I am not an expert, but it should give you the general idea: wait free
> algorithms by-definition _guarantee_ that _all_ threads make progress.

The "interrupting" thread can complete the "interrupted" thread's task only
if that task was made known to it. There are a variety of ways this can be
done, but in general it only complicates the code and moves the problem one
step out. That is, you can interrupt the code that records the operation
it's about to perform just as easily as you can interrupt the operation
itself; and the interrupting thread then can't know about it. Essentially,
you're really just doubling the region where a thread is vulnerable to
interference. I won't say there are no cases where this might possibly be
of value anyway, but in general you'd be far better off just trying the
operation and avoiding the extra obfuscation!

The basic fallacy of SenderX's arguments is that, while he's correct that
execution of a lock-free algorithm will not "indefinitely postpone"
execution of an individual thread, that's NOT necessarily always in the
best interests of the application (much less of the SYSTEM) as a whole.
Sometimes it really IS better to block a particular thread and allow others
to make progress. That's why you're often (and perhaps even "almost
always") better off using a carefully tuned COMBINATION of lock-based and
lock-free mechanisms. (And, in other contexts, though he keeps slipping
back into old habits, SenderX has even admitted that in this newsgroup.)

Furthermore, to say that lock-free algorithms don't "stall" a thread is
physically inaccurate. Memory contention WILL stall the processor's
interlocked operations, or cause them to fail and retry. The thread in
question is NOT making "uninterrupted forward progress". It's just in a
hardware stall or spinloop instead of being blocked on a wait queue.
Sometimes, especially in low-level operations that rarely contend, this is
good.

In other contexts, though, it's bad; and it can be seriously bad, if the
contention levels are high enough and evenly distributed. Where a blocking
algorithm allows SOME thread to complete its operation without
interference, the stalls and spins in a poorly tuned lock-free application
can instead cause ALL threads to struggle along in slow motion. (Of course
one shouldn't ever design such prolonged high contention into a program.
That sort of rule unfortunately is far easier to toss off with an
authoritative air in a newsgroup than to actually implement and guarantee
in a large application.)

> Here's a classic paper with some performance graphs showing clear
> performance and scalability benefits from lock-free algorithms on SMP
> systems:
> http://www.research.ibm.com/people/m/michael/podc-1996.pdf
>
> I have about 5 other studies showing similar results. Lock free lists are
> used in the Windows NT kernel, and lock-free dictionaries are used in the
> Sun JVM. There is surely still debate to be had about _when_ to use them,
> but it is clear that they are sometimes useful.

Note that neither of those examples use EXCLUSIVELY lock-free constructs;
nor should they. Lock-free algoritms are often more complicated, and aren't
as portable. Where they make sense, they can contribute dramatically to the
performance of a facility. Using them where they don't make sense just, uh,
doesn't make any sense.

So, yes, they are SOMETIMES useful. I'd be more generous and say they're
USUALLY useful if you're writing any large concurrent application. But
there's still a big difference between USUALLY and ALWAYS. There's no
universal answer here. Analyze, test, evaluate, and choose wisely. (Brings
up a vision of the knight at the end of "Indiana Jones and the Last
Crusade"... "He chose... unwisely.")

> A couple of reasons I have found to use lock-free algorithms:
>
> - You can't aquire a lock in an interrupt handler
> - If you don't aquire a lock, priority inversion doesn't happen

Yes, both of these are excellent reasons to chose lock-free, since there is
often no better solution. The vast majority of lock-free usage in OS
kernels fall into those categories. Even if the performance was worse and
it was wildly inconvenient, it'd still be worthwhile. But if all your
design choices are that clear, you're probably not building anything very
interesting. ;-)

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

SenderX

unread,

Mar 25, 2004, 11:14:56 AM3/25/04

to

> That's why you're often (and perhaps even "almost
> always") better off using a carefully tuned COMBINATION of lock-based and
> lock-free mechanisms. (And, in other contexts, though he keeps slipping
> back into old habits, SenderX has even admitted that in this newsgroup.)

Yes... Case in point: A special lock has greatly improved the througput of
my garbage collector!

How can this be? The collector that I have created is prone to overflow when
the application decides to blast it with tons of concurrent reads and
writes. Constant overlapping read access will not allow a collection cycle
to occur... I needed some way to defer threads that would clearly overflow
the collector. A special, semaphore-like, lock was the only solution! I
could not come up with any good plan that did not involve a lock...

Now when a threads tries to access the garbage collector, and the trash bin
is full, it goes into a kernel waitset and blocks. When the trash is
emptied, blocked threads are allowed into the collector.

Locks and Lock-Free algos can go REAL good together!

Ross Bencina

unread,

Mar 27, 2004, 8:28:14 PM3/27/04

to

> Yes... Case in point: A special lock has greatly improved the througput of
> my garbage collector!
>
> How can this be? The collector that I have created is prone to overflow
when
> the application decides to blast it with tons of concurrent reads and
> writes. Constant overlapping read access will not allow a collection cycle
> to occur... I needed some way to defer threads that would clearly overflow
> the collector. A special, semaphore-like, lock was the only solution! I
> could not come up with any good plan that did not involve a lock...

I understand how this could increase throughput, but it would also introduce
non-deterministic latency wouldn't it? In other words I couldn't use your GC
for a real-time application.

I still don't fully understand the memory reclamation issues with lock free
queues. My understanding is that blocks that have been used as lock free
nodes can't easily be freed for general use until you can be certain that no
thread holds a reference to a node (Michael's SMR addresses this by keeping
track of which nodes each thread holds ptrs to). However it's less clear
whether SMR (or equivalent) is necessary if all nodes are retained and only
used in lock-free queues. For example, if there is a fixed size free list
(lock free LIFO), and a set of queues that use nodes allocated from the LIFO
(and freed back to the LIFO) is SMR necessary or can you just keep recycling
nodes safely?

Ross.

SenderX

unread,

Mar 28, 2004, 7:36:38 AM3/28/04

to

> I understand how this could increase throughput, but it would also
introduce
> non-deterministic latency wouldn't it?

Not really, you can set it so it will not block at all.

At worst case performance the gc would block 2-3 threads every 4096 parallel
pops. This is extremely fast. But... If you want AppCore to "guarantee" that
a collection will NOT block ANY of your threads, you need to set the correct
AppCore flags per-queue-object.

There is also more than one implementation of the gc. The gc in question
would spin, or maybe block certain threads after a certain number of pop
operations happen in parallel.

> However it's less clear
> whether SMR (or equivalent) is necessary if all nodes are retained and
only
> used in lock-free queues. For example, if there is a fixed size free list
> (lock free LIFO), and a set of queues that use nodes allocated from the
LIFO
> (and freed back to the LIFO) is SMR necessary or can you just keep
recycling
> nodes safely?

Yes. AppCore queues use static node pools, then switches to full-blown
dynamic gc when then application needs more.

VERY IMPORTANT!!! If you do use pools of static nodes, make SURE that ANY
aba counters are preserved for the entire nodes lifetime. You will crash on
ABA rollover if you do not do this.

P.S.

One of AppCore garbage collectors users an experimental "pointer delay" algo
of mine, to avoid crashing under dynamic load.

Giancarlo Niccolai

unread,

Mar 28, 2004, 5:42:28 PM3/28/04

to

Ross Bencina wrote:

> I understand how this could increase throughput, but it would also
> introduce non-deterministic latency wouldn't it? In other words I couldn't
> use your GC for a real-time application.
>

You can't use GC for realtime applications in generally, as every one GC,
even those ones designed to work with small latency as the three color, or
any parallelized GC, will evetually be involved in periodic long loops that
requires other threads to retain from memory geometry change.

The alternative is a GC that is ALWAYS running, and that can free ONE block
element at a time; but this kind of GC would consume a considerabily high
CPU power with respect of any other GC, and the overall application times
would be worse. They would be more constant, so you may use this approach
on a RT application where the CPU has a calculation power edge enough to
accomodate both your application and the GC, while the application would
finish sooner, but with sometimes long pauses, with another kind of GC.

Also, as the "speed" ad wich the GC operates could not be tuned in such a
scenario, you would have to finetune the application so that it does not
have peaks of memory allocations in time.

Finally, it's at least questionable if such a GC may be better than the
reference counting; in some cases yes, but under realtime requirements,
refcount, although having overall worse prestations, has a finer and
smoother memory processing.

Giancarlo.

Ross Bencina

unread,

Mar 29, 2004, 2:00:56 AM3/29/04

to

Giancarlo Niccolai wrote:
> Ross Bencina wrote:
>
>
> > I understand how this could increase throughput, but it would also
> > introduce non-deterministic latency wouldn't it? In other words I
couldn't
> > use your GC for a real-time application.
> >
>
> You can't use GC for realtime applications in generally, as every one GC,
> even those ones designed to work with small latency as the three color, or
> any parallelized GC, will evetually be involved in periodic long loops
that
> requires other threads to retain from memory geometry change.

Yes, I guess that is a whole other can of worms, which I'm relatively aware
for someone who uses neither reference counting GC or any other GC in my
real-time code.

However, I think what Sender-X is calling GC is not a generalised GC in the
sense that you might think, but rather a specific type of memory reclamation
for nodes used in lock-free data structures.

Perhaps I misunderstand both of you though.

Ross.

SenderX

unread,

Mar 29, 2004, 2:16:29 PM3/29/04

to

> However, I think what Sender-X is calling GC is not a generalised GC in
the
> sense that you might think, but rather a specific type of memory
reclamation
> for nodes used in lock-free data structures.

Correct.

> Perhaps I misunderstand both of you though.

How do you create a linked-list that allows for variable number of threads
to iterate through and read the list, while a bunch of other threads are
pushing and popping items in parallel...

What if this happens:

1A. Threads A-D iterate up to node c

1B. Thread Z concurrently pops node c...

2A. Threads A-D read and "use" node c

2B. Thread Z deletes c!!! DOH!

3. Thread A-D crash! oops...

How do you manage the memory in that "volatile" situation? You simply have
to use some sort of garbage collection.

You need to ensure that "no thread has a pointer" to a node when its time to
delete it. My gc'tor solves this problem.

Giancarlo Niccolai

unread,

Mar 30, 2004, 3:11:27 PM3/30/04

to

SenderX wrote:

>
> How do you manage the memory in that "volatile" situation? You simply have
> to use some sort of garbage collection.
>

Or weak reference.

Giancarlo.

SenderX

unread,

Mar 30, 2004, 4:30:16 PM3/30/04

to

> > How do you manage the memory in that "volatile" situation? You simply
have
> > to use some sort of garbage collection.
> >
> Or weak reference.

Weak reference can't be used for this. Boost pointers, or any other
non-atomic counter pointer, will crash.

In order to use reference counting for a lock-free collection, you would
have to include the reference counts in every CAS that affected a node. This
would require Joe's atomic_ptr "with" ABA safe CAS that I added to it:

http://appcore.home.comcast.net/src/AtomicPtr.c

P.S.

AppCore x86-32 is almost complete. Before I release the version that is
ready for test driving, I have to write up a detailed paper on it. Then I
need to benchmark against other nice thread libraries... Like Wefts++

:)

Giancarlo Niccolai

unread,

Apr 1, 2004, 12:01:04 AM4/1/04

to

SenderX wrote:
>
> AppCore x86-32 is almost complete. Before I release the version that is
> ready for test driving, I have to write up a detailed paper on it. Then I
> need to benchmark against other nice thread libraries... Like Wefts++
>
> :)

I am very pleased to kuunusablehink wefts++ is a nice th.lib, but wefts is
meant for high level abstraction of threads, with few threads running in
processes and mainly blocked when they aren't needed to do anything (a
situation in which lock-free algos are more or less unusable). Your library
is meant for very fast computational result handling in massive parallel
environments, like i.e. hyperthreading, where a lock/wait concept is out of
any possible performance requirement. Long story short, you'll be
benchmarking apples against cars. Both can be good, but you don't eat cars
and you don't drive apples.

BTW, since I am there, I think I found a nice way to use lock free algos to
implement pthread condition variables under Windows without using 3
semaphores.More on my next mails; I would greatly appreciate your help to
put lock-free based posix condition variables in wefts.

Giancarlo.

SenderX

unread,

Apr 5, 2004, 4:24:11 PM4/5/04

to

> BTW, since I am there, I think I found a nice way to use lock free algos
to
> implement pthread condition variables under Windows without using 3
> semaphores.More on my next mails; I would greatly appreciate your help to
> put lock-free based posix condition variables in wefts.

What did you have in mind?