64 bit atomic_ptr possible

Joseph Seigh

unread,

Apr 10, 2003, 8:55:43 PM4/10/03

to

atomic_ptr can be made to work in 64 bit systems when those become more
commonplace. All you need to do to allocate the atomic_ptr_ref objects
from a private pool such that a 32 bit value will suffice to map into
that pool. You override the new and delete methods to effect this.
An easy way to do it would to allocate sufficient virual storage, up to
4 gigabytes, and use the offset into it as the 32 bit "pointer" value,
adding the virtual storage base address to get the actual 64 bit pointer.
Additionally, you can use compare and swap w/ a version count to avoid the
ABA problem to implement a lock-free lifo free queue for the storage pool
since 32 bit offsets will suffice there also.

For windows, that would be VirtualAlloc to reserve vitual storage and commit
it as needed. For unix, probably mmap.

Joe Seigh

SenderX

unread,

Apr 11, 2003, 10:34:01 PM4/11/03

to

"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3E961336...@xemaps.com...

> atomic_ptr can be made to work in 64 bit systems when those become more
> commonplace. All you need to do to allocate the atomic_ptr_ref objects
> from a private pool such that a 32 bit value will suffice to map into
> that pool.

This would be a solution for a 64-bit system that does not have a compare
and swap that works on double the pointer size right?

> Additionally, you can use compare and swap w/ a version count to avoid the
> ABA problem to implement a lock-free lifo free queue for the storage pool
> since 32 bit offsets will suffice there also.

Could the free stack grow and shrink?

Joseph Seigh

unread,

Apr 12, 2003, 2:12:18 PM4/12/03

to

SenderX wrote:
>
> "Joseph Seigh" <jsei...@xemaps.com> wrote in message
> news:3E961336...@xemaps.com...
> > atomic_ptr can be made to work in 64 bit systems when those become more
> > commonplace. All you need to do to allocate the atomic_ptr_ref objects
> > from a private pool such that a 32 bit value will suffice to map into
> > that pool.
>
> This would be a solution for a 64-bit system that does not have a compare
> and swap that works on double the pointer size right?

Right.

>
> > Additionally, you can use compare and swap w/ a version count to avoid the
> > ABA problem to implement a lock-free lifo free queue for the storage pool
> > since 32 bit offsets will suffice there also.
>
> Could the free stack grow and shrink?

That would depend on the implementation. You could probably use a windows
private heap providing you could query its base address to calculate the
offsets from. Or anything you want.

Joe Seigh

SenderX

unread,

Apr 14, 2003, 6:46:25 PM4/14/03

to

> That would depend on the implementation. You could probably use a windows
> private heap providing you could query its base address to calculate the
> offsets from. Or anything you want.

So, if atomic_ptr could work and scale on any 64-bit system...

That would give a 64-bit system the ability to compare and swap atomic_ptrs
with version counts right?

If so, then basically 64-bit systems lacking a compare and swap double the
pointer size the ability to still use lock-free algo's!

If atomic_ptr could make it so you do not need a cmpxchg16b to do 64-bit
lock-free, I think that would be huge!

Joseph Seigh

unread,

Apr 14, 2003, 7:56:28 PM4/14/03

to

There would be no problem for the basic atomic_ptr ptr. You could also
implement the compare and swap atomic_ptrs. There might be scalability
problems with compare and swap of atomic_ptr for 32 bit and 64 bits. The
reason this might be a problem is since compare and swap is meant to be
used for things like implementing locks, it has to execute fairly in
multiprocessor systems. So a single processors shouldn't be able to
repeatedly successfully execute the compare and swap and starve the other
processors. There are actually engineering notes to this effect in the
super top secret internal architecture document for a mainframe architecture.

The base atomic_ptr ops don't have a problem since the compare and swap loops
don't have other compare and swaps also. But the compare and swap method would
have them since atomic_ptrs are accessed as part of the compare and swap logic,
and those accesses do compare and swap internally. Any thread-safe reference
ptr would have this problem if it used compare and swap.

Exponential back off may work *, but somebody is going to have to do some
metrics on a multi-processor system to determine whether the extent of the
problem. Uniprocessor systems won't show the problem.

* this would need the version number implementation I was thinking of so all
concurrent compare and swap experience contention. The version you have which
is modeled on the IBM Z architecture examples, the sucessful updates don't
experience contention and thus don't back off. All have to back off or
it won't work. I can go into a little more detail if anyone wants.

Joe Seigh

SenderX

unread,

Apr 15, 2003, 11:51:38 PM4/15/03

to

> * this would need the version number implementation I was thinking of so
all
> concurrent compare and swap experience contention. The version you have
which
> is modeled on the IBM Z architecture examples, the sucessful updates don't
> experience contention and thus don't back off. All have to back off or
> it won't work.

Are you talking about putting a backoff spinner in the atomic_ptr code
itself?

On most lock-free algo's there are specific places to put backoff's. Like in
the lock-free fifo queue after a thread fails to swing the back's next
pointer to the new node. The backoff seems to be decided on the total number
of queues, and the total number of threads concurrently in the compare and
swap " critical section ".

> I can go into a little more detail if anyone wants.

Please do.

Joseph Seigh

unread,

Apr 16, 2003, 8:59:00 AM4/16/03

to

SenderX wrote:
>
> > * this would need the version number implementation I was thinking of so
> all
> > concurrent compare and swap experience contention. The version you have
> which
> > is modeled on the IBM Z architecture examples, the sucessful updates don't
> > experience contention and thus don't back off. All have to back off or
> > it won't work.
>
> Are you talking about putting a backoff spinner in the atomic_ptr code
> itself?

It would have to be in the loop somewhere. I'm not sure how I'd do it at this
point or whether I'd do it at all.

>
> On most lock-free algo's there are specific places to put backoff's. Like in
> the lock-free fifo queue after a thread fails to swing the back's next
> pointer to the new node. The backoff seems to be decided on the total number
> of queues, and the total number of threads concurrently in the compare and
> swap " critical section ".
>

Some of the "self sacrificial" back offs where only some of the threads back off
would only seem to work where the contention was intermittent. Otherwise some of
the backed off threads could block inordinately.

> > I can go into a little more detail if anyone wants.
>
> Please do.

In the version counted implementation, the version count would change on every access,
so any overlapped access would cause the compare and swap to fail. This makes it harder
for threads to sneak through when contention is occurring, so in that sense it would be
fairer and probably handle constant contention better than some other back off schemes.

This is probably going to be hypothetical. I'm thinking now that the current scheme, apart
from only a 16 bit version number, will scale with no problems. So the alternate version
scheme is plan B at this point. It's always good to have a plan B, though.

Joe Seigh

SenderX

unread,

Apr 19, 2003, 8:11:18 PM4/19/03

to

> This is probably going to be hypothetical. I'm thinking now that the
current scheme, apart
> from only a 16 bit version number, will scale with no problems. So the
alternate version
> scheme is plan B at this point. It's always good to have a plan B,
though.

Yes, concurrent access to the atomic_ptr::cas function will fail for sure.
Even if the 16-bit ref< T >::seq sequence integer overflows to zero.

The sequence counted version of atomic_ptr should not starve other
processors at all on atomic_ptr::cas. Especially if you use exponential
backoffs specific to the lock-free algo your using atomic_ptr to
implement...

Also, remember the extra 16-bits you get in the refcount struct?

struct refcount {
//long ecount; // ephemeral count
__int16 ecount;
__int16 extra16;
unsigned long rcount; // reference count
};

The refcount::extra16 member can maybe be used for an internal version
count.. Our, maybe it could help port atomic_ptr to the weird cmp8bxchg16b
that you said the I64 will have...