Question on ABA problem... (very important)

R.Amine

unread,

Sep 17, 2006, 1:51:52 PM9/17/06

to

Hello all,

Before asking this question , i will try to be very clear:

Please read bellow:

//#####################################################

function TFreeStack.Pop(var myObject:TObject):boolean;
var node: TSingleLnkNode;
begin
repeat
node := TSingleLnkNode(self.fhead.next);
if node = nil
then
Begin
myObject:=nil;
result:=false;
exit;
End;
{$IFDEF Delphi}
Until CASPointer(pointer(self.fhead.Next), pointer(node),
pointer(node.next));
{$ENDIF}
{$IFDEF FreePascal}
Until self.CritSyncObj.CAS(dword(self.fhead.Next), dword(node),
dword(node.next));
{$ENDIF}
myObject:=node.myObject;
self.freelist.Release(node);
result:=true;
decrementCount;
end;
//###########################################################

The ABA problem may happen since between the line

"node := TSingleLnkNode(self.fhead.next);"

and

"CASPointer(pointer(self.fhead.Next), pointer(node), pointer(node.next));"

Multiple pushes and pops may happen in parallel (on other threads), so, the
'self.fhead.Next' and 'node' pointers may point to the same element
even if node.next are not the same.

So, how to avoid this problem ?

For example, we have to create a 64 bit data filed inside the TFreestack ,
that must be aligned: and that's not a problem: i can use getmem(same as
malloc) and reserve
some memory and try to find an address that is 8 byte aligned.

In this 64bit filled inside the TFreeStack object the following must be
copied
self.fhead.Next and focount(the pop count: the output count) and they will
be
changed inside the DWCAS(with a lock compxchg8b: the 64bit destination
will reside in [ESI]).

But since i have to compare a variable ocount(old ocount: the old pop count)
with the self.focount (the new popcount), so , i have to pass the 32bit
address
of the VMT to be able to access self.focount and self.fhead.next inside
the DWCAS , but that's not a problem.

Now, follow me please:

As i said before, those conditions above are necessary but not sufficient,
and i have tried to explain it to Chriss Thomasson like this:

#------------------------------------------------

(1) push esi
push ebx
mov esi, [esp + 16]
mov eax, [esi]
mov edx, [esi + 4]
mov esi, [esp + 20]
mov ebx, [esi]
mov ecx, [esi + 4]
(2)mov esi, [esp + 12]
lock cmpxchg8b qword ptr [esi]

Suppose between (1) and (2) the thread is preempted and another thread
have executed both the Push() and the DWCAS inside , so, self.fhead.Next
will be changed , but since the CAS is not completly atomic , as soon as
the thread will run again it will find itself with the old copy of
"self.fhead.Next",
and it will execute the lock cmpxchg8b qword ptr [esi] on the old copy of
"self.fhead.Next".

#-----------------------------------------------------------------------

So, if the thread is prempted inside the DWCAS and a Push() is executed ,
we have a problem , and self.fhead.Next is not valid, so we have to ATOMICLY
compare the following:

SELF.FHEAD.NEXT

AND

SELF.FOCOUNT (the pop count)

AND

SELF.FICOUNT (the push count)

and that's more than 64 bits , mush more than the COMPXCHG8B can handle.

What's your opinion on this ?

R.Amine

unread,

Sep 17, 2006, 2:00:42 PM9/17/06

to

Some typo like:

"In this 64bit filled.." that must be read "In this 64bit field"

R.Amine

unread,

Sep 17, 2006, 2:20:13 PM9/17/06

to

I wrote:

> SELF.FHEAD.NEXT
>
> AND
>
> SELF.FOCOUNT (the pop count)
>
> AND
>
> SELF.FICOUNT (the push count)
>
>
> and that's more than 64 bits , mush more than the COMPXCHG8B can handle.

I assumed in the above that:

SELF.FHEAD.NEXT is 32 bits

SELF.FOCOUNT (the pop count) is 32 bits

SELF.FICOUNT (the pop count) is 32 bits

But if we change it to

SELF.FHEAD.NEXT is 32 bits

SELF.FOCOUNT (the pop count) is 16 bits

SELF.FICOUNT (the pop count) is 16 bits

I think that's will work.

What do you think ?

Is there any critics and/or advise ?

"R.Amine" <ami...@colba.net> wrote in message
news:12gqv9b...@corp.supernews.com...

R.Amine

unread,

Sep 17, 2006, 2:45:34 PM9/17/06

to

"R.Amine" <ami...@colba.net> wrote in message

news:12gr0uh...@corp.supernews.com...

> I wrote:
>
> > SELF.FHEAD.NEXT
> >
> > AND
> >
> > SELF.FOCOUNT (the pop count)
> >
> > AND
> >
> > SELF.FICOUNT (the push count)
> >
> >
> > and that's more than 64 bits , mush more than the COMPXCHG8B can handle.
>
>
> I assumed in the above that:
>
> SELF.FHEAD.NEXT is 32 bits
>
> SELF.FOCOUNT (the pop count) is 32 bits
>
> SELF.FICOUNT (the pop count) is 32 bits
>
>
> But if we change it to
>
> SELF.FHEAD.NEXT is 32 bits
>
>
> SELF.FOCOUNT (the pop count) is 16 bits
>
> SELF.FICOUNT (the pop count) is 16 bits
>

EVEN THAT it may still prempt inside the DWCAS between the time
that:

(we are trying to get both the new SELF.FOCOUNT and new SELF.FICOUNT)

AND

the COMPXCHG8B.

So,

How to solve this ?

Must i use a the recursive EnterCriticalSection and LeaveCriticalSection
that i wrote (inside the isync.pas), and do something like this:

EnterCriticalSection;

result:=np_ac_i686_atomic_dwcas_fence(destination,comperand,exchange)=1;

LeaveCriticalSection;

This will insure that the thread that executes the DWCAS
will not be prempted by the set of threads that uses the lock-free
structure, EVEN if the thread is prempted by high priority kernel threads
...

Or it it too expensive ?

I think the following is a nessary condition:

As soon as we call the DWCAS we have to avoid premption to
the set of other threads that are using the lock-free datastructure.

R.Amine

unread,

Sep 17, 2006, 2:48:00 PM9/17/06

to

"R.Amine" <ami...@colba.net> wrote in message
news:12gr0uh...@corp.supernews.com...
> I wrote:
>

> > SELF.FHEAD.NEXT
> >
> > AND
> >
> > SELF.FOCOUNT (the pop count)
> >
> > AND
> >
> > SELF.FICOUNT (the push count)
> >
> >
> > and that's more than 64 bits , mush more than the COMPXCHG8B can handle.
>
>

AND

the COMPXCHG8B.

So,

EnterCriticalSection;

result:=np_ac_i686_atomic_dwcas_fence(destination,comperand,exchange)=1;

LeaveCriticalSection;

I think the following is a necessary condition:

As soon as we call the DWCAS we have to avoid premption to
the set of other threads that are using the lock-free datastructure.

What do you think ?

Is there any critics and/or advise ?

>
> I think that's will work.
>
>
> What do you think ?
>
> Is there any critics and/or advise ?
>

>
>
>
> "R.Amine" <ami...@colba.net> wrote in message
> news:12gqv9b...@corp.supernews.com...
> >

R.Amine

unread,

Sep 17, 2006, 3:18:34 PM9/17/06

to

> SELF.FOCOUNT (the pop count) is 16 bits
> >
> > SELF.FICOUNT (the pop count) is 16 bits

read like this:

SELF.FOCOUNT (the pop count) is 16 bits

SELF.FICOUNT (the push count) is 16 bits

clayne

unread,

Sep 17, 2006, 3:44:08 PM9/17/06

to

R.Amine wrote:
> Hello all,
>
> Before asking this question , i will try to be very clear:

So let me get this straight.. We have here Pascal, threading, and
inline x86 assembly. Can things get anymore unclear?

R.Amine

unread,

Sep 17, 2006, 5:16:20 PM9/17/06

to

"clayne" <cla...@anodized.com> wrote in message

> R.Amine wrote:
> > Hello all,
> >
> > Before asking this question , i will try to be very clear:
>
> So let me get this straight.. We have here Pascal

The compiler: http://www.freepascal.org/

It's not Pascal , it's a much more than Pascal, with the Object oriented
concepts.(like C++)

Architechtures like Intel x86, Amd64/x86 64, PowerPC, Sparc and more..
are supported.(it's both 32bit and 64bit)

You can use also Delphi to compile the isynch, freelist,freestak...:

>, threading,

Right:

Deadlock, priority inversion, lock convoy, ABA etc...

You have also to understand concepts and know how to implemen the following:

a non recursive critical section , a recrusive critical section, and others
like recursive critical section
that avoids priority inversion.

mutex,semaphore, optex, waitforObject ,waitForMultipleObject, setEvent,
setEvents,resetEvent,resetEvents etc.(I will try to implement those objects
over ISynch.pas)

You have also to know structurel analysis(how to calculate the invariants
equations)
and Reachability of Petri Nets to avoid deadlock on programs that use
lock sychronisation objects...and how to model the the critical_section,
setevent,
setautoevent,waitforobject, waitformultipleobjects etc.

You have also to understand the concepts and know how to implement some
lock-free datastrutures

>and inline x86 assembly.

Yes assembly to be able to write some synchronisation primitives and
external assembly
in some situations when you want to aligh on 8 bytes( you can also align
data on 8bytes
from inside Objectpascal.) etc

>Can things get anymore unclear?

If you have not the expertise above, that's true.

R.Amine

unread,

Sep 17, 2006, 5:21:04 PM9/17/06

to

>You have also to know structurel analysis(how to calculate the invariants
>equations)
>and Reachability of Petri Nets to avoid deadlock on programs that use
>lock sychronisation objects...and how to model the the critical_section,
>setevent,

>setautoevent,waitforobject, waitformultipleobjects etc. in Petri nets ...

I mean: if you want to use some lock objects into realworld programming,
you have to know a good portion of that also....

"R.Amine" <ami...@colba.net> wrote in message

news:12grb8o...@corp.supernews.com...

R.Amine

unread,

Sep 17, 2006, 5:52:53 PM9/17/06

to

And also you have to know how to calculate the Big O complexity
of some datastruture algorithms , like for example: if you want to implement
a lock or lock-free SkipList or something else...

Regards,
Amine Moulay Ramdane.

"R.Amine" <ami...@colba.net> wrote in message

news:12grbhk...@corp.supernews.com...

R.Amine

unread,

Sep 17, 2006, 6:00:24 PM9/17/06

to

I wrote:
> And also you have to know how to calculate the Big O complexity
> of some datastruture algorithms , like for example: if you want to
implement

> a lock or lock-free SkipList ( or something else..).

And if you don't have some probability theory you will not be able to affirm
something like
this:

From the theoritic point of view, the properties of the skiplist are:

insertion , deletion and search are in log(n)

average pointers: equal or even better than the tree structure

the probability that it degenerate from nonlinear to linear is so small

and it has a good locality reference

Regards,
Amine Moulay Ramdane.

"R.Amine" <ami...@colba.net> wrote in message

news:12grdd9...@corp.supernews.com...

R.Amine

unread,

Sep 17, 2006, 6:10:29 PM9/17/06

to

And finally if you have a good logical mind you can use LOGIC to glue the
things...

:)

Regards,
Amine Moulay Ramdane

R.Amine

unread,

Sep 17, 2006, 6:25:09 PM9/17/06

to

I wrote:
> The compiler: http://www.freepascal.org/
>
> It's not Pascal , it's a much more than Pascal, with the Object oriented
> concepts.(like C++)
>
>
> Architechtures like Intel x86, Amd64/x86 64, PowerPC, Sparc and more..
> are supported.(it's both 32bit and 64bit)

You can link a.out freepascal objects and also .so,.dll files
to GCC or MinGwin.

the other way is also possible ..

"R.Amine" <ami...@colba.net> wrote in message
news:12grb8o...@corp.supernews.com...
>

clayne

unread,

Sep 17, 2006, 5:41:11 PM9/17/06

to

R.Amine wrote:
> I wrote:
> > The compiler: http://www.freepascal.org/
> >
> > It's not Pascal , it's a much more than Pascal, with the Object oriented
> > concepts.(like C++)

The problem you're going to have in this group is that the majority, by
large, people speak C as the programmatic language of communication.
It's very hard to read the psuedo-psuedo-code from freepascal.

R.Amine

unread,

Sep 17, 2006, 6:46:16 PM9/17/06

to

"R.Amine" <ami...@colba.net> wrote in message

news:12grf9p...@corp.supernews.com...

> I wrote:
> > The compiler: http://www.freepascal.org/
> >
> > It's not Pascal , it's a much more than Pascal, with the Object oriented
> > concepts.(like C++)
> >
> >
> > Architechtures like Intel x86, Amd64/x86 64, PowerPC, Sparc and more..
> > are supported.(it's both 32bit and 64bit)
>
>
> You can link a.out freepascal objects and also .so,.dll files
> to GCC or MinGwin.
>
> the other way is also possible ..

Dev-C++ 5.0
Mingw/GCC 3.4.2 was written in ObjectPascal

look at the 'sourcecode' in:
http://www.bloodshed.net/dev/devcpp.html

R.Amine

unread,

Sep 17, 2006, 6:50:01 PM9/17/06

to

"clayne" <cla...@anodized.com> wrote in message

news:1158529271.2...@i42g2000cwa.googlegroups.com...

ObjectPascal is much more clear than C and C++, and much easy to understand.

And also very powerfull that even
Dev-C++ 5.0
Minngw/GCC 3.4.2

R.Amine

unread,

Sep 17, 2006, 7:02:56 PM9/17/06

to

"R.Amine" <ami...@colba.net> wrote in message

news:12grgoc...@corp.supernews.com...

And in my opinion what's more important is the 'logic' not the programming
language.

And i think you can easily extract/write the logic to/from a programming
language
like ObjectPascal...

>
>
>
>
>
>

R.Amine

unread,

Sep 17, 2006, 7:20:47 PM9/17/06

to

I wrote:
>And in my opinion what's more important is the 'logic' not the programming
>language.

>And i think you can easily extract/write the logic to/from a programming
>language
>like ObjectPascal...

You want a proof clayne ?

Look for example at me, i don't know well english , but since i am very
logic
you can understand me very well.
I know how to glue simple thinks to make meaningful things.

So, LOGIC is the GLUE that makes sense, not the language.

:)

Regards,
Amine Moulay Ramdane.

"R.Amine" <ami...@colba.net> wrote in message

news:12grhgk...@corp.supernews.com...

Chris Thomasson

unread,

Sep 18, 2006, 8:45:00 PM9/18/06

to

> But if we change it to
>
> SELF.FHEAD.NEXT is 32 bits
>
>
> SELF.FOCOUNT (the pop count) is 16 bits
>
> SELF.FICOUNT (the pop count) is 16 bits
>
>
> I think that's will work.
>
>
> What do you think ?
>
> Is there any critics and/or advise ?

Refer to one of my old ideas:

http://groups.google.com/group/comp.programming.threads/msg/c0eb5f08e858948f

Actually, as you may know, you can do "lots of interesting stuff with offset
tricks"... IMHO, they are very useful for lock-free algorithms that have to
deal with lots of activity from writer threads... (e.g., lock-free stack
anchor, skiplist anchor, ect...)

For instance, think of something like this:

quickly scribbling down pseudo-code for an atomic triple linked list:
----------------

/* 64-bit triple list anchor */
#define TLIST_DEPTH 3
struct tlist_anchor_t {
int16 list[TLIST_DEPTH]; /* list root node offsets... */
int16 aba;
};

/* node with a triple list as a next pointer */
struct node_t {
tlist_anchor_t next; /* a triple list anchor for the next link... */
int16 thisindex; /* self offset */
void *state; /* user state */
};

/* global node cache via. lock-free LIFO */
#define NODE_DEPTH USHRT_MAX
static node_t gnodes[NODE_DEPTH];
extern node_t* gnode_pop(void); /* may block when empty */
extern void gnode_push(node_t*);

/* AppCore DWCAS:
updates comprand and returns non-zero on failure */
#define DWCAS np_ac_i686_atomic_dwcas_fence

/* I will modify aba on pushes, instead of pops...
this function pushes ndepth nodes at a time.
the nxoffset determines where to link the next nodes.
n points to local array of node_t*, at least the size of ndepth.
*/
void tlist_push_local
(triple_list_t *_this,
short const nxoffset,
short const ndepth,
node_t **n) {

int i;
triple_list_t xchg, cmp;

/* sanity check */
assert(ndepth > -1 && ndepth < TLIST_DEPTH);
assert(nxoffset > -1 && nxoffset < TLIST_DEPTH);

/* init the xchg offsets with the passed offsets */
for(i = 0; i < ndepth; ++i) {
xchg.list[i] = n[i].thisindex;
}

/* make the initial read, and start the "CAS-Loop" */
cmp = *_this;
do {

/* link the next nodes with the initial read */
for(i = 0; i < ndepth; ++i) {
n[i].next[nxoffset] = cmp.list[i];
}

/* inc the monotonic aba counter of the initial read */
xchg.aba = cmp.aba + 1;

/* attempt to swap in the xchg */
} while(DWCAS(_this, &cmp, &xchg));

/* thats all folks!! ;) */
}

Humm... Pretty cool huh?

;)

We can keep the reads and writes of "three separate linked lists" atomic":

- A monotonic counter and the "root nodes" of each linked list is contained
in a "single 64-bit anchor structure"; this is the "atomic sequence point",
so to speak...

- A triple list anchor and node links are all made up of a single 64-bit
anchor structure.

- A node in a linked list can contain a single atomic link to three
different lists. These node links can be atomically modified with 64-bit
atomic instructions (e.g., DWCAS on 32-bit machine, and normal CAS on 64-bit
machine). That means that you can build complex tree structures that support
lock-free modifications' anywhere in the list...

- Every increment an anchors monotonic counter effects all its linked list
root nodes. That is, a single modification to the counter will cause any
other CAS-loops that happen to be running in parallel, to simply fail; that
is how true atomicity is achieved wrt lock-free linked lists. Basically, the
aba counter gives CAS LL/SC like abilities', which is useful for writer
based lock-free algorithms...

At the moment, I am not exactly sure what this could possibly be used for,
however, it does demonstrate the potential power of lock-free offset tricks
in general, IMHO.

Any thoughts on clever lock-free offset trickery?

One initial thought: This stuff would be used in the writer side of a
lock-free reader pattern...

Chris Thomasson

unread,

Sep 18, 2006, 9:03:32 PM9/18/06

to

"clayne" <cla...@anodized.com> wrote in message

news:1158522248.1...@h48g2000cwc.googlegroups.com...

I wonder of objects in Pascal use any internal pointer trickery/squizzling
that may interfere with the ability to use CAS on a raw pointer to a Pascal
object...?

Anyway, it would be interesting, to me at least, if the OP can successfully
utilize some of the lock-free functionality that my AppCore library
provides, in a Pascal-based programming environment...

I have never programmed in Pascal or Delphi, and have extremely little
experience with VB... So, I would not really mind if somebody else can do
the "tedious work" that is usually involved with "convincing" a
really-high-level language to use "raw atomic pointer operations"...

;)