Before asking this question , i will try to be very clear:
Please read bellow:
//#####################################################
function TFreeStack.Pop(var myObject:TObject):boolean;
var node: TSingleLnkNode;
begin
repeat
node := TSingleLnkNode(self.fhead.next);
if node = nil
then
Begin
myObject:=nil;
result:=false;
exit;
End;
{$IFDEF Delphi}
Until CASPointer(pointer(self.fhead.Next), pointer(node),
pointer(node.next));
{$ENDIF}
{$IFDEF FreePascal}
Until self.CritSyncObj.CAS(dword(self.fhead.Next), dword(node),
dword(node.next));
{$ENDIF}
myObject:=node.myObject;
self.freelist.Release(node);
result:=true;
decrementCount;
end;
//###########################################################
The ABA problem may happen since between the line
"node := TSingleLnkNode(self.fhead.next);"
and
"CASPointer(pointer(self.fhead.Next), pointer(node), pointer(node.next));"
Multiple pushes and pops may happen in parallel (on other threads), so, the
'self.fhead.Next' and 'node' pointers may point to the same element
even if node.next are not the same.
So, how to avoid this problem ?
For example, we have to create a 64 bit data filed inside the TFreestack ,
that must be aligned: and that's not a problem: i can use getmem(same as
malloc) and reserve
some memory and try to find an address that is 8 byte aligned.
In this 64bit filled inside the TFreeStack object the following must be
copied
self.fhead.Next and focount(the pop count: the output count) and they will
be
changed inside the DWCAS(with a lock compxchg8b: the 64bit destination
will reside in [ESI]).
But since i have to compare a variable ocount(old ocount: the old pop count)
with the self.focount (the new popcount), so , i have to pass the 32bit
address
of the VMT to be able to access self.focount and self.fhead.next inside
the DWCAS , but that's not a problem.
Now, follow me please:
As i said before, those conditions above are necessary but not sufficient,
and i have tried to explain it to Chriss Thomasson like this:
#------------------------------------------------
(1) push esi
push ebx
mov esi, [esp + 16]
mov eax, [esi]
mov edx, [esi + 4]
mov esi, [esp + 20]
mov ebx, [esi]
mov ecx, [esi + 4]
(2)mov esi, [esp + 12]
lock cmpxchg8b qword ptr [esi]
Suppose between (1) and (2) the thread is preempted and another thread
have executed both the Push() and the DWCAS inside , so, self.fhead.Next
will be changed , but since the CAS is not completly atomic , as soon as
the thread will run again it will find itself with the old copy of
"self.fhead.Next",
and it will execute the lock cmpxchg8b qword ptr [esi] on the old copy of
"self.fhead.Next".
#-----------------------------------------------------------------------
So, if the thread is prempted inside the DWCAS and a Push() is executed ,
we have a problem , and self.fhead.Next is not valid, so we have to ATOMICLY
compare the following:
SELF.FHEAD.NEXT
AND
SELF.FOCOUNT (the pop count)
AND
SELF.FICOUNT (the push count)
and that's more than 64 bits , mush more than the COMPXCHG8B can handle.
What's your opinion on this ?
"In this 64bit filled.." that must be read "In this 64bit field"
> SELF.FHEAD.NEXT
>
> AND
>
> SELF.FOCOUNT (the pop count)
>
> AND
>
> SELF.FICOUNT (the push count)
>
>
> and that's more than 64 bits , mush more than the COMPXCHG8B can handle.
I assumed in the above that:
SELF.FHEAD.NEXT is 32 bits
SELF.FOCOUNT (the pop count) is 32 bits
SELF.FICOUNT (the pop count) is 32 bits
But if we change it to
SELF.FHEAD.NEXT is 32 bits
SELF.FOCOUNT (the pop count) is 16 bits
SELF.FICOUNT (the pop count) is 16 bits
I think that's will work.
What do you think ?
Is there any critics and/or advise ?
"R.Amine" <ami...@colba.net> wrote in message
news:12gqv9b...@corp.supernews.com...
EVEN THAT it may still prempt inside the DWCAS between the time
that:
(we are trying to get both the new SELF.FOCOUNT and new SELF.FICOUNT)
AND
the COMPXCHG8B.
So,
How to solve this ?
Must i use a the recursive EnterCriticalSection and LeaveCriticalSection
that i wrote (inside the isync.pas), and do something like this:
EnterCriticalSection;
result:=np_ac_i686_atomic_dwcas_fence(destination,comperand,exchange)=1;
LeaveCriticalSection;
This will insure that the thread that executes the DWCAS
will not be prempted by the set of threads that uses the lock-free
structure, EVEN if the thread is prempted by high priority kernel threads
...
Or it it too expensive ?
I think the following is a nessary condition:
As soon as we call the DWCAS we have to avoid premption to
the set of other threads that are using the lock-free datastructure.
AND
the COMPXCHG8B.
So,
EnterCriticalSection;
result:=np_ac_i686_atomic_dwcas_fence(destination,comperand,exchange)=1;
LeaveCriticalSection;
I think the following is a necessary condition:
As soon as we call the DWCAS we have to avoid premption to
the set of other threads that are using the lock-free datastructure.
What do you think ?
Is there any critics and/or advise ?
>
> I think that's will work.
>
>
> What do you think ?
>
> Is there any critics and/or advise ?
>
>
>
>
> "R.Amine" <ami...@colba.net> wrote in message
> news:12gqv9b...@corp.supernews.com...
> >
read like this:
SELF.FOCOUNT (the pop count) is 16 bits
SELF.FICOUNT (the push count) is 16 bits
So let me get this straight.. We have here Pascal, threading, and
inline x86 assembly. Can things get anymore unclear?
The compiler: http://www.freepascal.org/
It's not Pascal , it's a much more than Pascal, with the Object oriented
concepts.(like C++)
Architechtures like Intel x86, Amd64/x86 64, PowerPC, Sparc and more..
are supported.(it's both 32bit and 64bit)
You can use also Delphi to compile the isynch, freelist,freestak...:
>, threading,
Right:
Deadlock, priority inversion, lock convoy, ABA etc...
You have also to understand concepts and know how to implemen the following:
a non recursive critical section , a recrusive critical section, and others
like recursive critical section
that avoids priority inversion.
mutex,semaphore, optex, waitforObject ,waitForMultipleObject, setEvent,
setEvents,resetEvent,resetEvents etc.(I will try to implement those objects
over ISynch.pas)
You have also to know structurel analysis(how to calculate the invariants
equations)
and Reachability of Petri Nets to avoid deadlock on programs that use
lock sychronisation objects...and how to model the the critical_section,
setevent,
setautoevent,waitforobject, waitformultipleobjects etc.
You have also to understand the concepts and know how to implement some
lock-free datastrutures
>and inline x86 assembly.
Yes assembly to be able to write some synchronisation primitives and
external assembly
in some situations when you want to aligh on 8 bytes( you can also align
data on 8bytes
from inside Objectpascal.) etc
>Can things get anymore unclear?
If you have not the expertise above, that's true.
I mean: if you want to use some lock objects into realworld programming,
you have to know a good portion of that also....
"R.Amine" <ami...@colba.net> wrote in message
news:12grb8o...@corp.supernews.com...
And also you have to know how to calculate the Big O complexity
of some datastruture algorithms , like for example: if you want to implement
a lock or lock-free SkipList or something else...
Regards,
Amine Moulay Ramdane.
"R.Amine" <ami...@colba.net> wrote in message
news:12grbhk...@corp.supernews.com...
And if you don't have some probability theory you will not be able to affirm
something like
this:
From the theoritic point of view, the properties of the skiplist are:
insertion , deletion and search are in log(n)
average pointers: equal or even better than the tree structure
the probability that it degenerate from nonlinear to linear is so small
and it has a good locality reference
Regards,
Amine Moulay Ramdane.
"R.Amine" <ami...@colba.net> wrote in message
news:12grdd9...@corp.supernews.com...
And finally if you have a good logical mind you can use LOGIC to glue the
things...
:)
Regards,
Amine Moulay Ramdane
You can link a.out freepascal objects and also .so,.dll files
to GCC or MinGwin.
the other way is also possible ..
"R.Amine" <ami...@colba.net> wrote in message
news:12grb8o...@corp.supernews.com...
>
The problem you're going to have in this group is that the majority, by
large, people speak C as the programmatic language of communication.
It's very hard to read the psuedo-psuedo-code from freepascal.
Dev-C++ 5.0
Mingw/GCC 3.4.2 was written in ObjectPascal
look at the 'sourcecode' in:
http://www.bloodshed.net/dev/devcpp.html
ObjectPascal is much more clear than C and C++, and much easy to understand.
And also very powerfull that even
Dev-C++ 5.0
Minngw/GCC 3.4.2
And in my opinion what's more important is the 'logic' not the programming
language.
And i think you can easily extract/write the logic to/from a programming
language
like ObjectPascal...
>
>
>
>
>
>
>And i think you can easily extract/write the logic to/from a programming
>language
>like ObjectPascal...
You want a proof clayne ?
Look for example at me, i don't know well english , but since i am very
logic
you can understand me very well.
I know how to glue simple thinks to make meaningful things.
So, LOGIC is the GLUE that makes sense, not the language.
:)
Regards,
Amine Moulay Ramdane.
"R.Amine" <ami...@colba.net> wrote in message
news:12grhgk...@corp.supernews.com...
Refer to one of my old ideas:
http://groups.google.com/group/comp.programming.threads/msg/c0eb5f08e858948f
Actually, as you may know, you can do "lots of interesting stuff with offset
tricks"... IMHO, they are very useful for lock-free algorithms that have to
deal with lots of activity from writer threads... (e.g., lock-free stack
anchor, skiplist anchor, ect...)
For instance, think of something like this:
quickly scribbling down pseudo-code for an atomic triple linked list:
----------------
/* 64-bit triple list anchor */
#define TLIST_DEPTH 3
struct tlist_anchor_t {
int16 list[TLIST_DEPTH]; /* list root node offsets... */
int16 aba;
};
/* node with a triple list as a next pointer */
struct node_t {
tlist_anchor_t next; /* a triple list anchor for the next link... */
int16 thisindex; /* self offset */
void *state; /* user state */
};
/* global node cache via. lock-free LIFO */
#define NODE_DEPTH USHRT_MAX
static node_t gnodes[NODE_DEPTH];
extern node_t* gnode_pop(void); /* may block when empty */
extern void gnode_push(node_t*);
/* AppCore DWCAS:
updates comprand and returns non-zero on failure */
#define DWCAS np_ac_i686_atomic_dwcas_fence
/* I will modify aba on pushes, instead of pops...
this function pushes ndepth nodes at a time.
the nxoffset determines where to link the next nodes.
n points to local array of node_t*, at least the size of ndepth.
*/
void tlist_push_local
(triple_list_t *_this,
short const nxoffset,
short const ndepth,
node_t **n) {
int i;
triple_list_t xchg, cmp;
/* sanity check */
assert(ndepth > -1 && ndepth < TLIST_DEPTH);
assert(nxoffset > -1 && nxoffset < TLIST_DEPTH);
/* init the xchg offsets with the passed offsets */
for(i = 0; i < ndepth; ++i) {
xchg.list[i] = n[i].thisindex;
}
/* make the initial read, and start the "CAS-Loop" */
cmp = *_this;
do {
/* link the next nodes with the initial read */
for(i = 0; i < ndepth; ++i) {
n[i].next[nxoffset] = cmp.list[i];
}
/* inc the monotonic aba counter of the initial read */
xchg.aba = cmp.aba + 1;
/* attempt to swap in the xchg */
} while(DWCAS(_this, &cmp, &xchg));
/* thats all folks!! ;) */
}
Humm... Pretty cool huh?
;)
We can keep the reads and writes of "three separate linked lists" atomic":
- A monotonic counter and the "root nodes" of each linked list is contained
in a "single 64-bit anchor structure"; this is the "atomic sequence point",
so to speak...
- A triple list anchor and node links are all made up of a single 64-bit
anchor structure.
- A node in a linked list can contain a single atomic link to three
different lists. These node links can be atomically modified with 64-bit
atomic instructions (e.g., DWCAS on 32-bit machine, and normal CAS on 64-bit
machine). That means that you can build complex tree structures that support
lock-free modifications' anywhere in the list...
- Every increment an anchors monotonic counter effects all its linked list
root nodes. That is, a single modification to the counter will cause any
other CAS-loops that happen to be running in parallel, to simply fail; that
is how true atomicity is achieved wrt lock-free linked lists. Basically, the
aba counter gives CAS LL/SC like abilities', which is useful for writer
based lock-free algorithms...
At the moment, I am not exactly sure what this could possibly be used for,
however, it does demonstrate the potential power of lock-free offset tricks
in general, IMHO.
Any thoughts on clever lock-free offset trickery?
One initial thought: This stuff would be used in the writer side of a
lock-free reader pattern...
I wonder of objects in Pascal use any internal pointer trickery/squizzling
that may interfere with the ability to use CAS on a raw pointer to a Pascal
object...?
Anyway, it would be interesting, to me at least, if the OP can successfully
utilize some of the lock-free functionality that my AppCore library
provides, in a Pascal-based programming environment...
I have never programmed in Pascal or Delphi, and have extremely little
experience with VB... So, I would not really mind if somebody else can do
the "tedious work" that is usually involved with "convincing" a
really-high-level language to use "raw atomic pointer operations"...
;)