[ANN] Wefts++ on windows

Giancarlo Niccolai

unread,

Dec 15, 2003, 4:14:03 AM12/15/03

to

Dear friends,

I am happy to announce I have succeded in porting Wefts++ on MS-Windows (any
version), and making it 100% compatible with posix implementation.

I have implemented posix-like condition variables using a slighly modified
version of the famous pth-win32 Terekhov's algorithm (version 9):

Using WaitForMultipleObjectsEx() function and using one Windows Event
variable as cancelation request, it is possible to cleanly interrupt a
condition wait. Relevant thread data, as the cleanup stack, is held in a
thread local stack area, and this allows the condition to access this
cancelation event. As the signal carrier is thread-specific, and it can be
only signaled by any thread but never reset, the event variable is a valid
mean to do this.

Also, this allows to create a set of interruptable-on-request operations as
read and write, or even internet send and receive, via APC calls and
WaitForMultipleObjectEx().

The cancelable wait is then not a viable drop-in for existing program, but
newly written application may take advantage of blocking operation
wrappers.

---------------------------------

This way may be interesting ALSO for pthread-win32: providing posix
cancelation points as a part of pthread32, that will be wrapper into
windows functions, is rather possible with this technique. Pthread-win32
may well re-write write(), fwrite() and family using this scheme, and then
a careful linking process may hide the windows's LIBC functions and allow
using the pth-win32 functions.

I can provide more information, if the thing is interesting.

----------------------------------

Currently, the active CVS compiles and runs well with BCC32 (from free
commandline tools); i've been also able to produce .a libs under Windows
with msys & DevC++(without pthread for windows, obviously).

I will issue an official RC release as soon as I'll provide those
cancelation point wrapper functions (that should be even today).

If you are interested, mail a reply here or to me (antispam ---at---
niccolai [dot] ws). Also, you can just peek at the code at wefts
sourceforge project/cvs via web: http://wefts.sourceforge.net

Bests,
Giancarlo Niccolai.

SenderX

unread,

Dec 15, 2003, 5:16:58 AM12/15/03

to

> I am happy to announce I have succeded in porting Wefts++ on MS-Windows
(any
> version), and making it 100% compatible with posix implementation.

Wefts::Referenced
--------------------

00094 void incRef() {
00095 lockWrite();
00096 if (m_count > 0 ) // if m_count == 0 we are going to be
destroyed now.
00097 m_count++;
00098 unlock();
00099 }
00100
00106 void decRef() {
00107 bool destroy = false;
00108 lockWrite();
00109 if ( m_disposeable && m_count > 0 ) m_count--;
00110 if ( m_count <= 0 ) destroy = true;
00111 unlock();
00112 // now, an incref could arrive here, but it would be an error,
00113 // a race caused by misusing this object.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

00114 // An assert here should do, but I don't know if asserting in
this
00115 // lib would be appreciated by users; anyhow, the user will get
a
00116 // segfault after incref.
00117 if ( destroy ) delete this;
00118 }

Did you know that you can totally remove this race-condition altogether?

--
The designer of the experimental, SMP and HyperThread friendly, AppCore
library.

http://AppCore.home.comcast.net

Giancarlo Niccolai

unread,

Dec 15, 2003, 5:34:29 AM12/15/03

to

SenderX wrote:

> 00112 // now, an incref could arrive here, but it would be an error,
> 00113 // a race caused by misusing this object.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>

> Did you know that you can totally remove this race-condition altogether?
>

Yes, I just had not the time :-); also, that class is -currently- (up to
1.0) a "side" class, so it deserves less attention.

But anyway, if you have a ready solutiona and a SourceForge account, adding
your change would be appreciated.

Giancarlo

SenderX

unread,

Dec 15, 2003, 5:55:41 AM12/15/03

to

> Yes, I just had not the time :-); also, that class is -currently- (up to
> 1.0) a "side" class, so it deserves less attention.
>
> But anyway, if you have a ready solutiona and a SourceForge account,
adding
> your change would be appreciated.

http://groups.google.com/groups?selm=3DDBF8B8.AD1089BF%40xemaps.com&rnum=6

This killer algo allows for threads to access shared objects, "without
previously owning a reference"!

Cool huh?

;)

Giancarlo Niccolai

unread,

Dec 15, 2003, 1:29:47 PM12/15/03

to

SenderX wrote:

Yes, cool and safe, but a little heavy for my needs. RefCount objects are
FAR simpler (i.e. atomic_ptr must create a temp copy for expression
evaluation, which must create & destroy a temp mutex), and expecially they
do not encapsulate transparently a user-defined data pointer (as atomic_ptr
does) but they are eventually meant to be derived into a user-defined
class. The RefCount solution gives a little more control on the data passed
around and comes practically at no cost (if not the cost for the user to
derive its data from that class, and if I don't add extra security for that
race, just to use the class as the manual says).

Also, they are a template class, which means a complete different
implementation for each different pointer you have to atomic_ptrize; even
if the code is inlined, this reflects into -very likely- faster code but
may also lead to bigger programs (by far).

Anyhow, the thing is SO cool that, with permission of the author, I would
add this code to wefts, but probably around or after full 1.0 version.

May I, mr. Seigh?

Giancarlo Niccolai

unread,

Dec 15, 2003, 2:23:59 PM12/15/03

to

Giancarlo Niccolai wrote:

>
> Also, this allows to create a set of interruptable-on-request operations
> as read and write, or even internet send and receive, via APC calls and
> WaitForMultipleObjectEx().

I want to put the Copyright (fake copyright, of course) on the name of this
layer: (OS-)COFFEE, (Operating System) Cooperative file functions enlarged
environment.

"Enlarged" because it can "drive" also non-strictly OS file operations, as
ftp/sftp.

:-)

Giancarlo.

SenderX

unread,

Dec 15, 2003, 2:50:32 PM12/15/03

to

> Yes, cool and safe, but a little heavy for my needs. RefCount objects are
> FAR simpler (i.e. atomic_ptr must create a temp copy for expression
> evaluation, which must create & destroy a temp mutex),

Actually, you don't need any mutexs for this...

You can use CAS2 or LL/SC, and you can also use CAS and substitute pointers
with offsets and indexes:

http://groups.google.com/groups?selm=3E6B7923.57B88B09%40xemaps.com&rnum=4

Wow, this is 100% lock-free...

Nice!

;)

This is yet "another example" of a stable and efficient 100% lock-free
algo...

You can also hash the atomic pointer into an array of mutexs:

http://msdn.microsoft.com/msdnmag/issues/01/08/Concur/default.aspx

> Also, they are a template class, which means a complete different
> implementation for each different pointer you have to atomic_ptrize; even
> if the code is inlined, this reflects into -very likely- faster code but
> may also lead to bigger programs (by far).

Yes, templates seem to bloat code.

;(

> Anyhow, the thing is SO cool that, with permission of the author, I would
> add this code to wefts, but probably around or after full 1.0 version.

Yes, every thread library should have a full atomic pointer.

;)

>
> May I, mr. Seigh?

I think IBM patented this algo of Joes, but I believe its expired...

P.S.

Wefts++ makes me want to do a AppCore++!

:P

I tweaked atomic_ptr to allow for an internal ABA count on CAS. With my ABA
tweak, you can actually use atomic_ptr interface to implement dyanamic
lock-free algos!

I can give you a link if you want.

Joseph Dionne

unread,

Dec 15, 2003, 3:02:04 PM12/15/03

to

While new to UNIX threads, I am an old timer at UNIX development, using
processes. I have one question that I would like confirmed before
moving several applications to pthreads.

I (A)ss(U)(M)e that native threads would reduce virtual memory usage by
mapping a new code and stack memory to each thread while sharing data
and text memory with all the threads in the process. In a parent/child
process, each process allocates it own memory for each of these, which
can be quite large.

My question then is; are threads better at conserving application memory
than a parent/child process model?

Under Windows (yuk), I have found threads are great improvement to
separate processes, and can verify this fact using Taskmanager. Are
there separate tools for evaluating threaded processes above and beyond
typical UNIX process analysis?

Please forgive this silly question. I have not had the time to do the
proper analysis myself. Guess I'm getting lazy in my old age.

Giancarlo Niccolai

unread,

Dec 15, 2003, 3:27:28 PM12/15/03

to

SenderX wrote:

>
> P.S.
>
> Wefts++ makes me want to do a AppCore++!
>
> :P
>
>

It is easy, and I was just going to ask it to you: you can provide an OSTAIL
(Os Threading and Independence Layer) built with your AppCore code. As many
wefts function are inlined (generally up to the moment where it is
absolutely need or at least useful to de-inline them), you would
immediately have access to higher level constructs, as Subscriptions, ring
buffers, barriers, interruptable wait mutexes, read/write
promotable/degradable reentrant mutexes, all implemented with your
extremely fast low level mutex/threads.

All you need is to provide the mutex (non-reentrant and possibly with a
trylock semantic), a posix-like condition variable (that is, an atomic
wait-and-release-mutex that may be interrutped) a thread class with minimal
capabilities (only "kind" deferred cancelation allowed, and only two
cleanup routines are required; they may be turned to one soon), an
interruptable sleep and you are done. If your app is system specific, no
problem: OSTAIL is meant to have one or more layer per system; but if it's
portable it is even better. In example, you can rely on system signals to
cancel the sleep, if you don't want to or don't have a mean to provide an
external way to interrupt the wait; OSTail is thought so that it can
seamlessy use material provided by the code you write and system or
underlying lib capabilities.

See wefts_os_windows.h/cpp and wefts_os_pthread.h/cpp to see how easily you
can create a wefts_os_appcore.h/cpp

A typical OSTAIL layer is about 400 lines long, and may be shorter, so it
would be a fast thing to do to have a test.

Think about it, I would be enthusiast at the idea.

Giancarlo.

Joe Seigh

unread,

Dec 15, 2003, 5:26:23 PM12/15/03

to

SenderX wrote:
>
> > Yes, cool and safe, but a little heavy for my needs. RefCount objects are
> > FAR simpler (i.e. atomic_ptr must create a temp copy for expression
> > evaluation, which must create & destroy a temp mutex),

local_ptr is a bit lighter if you don't need to share it, i.e. as a working
pointer.

>
> Actually, you don't need any mutexs for this...
>
> You can use CAS2 or LL/SC, and you can also use CAS and substitute pointers
> with offsets and indexes:
>
> http://groups.google.com/groups?selm=3E6B7923.57B88B09%40xemaps.com&rnum=4
>

...

> >
> > May I, mr. Seigh?
>
> I think IBM patented this algo of Joes, but I believe its expired...
>

Well, it's based in part on the patent which is expired AFAIK. It could
also be argued it's based in part on weighted reference counting.

This is the latest version with some compares corrected and some experimental
buffer recycling. I've been fooling with it with vc++ 6.x so it's possible I
broke it for Linux and stuff. Also I need to put an open source copyright
on it if anybody actually intends to use it. It's mainly experimental for now.

Joe Seigh

-- atomic_ptr.h --
#include <cas64.h>

template<typename T> class atomic_ptr;
template<typename T> class local_ptr;
template<typename T> class atomic_ptr_ref;
//typedef const float ** (atomic_ptr<T>:: **)() atomic_null_t;

struct refcount {
long ecount; // ephemeral count
long rcount; // reference count
};

template<typename T> struct ref {
long ecount; // ephemeral count
atomic_ptr_ref<T> *ptr;
};

//=============================================================================
// atomic_pool
//
//
//=============================================================================
template<typename T> class atomic_pool {
private:
struct queue_t {
int version;
atomic_ptr_ref<T> * next;

queue_t() : version(0), next(0) {}
}; // 64 bit lifo queue anchor

queue_t q;

public:
atomic_pool() : q() {}

~atomic_pool() {
atomic_ptr_ref<T> *p1, *p2;
p2 = q.next;
while (p2) {
p1 = p2->next;
delete p2;
p2 = p1;
}
}

atomic_ptr_ref<T> * get() {
queue_t xcmp, xchg;

xcmp.version = q.version;
xcmp.next = q.next;
while (xcmp.next != 0) {
xchg.version = xcmp.version + 1;
xchg.next = xcmp.next->next;
if (cas64(&q, &xcmp, &xchg) != 0)
break;
}

return xcmp.next;
}

void put(atomic_ptr_ref<T> * item) {
queue_t xcmp, xchg;

xchg.next = item;

xcmp.version = q.version;
xcmp.next = q.next;
for (;;) {
item->next = xcmp.next;
xchg.version = xcmp.version;
if (cas64(&q, &xcmp, &xchg) != 0)
break;
}
return;
}

};

//=============================================================================
// atomic_ptr_ref
//
//
//=============================================================================
template<typename T> class atomic_ptr_ref {
friend class atomic_ptr<T>;
friend class local_ptr<T>;
friend class atomic_pool<T>;

private:

refcount count; // reference counts
T * ptr; // ptr to actual object
atomic_pool<T> * pool;

public:

atomic_ptr_ref<T> * next;

atomic_ptr_ref(T * p = 0) {
count.ecount = 0;
count.rcount = 1;
ptr = p;
pool = 0;
next = 0;
};

private:

~atomic_ptr_ref() {
delete ptr;
}

// atomic
int adjust(long xephemeralCount, long xreferenceCount) {
refcount oldval, newval;

oldval.ecount = count.ecount;
oldval.rcount = count.rcount;
do {
newval.ecount = oldval.ecount + xephemeralCount;
newval.rcount = oldval.rcount + xreferenceCount;

}
while (cas64(&count, &oldval, &newval) == 0);

return (newval.ecount == 0 && newval.rcount == 0) ? 0 : 1;
}

}; // class atomic_ptr_ref

//=============================================================================
// local_ptr
//
//
//=============================================================================
template<typename T> class local_ptr {
friend class atomic_ptr<T>;
public:

local_ptr(T * obj = 0) {
if (obj != 0) {
refptr = new atomic_ptr_ref<T>(obj);
refptr->count.ecount = 1;
refptr->count.rcount = 0;
}
else
refptr = 0;
}

local_ptr(const local_ptr<T> & src) {
if ((refptr = src.refptr) != 0)
refptr->adjust(+1, 0);
}

local_ptr(atomic_ptr<T> & src) {
refptr = src.getrefptr();
}

~local_ptr() {
if (refptr != 0 && refptr->adjust(-1, 0) == 0) {
if (refptr->pool == 0)
delete refptr;
else
refptr->pool->put(refptr); // recyle to pool
}
}

local_ptr<T> & operator = (T * obj) {
local_ptr<T> temp(obj);
swap(temp); // non-atomic
return *this;
}

local_ptr<T> & operator = (local_ptr<T> & src) {
local_ptr<T> temp(src);
swap(temp); // non-atomic
return *this;
}

local_ptr<T> & operator = (atomic_ptr<T> & src) {
local_ptr<T> temp(src);
swap(temp); // non-atomic
return *this;
}

T * get() { return (refptr != 0) ? refptr->ptr : (T *)0; }

T * operator -> () { return get(); }
T & operator * () { return *get(); }
//operator T* () { return get(); }

bool operator == (T * rhd) { return (rhd == get() ); }
bool operator != (T * rhd) { return (rhd != get() ); }

// refptr == rhd.refptr iff refptr->ptr == rhd.refptr->ptr
bool operator == (local_ptr<T> & rhd) { return (refptr == rhd.refptr);}
bool operator != (local_ptr<T> & rhd) { return (refptr != rhd.refptr);}
bool operator == (atomic_ptr<T> & rhd) { return (refptr == rhd.xxx.ptr);}
bool operator != (atomic_ptr<T> & rhd) { return (refptr != rhd.xxx.ptr);}

private:
void * operator new (size_t) {} // auto only

atomic_ptr_ref<T> * refptr;

inline void swap(local_ptr & other) { // non-atomic swap
atomic_ptr_ref<T> * temp;
temp = refptr;
refptr = other.refptr;
other.refptr = temp;
}

}; // class local_ptr

template<typename T> inline bool operator == (int lhd, local_ptr<T> & rhd)
{ return ((T *)lhd == rhd); }

template<typename T> inline bool operator != (int lhd, local_ptr<T> & rhd)
{ return ((T *)lhd != rhd); }

template<typename T> inline bool operator == (T * lhd, local_ptr<T> & rhd)
{ return (rhd == lhd); }

template<typename T> inline bool operator != (T * lhd, local_ptr<T> & rhd)
{ return (rhd != lhd); }

//=============================================================================
// atomic_ptr
//
//
//=============================================================================
template<typename T> class atomic_ptr {
friend class local_ptr<T>;

protected:
ref<T> xxx;

private:
atomic_ptr(atomic_ptr_ref<T> * src) {
xxx.ecount = 0;
xxx.ptr = src;
}

public:

atomic_ptr(T * obj = 0) {
xxx.ecount = 0;
if (obj != 0) {
xxx.ptr = new atomic_ptr_ref<T>(obj);
}
else
xxx.ptr = 0;
}

atomic_ptr(local_ptr<T> & src) { // copy constructor
xxx.ecount = 0;
if ((xxx.ptr = src.refptr) != 0)
xxx.ptr->adjust(0, +1);
}

atomic_ptr(atomic_ptr<T> & src) { // copy constructor
xxx.ecount = 0;
xxx.ptr = src.getrefptr(); // atomic

// adjust link count
if (xxx.ptr != 0)
xxx.ptr->adjust(-1, +1); // atomic
}

~atomic_ptr() { // destructor
// membar.release
if (xxx.ptr != 0 && xxx.ptr->adjust(xxx.ecount, -1) == 0) {
// membar.acquire
if (xxx.ptr->pool == 0)
delete xxx.ptr;
else
xxx.ptr->pool->put(xxx.ptr);
}
}

atomic_ptr & operator = (T * obj) {
atomic_ptr<T> temp(obj);
swap(temp); // atomic
return *this;
}

atomic_ptr & operator = (local_ptr<T> & src) {
atomic_ptr<T> temp(src);
swap(temp); // atomic
return *this;
}

atomic_ptr & operator = (atomic_ptr<T> & src) {
atomic_ptr<T> temp(src);
swap(temp); // atomic
return *this;
}

//-----------------------------------------------------------------
// atomic_ptr recycle methods
//-----------------------------------------------------------------
void putpool(atomic_pool<T> * pool, T * obj) {
atomic_ptr<T> temp(obj);
temp.xxx.ptr->pool = pool; // set pool
swap(temp); // atomic
}

bool getpool(atomic_pool<T> * pool) {
atomic_ptr_ref<T> * tref;
if ((tref = pool->get()) != 0) {
atomic_ptr<T> temp(tref);
temp.xxx.ptr->count.ecount = 0;
temp.xxx.ptr->count.rcount = 1;
swap(temp);
return true;
}

else
return false;
}

//-----------------------------------------------------------------
// generate local temp ptr to guarantee validity of ptr
// during lifetime of expression. temp is dtor'd after
// expression has finished execution.
//-----------------------------------------------------------------
inline local_ptr<T> operator -> () { return local_ptr<T>(*this); }
inline local_ptr<T> operator * () { return local_ptr<T>(*this); }

bool operator == (T * rhd) {
if (rhd == 0)
return (xxx.ptr == 0);
else
return (local_ptr<T>(*this) == rhd);
}

bool operator != (T * rhd) {
if (rhd == 0)
return (xxx.ptr != 0);
else
return (local_ptr<T>(*this) != rhd);
}

bool operator == (local_ptr<T> & rhd) {return (local_ptr<T>(*this) == rhd); }
bool operator != (local_ptr<T> & rhd) {return (local_ptr<T>(*this) != rhd); }
bool operator == (atomic_ptr<T> & rhd) {return (local_ptr<T>(*this) == local_ptr<T>(rhd)); }
bool operator != (atomic_ptr<T> & rhd) {return (local_ptr<T>(*this) != local_ptr<T>(rhd)); }

bool cas(local_ptr<T> cmp, atomic_ptr<T> xchg) {
ref<T> temp;
bool rc = false;

temp.ecount = xxx.ecount;
temp.ptr = cmp.refptr;

// membar.release
do {
if (cas64(&xxx, &temp, &xchg.xxx) != 0) {
xchg.xxx = temp;
rc = true;
break;
}

}
while (cmp.refptr == temp.ptr);

return rc;
}

//protected:
// atomic
void swap(atomic_ptr<T> & obj) { // obj is local & non-shared
ref<T> temp;

temp.ecount = xxx.ecount;
temp.ptr = xxx.ptr;

// membar.release
while(cas64(&xxx, &temp, &obj.xxx) == 0);

obj.xxx.ecount = temp.ecount;
obj.xxx.ptr = temp.ptr;
}

private:

// atomic
atomic_ptr_ref<T> * getrefptr() {
ref<T> oldval, newval;

oldval.ecount = xxx.ecount;
oldval.ptr = xxx.ptr;

do {
newval.ecount = oldval.ecount + 1;
newval.ptr = oldval.ptr;
}
while (cas64(&xxx, &oldval, &newval) == 0);
// membar.acquire

return oldval.ptr;
}

}; // class atomic_ptr

template<typename T> inline bool operator == (int lhd, atomic_ptr<T> & rhd)
{ return ((T *)lhd == rhd); }

template<typename T> inline bool operator != (int lhd, atomic_ptr<T> & rhd)
{ return ((T *)lhd != rhd); }

template<typename T> inline bool operator == (T * lhd, atomic_ptr<T> & rhd)
{ return (rhd == lhd); }

template<typename T> inline bool operator != (T * lhd, atomic_ptr<T> & rhd)
{ return (rhd != lhd); }

/*-*/

Giancarlo Niccolai

unread,

Dec 15, 2003, 5:32:50 PM12/15/03

to

Joe Seigh wrote:

>
>
> SenderX wrote:
>>
>> > Yes, cool and safe, but a little heavy for my needs. RefCount objects
>> > are FAR simpler (i.e. atomic_ptr must create a temp copy for expression
>> > evaluation, which must create & destroy a temp mutex),
>
> local_ptr is a bit lighter if you don't need to share it, i.e. as a
> working pointer.
>
>>
>> Actually, you don't need any mutexs for this...
>>
>> You can use CAS2 or LL/SC, and you can also use CAS and substitute
>> pointers with offsets and indexes:
>>
>>
http://groups.google.com/groups?selm=3E6B7923.57B88B09%40xemaps.com&rnum=4
>>
> ...
>> >
>> > May I, mr. Seigh?
>>
>> I think IBM patented this algo of Joes, but I believe its expired...
>>
>
> Well, it's based in part on the patent which is expired AFAIK. It could
> also be argued it's based in part on weighted reference counting.
>
> This is the latest version with some compares corrected and some
> experimental
> buffer recycling. I've been fooling with it with vc++ 6.x so it's
> possible I
> broke it for Linux and stuff. Also I need to put an open source copyright
> on it if anybody actually intends to use it. It's mainly experimental for
> now.
>
> Joe Seigh
>

Thanks, Mr. Seigh; I will test it and if you give me the OK signal (or if
you rise the event :-) I can insert it in Wefts before they reach 1.0.

I would also be VERY pleased to invite you in the wefts project, where you
could add the file directly as wefts_autoptr.xxx by your hand and with the
opensource copyright you like.

Wefts++ are currently LGPL, but this does not prevents a part of it from
being released with a lighter copyright (expecially a pure header file).

Giancarlo Niccolai.

SenderX

unread,

Dec 15, 2003, 6:21:10 PM12/15/03

to

> It is easy, and I was just going to ask it to you: you can provide an
OSTAIL
> (Os Threading and Independence Layer) built with your AppCore code.

My developing system code relies on a custom low-level portable threading
api that behaves like your OSTAIL. The exported appcore api relies on the
system code.

There might be a problem with integrating this with your lib:

The lock-free algos use an object called a waitset, which manages threads
that failed their lock-free(fast) path's. Threads with time critical
priority are placed in front of all concurrent waiting threads to get
serviced first, and lower priority threads get queued in the back, and are
deferred until the higher priority threads are dispatched.

The waitset api also provides friendly cancellation. The special
cancellation method simply informs the user that the thread has been
requested to be cancelled. The user now knows that the thread is invalid,
and is out of all waitsets. The user can clean up, and unwind the stack( c++
stack objects can unwind ) to the thread entry function and return.

The appcore C++ wrappers throw a cancellation exception when the appcore c
api requests a thread cancellation.

Like this...

void* thread::entry()
{
try
{
CStackObject1 o1;
CStackObject2 o2;

userapp::do();
}

catch ( CThreadCancelled &e )
{
// The thread is cancelled and the stack is unwound!
}

return 0;
}

void userapp::do()
{
for (;;)
{
// Waiting on a queue is a cancellation point
// If this thread is cancelled during the wait,
// a CThreadCancelled exception is thrown...
g_WorkQueue.Pop( AC_INFINITE );

CStackObject1 o1;

// If the thread has been requested to be cancelled,
// a CThreadCancelled is thrown, and o1 is destructed...
ac::TestCancel();

CStackObject2 o2;
}
}

How could this waitset object, and the custom cancellation method it
provides fit into your design?

> As many
> wefts function are inlined (generally up to the moment where it is
> absolutely need or at least useful to de-inline them), you would
> immediately have access to higher level constructs, as Subscriptions, ring
> buffers, barriers, interruptable wait mutexes, read/write
> promotable/degradable reentrant mutexes, all implemented with your
> extremely fast low level mutex/threads.

I would have to code:

barriers
read/write promotable/degradable reentrant mutexes

using special lock-free algos, they exploit the fact that the listed
primitives can be "heavily" fath-pathed...

> All you need is to provide the mutex (non-reentrant and possibly with a
> trylock semantic), a posix-like condition variable (that is, an atomic
> wait-and-release-mutex that may be interrutped) a thread class with
minimal
> capabilities (only "kind" deferred cancelation allowed, and only two
> cleanup routines are required; they may be turned to one soon), an
> interruptable sleep and you are done. If your app is system specific, no
> problem: OSTAIL is meant to have one or more layer per system; but if it's
> portable it is even better. In example, you can rely on system signals to
> cancel the sleep, if you don't want to or don't have a mean to provide an
> external way to interrupt the wait; OSTail is thought so that it can
> seamlessy use material provided by the code you write and system or
> underlying lib capabilities.

I haven't really looked at wefts_os_windows.h/cpp and
wefts_os_pthread.h/cpp., but my C++ wrappers throw thread cancelled
exceptions when the thread has been requested to be cancelled, would this
screw your lib?

SenderX

unread,

Dec 15, 2003, 6:36:05 PM12/15/03

to

> This is the latest version with some compares corrected and some
experimental
> buffer recycling. I've been fooling with it with vc++ 6.x so it's
possible I
> broke it for Linux and stuff. Also I need to put an open source copyright

^^^^^^^^^^^^^^^^^^^^^

> on it if anybody actually intends to use it. It's mainly experimental for
now.

Can I use my aba logic for atomic_ptr::cas? I like the fact that it renders
atomic_ptr safe for lock-free algos...

:)

P.S.

atomic_ptr screams with per-thread pooling overflowing to lock-free global
stacks!

;)

Giancarlo Niccolai

unread,

Dec 15, 2003, 7:06:03 PM12/15/03

to

SenderX wrote:

> How could this waitset object, and the custom cancellation method it
> provides fit into your design?

Very easily: I don't schedule threads (for now, and later on I may only
define a "preferred" priority). This is up to you. I just mutexes, threads
and condition-like sync objects to do "nicer" higher lever things. Give me
threads, mutexes, safe signal broadcasting for waiting objects and a
function (that YOU call) to clean up MY stack and we're friends.

>
> I would have to code:
>
> barriers
> read/write promotable/degradable reentrant mutexes

No, I provide those things starting with mutexes, you don't have to provide
them to me. I know your library provides mostly lock-free algos, and we can
introduce some lock-free facility in wefts too (once I have studied it),
but I am mostly interested in the very low level syncronization it provides
for very low level threads, mutex and broadcasts. I.e. I suppose you have a
set of waitlists other than the scheduled thread list, that can be easily
use to provide high prestation posix-like condition variables.

>
> I haven't really looked at wefts_os_windows.h/cpp and
> wefts_os_pthread.h/cpp., but my C++ wrappers throw thread cancelled
> exceptions when the thread has been requested to be cancelled, would this
> screw your lib?

No; just you have to catch it in the OSTAIL code and then call a cleanup
routine I pass you. You have to provide a "osRun()" method in the
OSThreadBase derived class; you would just do:

void OSThreadWindows::osRun(
void (*func)(void *),
void *data,
void (*cleanup)(void *) )
{
// set ourself in the TLS so we can access ourself to cancel ourselves
// from condition variables; currently useless if you have the throw
// mechanism.
assert( TlsSetValue( s_tlsThreadObj, static_cast<LPVOID>( this ) ));

try {
//this is my data = Wefts::Thread, and func will call data->run
func( data );
}
catch(...) {}

// cleanup must anyway to be called.
cleanup( data );
}

Very, very simple.
Currently, I have an extra cleanup routine outside the above for condition
waits (to have self-cleanupable mutexes), but this would just require you
to catch the cancelation request in the cond variables, do the self-cleanup
routine and then just rethrow the request.

In the near future I may also drop this extra self-cleaning opportunity for
condition waits, and delegate all to the thread cleanup() routine.

Just, will the THROW mechanism interrupt a waiting thread? Is the throwing
able to unblock a thread engaged in i.e. a modal read() call? I want waits
to be cancelation points as for posix, and not to be dead points as in
pthread-win32.

Giancarlo.

Patrick TJ McPhee

unread,

Dec 16, 2003, 1:01:58 PM12/16/03

to

In article <0roDb.29089$Dt6.6...@twister.tampabay.rr.com>,
Joseph Dionne <n...@emailplease.org> wrote:

% I (A)ss(U)(M)e that native threads would reduce virtual memory usage by
% mapping a new code and stack memory to each thread while sharing data
% and text memory with all the threads in the process. In a parent/child
% process, each process allocates it own memory for each of these, which
% can be quite large.

I wouldn't make this assumption. Must current systems share text
between all processes, for both programs and shared libraries. The stack
requirements will be roughly the same, so the amount of memory you
save will depend on the amount of global data that you can share between
threads (but then, you could save the same memory by sharing it between
processes). You don't get automatic savings simply by replacing fork()
with pthread_create().

% Under Windows (yuk), I have found threads are great improvement to
% separate processes, and can verify this fact using Taskmanager. Are
% there separate tools for evaluating threaded processes above and beyond
% typical UNIX process analysis?

Some systems provide per-thread information using, e.g., ps or system-specific
tools, but surely what you're interested in is the aggregate effect -- I
used so much resource with the old system, but did the same amount of work
using so much less resource in my new system -- so what you really want
is a system-wide measurement of cpu time and memory used.
--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

SenderX

unread,

Dec 16, 2003, 6:47:16 PM12/16/03

to

> No, I provide those things starting with mutexes, you don't have to
provide
> them to me. I know your library provides mostly lock-free algos, and we
can
> introduce some lock-free facility in wefts too (once I have studied it),
> but I am mostly interested in the very low level syncronization it
provides
> for very low level threads, mutex and broadcasts. I.e. I suppose you have
a
> set of waitlists other than the scheduled thread list, that can be easily
> use to provide high prestation posix-like condition variables.

I do have a "very" experimental lock-free waitset, it allows threads that
fail their fast-path to enter their waitsets without any sync, and still
maintain scheduling of high-priority threads before lower priority.

So a signal/broadcast thread can make lock-free access to the waitset,
remove and signal waiting threads...

> Just, will the THROW mechanism interrupt a waiting thread? Is the throwing
> able to unblock a thread engaged in i.e. a modal read() call? I want waits
> to be cancelation points as for posix, and not to be dead points as in
> pthread-win32.

No, the throw method doesn't use pthread_cancel() even on unix, so posix
cancellation points will not fire.

;(

This exception based cancellation method is currently an experiment to see
how to gracefully unwind the c++ stack upon thread cancellation when in
lock-free algos. I wanted to avoid having to push and pop cleanup handlers
just to destroy stack based objects in C++...

However, I do have non-experimental code that does use real pthread
cancellation, and will fire in all posix cancellation points on non-windows
systems. It could be in windows systems as well if pthread_win32 would layer
a cancellation enabled recv, read and other functions over windows.

:)

P.S.

I am very pleased with how well lock-free algos and thread cancellation go
together! Almost 100% of my lock-free algos are now deferred cancellation
safe using the throw method or pthread_cancel.

Also, I "don't really want to post the source" for most of the systems
non-exposed low-level, complex, lock-free algos. Like the garbage collector
and fast waitsets. I know I already posted a collector in my lock-free vs.
lock-based test, but it is outdated and experimental because it uses
sched_yield() to handle garbage overflow.

"That is NOT good when the CPU and thread numbers rise, so I would not use
the posted collector at all for production code."

I have vastly improved the "portable" version of AppCore and rendered its
garbage collector "safe, stable and non-CAS2 dependant", and it will
outperform the one posted on the lock-free vs. lock-based test site by a
wide margin. I think the current AppCore site gives enough stuff away
already!

;)

Could I just provide an OSTAIL that uses my compiled library and not the
source?

If not, I will think about creating an OSTAIL using Joe Seighs great
lock-free semaphore:

http://intel.forums.liveworld.com/thread.jsp?forum=242&thread=6685

Joseph Dionne

unread,

Dec 16, 2003, 10:20:38 PM12/16/03

to

Thanks

Giancarlo Niccolai

unread,

Dec 17, 2003, 3:23:00 AM12/17/03

to

SenderX wrote:

> However, I do have non-experimental code that does use real pthread
> cancellation, and will fire in all posix cancellation points on
> non-windows systems. It could be in windows systems as well if
> pthread_win32 would layer a cancellation enabled recv, read and other
> functions over windows.

I am just finishing the COFFEE for windows now: they are "wrapper" functions
that mimics the read() and write() functions (and since I am there, also
other multiplatform goodies) with posix-like cancelation in c++/weftish
style. How I have already signaled before, is well possible to write a
drop-in "read()" &c and "recv()" &c that are cancelation points, given that
the cancelation request is a Winapi event.

At the moment, I would prefer wefts users to go for the weftish portable C++
solution, but there's no problem in providing a replacer for LIBC functions
under certain non-cooperative systems also.

>
>
> Could I just provide an OSTAIL that uses my compiled library and not the
> source?

We are all open source coders, but if you want to go your own way you are
free to do it: you can provide an OSTAIL with your own closed source
library, but it may not be included in the wefts distribution (I mean, the
OSTAIL yes, but your lib not); i.e. you can provide a downloadable version
of wefts+yourlib at your site. In fact, ostail also interfaces MS-WIN
libraries... and I am not asking MS to open their source for me :-)

>
> If not, I will think about creating an OSTAIL using Joe Seighs great
> lock-free semaphore:

I would prefer this solution ONLY for the reason that the whole ostail you
provide could be entered then in wefts CVS; this would provide probably
better maintainability as the project grows, and as other developer joins,
and could also make both our projects (wefts and appcore) more visible;
with closed source version, although you can do it and I would be happy
too, we'll both loose some sinergy.

Giancarlo.