http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
So far, the counting algorithm looks okay. I guess at a certain level its
uses some of the same basic principals that one of Joe Seighs wait-free
semaphore algorithms used. Everything takes place in single atomic
statements which do not depend on any prior state holding a consistent value
within a specific timeframe, unlike CAS or LL/SC. There is no need for any
sort of "compare logic" in order to commit a mutation to the shared counter.
You don't have to load; modify; store only if prior load is consistent...
Therefore, simple increments, decrements and fetch-and-add is all that is
really required. There is no timeout logic as of yet. However, I believe the
same basic scheme that Joe used can be make to work within the context of my
algorithm...
Anyway, it should compile fine. Please tinker around with it, and see if you
can come up with some real nasty race-conditions!
Any thoughts?
;^)
Reload the code. There was a race-condition. On your case rdunlock would
detect a m_count < 1 and posted to the event waking the writer. Problem...
Please re-download the link, and tell me what you think:
http://appcore.home.comcast.net/~appcore/misc/rwmutex_win_hpp.html
Okay, here is a little contest... Who can identify the race-condition I
found? Hint: it has to do with rdunlock responding to wrlock... Anyway, the
corrected code is on the website.
Post followups here please:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/62cbf4851c1f63de
http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
_____________________________________________________________________
#if ! defined(RWMUTEX_WIN_HPP)
#define RWMUTEX_WIN_HPP
#if ! defined(WIN32_LEAN_AND_MEAN)
#define WIN32_LEAN_AND_MEAN
#endif
#undef _WIN32_WINNT
#define _WIN32_WINNT 0x0400
#include <windows.h>
#include <exception>
#include <cassert>
#include <climits>
#if defined(_MSC_VER)
#define RWMUTEX_WIN_UNEXPECTED unexpected
#else
#define RWMUTEX_WIN_UNEXPECTED std::unexpected
#endif
/*************************************************************/
class rwmutex_win {
rwmutex_win(rwmutex_win const&);
rwmutex_win const& operator =(rwmutex_win const&);
private:
LONG m_count;
LONG m_rdwake;
HANDLE m_wrwset;
HANDLE m_rdwset;
CRITICAL_SECTION m_wrlock;
public:
rwmutex_win()
: m_count(LONG_MAX),
m_rdwake(0),
m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
m_rdwset(CreateSemaphore(0, 0, LONG_MAX, 0)) {
if (! m_rdwset || ! m_wrwset) {
if (m_rdwset) { CloseHandle(m_rdwset); }
if (m_wrwset) { CloseHandle(m_wrwset); }
throw;
}
InitializeCriticalSection(&m_wrlock);
}
~rwmutex_win() throw() {
DeleteCriticalSection(&m_wrlock);
if (! CloseHandle(m_rdwset)) { assert(false); }
if (! CloseHandle(m_wrwset)) { assert(false); }
}
public:
void wrlock() throw() {
EnterCriticalSection(&m_wrlock);
LONG count =
InterlockedExchangeAdd(&m_count, -LONG_MAX);
if (count < LONG_MAX) {
if ( InterlockedExchangeAdd(&m_rdwake,
LONG_MAX - count) + LONG_MAX - count) {
if (WaitForSingleObject(m_wrwset,
INFINITE) != WAIT_OBJECT_0) {
assert(false); RWMUTEX_WIN_UNEXPECTED();
}
}
}
}
void wrunlock() throw() {
LONG count =
InterlockedExchangeAdd(&m_count, LONG_MAX);
if (count < 0) {
if (! ReleaseSemaphore(m_rdwset, count * -1, 0)) {
assert(false); RWMUTEX_WIN_UNEXPECTED();
}
}
LeaveCriticalSection(&m_wrlock);
}
public:
void rdlock() throw() {
LONG count = InterlockedDecrement(&m_count);
if (count < 0) {
if (WaitForSingleObject(m_rdwset,
INFINITE) != WAIT_OBJECT_0) {
assert(false); RWMUTEX_WIN_UNEXPECTED();
}
}
}
void rdunlock() throw() {
LONG count = InterlockedIncrement(&m_count);
if (count < 1) {
if (! InterlockedDecrement(&m_rdwake)) {
if (! SetEvent(m_wrwset)) {
assert(false); RWMUTEX_WIN_UNEXPECTED();
}
}
}
}
public:
bool tryrdtowr() throw() {
if (TryEnterCriticalSection(&m_wrlock)) {
LONG count =
InterlockedExchangeAdd(&m_count, (-LONG_MAX) + 1) + 1;
if (count < LONG_MAX) {
if (InterlockedExchangeAdd(&m_rdwake,
LONG_MAX - count) + LONG_MAX - count) {
if (WaitForSingleObject(m_wrwset,
INFINITE) != WAIT_OBJECT_0) {
assert(false); RWMUTEX_WIN_UNEXPECTED();
}
}
}
return true;
}
return false;
}
};
/*************************************************************/
#endif
_____________________________________________________________________
This was in response to the following query by Alexander Grigoriev:
http://groups.google.com/group/microsoft.public.win32.programmer.kernel/msg/63182c5c425e41f5
Does this look okay to you guys? If not, please help me fix it!
;^)
Thank you all.
Whats experimental about it?
Outline the experiment, what you are experimenting with? Are you
testing for something?
Well, experiments aside, there are numerious design issues with this
code, but the most glaring one in its vain attempt to protect resources,
it fails to protect its entry points.
You need to protect not only whats in the box, but all interface points
to the box.
--
HLS
The algorihtm itself.
>
> Outline the experiment, what you are experimenting with? Are you testing
> for something?
>
> Well, experiments aside, there are numerious design issues with this
> code, but the most glaring one in its vain attempt to protect resources,
> it fails to protect its entry points.
If you found a bug point it out please. I can't find any so far.
Your trolling attempts are not going to work.
> You need to protect not only whats in the box, but all interface points to
> the box.
Are referring to the fact that I did not include a lock helper class for
scoped locking? Why can't you make you own? Its trivial.
Seems to me, it already did!
>> You need to protect not only whats in the box, but all interface
>> points to the box.
>
> Are referring to the fact that I did not include a lock helper class for
> scoped locking? Why can't you make you own? Its trivial.
Whoa, you posted code here BEGGING for help. It was pointed out to you
the glaring problem with it, with a PROVEN example of a Application
unstable operation of your code, and now you don't wish to heed the
advice to fix it? pushing the buck to others? Give me a break!
Who's trolling now?
--
HLS
Show me exactly how my rw-mutex does not work?
I believe the example you gave was something like this:
{
rwmutex_win rw;
rw.rdlock();
}
Well, of course that is busted. Let me fix it:
{
rwmutex_win rw;
rw.rdlock();
rw.rdunlock();
}
There, now what's wrong with my code? If you can help me perfect my
rwmutex_win class, I will appreciate.it . So far, from what I can gather,
you want me to implement a scoped locking helper class for it. Okay, I will
add one and post it here shortly.
Thanks.
Okay. Here is the full example of the code you say proves by rwmutex_win
class does not work:
>> void main(char argc, char *argv[])
>> {
>> rwmutex_win rw;
>> rw.rdunlock();
>> rw.rdunlock();
>> {
>> rwmutex_win rw;
>> rw.rdlock();
>> }
>>
>> getch();
>>
>> rw.rdlock();
>> }
>>
That is clearly an incorrect use of a mutual exclusion synchronization
object in general. Why are you releasing a read-lock when you have not
acquired one? In other words, you create a rwmutex_win object on the stack
and the first call you make to it is to unlock read-access. Why do you think
this is okay?
For some reason I think you wrote that code in error on purpose to see if
you could get some sort of rise out of me? Na...
>>>> Well, experiments aside, there are numerious design issues with
>>>> this code, but the most glaring one in its vain attempt to protect
>>>>
>>>> You need to protect not only whats in the box, but all interface
>>>> points to the box.
>>>
>>> Are referring to the fact that I did not include a lock helper class
>>> for scoped locking? Why can't you make you own? Its trivial.
>>
>> Whoa, you posted code here BEGGING for help. It was pointed out to you
>> the glaring problem with it, with a PROVEN example of a Application
>> unstable operation of your code, and now you don't wish to heed the
>> advice to fix it? pushing the buck to others? Give me a break!
> Show me exactly how my rw-mutex does not work?
ah come on! Don't you know anything about software engineering, never
mind SE, thats a class of of its own, Programming? Do you know what
Entry points? Interface Points? Black Box concepts? Can't you even
test YOUR own programs? Really, its seems like you are one of those
wannabe programmers, probably uneducated, with no professional degree,
self taught, which by no means doesn't mean there are no natural great
ones, but you are not one of them.
I posted the example fault code already in the other thread where you
already responded to with your silly *fix it yourself* response.
Need it again? Fine!
void main(char argc, char *argv[])
{
rwmutex_win rw;
rw.rdunlock();
rw.rdunlock();
getch();
rw.rdlock();
}
You need to control C/Break out of this code.
That is oppose to a method that would be used by our R/W classes with
inherent initialization, and locks and unlocks.
void main(char argc, char *argv[])
{
TReaderWriter mutex;
...
{
TReaderGrabber grab(mutex);
...
}
}
Which would NOT lend itself to easy programming or sychronization order
mistakes quite possible with your code.
Seriously, if you wish to exhibit yourself as even a close facsimile of
a great multi-threaded developer "expert" atleast, please, don't
embarass me with this baloney of yours. And stop trying to pit that
other usenet GROUP against this Microsoft GROUP. That's nonsense and
isn't going to impress anyone. The participants could care less, but you
didn't insult them yet either. Nonetheless, I could give a hoot about
what your friends out there think. It is meaningless to me.
Are you ready to play nice?
==
HLS
void main(char argc, char *argv[])
{
rwmutex_win rw;
rw.rdunlock();
rw.rdunlock();
getch();
rw.rdlock();
}
Chris Thomasson wrote:
> For some reason I think you wrote that code in error on purpose to see
> if you could get some sort of rise out of me? Na...
If that doesn't raise the hair in your back, or you do not recognize the
possible dualities of improper synchronization, then god help anyone
would use and trust your code! Just fix it by checking your entry
points or provide a local scope class wrapper for implmenentators to use.
The bottom line is you are a hypocrit, and it clearly shows your lack of
software engineering knowledge. You do not have to gain the right to
criticize any other else's work and insult other's people knowledge.
--
HLS
void main is classic mistake. Anyway:
> {
> rwmutex_win rw;
> rw.rdunlock();
> rw.rdunlock();
> getch();
> rw.rdlock();
> }
>
Your code uses unlocks a read lock when it has not already acquired one. Now
common! Here is the proper way to do it:
int main(char argc, char *argv[]) {
rwmutex_win rw;
rw.rdlock();
rw.rdlock();
getch();
rw.rdunlock();
rw.rdunlock();
return 0;
}
Read here:
http://groups.google.com/group/comp.programming.threads/msg/cede2e550058c2ff
>> The following code (using Chris's RW library) will lock up this simple
>> application requiring an abortive ^C/break:
>>
>> void main(char argc, char *argv[])
>> {
>> rwmutex_win rw;
>> rw.rdunlock();
>> rw.rdunlock();
>> getch();
>> rw.rdlock();
>> }
>
> Your code uses unlocks a read lock when it has not already acquired one.
> Now common! Here is the proper way to do it:
>
> int main(char argc, char *argv[]) {
> rwmutex_win rw;
> rw.rdlock();
> rw.rdlock();
> getch();
> rw.rdunlock();
> rw.rdunlock();
> return 0;
> }
You really don't have a clue? How can you complain about the lack of
our constructor not checking return values with a VERY LOW OCCURENCE
rate of railure, essentially a NIL possibility, yet you no qualm with
your own code with VERY HIGH OCCURRENCE unprotected functional entry points?
Honestly, you don't have a clue?
--
HLS
You release a lock that has not been previously acquired. Your making a big
mistake there. Read here:
http://groups.google.com/group/comp.programming.threads/msg/cede2e550058c2ff
If you insist on misusing the class, well, that's your problem not mine.
The stack-based programmer friendly helper class is only as good as the
underlying primitives. Chris hasn't argued with the value of the helper
class, nor implied that his implementation replaces it. He is working on
the meat of the problem, beneath the surface and not even visible to the end
user.
>
> void main(char argc, char *argv[])
> {
> TReaderWriter mutex;
> ...
> {
> TReaderGrabber grab(mutex);
> ...
> }
> }
>
> Which would NOT lend itself to easy programming or sychronization order
> mistakes quite possible with your code.
>
> Seriously, if you wish to exhibit yourself as even a close facsimile of a
> great multi-threaded developer "expert" atleast, please, don't embarass me
> with this baloney of yours. And stop trying to pit that other usenet
> GROUP against this Microsoft GROUP. That's nonsense and isn't going to
> impress anyone. The participants could care less, but you didn't insult
> them yet either. Nonetheless, I could give a hoot about what your friends
> out there think. It is meaningless to me.
I believe that the other group you are referring to
(comp.programming.threads) is moderated while the Microsoft ones are open to
anyone who wants to mouth off in public.
>
> Are you ready to play nice?
Chris always plays nice, as nearly as I can tell.
In fact, I think he was the one I had a very similar discussion with some
time ago, because he told me a way to make my algorithm faster which in fact
added overhead in my particular case. The key differences were that our
exchange ended relatively quickly when I explained I was discussing a case
with only one reader (and I never claimed it was an r/w-mutex or r/w queue)
and that neither of us ever resorted to ad-hominem attacks (nevermind
attacks thoroughly without basis which seem to be the norm in your threads
here).
And BTW your code is appallingly inefficient for the single reader case, you
should be using interlocked singly linked-linked lists which give single
reader, multiple writer, lockless, low overhead operation. Microsoft even
has provided an implementation for you.
http://msdn2.microsoft.com/en-us/library/ms686962.aspx
>
> ==
> HLS
> And BTW your code is appallingly inefficient for the single reader case, you
> should be using interlocked singly linked-linked lists which give single
> reader, multiple writer, lockless, low overhead operation. Microsoft even
> has provided an implementation for you.
>
> http://msdn2.microsoft.com/en-us/library/ms686962.aspx
Benny, this EXACT point of EFFICIENCY above was *ALREADY* pointed out by
yours truly on multiple occasions - from the onset. So there is nothing
new being stated here. The posted code with the MFC class was a WORKING
starting point for exploration to provide the OP a quick FIFO queue and
I clearly stated the case where multi-readers might be required.
I appreciate the link, thought, a glance tells me it will serve very useful
PS: I am happy Chris was able to help you and not insult you at the same
time. I disagree with your other comments though, but there is no need
to repeat it.
--
HLS
> If you insist on misusing the class, well, that's your problem
> not mine.
No, I'm "worry about the NEWBIE" who might end up writing mult-threaded
applications and is not sychronizing his locks and unlock calls which
your library, clearly, obviously, uncategorically promotes with a VERY
HIGH occurence - just like the SIMPLE example illustrated.
Let me show it again:
void main(char argc, char *argv[])
{
rwmutex_win rw;
rw.rdunlock();
rw.rdunlock();
getch();
rw.rdlock();
}
The above will LOCK UP the application using Chris's library.
--
HLS
Ok, good enough, but advertising it as an R/W-queue was rather disingenious
if you were fully aware that it supports only one simultaneous reader.
Not entirely. It would be an improvement to check for such mistakes in an
appropriate debug version protected by #if DEBUG
Sorry, I think it is worth noting:
But it is indeed visible to the end user, clearly visible. Any software
engineer worth his salt will clearly tell you that not protecting your
entry points is clearly a fundamental aspect of getting to the "meat of
the problem" which is akin and all part of "Black Box" A.K.A functional
programming even with its C/C++ substrates.
--
HLS
Shall Chris also prevent the user from writing:
int main(void)
{
while (1);
{
printf("Press Q to quit.\n");
int c = getch();
if (c == 'Q' || c == 'q') return 0;
}
}
which also hangs, requiring CTRL+C to break?
> Ok, good enough, but advertising it as an R/W-queue was rather disingenious
> if you were fully aware that it supports only one simultaneous reader.
Only in the example posted, and it was CLEARLY stated in the original
response to the OP.
Just to be clear, it was well pointed out that in multiple reader
environment, a simple change to a TCriticalSectionGrabber() will do the
job, and clearly stated that a MORE efficient FIFO other than MFC based
one was more appropriate.
Anything else?
--
HLS
Ben, I nearly missed the introduction of some actual discussion into
this battle. Shame on you. ;-)
This is an excellent point, but I'll add more. A READ lock guard object
construct can be a great boon -- it adds convenience with no real risk.
A WRITE lock (or normal mutex) guard object introduces substantial risk
of its own because it'll unlock implicitly when the destructor fires no
matter what the cause or where in the code the unwind occurred. An
unlock with broken shared data predicates is a serious bug, and the
guard object can make the bug really hard to find. Obviously this risk
can be controlled -- but it's not trivial for beginners.
Whereas "straight C style" code like Chris' example is more likely to
exit still holding the lock. Also bad, clearly; but generally much
easier to diagnose. Any decent lock analysis package will be able to
tell you where the lock was taken after it leads to deadlock, and the
state of the data will be available for analysis. No analysis package is
going to be able to tell you where your guard implicitly released a lock
when it shouldn't have.
Guard object are "cool", and quite often appropriate and convenient. But
they can also be extremely dangerous if not completely understood. And,
in particular, it's cheesy for someone to be criticizing code merely
because it chooses not to use that style of programming.
> I believe that the other group you are referring to
> (comp.programming.threads) is moderated while the Microsoft ones are open to
> anyone who wants to mouth off in public.
No, sadly, (sometimes), comp.programming.threads is NOT moderated.
Okay. I will post the error checking version later on today. Thanks for
that.
>> The above will LOCK UP the application using Chris's library.
>
> Shall Chris also prevent the user from writing:
>
> int main(void)
> {
> while (1);
> {
> printf("Press Q to quit.\n");
> int c = getch();
> if (c == 'Q' || c == 'q') return 0;
> }
> }
>
> which also hangs, requiring CTRL+C to break?
Do you see anything there related to Chris's library?
Oh come on, not you too! What happen to this Microsoft GROUP!! Do they
let just anyone become a C++ MVP!? Or as Chris's friend, you both in
the same school level?
If you missed the entire point of about his "MEAT BOX" and protecting
his entry points, which is FAR worst a situation than not checking for
return values in our constructor, then this has become a BIG joke!
Geesz!
I'm sorry to be rude to you, but why are you wasting my time with that
baloney?
--
HLS
I would advise against insult the work that Ben, and many others, put into
obtaining their MVP titles. I say, good for them, and keep up the good work.
> If you missed the entire point of about his "MEAT BOX" and protecting his
> entry points, which is FAR worst a situation than not checking for return
> values in our constructor, then this has become a BIG joke!
>
> Geesz!
>
> I'm sorry to be rude to you, but why are you wasting my time with that
> baloney?
Your trolling attempts are not going to work any more.
I am going to add a locking scoping utility class and some further "sanity"
checking in my rwmutex_win class and post it here later on today. I hope
that you could give it another chance after that? Anyway, I was really
focused on the "rw-mutex algorithm itself" I created. I wanted to know if
anybody can see something I cannot. You mentioned that it allows programmers
to make explicit mistakes. Well, I will add some
#if defined(DEBUG)
sanity checks.
You informed me that it would be a good idea to include locking scoping
objects. Okay, I will add those as well.
Thanks for your help Hector (aka, Pops).
Chris,
Since the implementation your simple library requires the exposure of
user entry points for usage, these need to be protected in release code.
There is a high potential for synchronization errors that will not get
caught.
Please trust me on that.
And finally, here let me throw another bone out to you:
We will add the error checking for our constructor, we don't believe it
will offer any value, but will add a logger/tracer to see if anything
has silently occurred unbeknowst to us in the last 12 years of working
operations.
What we can't do is just throw exceptions without a new design review
and prolong QA testing period to see how any new throws may alter any
EH/SEH handling currently in place.
--
Hector Santos, CTO
Santronics Software, Inc
http://www.santronics.com
Okay. I am making the proper modifications to my code now. I will post soon.
> And finally, here let me throw another bone out to you:
>
> We will add the error checking for our constructor, we don't believe it
> will offer any value, but will add a logger/tracer to see if anything has
> silently occurred unbeknowst to us in the last 12 years of working
> operations.
Perfect.
> What we can't do is just throw exceptions without a new design review and
> prolong QA testing period to see how any new throws may alter any EH/SEH
> handling currently in place.
Okay. I see. However, I think the add error checking via logging will be
worthwhile.
Thanks.
My point was simply that bad code will do bad things. We would hardly say
that it is up to the printf or getch writer to prevent (ab)use in the
fashion of the example I gave. Why then would we expect Chris to prevent
clear abuse of his functions? And even if he prevented abuse in the manner
of your example, would it actually do any good? I can still create a
deadlock by using your guard objects improperly (pseudocode):
rwlock a, b;
thread1()
{
writerguard w1(a);
writerguard w2(b);
...
}
thread2()
{
writerguard w1(b);
writerguard w2(a);
...
}
Is this a bug in your version, because you do not protect against this
abuse?
Ok, the obvious. GIGO! Garbage In, Garbage out.
> Why then would we expect Chris to prevent
> clear abuse of his functions?
That was a clear attempt to break it, and that alone should be enough to
show the entry points need protection.
But we are talking the real world, and more appropriate the complex
workd of multi-thread and synchronization designs.
It doesn't have to be an abuse. It can be "honest" coding mistake, I
mean the sky is the limit of the coding possibilities that can easily
create a situation whether unlock is prematurely called. Maybe the gut
there if conditiion function exit that skipped a lock, and the exit
called the unlock, maybe he was using try finally blocks and didn't
quite understood the flow yet.
The fact that it is allowed to occured, opens the door that it can, and
more than likely will happen
> And even if he prevented abuse in the manner
> of your example, would it actually do any good?
Sure, better coding, less supports issues, confidence in design, there
are all kinds of benefits which ultimately spells into less cost, less
dollars across the board.
> I can still create a
> deadlock by using your guard objects improperly (pseudocode):
>
> ...
>
> Is this a bug in your version, because you do not protect against this
> abuse?
Sure, there is clear cut points where proper implementation is expected.
No doubt.
No one is immune to it. My point is when you have explicit entry points
for usage where more entry points are required to release it, it is
far better to remove the potential issues as much as you can, then to
just let to to the whims of a programmer.
In our case, which means nothing, it is unlikely to happen since we
don't use globals like you showed and most our resources to be protected
all modularilize, a class per resource, etc. But sure, in principle, it
can happen.
So no, I don't expect Chris or anyone else to solve the world problems,
just the black box they own and expose as much as they *practically*
can. There is always a point where the
"Press Any Key / Where is the Any Key?"
user syndrome comes into play.
Is that obvious? I think so. Or maybe not. :-)
(Oh, time for dinner! See ya later)
--
HLS
>>> Okay. I will post the error checking version later on today. Thanks
>>> for that.
My opinion, there is no right/wrong, just how I was trained. Also keep
in mind that I have not done this level of coding/thinking in many
years. Our treasure chest is full of reusable code designed over many
years that the finer details are well forgotten. Plus age is catching up
fast. :-) All I am saying is talk all this with an open mind, there is
nothing "locked" in stone. I just wish to avoid useless philosophical
debates. Ok?
1) Another Lockup situation:
void BadTest2()
{
rwmutex_win rw();
rw.wrlock();
getch();
rw.wrlock();
}
One might say this is a programmer mistake, YES, but it is possible to
occur through obscure and abstract ways in the programmers world,
especially as I explain in #4 below "Same Thread Writer Access".
Also, this might be related to another problem with the "depth" which I
will show more below in #3. So keep this in mind.
2) xxunlock() / xxlock() entry points.
For explicit usage (users calling directly), this is a perfect situation
were you might wish to use a throw because it is a fundamental user
coding error to resolved asap. It is critical to perfecting the
synchronization and serialization of resources.
So I will interested to see how you do this because there is a issue
with the depth (#3) that might be used in checking the entry point. It
depends on how you do this.
Anyway, when you add the local scope class helper (and if you like
borrow the similar login in critsect.h), this will tremendously clean up
user implementation. For example, when I used your library for the
multi-reader/single reader pump example, the Get() looks like this:
BOOL Get(TWCFWatchEvent &o)
{
Mutex.wrlock();
if (IsEmpty()) {
Mutex.wrunlock();
return FALSE;
}
o = RemoveTail();
Mutex.wrunlock();
return TRUE;
}
There is nothing wrong with it, but one might be tempted to want to use
try finally, and now SEH might be required. So getting the local scope
class helper is going to be a great addition to your rw code, and
greatly appreciated by end-users for very simple and common
serialization of resources.
3) Semaphore counts (depth).
I found another mite with the depth usage.
Again, in the spirit of minimizing software engineering coding issues,
an allowance for the end user to enter a depth value in the contructor
is available and because of that allowance, it can open the door to
incorrect usage.
Per MSDN, the minimum value for depth is 1.
If a value of zero, it is caught with a run time exception:
rwmutex_win Mutex(0); // <-- exception thrown
This is good so far,
But a value of 1 is not correctly considered in the if conditions and
this will cause a permanent lockup as well.
Design consideration questions:
1) Why is depth needed? Can it be removed?
2) Can it stay at LONG_MAX? Think Pareto Principle. If 80-90% is the
common usage and most appropriate usage, then you might consider fixing
it at LONG MAX.
3) Will it hurt keeping it at LONG MAX or some other value?
Again, these are considerations to solidify the code and reduce any
software engineering issues. In either case, needed, desired or not,
its internal usage needs to be used corrected.
4) Same thread WRITER access.
Here I admit there is philosophy debate depending on one's school of
thought. I definitely open to open-minded discussions here.
But overall, I don't believe the same thread WRITER access should be
blocked. I believe this is whats causing the lock in #1 above.
In our critsect.h, this is possible to have global controller:
class TFoobar
{
public:
void memberX()
{
TWriterGrabber grab(Mutex);
memberY(); //Call some other class memberY
}
void memberY()
{
TWriterGrabber grab(Mutex);
}
private:
TReaderWriter Mutex;
} foobar;
Again, think black box, the functional modularity may be use
autonomously or with other functions. The applicability and reasons is
plentiful. This writer mutex should not block the same thread.
Unless I am missing something Chris, I am not see thing with the
rwmutex_win library.
5) Support Functional Programming.
I would consider using a BOOL return on the xxxunlock/lock functions to
help avoid having to use exception issues:
rwmutex_win rw();
if (!rw.wrlock()) {
MYLOGERROR(GetLastError(),"Critical Design Error");
}
You want to call and set your own SetLastError(RWMUTEX_ERROR_BASE+xxxx)
for any FALSE condition here.
Summary:
Overall, I think correcting the logic and usage the depth variables is
key to remove most if not all of its issue I see.
- Check for entry points misuse,
- Checking for a depth value of 1 and using it correctly is needed.
- Consider deprecating the rddepth optional parameter unless required.
- Consider same thread WRITE access allowance,
- Add a Local Scope Class Header,
- Consider functional programming support.
--
HLS
All right with me. BTW, thanks for your comments on my code. I appreciate
them.
> 1) Another Lockup situation:
>
> void BadTest2()
> {
> rwmutex_win rw();
> rw.wrlock();
> getch();
> rw.wrlock();
> }
>
> One might say this is a programmer mistake, YES, but it is possible to
> occur through obscure and abstract ways in the programmers world,
> especially as I explain in #4 below "Same Thread Writer Access".
Yes. This is an example of Murphy's Law. If shit can happen, it will happen
right in the middle of the place where you don't seem to think it will...
;^)
I have removed the m_rddepth variable. Check here:
http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
You are 100% correct in that it added unneeded complexity. I subsituted
m_rddepth with LONG_MAX. I think that processors with more than LONG_MAX
cores are far off in the future...
:^)
> Again, in the spirit of minimizing software engineering coding issues, an
> allowance for the end user to enter a depth value in the contructor is
> available and because of that allowance, it can open the door to incorrect
> usage.
>
> Per MSDN, the minimum value for depth is 1.
>
> If a value of zero, it is caught with a run time exception:
>
> rwmutex_win Mutex(0); // <-- exception thrown
>
> This is good so far,
>
> But a value of 1 is not correctly considered in the if conditions and this
> will cause a permanent lockup as well.
>
> Design consideration questions:
>
> 1) Why is depth needed? Can it be removed?
Yes it can, and its has been removed. Thanks for pointing that out.
> 2) Can it stay at LONG_MAX? Think Pareto Principle. If 80-90% is the
> common usage and most appropriate usage, then you might consider fixing it
> at LONG MAX.
Yes. I fixed it at LONG_MAX. Perfect.
>
> 3) Will it hurt keeping it at LONG MAX or some other value?
>
> Again, these are considerations to solidify the code and reduce any
> software engineering issues. In either case, needed, desired or not,
> its internal usage needs to be used corrected.
I have corrected the issue.
>
> 4) Same thread WRITER access.
Okay. You want me to make my wrlock function to be recursive. You correct in
that recursive locking is not needed to create correct code. However, I will
go ahead and try to recurse the writer as it adds to the overall
functionality of the synchronization object overall; I mean POSIX extension
supports recursive attribute for mutex, so, why should I not provide
recursive write locking? Thanks.. That should be a trivial task for me to
do.
>
> Here I admit there is philosophy debate depending on one's school of
> thought. I definitely open to open-minded discussions here.
Indeed.
[...]
> Unless I am missing something Chris, I am not see thing with the
> rwmutex_win library.
You missed nothing at all. The rw-mutex implementation I posted as-is does
NOT support recursive writer locking in any way, shape or form. I think I
will add it as to further support enhanced functionality of my rw-mutex
class.
> 5) Support Functional Programming.
>
> I would consider using a BOOL return on the xxxunlock/lock functions to
> help avoid having to use exception issues:
>
> rwmutex_win rw();
> if (!rw.wrlock()) {
> MYLOGERROR(GetLastError(),"Critical Design Error");
> }
I don't think I need bool return... Humm... Well, I am interested in what
others on this point? I think I would only need to return bool if there was
the qualifier "try" in the name of the function called. Otherwise, I would
expect blocking behavior to ensue.
> You want to call and set your own SetLastError(RWMUTEX_ERROR_BASE+xxxx)
> for any FALSE condition here.
>
> Summary:
>
> Overall, I think correcting the logic and usage the depth variables is key
> to remove most if not all of its issue I see.
>
> - Check for entry points misuse,
> - Checking for a depth value of 1 and using it correctly is needed.
> - Consider deprecating the rddepth optional parameter unless required.
> - Consider same thread WRITE access allowance,
> - Add a Local Scope Class Header,
> - Consider functional programming support.
Thank so much for that. I really appreciate the time you took to looking
over my code an try to help it out.
I am not going to be able to post the code today. Please check back sometime
tomorrow though. I will take your comment to heart and apply them in my
refactored code.
Thanks Hector.
Sincerely,
Chris Thomasson
http://appcore.home.comcast.net
:^)
Humm... Well, perhaps you could go ahead and think about renaming the class
to something along the lines of:
SingleReaderMultiWriterQueue
Just to document the actual usage requ9irements and/or functionality in the
name of the object itself, or something to get the point across more
directly?
Would that be helpful at all?
[...]
I will add a trywrlock function. I will also consider and probably use
SetLastError like errno. Fine.
>> Just to be clear, it was well pointed out that in multiple reader
>> environment, a simple change to a TCriticalSectionGrabber() will do
>> the job, and clearly stated that a MORE efficient FIFO other than MFC
>> based one was more appropriate.
>>
>> Anything else?
>
> Humm... Well, perhaps you could go ahead and think about renaming the
> class to something along the lines of:
>
> SingleReaderMultiWriterQueue
Maybe Chris, but I think more appropriately is my propensity to presume
the obvious will be understood. It is a bad habit, but I have to admit
this kernel32 group has changed. There is a new generation here, the
veterans are not longer here or more mellow like I have become. But the
vets all know me here. I was a regular during the 90s and early 2000s,
but now I only pop up once in a blue moon to get an answer and then move
on. While I'm here, something I will answer some question too. I love
and enjoy helping people if I can, like I did with the OP. But if you
didn't begin to disrupt the thread, rest assured, I was near ready to
move on. :-)
Anyway, many times, I will draft a message and decide that it could be
cut down, or I just won't worry about the finer details because I
presume in this highly technical forum full of professionals and long
time veterans, it will be understood.
That of course, can often get you in trouble when a "young turk" who
doesn't know you begin challenging you. :-) I think you went overboard,
but lets put that to rest.
Anyway, the point is that it is WELL UNDERSTOOD the MFC Clist for a FIFO
was not ideal, it is not thread ready and requires serializations. MFC
does offer CNoTrackObject and CProcessLocal/CThreadLocal template
classes to make it thread ready at the process and thread level. Off
hand, if I recall, I believe it uses TLS. Let me see if I can find an
quick example usage.... ok, here is one:
Example usage:
struct CDllGlobalData : public CNoTrackObject
{
... your data ...
DWORD dwCount;
CString sModulename
};
CThreadLocal<CDllGlobalData> dllGlobalData;
int GetGlobalDebugLevel()
{
return dllGlobalData->dwCount;
}
CString GetGlobalModuleName()
{
return dllGlobalData->sModuleName;
}
(please don't jump on the example, thats all it is. :-))
But I didn't want to waste time to point out this. Efficiency and thread
counts came into my mind.
I also knew a more efficient,low overhead double linked link queue would
be better and we have code for this too but for other things. I wasn'
about to post that. But I did mention this point in one of the post
which you skipped over. :-)
And believe it or not the code I posted was first designed for
multi-readers, multi-writers, but there were several reasons why I cut
it down to a multi-writer/single reader example:
1) Cut the size of the posted code, and more importantly,
2) Give the poster the MINIMUM BASELINE for him to
analyze the loading requirement.
I knew that would be very important in this "pump" design, so I cut it
down, simplified it to a multi-writer/single reader design.
Honest, this is a key consideration and I know how important it is
because it is fundamental in our communications product flow control
synchronization needs. That is why my original statement to the OP 100%
implied this concept and point:
"If your testing shows the queue is growing too fast,
then you add additional data threads."
That was 100% stated with the implications if he added more reader
threads, then the code as it was posted would have to be changed to
serialized the multiple readers, as you noted with a writer lock.
I guess my mistake is that I either pulled that clarification from my
draft message, or I didn't bother to write it or since I just finished
exploring the same idea with RDCW() missed events and use a similar MFC
FIFO queue for a proof of concept, I used this more recognizable idea.
So I never disagreed with that point. But for some reason you wanted to
ignore everything that was said and preferred to militantly focus on it
being a problem and I was a terrible person because of it. :)
Anyway, its a dead issue.
Peace?
We really do need to stop this at this point bu if you feel to urge to
reply, go ahead with the final say.
--
HLS
I'll try to make this discussion a bit more interesting and maybe
restore some peace ;-)
Read write lock is actually tricky to use because the following is
also a bug but much more slippery (hence deadlier):
thread1()
{
readguard rlock1(a);
readguard rlock2(a);
}
Thought read lock allow multiple readers it's actually not recursive
(i.e. Illusive recursive).
[hint - another thread try to acquire exclusive lock in the middle].
In our read RW lock (which is similar to one of Chris lock free read
lock) we have several defect detections in debug mode:
1) Recursive read lock detection for up to N concurrent accesses
2) Recursive write lock detection
3) Recursive write lock detection (includes write-read combinations)
Write lock promotion not allowed (I think that it complicate the
code).
4) 2.5 minutes deadlock detection for other cases (using "hibernate
safe" timer).
Rani
"Ben Voigt [C++ MVP]" <r...@nospam.nospam> wrote in message
news:uY57H3jJ...@TK2MSFTNGP05.phx.gbl...
http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
________________________________________________________________
#if ! defined(RWMUTEX_WIN_HPP)
#define RWMUTEX_WIN_HPP()0x0001UL
#undef WIN32_LEAN_AND_MEAN
#undef _WIN32_WINNT
#define _WIN32_WINNT 0x0400
#include <windows.h>
#include <exception>
#include <cassert>
#include <climits>
#if ! defined(RWMUTEX_WIN_UNEXPECTED)
#if defined(_MSC_VER)
#define RWMUTEX_WIN_UNEXPECTED() \
assert(false), unexpected()
#else
#define RWMUTEX_WIN_UNEXPECTED() \
assert(false), std::unexpected()
#endif
#endif
#if ! defined(RWMUTEX_WIN_CTOR_UNEXPECTED)
#define RWMUTEX_WIN_CTOR_UNEXPECTED() \
assert(false); throw std::exception()
#endif
#if ! defined(RWMUTEX_WIN_DTOR_UNEXPECTED)
#define RWMUTEX_WIN_DTOR_UNEXPECTED()assert(false)
#endif
/*************************************************************/
/* Simple Scoped Lock Helpers
_________________________________________________________*/
namespace lockguard {
template<typename T>
class read {
T* m_state;
public:
read(T& state) throw()
: m_state(&state) {
m_state->rdlock();
}
~read() throw() {
if (m_state) {
m_state->rdunlock();
}
}
public:
void cancel() throw() {
m_state = 0;
}
};
template<typename T>
class write {
T* m_state;
public:
write(T& state) throw()
: m_state(&state) {
m_state->wrlock();
}
~write() throw() {
if (m_state) {
m_state->wrunlock();
}
}
public:
void cancel() throw() {
m_state = 0;
}
};
}
/* Wait-Free Fast-Pathed Read/Write Mutex
_________________________________________________________*/
class rwmutex_win {
rwmutex_win(rwmutex_win const&);
rwmutex_win const& operator =(rwmutex_win const&);
public:
typedef lockguard::read<rwmutex_win> lock_read;
typedef lockguard::write<rwmutex_win> lock_write;
private:
LONG m_count;
LONG m_rdwake;
CRITICAL_SECTION m_wrlock;
HANDLE m_wrwset;
HANDLE m_rdwset;
public:
rwmutex_win()
: m_count(LONG_MAX),
m_rdwake(0),
m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
m_rdwset(CreateSemaphore(0, 0, LONG_MAX, 0)) {
if (! m_rdwset || ! m_wrwset) {
if (m_rdwset) { CloseHandle(m_rdwset); }
if (m_wrwset) { CloseHandle(m_wrwset); }
RWMUTEX_WIN_CTOR_UNEXPECTED();
}
InitializeCriticalSection(&m_wrlock);
}
~rwmutex_win() throw() {
if (m_count != LONG_MAX || m_rdwake) {
RWMUTEX_WIN_DTOR_UNEXPECTED();
}
DeleteCriticalSection(&m_wrlock);
if (! CloseHandle(m_rdwset) ||
! CloseHandle(m_wrwset)) {
RWMUTEX_WIN_DTOR_UNEXPECTED();
}
}
public:
void wrlock() throw() {
EnterCriticalSection(&m_wrlock);
assert(m_count > -1);
LONG count =
InterlockedExchangeAdd(&m_count, -LONG_MAX);
if (count < LONG_MAX) {
if (InterlockedExchangeAdd(&m_rdwake,
LONG_MAX - count) + LONG_MAX - count) {
if (WaitForSingleObject(m_wrwset,
INFINITE) != WAIT_OBJECT_0) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}
}
void wrunlock() throw() {
assert(m_count < 1);
LONG count =
InterlockedExchangeAdd(&m_count, LONG_MAX);
if (count < 0) {
if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
RWMUTEX_WIN_UNEXPECTED();
}
}
LeaveCriticalSection(&m_wrlock);
}
public:
void rdlock() throw() {
assert(m_count <= LONG_MAX);
LONG count = InterlockedDecrement(&m_count);
if (count < 0) {
if (WaitForSingleObject(m_rdwset,
INFINITE) != WAIT_OBJECT_0) {
assert(false); RWMUTEX_WIN_UNEXPECTED();
}
}
}
void rdunlock() throw() {
assert(m_count < LONG_MAX);
LONG count = InterlockedIncrement(&m_count);
if (count < 1) {
if (! InterlockedDecrement(&m_rdwake)) {
if (! SetEvent(m_wrwset)) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}
}
public:
bool tryrdtowr() throw() {
assert(m_count < LONG_MAX);
if (TryEnterCriticalSection(&m_wrlock)) {
assert(m_count > -1);
LONG count =
InterlockedExchangeAdd(&m_count, (-LONG_MAX) + 1) + 1;
if (count < LONG_MAX) {
if (InterlockedExchangeAdd(&m_rdwake,
LONG_MAX - count) + LONG_MAX - count) {
if (WaitForSingleObject(m_wrwset,
INFINITE) != WAIT_OBJECT_0) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}
return true;
}
return false;
}
};
/*************************************************************/
#endif
________________________________________________________________
Enjoy!
BTW, have any of you been tinkering around with this particular design?
Thanks.
http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
________________________________________________________________
bool m_upgraded;
public:
read(T& state) throw()
: m_state(&state),
m_upgraded(false) {
m_state->rdlock();
}
~read() throw() {
if (m_state) {
if (! m_upgraded) {
m_state->rdunlock();
} else {
m_state->wrunlock();
}
}
}
public:
void cancel() throw() { m_state = 0; }
bool tryrdtowr() throw() {
return (m_state && ! m_upgraded) ?
(m_upgraded = m_state->tryrdtowr()) : false;
}
};
template<typename T>
class write {
T* m_state;
public:
write(T& state) throw() : m_state(&state) {
m_state->wrlock();
}
~write() throw() {
if (m_state) { m_state->wrunlock(); }
}
public:
void cancel() throw() { m_state = 0; }
};
}
/* Wait-Free Fast-Pathed Read/Write Mutex
_________________________________________________________*/
class rwmutex_win {
rwmutex_win(rwmutex_win const&);
rwmutex_win const& operator =(rwmutex_win const&);
public:
typedef lockguard::read<rwmutex_win> lock_read;
typedef lockguard::write<rwmutex_win> lock_write;
private:
LONG m_count;
LONG m_rdwake;
LONG m_wrrecurse;
CRITICAL_SECTION m_wrlock;
HANDLE m_wrwset;
HANDLE m_rdwset;
public:
rwmutex_win()
: m_count(LONG_MAX),
m_rdwake(0),
m_wrrecurse(0),
m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
m_rdwset(CreateSemaphore(0, 0, LONG_MAX, 0)) {
if (! m_rdwset || ! m_wrwset) {
if (m_rdwset) { CloseHandle(m_rdwset); }
if (m_wrwset) { CloseHandle(m_wrwset); }
RWMUTEX_WIN_CTOR_UNEXPECTED();
}
InitializeCriticalSection(&m_wrlock);
}
~rwmutex_win() throw() {
if (m_count != LONG_MAX ||
m_rdwake || m_wrrecurse) {
RWMUTEX_WIN_DTOR_UNEXPECTED();
}
DeleteCriticalSection(&m_wrlock);
if (! CloseHandle(m_rdwset) ||
! CloseHandle(m_wrwset)) {
RWMUTEX_WIN_DTOR_UNEXPECTED();
}
}
public:
void wrlock() throw() {
EnterCriticalSection(&m_wrlock);
if (! (m_wrrecurse++)) {
assert(m_count > -1);
LONG count = InterlockedExchangeAdd(&m_count, -LONG_MAX);
if (count < LONG_MAX) {
if (InterlockedExchangeAdd(&m_rdwake,
LONG_MAX - count) + LONG_MAX - count) {
if (WaitForSingleObject(m_wrwset,
INFINITE) != WAIT_OBJECT_0) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}
}
}
void wrunlock() throw() {
assert(m_count < 1);
if (! (--m_wrrecurse)) {
LONG count = InterlockedExchangeAdd(&m_count, LONG_MAX);
if (count < 0) {
if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}
LeaveCriticalSection(&m_wrlock);
}
public:
void rdlock() throw() {
assert(m_count <= LONG_MAX);
LONG count = InterlockedDecrement(&m_count);
if (count < 0) {
if (WaitForSingleObject(m_rdwset,
INFINITE) != WAIT_OBJECT_0) {
assert(false); RWMUTEX_WIN_UNEXPECTED();
}
}
}
void rdunlock() throw() {
assert(m_count < LONG_MAX);
LONG count = InterlockedIncrement(&m_count);
if (count < 1) {
if (! InterlockedDecrement(&m_rdwake)) {
if (! SetEvent(m_wrwset)) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}
}
public:
bool tryrdtowr() throw() {
assert(m_count < LONG_MAX);
if (TryEnterCriticalSection(&m_wrlock)) {
assert(! m_wrrecurse);
++m_wrrecurse;
if (m_wrrecurse == 1) {
assert(m_count > -1);
LONG count =
InterlockedExchangeAdd(&m_count, (-LONG_MAX) + 1) + 1;
if (count < LONG_MAX) {
if (InterlockedExchangeAdd(&m_rdwake,
LONG_MAX - count) + LONG_MAX - count) {
if (WaitForSingleObject(m_wrwset,
INFINITE) != WAIT_OBJECT_0) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}
}
return true;
}
return false;
}
};
/*************************************************************/
#endif
________________________________________________________________
Enjoy! ;^)
BTW, any comments?
I have your previous copy, and the one I used for my context switch rate
profile testing (did you see that message in the ISLL thread?)
I just wanted to ask you a few things.
I believe I did see an slight improvement with I changed
InitializeCriticalSection(&m_wrlock);
to
InitializeCriticalSectionAndSpinCount(&m_wrlock,4);
Also, in my evaluation and comments regarding deprecating the
depth=LONG_MAX, you stated something that implied there was a
relationship with the # of cpus.
If so, then why not use get the CPU count and use that? I did this in
my copy of your previous posted version:
int rwMutextCpuCount()
{
int iTotalCpus = LONG_MAX;
#ifdef WIN32
SYSTEM_INFO systemInfo;
GetSystemInfo(&systemInfo);
iTotalCpus = systemInfo.dwNumberOfProcessors;
#endif
return iTotalCpus;
}
and in your constructor:
rwmutex_win()
: m_cpus(rwMutextCpuCount()),
m_count(m_cpus),
m_rdwake(0),
m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
m_rdwset(CreateSemaphore(0, 0, m_cpus, 0)) {
if (! m_rdwset || ! m_wrwset) {
if (m_rdwset) { CloseHandle(m_rdwset); }
if (m_wrwset) { CloseHandle(m_wrwset); }
throw;
}
InitializeCriticalSection(&m_wrlock);
}
I don't recall seeing any real different on my machine/testing.
But for the profile testing I did and posted, I used the LONG_MAX for
the comparison testing.
--
HLS
Using a CS with a spin count only makes sense if the CS is usually held for
very little time, to justify wait in a tight loop.
"Pops" <du...@nospamnospamnospam.com> wrote in message
news:%23csdw6J...@TK2MSFTNGP03.phx.gbl...
Agreed.
>
> "Pops" <du...@nospamnospamnospam.com> wrote in message
> news:%23csdw6J...@TK2MSFTNGP03.phx.gbl...
>> Chris,
>>
>> I have your previous copy, and the one I used for my context switch rate
>> profile testing (did you see that message in the ISLL thread?)
>>
>> I just wanted to ask you a few things.
>>
>> I believe I did see an slight improvement with I changed
>>
>> InitializeCriticalSection(&m_wrlock);
>>
>> to
>>
>> InitializeCriticalSectionAndSpinCount(&m_wrlock,4);
>>
>>
>> Also, in my evaluation and comments regarding deprecating the
>> depth=LONG_MAX, you stated something that implied there was a
>> relationship with the # of cpus.
>>
>> If so, then why not use get the CPU count and use that? I did this in my
>> copy of your previous posted version:
I use LONG_MAX because I want the algorithm to be able to support a large
amount of readers. I cannot set it to the number of cpus simply because a
user can decide to create 10 threads in a 2 CPU system. The depth of the
counter is directly proportional to the total number of reader threads, not
cpus, that can access the rw-mutex at any one time.
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:i5idndsOPO1ifLra...@comcast.com...
> This simplistic reader/writer algorithm is wait-free in the sense that
> there are no loops; no CAS or LL/SC:
>
>
> http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
>
>
> So far, the counting algorithm looks okay. I guess at a certain level its
> uses some of the same basic principals that one of Joe Seighs wait-free
> semaphore algorithms used. Everything takes place in single atomic
> statements which do not depend on any prior state holding a consistent
> value within a specific timeframe, unlike CAS or LL/SC. There is no need
> for any sort of "compare logic" in order to commit a mutation to the
> shared counter. You don't have to load; modify; store only if prior load
> is consistent... Therefore, simple increments, decrements and
> fetch-and-add is all that is really required. There is no timeout logic as
> of yet. However, I believe the same basic scheme that Joe used can be make
> to work within the context of my algorithm...
>
> Anyway, it should compile fine. Please tinker around with it, and see if
> you can come up with some real nasty race-conditions!
[...]
I have updated the code some more. I removed some local variables, and I
made some of the shared ones mutable in order to decorate the read-access
functions as const. I tweaked the code for the writer locking. Basically, I
moved the functionality out into two private functions (e.g.,
rwmutex::sys_wracquire/release).
Also, I created a stupid simple version thing based on namespaces;
simplistic analog of the following technique:
http://groups.google.com/group/comp.lang.c++.moderated/msg/bb45c17cdd3c04a3
Here is the latest version:
http://appcore.home.comcast.net/misc/winsync_v0001_rwmutex_hpp.html
__________________________________________________________________
/*************************************************************/
/*************************************************************
==============================================================
Experimental, Fast-Pathed Thread Synchronization for Windows
___________________________________________________________
// \\
\\ Created By //
_________//______________\\__________
// \\
*| |*
*| Chris M. Thomasson |*
*| |*
*| cri...@comcast.net |*
*| news://comp.programming.threads |*
*| http://appcore.home.comcast.net |*
*| |*
\\___________________________________//
==============================================================
/*************************************************************/
/*************************************************************/
// validate that we are using a C++ compiler
#if ! defined(__cplusplus)
#error RWMUTEX_WIN_BUILD - C++ compiler is required!!!
// check include guard
#elif ! defined(RWMUTEX_WIN_V0001_HPP)
// include guard file version 0001
#define RWMUTEX_WIN_V0001_HPP()0x1UL
// setup some windows build flags
#undef WIN32_LEAN_AND_MEAN
#undef _WIN32_WINNT
#define WIN32_LEAN_AND_MEAN
#define _WIN32_WINNT 0x0400 // for TryEnterCriticalSection
// include required support
#include <windows.h> // for a handfull windows API calls
#include <exception> // for std::exception, std::unexpected
#include <cassert> // for RWMUTEX_WIN_ASSERT
#include <climits> // for LONG_MAX
// the assertion macro
#if ! defined(RWMUTEX_WIN_ASSERT)
#define RWMUTEX_WIN_ASSERT(mp_this, mp_exp) \
assert((mp_exp))
#endif
// the constructor failure macro
#if ! defined(RWMUTEX_WIN_CTOR_UNEXPECTED)
#define RWMUTEX_WIN_CTOR_UNEXPECTED(mp_this) \
RWMUTEX_WIN_ASSERT(mp_this, false); \
throw std::exception()
#endif
// the macros called when the shi% hits the fan! ;^0
#if ! defined(RWMUTEX_WIN_UNEXPECTED)
#if defined(_MSC_VER)
#define RWMUTEX_WIN_UNEXPECTED(mp_this) \
RWMUTEX_WIN_ASSERT(mp_this, false), unexpected()
#else
#define RWMUTEX_WIN_UNEXPECTED(mp_this) \
RWMUTEX_WIN_ASSERT(mp_this, false), std::unexpected()
#endif
#endif
#if ! defined(RWMUTEX_WIN_DTOR_UNEXPECTED)
#define RWMUTEX_WIN_DTOR_UNEXPECTED(mp_this) \
RWMUTEX_WIN_ASSERT(mp_this, false)
#endif
#if ! defined(RWMUTEX_WIN_DTOR_SANITY_UNEXPECTED)
#define RWMUTEX_WIN_DTOR_SANITY_UNEXPECTED(mp_this) \
RWMUTEX_WIN_ASSERT(mp_this, false)
#endif
// the version default macro
#if ! defined(RWMUTEX_WIN_VER_DEFAULT)
#define RWMUTEX_WIN_VER_DEFAULT()v0001
#endif
// the version selection macro
#if ! defined(RWMUTEX_WIN_VER_SELECT)
#define RWMUTEX_WIN_VER_SELECT RWMUTEX_WIN_VER_DEFAULT
#endif
/*************************************************************/
/*************************************************************/
/* Windows Syncronization System Library Version 0001
_________________________________________________________*/
/* Public Forward Declarations
___________________________________________________*/
namespace winsyncsys {
namespace v0001 {
class rwmutex;
namespace raii {
namespace lock {
template<typename T> class read;
template<typename T> class write;
}} // namespace raii::lock
}} // namespace winsyncsys::v0001
/* Public Definitions/Declarations
___________________________________________________*/
namespace winsyncsys {
namespace v0001 {
/* RAII Scoped Helper Objects
___________________________________________________*/
namespace raii {
namespace lock {
// scoped read-access helper
template<typename T>
class read {
T* m_state;
mutable bool m_upgraded;
public:
inline read(T& state) throw()
: m_state(&state),
m_upgraded(false) {
m_state->rdlock();
}
inline ~read() throw() {
if (m_state) {
if (! m_upgraded) {
m_state->rdunlock();
} else {
m_state->wrunlock();
}
}
}
public:
inline void cancel() throw() {
m_state = 0;
}
inline bool is_active() const throw() {
return (m_state);
}
inline bool is_upgraded() const throw() {
RWMUTEX_WIN_ASSERT(this, (m_upgraded) ? m_state : false);
return m_upgraded;
}
inline bool tryrdtowr() const throw() {
RWMUTEX_WIN_ASSERT(this, ! m_upgraded && m_state);
return (m_state && ! m_upgraded) ?
(m_upgraded = m_state->tryrdtowr()) : false;
}
}; // class read
// scoped write-access helper
template<typename T>
class write {
T* m_state;
public:
inline write(T& state, bool trylock = false) throw()
: m_state(&state) {
if (! trylock) {
m_state->wrlock();
} else {
if (! m_state->trywrlock()) {
m_state = 0;
}
}
}
inline ~write() throw() {
if (m_state) { m_state->wrunlock(); }
}
public:
inline void cancel() throw() {
m_state = 0;
}
inline bool is_active() const throw() {
return (m_state);
}
}; // class write
}} // namespace raii::lock
/* Wait-Free Fast-Pathed Read/Write Mutex Object
___________________________________________________*/
class rwmutex {
// no copy/assign
rwmutex(rwmutex const&);
rwmutex const& operator =(rwmutex const&);
// Public Scoped-Locking Utility (RAII)
public:
// obtain scoped read-access
typedef raii::lock::read<rwmutex> read;
// obtain scoped write-access
typedef raii::lock::write<rwmutex> write;
// Private State
private:
// read/write-access (atomic count: fast/slow-path)
mutable LONG m_count;
// read-epoch (atomic differental-count: fast/slow-path)
mutable LONG m_rdepoch;
// write recursion (simple count: fast/slow-path)
LONG m_wrrecurse;
// writer-access overall lock (mutex: fast/slow-path)
mutable CRITICAL_SECTION m_wrlock;
// waiting writer (waitset: bin-sema slow-path)
HANDLE m_wrwset;
// waiting reader(s) (waitset: sema slow-path)
HANDLE m_rdwset;
// Public Constructor/Destructor
public:
// initialize the object for LONG_MAX total readers...
rwmutex()
: m_count(LONG_MAX),
m_rdepoch(0),
m_wrrecurse(0),
m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
m_rdwset(CreateSemaphore(0, 0, LONG_MAX, 0)) {
// ensure we acquired our sync resources
if (! m_rdwset || ! m_wrwset) {
if (m_rdwset) { CloseHandle(m_rdwset); }
if (m_wrwset) { CloseHandle(m_wrwset); }
RWMUTEX_WIN_CTOR_UNEXPECTED(this); // Out-Of-Resources!
}
InitializeCriticalSection(&m_wrlock);
}
// destroy the object...
~rwmutex() throw() {
// a basic sanity check! ;^)
if (m_count != LONG_MAX ||
m_rdepoch ||
m_wrrecurse) {
/* the shit hit the fuc%king fan because this is
being destroyed, while its being used!!! :^0
*/
RWMUTEX_WIN_DTOR_SANITY_UNEXPECTED(this); // DOH!
}
// make sure we can release our sync resources
DeleteCriticalSection(&m_wrlock);
if (! CloseHandle(m_rdwset) ||
! CloseHandle(m_wrwset)) {
RWMUTEX_WIN_DTOR_UNEXPECTED(this); // YIKES!
}
}
// Private System Writer API
private:
// adjust state wrt introducing writer-access...
void sys_wracquire(LONG CONST addend = 0) throw() {
// inc-and-test writer recursion depth
if (! (m_wrrecurse++)) {
// ensure we are the only writer
RWMUTEX_WIN_ASSERT(this, m_count > -1);
// introduce writer request
LONG CONST count =
InterlockedExchangeAdd(
&m_count,
(-(LONG_MAX)) + addend) + addend;
// check for readers
if (count < LONG_MAX) {
// increment the differential read-epoch count
if (InterlockedExchangeAdd(
&m_rdepoch,
LONG_MAX - count) + LONG_MAX - count) {
// wait for the readers in the current read-epoch
if (WaitForSingleObject(
m_wrwset,
INFINITE) != WAIT_OBJECT_0) {
RWMUTEX_WIN_UNEXPECTED(this); // SON-OF-A-BIT%CH!
}
}
}
}
}
// adjust state wrt revoking writer-access...
void sys_wrrelease(LONG CONST addend = LONG_MAX) throw() {
// validate prior write-access
RWMUTEX_WIN_ASSERT(this, m_count < 1);
// dec-and-test writer recursion depth
if (! (--m_wrrecurse)) {
// revoke prior writer request
LONG CONST count =
InterlockedExchangeAdd(
&m_count,
addend);
// check for waiting readers
if (count < 0) {
/* signal all the the reader(s) waiting on the
next read-epoch
*/
if (! ReleaseSemaphore(
m_rdwset,
-count,
0)) {
RWMUTEX_WIN_UNEXPECTED(this); // OMFG!!
}
}
}
}
// Public Writer API
public:
// lock for writer-access...
inline void wrlock() throw() {
// obtain overall mutual exclusion
EnterCriticalSection(&m_wrlock);
// let everybody know a writer is in the pipe
sys_wracquire();
// we have write-access!
}
// try-lock for writer-access...
inline bool trywrlock() throw() {
// try to obtain overall mutual exclusion
if (TryEnterCriticalSection(&m_wrlock)) {
// let everybody know a writer is in the pipe
sys_wracquire();
// we have been granted write-access
return true;
}
// CRAP! somebody else beat us here ;^(...
return false;
}
// unlock for writer-access...
inline void wrunlock() throw() {
// let everybody know the writer is leaving
sys_wrrelease();
// release overall mutual exclusion
LeaveCriticalSection(&m_wrlock);
// we have released the prior write-access!
}
// Public Reader API
public:
// lock for read-access...
inline void rdlock() const throw() {
// validate the count
RWMUTEX_WIN_ASSERT(this, m_count <= LONG_MAX);
// introduce a read-access request
if (InterlockedDecrement(&m_count) < 0) {
// there is writer-access at this time, we need to
// wait on the current read-epoch
if (WaitForSingleObject(
m_rdwset,
INFINITE) != WAIT_OBJECT_0) {
RWMUTEX_WIN_UNEXPECTED(this); // NOT GOOD!!
}
}
// we have read-access!
}
// unlock for read-access...
inline void rdunlock() const throw() {
// validate a prior read-access
RWMUTEX_WIN_ASSERT(this, m_count < LONG_MAX);
// revoke the prior read-access
if (InterlockedIncrement(&m_count) < 1) {
/* there is a waiting writer, therefore, we need to
decrement the differential read-epoch count
*/
if (! InterlockedDecrement(&m_rdepoch)) {
/* we are last reader in the current read-epoch,
therefore, we need to singal the waiting writer
*/
if (! SetEvent(m_wrwset)) {
RWMUTEX_WIN_UNEXPECTED(this); // CRAP!
}
}
}
// we have released the prior read-access!
}
// try to upgrade read-access to writer-access...
inline bool tryrdtowr() const throw() {
// validate prior read-access
RWMUTEX_WIN_ASSERT(this, m_count < LONG_MAX);
// try to obtain overall mutual exclusion
if (TryEnterCriticalSection(&m_wrlock)) {
/* let everybody know a writer is in the pipe,
and atomically revoke the prior read-access
*/
const_cast<rwmutex*>(this)->sys_wracquire(1);
/* we have atomically rendered the prior read-access
into a write-access! :^)
*/
return true;
}
/* another write-request beat us to the punch;
sh%t happens! ;^)
*/
return false;
}
}; // class rwmutex
}} // namespace winsyncsys::v0001
/* Basic Windows Synchronization User Library Abstraction
_________________________________________________________*/
namespace winsync = winsyncsys::RWMUTEX_WIN_VER_SELECT();
/*************************************************************/
/*************************************************************/
#endif // #if ! defined(RWMUTEX_WIN_V0001_HPP)
__________________________________________________________________
I am creating some simple usage examples. Basically, you can use it like:
__________________________________________________________________
static winsync::rwmutex g_rwmtx;
void readers() {
for(;;) {
[...]
{
winsync::rwmutex::read rdlock(g_rwmtx);
[read-access granted]
}
[...]
}
}
void writers() {
for(;;) {
[...]
{
winsync::rwmutex::write wrlock(g_rwmtx);
[write-access granted]
}
[...]
}
}
__________________________________________________________________
Its fairly simple to use indeed... BTW, sorry for some of the spelling
mistakes in the comments. I need to run those through a spell checker. Humm,
do any of you know of a tool that can spell check comments in C/C++ code? I
think I could write a macro for ms-word or something...
> I have updated the code some more. I removed some local variables, and I
> made some of the shared ones mutable in order to decorate the read-access
> functions as const. I tweaked the code for the writer locking. Basically, I
> moved the functionality out into two private functions (e.g.,
> rwmutex::sys_wracquire/release).
I would not like to upset you, but Win32 API already includes such
API. Along with condition variable, lock-free mpmc stack and one-time
initialization. Good news for you is that API is available only since
Vista/Longhorn :)
For details see in MSDN:
InitializeSRWLock()/AcquireSRWLockExclusive()/AcquireSRWLockShared()
And maybe it will be interesting for you too:
InitializeConditionVariable()/SleepConditionVariableCS()/
SleepConditionVariableSRW()
InitOnceInitialize()/InitOnceExecuteOnce()
InitializeSListHead()/InterlockedPopEntrySList()/
InterlockedPushEntrySList()
SRWLock is very slim - only sizeof pointer. Thus is not recursive. And
not upgradeable. Shared lock/unlock - one interlocked operation as
expected.
See this blog for details:
http://www.bluebytesoftware.com/blog/2006/06/22/NewVistaConcurrencyFeatures.aspx
Dmitriy V'jukov
Yeah. I can't believe it took them that long to include a condvar! Anyway...
They are still missing EventCount's! Unless I am missing something here, I
don't think keyed events are as versatile as eventcounts with regard to
their use in non-blocking algorithms.
> Good news for you is that API is available only since
> Vista/Longhorn :)
Indeed. :^)
I was trying to do something for all the people who want to keep using NT/XP
for as long as possible. Vista, IMVHO, does not have what it takes to
justify a switch from NT/XP...
> For details see in MSDN:
> InitializeSRWLock()/AcquireSRWLockExclusive()/AcquireSRWLockShared()
>
> And maybe it will be interesting for you too:
> InitializeConditionVariable()/SleepConditionVariableCS()/
> SleepConditionVariableSRW()
> InitOnceInitialize()/InitOnceExecuteOnce()
> InitializeSListHead()/InterlockedPopEntrySList()/
> InterlockedPushEntrySList()
Yeah. Here is something you can do with the InterlockedXXXSList API:
http://groups.google.com/group/microsoft.public.win32.programmer.kernel/msg/dc5a8ab20816b56c
Simplistic Lock-Free LIFO/FIFO IPC. Your going to have to use something to
make it waitable. EventCount makes this trivial. You can also use something
like Joe Seighs fast semaphore implementation he posted on cpt several years
ago.
> SRWLock is very slim - only sizeof pointer. Thus is not recursive. And
> not upgradeable. Shared lock/unlock - one interlocked operation as
> expected.
I believe the last rwmutex implmentation of Microsoft's was in some .NET CLI
code here:
http://groups.google.com/group/comp.programming.threads/msg/b73084939aac60f0
I believe its structure was something like the one Michel outlined in the
post above:
_____________________________________
uchar - readers
bit - readers signaled
bit - writers signaled
uchar - waiting readers
uchar - waiting writers
_____________________________________
It seems that the links are no longer active, anyway IIRC, the code used
event/sema to do the waitsets. Humm, I am not sure if Windows Vista uses
keyed events as waitsets for everything now. I also think they use CAS to
mutate the state. The algorithm I created for NT/XP does not use CAS at all.
It also has wait-free properties for uncontended readers. Even though the
code is in a early stage, IMVHO, I think its more versatile than the one
provided by Vista. What do you think?
> See this blog for details:
> http://www.bluebytesoftware.com/blog/2006/06/22/NewVistaConcurrencyFeatures.aspx
Yup; I believe I have already read that sometime ago. I guess the main point
I want to make here is that there is no "real" need to get Vista just to
have access to a condvar, or rwmutex. As you know, Windows 9x/XP/NT already
exposes all of the API's that are needed to get fast-pathed synchronization
primitives up and running; Vista NOT required for this at all.
Its kind of funny how some people seem to get so excited about Microsoft
sticking a condvar and rwmutex implementation in Windows Vista, when they
most certainly could of had them all the way back in Windows 9x. They also
could have easily had a "initialize-once" implementation; why they waited
all this time, I don't know. IMHO, POSIX still has them beat wrt
threading/synchronization API...
Not even close to your eventcount ;)
Dmitriy V'jukov
Yeah.
In client applications WinXP will be droped off maybe in 5-10 years.
But in server applications different picture. I can create server app
particularly for Windows Server 2003 R2 now, and for Windows Longhorn
in year or two.
Win API is very attractive because it's "standard", and I can just
write "#include <windows.h>" and use this standard, reliable, stable
and tested API out of the box.
Dmitriy V'jukov
> Yeah. Here is something you can do with the InterlockedXXXSList API:
>
> http://groups.google.com/group/microsoft.public.win32.programmer.kern...
>
> Simplistic Lock-Free LIFO/FIFO IPC. Your going to have to use something to
> make it waitable. EventCount makes this trivial. You can also use something
> like Joe Seighs fast semaphore implementation he posted on cpt several years
> ago.
Yes. Their API is not "complete". One must bolt something on top
anyway...
I think that they will create more usable API in .NET on top of
this...
BTW SList copes with ABA in interesting way.
They use ABA counter along with SEH __try/__catch block to prevent
reading from freed memory.
Something like:
slist_node*
slist_anchor_pop(
slist_anchor* const _this
) {
begin:
bool bad_thing = false;
__try
{
slist_anchor cmp, xchg;
do {
cmp = *_this;
if (! cmp.head) { return 0; }
/* hazard! refer to PDR algorihtm for more details... */
/* not hazard any more */
xchg.head = cmp.head->nx;
/* increment ABA */
xchg.aba = cmp.aba + 1;
} while (! DWCAS(_this, &cmp, &xchg));
}
__catch (EXECUTE_HANDLER)
{
bad_thing = true;
}
if (bad_thing)
goto begin;
return cmp.head;
}
Dmitriy V'jukov
> It seems that the links are no longer active, anyway IIRC, the code used
> event/sema to do the waitsets. Humm, I am not sure if Windows Vista uses
> keyed events as waitsets for everything now. I also think they use CAS to
> mutate the state. The algorithm I created for NT/XP does not use CAS at all.
> It also has wait-free properties for uncontended readers. Even though the
> code is in a early stage, IMVHO, I think its more versatile than the one
> provided by Vista. What do you think?
I wonder why they use CAS instead of InterlockedIncrement/
InterlockedDecrement...
BTW did you see this:
http://www.accu.org/index.php/journals/1324
New boost.thread implementation will have solution similar to your.
And it will be available on all Windows versions along with Linux etc.
And I propose a little improvement for this implementation :) It seems
that Anthony Williams just overlooked this little moment:
Instead of:
bool lock()
{
long old_flag=0;
while(true)
Must be:
bool lock()
{
long old_flag=flag;
while(true)
Hmmm.... Now I notice that he use InterlockedCompareExchange on fast-
path too...
So your implementation with InterlockedIncrement/InterlockedDecrement
is somewhat better ;)
Maybe Anthony Williams will be interested in your algorithm to
incorporate into boost...
Dmitriy V'jukov
[...]
Right. We have has discussion on this here in cpt. One of the participants
in the discussion "Oliver S." or something, posted many messages with the
"No-Archive-X" flag raised in the headers; his posts are deleted. He
proposed Microsoft's solution. There can be several issues; I can only hope
that Microsoft has addressed them all in their SEH implementation. This is
similar to counting on a Load-Linked reservation to fail when the memory
that represents it gets freed within the LL/SC scope; e.g, mem gets freed
after LL, and before SC. That can raise some exceptional conditions. SEH is
a "bit" of a hack?, IMVVVHO that is...
"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:7e695ca8-ab73-4ada...@s36g2000prg.googlegroups.com...
With InterlockedCompareExchange() you have to organize do{}while(!
CAS()) loop. So InterlockedCompareExchange() can be executed several
times instead of one.
I am not sure how significant this advantage in practice...
Dmitriy V'jukov
"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:c87222d0-214b-4203...@n20g2000hsh.googlegroups.com...
On 2 дек, 01:55, "Alexander Grigoriev" <al...@earthlink.net> wrote:
> InterlockedCompareExchange by itself is LOCK XCHG instruction. Of course, it
> allows to implement arbitrary atomic read-modify-write functions, including
> OR, AND, XOR and conditionals, at a cost of possible retrying the operation.
>
> "Dmitriy Vyukov" <dvyu...@gmail.com> wrote in message
>
> news:c87222d0-214b-4203...@n20g2000hsh.googlegroups.com...
If memory is freed then CAS in lock-free stack algorithm will fail.
Freed node can't be head of stack. I'm sure that they have addressed
all such issues.
> SEH is
> a "bit" of a hack?, IMVVVHO that is...
Yeah. A bit. But I'm more concerned with performance and scalability
of such approach. I think that such approach has medium performance
(because of SEH frame installation and deinstallation for every pop
operation) and perfect scalability (because only thread local
operations involved).
And SEH frame installation/deinstallation cost can be eliminated if
one big frame will be installed for every thread when thread starts.
V'jukov Dmitriy
If the CAS even gets executed. The thing can crash on the hazard. I am not
sure how good it is to base your non-blocking memory management scheme on
the fact that your threads will be hitting segmentation faults. Microsoft
doesn't seem to mind. That's why I think its a bit of a hack.
> Freed node can't be head of stack. I'm sure that they have addressed
> all such issues.
Yeah. The basic result is seg-fault, and they use SEH to handle that
situation, and they make the OS, so I bet it works.
[...]
> And SEH frame installation/deinstallation cost can be eliminated if
> one big frame will be installed for every thread when thread starts.
A big catch all... Not sure if I like that or not... Humm...
> > If memory is freed then CAS in lock-free stack algorithm will fail.
>
> If the CAS even gets executed. The thing can crash on the hazard. I am not
> sure how good it is to base your non-blocking memory management scheme on
> the fact that your threads will be hitting segmentation faults. Microsoft
> doesn't seem to mind. That's why I think its a bit of a hack.
In Unix world paging fault (they call this segmentation fault :) ) is
a really fearful thing without appropriate means to cope with.
But in Windows world paging fault (they call this access violation) is
a ordinary thing with appropriate means to cope with. You can install
SEH frames per thread, you can nest SEH frames, you can filter
exceptions, you can even repair environment and retry and raise any
exception programmatically with RaiseException(). And raising
exceptions with particular codes is the official method to communicate
with debugger.
> > And SEH frame installation/deinstallation cost can be eliminated if
> > one big frame will be installed for every thread when thread starts.
>
> A big catch all... Not sure if I like that or not... Humm...
Yes, big catch, but not all. Only AV exceptions in pop() function.
Exception handler must restart pop() function from beginning.
Dmitriy V'jukov
seg-fault is a bug in my book... :^0
Well, if I guess you can use SEH for strong thread-safe atomic ref-counting
as well:
_____________________________________________________________
rc* rcinc_strong(rc** const pshared) {
rc* local;
restart:
local = *pshared;
if (local) {
_try {
int rcval
do {
rcval = local->rcval;
if (rcval < 1) {
goto restart;
}
} while (! CAS(&local->rcval, rcval, rcval + 1));
} _catch_everything {
goto restart;
}
}
return local;
}
_____________________________________________________________
I don't want/need to use exceptions and/or debugging interfaces in order to
get correct code. Virtually zero-overhead PDR works fine, without
sef-faulting... Humm.
> I don't want/need to use exceptions and/or debugging interfaces in order to
> get correct code
You and another 3 people who hold PDR patents :)
> Virtually zero-overhead PDR works fine, without sef-faulting... Humm.
Good name for the thing - SF-PDR - self-faulting PDR :)
Dmitriy V'jukov
NO! Here is big difference. SEH only prevents access to unreserved
memory in process address space. But SEH don't prevent from reading/
writing new objects in old place (or just trash).
So here you can not only read some unknown object, but write some
random bytes into unknown object too!
Dmitriy V'jukov
> BTW did you see this:
> http://www.accu.org/index.php/journals/1324
>
> New boost.thread implementation will have solution similar to your.
> And it will be available on all Windows versions along with Linux etc.
Currently the linux version uses pthreads, but I do intend to rewrite it at
some point.
> And I propose a little improvement for this implementation :) It seems
> that Anthony Williams just overlooked this little moment:
>
> Instead of:
> bool lock()
> {
> long old_flag=0;
> while(true)
>
> Must be:
> bool lock()
> {
> long old_flag=flag;
> while(true)
I did spot that after I wrote the article. The current boost trunk
implementation does this.
> Hmmm.... Now I notice that he use InterlockedCompareExchange on fast-
> path too...
>
> So your implementation with InterlockedIncrement/InterlockedDecrement
> is somewhat better ;)
> Maybe Anthony Williams will be interested in your algorithm to
> incorporate into boost...
As explained in the article, the CAS in the lock() is to allow a "waiters"
count as well as a "locked" flag. We could remove the count (and just use
InterlockedExchange in lock()) if unlock() always did an unconditional
SetEvent call, or we used another scheme for the count (such as an
entirely-separate counter).
We could do this, for example:
class mutex
{
long locked;
long counter;
void* event;
public:
mutex():
locked(0),counter(0),event(CreateEvent(0,0,0,0))
{}
void lock()
{
lock const was_locked=InterlockedExchange(&locked,1);
if(was_locked)
{
InterlockedIncrement(&counter);
while(InterlockedExchange(&locked,1))
{
WaitForSingleObject(event,INFINITE);
}
InterlockedDecrement(&counter);
}
}
void unlock()
{
store_release(locked,0);
_ReadWriteBarrier(); // don't move the read of counter before the
// write of locked
if(load_acquire(counter))
{
SetEvent(event);
}
}
};
Is that really an improvement? Well, I suppose it doesn't use any interlocked
instructions in unlock().
What does Chris's InterlockedIncrement/InterlockedDecrement implementation
look like?
Anthony
--
Anthony Williams
Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
If the memory gets freed, it's the same as someone standing on a chair, and
another kicks the chair from under the first's feet. Properly designed
program just should not do it.
> >> Freed node can't be head of stack. I'm sure that they have addressed
> >> all such issues.
>
> If the memory gets freed, it's the same as someone standing on a chair, and
> another kicks the chair from under the first's feet. Properly designed
> program just should not do it.
But if someone put a pillow and waiting for kick... :)
RCU badly works in user space. And requires participation from all
threads. And prohibits holding references in quiescent state.
SMR/ROP/SMR+RCU prohibits holding arbitrary number of references.
vZOOM requires participation from all threads. And have overhead per
long-term reference.
Differential reference-counting have big overhead. And requires
intrusiveness.
Not counting that they all are patented...
So what you are propose to do?
IMVHO they have very nice solution.
Interesting how M$ implement lock-free stack on x86-64? There is no
DWORD CAS...
Dmitriy V'jukov
Here is the code:
http://appcore.home.comcast.net/misc/winsync_v0001_rwmutex_hpp.html
YIKES! Your absolutely correct. WTF was I thinking!
;^0
> Here is big difference. SEH only prevents access to unreserved
> memory in process address space. But SEH don't prevent from reading/
> writing new objects in old place (or just trash).
> So here you can not only read some unknown object, but write some
> random bytes into unknown object too!
Check these posts out:
http://groups.google.com/group/comp.programming.threads/msg/93421cc16fcfc072
http://groups.google.com/group/comp.programming.threads/msg/32bf9bc5c5aa85ee
http://groups.google.com/group/comp.programming.threads/msg/24b6ef3944657343
http://groups.google.com/group/comp.programming.threads/msg/447b1397aa7ee7fa
Those posts were about Joe Seighs atomic_ptr for PPC. The SC can operate on
reclaimed memory, it will fail, but it raises exceptions and its
implementation defined behavior on the PPC. I wonder how Microsoft gets
around the hazard in the lock-free stack in for PPC (e.g., XBOX)...
Lock-free LIFO stack algorithm (the one describe in 370/Z-arch POp) has
the same restriction since it fetchs a "next" pointer from what may
no longer be a node. Not new.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
That's a read-write mutex, not a plain mutex (which is what my article
was about). Also, the write-lock portion (which is the closest to a
normal mutex) starts with "EnterCriticalSection" --- not exactly the
optimized code I expected.
I would be interested in your comments on the new scheme I just
suggested:
class mutex
{
long locked;
long counter;
void* event;
public:
mutex():
locked(0),counter(0),event(CreateEvent(0,0,0,0))
{}
void lock()
{
lock const was_locked=InterlockedExchange(&locked,1);
if(was_locked)
{
InterlockedIncrement(&counter);
while(InterlockedExchange(&locked,1))
{
WaitForSingleObject(event,INFINITE);
}
InterlockedDecrement(&counter);
}
}
void unlock()
{
store_release(locked,0);
_ReadWriteBarrier(); // don't move the read of counter before
the
// write of locked
if(load_acquire(counter))
{
SetEvent(event);
}
}
};
Anthony
I assume that _ReadWriteBarrier is a #StoreLoad | #StoreStore right?
> if(load_acquire(counter))
> {
> SetEvent(event);
> }
> }
>
> };
That barrier is too strong for the mutex unlock function because it only
requires release semantics. Since you do a store, followed by a load to
another location, you need a fence in between. So, IMO, you have not
actually optimized the unlock function... You made it more expensive because
it has to have the full membar semantics instead of just release semantics.
I like Alex Terekhov's mutex implementation myself. IIRC, it was something
like:
_________________________________________________________________
class mutex {
long m_state;
HANDLE m_wset;
enum state_e {
UNLOCKED = 0,
LOCKED = 1,
CONTENTION = 2
};
public:
mutex() :
m_state(UNLOCKED),
m_wset(CreateEvent(0, FALSE, FALSE, 0)) {
if (! m_wset) { throw std::exception(); }
}
~mutex() throw() {
if (! CloseHandle(m_wset)) { assert(false); }
}
public:
void lock() throw() {
if (InterlockedExchangeAcquire(&m_state, LOCKED)) {
Sleep(1);
while (InterlockedExchangeAcquire(&m_state, CONTENTION)) {
if (WaitForSingleObject(m_wset) != WAIT_OBJECT_0) {
assert(false); std::unexpected();
}
}
}
}
void unlock() throw() {
if (InterlockedExchangeRelease(&m_state, UNLOCKED) == CONTENTION) {
if (! SetEvent(m_wset)) {
assert(false); std::unexpected();
}
}
}
};
_________________________________________________________________
What about the read functions? I did not bother to heavily optimize the
write functions. The writer fast-paths have to execute two interlocked
instructions. The read functions only have to execute a single interlocked
instruction. Also, uncontended readers are wait-free; there are no loops in
the entire algorithm.
This might work:
___________________________________________________________
enum flags_e {
LOCKED = 0x1,
WAITERS = (LOCKED | 0x2),
UNLOCKED = 0
};
void lock() {
while(ATOMIC_BTS(&m_state, LOCKED)) {
if(ATOMIC_SWAP(&m_state, WAITERS) & LOCKED){
WaitForSingleObject(m_wset);
}
}
membar #StoreLoad | #StoreStore;
}
void unlock() {
long const state = ATOMIC_LOAD(&m_state);
membar #LoadStore | #StoreStore;
ATOMIC_STORE(m_state, UNLOCKED);
long const cmp = ATOMIC_LOAD(&m_state);
if (state & WAITERS || cmp & WAITERS) {
SetEvent(m_wset);
}
}
___________________________________________________________
You don't need the full membar in the unlock function here because the store
is followed by a load to the same location.
No, it's a compiler barrier that prevents rearranging the code, rather
than a membar.
That's pretty good, but I'm not sure I like it. If thread A has the
lock, and thread B is waiting, m_state is CONTENTION. If thread C then
sets the state to LOCKED and gets preempted in the Sleep, and thread A
unlocks whilst thread C is sleeping, thread A will see a value of
LOCKED rather than CONTENTION, so it won't trigger the event. Thread B
will then not be woken, and can't take the lock, even though thread C
has just gone to Sleep.
Anthony
How is that better than my original code? You have one or two atomic
RMW ops in the lock before the wait vs one CAS loop, and the unlock
looks like it's trying to do ATOMIC_SWAP in a non-atomic fashion (I
think it would be better if it *did* just use ATOMIC_SWAP), whereas my
version does a single InterlockedExchangeAdd.
Anthony
Actually there is 128 bit CAS, but not on IA64 (as if anybody cares).
"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:98e81acd-3e5e-4428...@d4g2000prg.googlegroups.com...
But they have to pack full-fledged user pointer *and* ABA counter in
64 bits...
> Actually there is 128 bit CAS, but not on IA64 (as if anybody cares).
Afaik IA64 have cmp8bxchg16b, but first x64 doesn't have. Afaik only
latest x64 have 128 bit CAS...
Dmitriy V'jukov
Having Sleep() there makes no sense. Chris' IIRC is not quite C. :-)
However, if the only atomic RMW available is swap then we could indeed
have a temporary loss of contention indication. Atomic bit test and
set on x86 (to get rid of that problem) aside for a moment, a bit more
powerful archs (like the current Xbox for example) can do something
along the lines of the following:
// doesn't provide "POSIX-safety" with respect to destruction
class mutex_for_Xbox_360 {
atomic<int> m_lock_status; // 0: free, 1/-1: locked/contention
auto_reset_event m_retry_event; // prohibitively slow bin.sema/gate
int change_reserved(int old, int new) {
while (!m_lock_status.store_conditional(new)) {
int fresh = m_lock_status.load_and_reserve();
if (fresh != old) return fresh;
}
return old;
}
public:
void lock() {
int old = m_lock_status.load_and_reserve();
if (old || old = change_reserved(0, 1)) {
do {
while (old < 0 || old = change_reserved(1, -1)) {
m_retry_event.wait();
if (!(old = m_lock_status.load_and_reserve())) break;
}
} while (old = change_reserved(0, -1));
}
isync();
}
void unlock() {
int old = m_lock_status.load_and_reserve();
assert(old);
lwsync();
if (old < 0 || !m_lock_status.store_conditional(0)) {
m_lock_status.store(0);
m_retry_event.set();
}
}
};
regards,
alexander.
A compiler barrier is not going to prevent the x86 from hoisting the load
above the store.
__________________________________________________
void unlock()
{
1: store_release(locked,0);
_ReadWriteBarrier();
2: if(load_acquire(counter))
{
SetEvent(event);
}
}
__________________________________________________
Step 2 can be executed before step 1:
http://groups.google.com/group/comp.programming.threads/msg/731cd06e8c04f744
Unless I am missing something, this will cause problems in the algorithm
as-is. I think your going to need to do a fence:
__________________________________________________
void unlock()
{
store_fence(&locked,0);
if(load_naked(&counter))
{
SetEvent(event);
}
}
__________________________________________________
Humm.
Yeah. Here is official information:
http://groups.google.com/group/comp.programming.threads/msg/e01455d08325c71c
Neill Clift works on the kernel. According to him, they use a pointer
compression technique.
>> Actually there is 128 bit CAS, but not on IA64 (as if anybody cares).
This is pain in the neck for some people... Yes, they care about this
indeed. SPARC has the same problem:
http://groups.google.com/group/comp.arch/browse_frm/thread/71f8e0094e353e5
> Afaik IA64 have cmp8bxchg16b, but first x64 doesn't have. Afaik only
> latest x64 have 128 bit CAS...
Yup.
DOH! Your right of course! Shi%t.
FWIW, LOCK is already implied by the XCHG instruction; is not needed
explicitly...
Indeed it can read from memory that has been reclaimed, and subsequently
reallocated, within the scope of the CAS-loop or LL/SC. However, I think
that the SC part, or the CAS w/ ABA counter, should prevent the read of
garbage from being committed in the data-structure. The reservation is lost
for sure wrt LL/SC, and the ABA counter would detect that the operation that
removed and reclaimed the node wrt CAS-loop.
Ahh, why not just avoid the seg-fault like this:
_________________________________________________
void*
slist_anchor_pop(
slist_anchor* const _this
) {
PROXYGC_ACQUIRE();
slist_anchor cmp, xchg;
do {
cmp = *_this;
if (! cmp) { return 0; }
} while (! CAS(_this, cmp, cmp->nx));
void* state = cmp->state;
PROXYGC_RELEASE_AND_DEFER(cmp);
return state;
}
_________________________________________________
Using PDR within the context of a lock-free stack means that you non longer
need to worry about ABA or any reclamation issue for that matter. This seems
Kosher to me. I don't like my apps to seg-fault!
;^0
Well, if your code can avoid the fence in between the store_release and
subsequent load_acquire in the unlock function, then its not any better at
all:
http://groups.google.com/group/comp.programming.threads/msg/e15e510849374f97
> Unless I am missing something, this will cause problems in the algorithm
> as-is. I think your going to need to do a fence:
You're right. Any ideas how a store/fence/load compares to
InterlockedExchangeAdd?
Anthony
> Using PDR within the context of a lock-free stack means that you non longer
> need to worry about ABA or any reclamation issue for that matter. This seems
> Kosher to me. I don't like my apps to seg-fault!
I think it will be not very Kosher if Microsoft will require that
*ALL* user programs must call something like PdrSynchronize() in *ALL*
thread *PERIODICALLY*. And all old code on Win32 will be broken,
including all scripts, java, delphi etc.
Dmitriy V'jukov
http://groups.google.com/group/comp.programming.threads/msg/9134192beca91c5a
http://groups.google.com/group/comp.programming.threads/msg/890bb145ba5606ef
Windows is a platform in which you can do automatic epoch detection. You
don't have to explicitly call PdrSynchronize at all.
> And all old code on Win32 will be broken,
> including all scripts, java, delphi etc.
This is Microsoft's problem.
Since InterlockedExchangeAdd executes a fence, its most likely just as
expensive. However, when you compare it to InterlockedExchangeAddRelease,
then the fence will start to cause to unneeded overhead...
Let try any refine this a little bit:
> ___________________________________________________________
[...]
enum state_e {
UNLOCKED = 0,
LOCKED = 0x1,
WAITER = 0x2,
CONTENTION = (LOCKED | WAITER)
};
void lock() {
if (ATOMIC_BTS(&m_state, LOCKED)) {
while (ATOMIC_SWAP(&m_state, CONTENTION) & LOCKED) {
WaitForSingleObject(m_wset);
}
}
membar #StoreLoad | #StoreStore;
}
void unlock() {
long const state = ATOMIC_LOAD(&m_state);
membar #LoadStore | #StoreStore;
ATOMIC_STORE(m_state, UNLOCKED);
long const cmp = ATOMIC_LOAD(&m_state);
if (state & WAITER || cmp & WAITER) {
SetEvent(m_wset);
}
}
> ___________________________________________________________
[...]
Humm. Can you see a race-condition in the unlock function? I can't seem to
find one right now.
Let me try an refine this a little bit here:
>> ___________________________________________________________
[...]
It sounded kind of retarded before.
Humm. I am not sure I like the LL/SC version compared to the interlocked
swap/bts one. I think the latter can outperform it because the code to
perform the swap instruction on a system that only has LL/SC is something
like:
__________________________________________
extern "C" void*
ATOMIC_SWAPPTR(
void* volatile* const shared,
void* const local;
) {
void* prev;
do {
prev = LL(shared);
} while(! SC(shared, local));
return prev;
}
__________________________________________
instead of:
__________________________________________
extern "C" void*
ATOMIC_SWAPPTR(
void* volatile* const shared,
void* const local;
) {
return XCHG(shared, local);
}
__________________________________________
The LL/SC version has more overhead, and its not wait-free...
Humm...
If you're going to correct it, ITYM:
Let me try and refine this a little bit here:
--
Later,
Jerry.
The universe is a figment of its own imagination.
Think of waiters coming in between m_state store and first load.
Take a break. :-)
regards,
alexander.
Yup.
void lock() {
L1: if (ATOMIC_BTS(&m_state, LOCKED)) {
L2: while (ATOMIC_SWAP(&m_state, CONTENTION) & LOCKED) {
L3: WaitForSingleObject(m_wset);
}
}
membar #StoreLoad | #StoreStore;
}
void unlock() {
U1: long const state = ATOMIC_LOAD(&m_state);
membar #LoadStore | #StoreStore;
U2: ATOMIC_STORE(m_state, UNLOCKED);
U3: long const cmp = ATOMIC_LOAD(&m_state);
U4: if (state & WAITER || cmp & WAITER) {
SetEvent(m_wset);
}
}
The following execution sequence will miss a waiter:
ThreadA: U1 - loads LOCKED
ThreadB: L1 - sets LOCKED
ThreadB: L2 - stores CONTENTION
ThreadB: L3 - waits
ThreadA: U2 - stores UNLOCKED
ThreadA: U3 - loads UNLOCKED
ThreadA: U4 - misses waiter.
Is that the condition you are referring to?
> Take a break. :-)
;^)
Well, I think your SWAP version is about as good as its going to get. Even
though it can issue a superfluous wakeup at the end of a contention
interval:
ThreadA: SWAP(LOCKED) - Has the lock
ThreadB: SWAP(LOCKED) - Contention!
ThreadA: SWAP(UNLOCKED) - Issues nothing
ThreadB: SWAP(CONTENTION) - Has the lock
ThreadB: SWAP(UNLOCKED) - Issues superfluous wakeup
That's fine with me.
To correct the following:
> void unlock() {
> long const state = ATOMIC_LOAD(&m_state);
> membar #LoadStore | #StoreStore;
> ATOMIC_STORE(m_state, UNLOCKED);
> long const cmp = ATOMIC_LOAD(&m_state);
> if (state & WAITER || cmp & WAITER) {
> SetEvent(m_wset);
> }
> }
void unlock() {
membar #LoadStore | #StoreStore;
if (ATOMIC_SWAP(&m_state, UNLOCKED) & WAITER) {
SetEvent(m_wset);
}
}
>> ___________________________________________________________
> [...]
I wonder how much better a waiters-bit solution would actually be, other
than POSIX safety on sync object destruction.
> Windows is a platform in which you can do automatic epoch detection. You
> don't have to explicitly call PdrSynchronize at all.
Interesting. Don't forgot that SList is intrusive. How you are going
to detect that user's code doesn't hold any references in arbitrary
form to arbitrary object without any additional helper calls like
release_object()?
With stack analyzing? This won't work in C/C++ :)
> > And all old code on Win32 will be broken,
> > including all scripts, java, delphi etc.
>
> This is Microsoft's problem.
And exactly because this they don't relay on PDR
Dmitriy V'jukov