I would like know what is the difference between spinlock and mutex.
I think know, there are a mutiprocessor story but I don't know exactly what
is it.
If I take this exemple :
KeAcquireSpinLock <- CPU2 is here and it have the hand.
... <- CPU1 is here
KeReleaseSpinLock
What is the result ? CPU2 is locked until the CPU1 go on KeReleaseSpinLock ?
And for Mutex it's the same result ?
Thank you in advance,
Thierry.
1. When CPU2 waits the thread is blocked and another thread runs on CPU2
with a mutex, versus CPU2 consuming all its time spinning on the lock wth a
spin lock
2. Mutexs have a lot more overhead, spin locks are low overhead
3. Mutexs do not raise IRQL to DISPATCH_LEVEL, while spinlocks do.
--
Don Burn (MVP, Windows DKD)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
"o0Zz" <ooz...@gmail.com> wrote in message
news:%233$VwpEdK...@TK2MSFTNGP05.phx.gbl...
> __________ Information from ESET NOD32 Antivirus, version of virus
> signature database 4658 (20091203) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
>
__________ Information from ESET NOD32 Antivirus, version of virus signature database 4658 (20091203) __________
The message was checked by ESET NOD32 Antivirus.
spinlock acquire psudocode: (ignoring recursive acquisition and race
conditions)
while(true)
{
if(TryToAcquireLock()) // this is a non-blocking test of a
variable using InterlockedXXX
{
break;
}
}
The loop will consume all 100% of a single CPU until the lock is acquired,
the thread is interrupted by a higher priority thread or the thread quantum
ends (and the OS schedules another thread).
Mutex acquire psudocode: (ignoring recursive acquisition and race
conditions)
SuspendThreadAndRegisterWaitForObject() // this is a blocking call
The scheduler will suspend this thread and schedule other threads on this
CPU until the mutex is signaled by another thread and this thread is marked
as ready to run. Once the thread is ready to run, the scheduler will
schedule it to run on an available CPU it it will continue normally.
"o0Zz" <ooz...@gmail.com> wrote in message
news:#3$VwpEdK...@TK2MSFTNGP05.phx.gbl...
That information is incorrect, on Windows a spinlock raises to
DISPATCH_LEVEL (or higher) so it is not possible to get interrupted by any
thread. It's not even possible for the thread to finish its quantum while it
spins or for the dispatcher to schedule another thread on the same CPU. It
can get interrupted by an ISR but it will return (to the spinning state) as
soon as the ISR has executed and returns control.
//Daniel
"Daniel Terhell" <dan...@resplendence.com> a �crit dans le message de groupe
de discussion : emRp7$LdKHA...@TK2MSFTNGP04.phx.gbl...
"Daniel Terhell" <dan...@resplendence.com> wrote in message
news:emRp7$LdKHA...@TK2MSFTNGP04.phx.gbl...
The kind of operations that might be suited to a spinlock protection are:
1) Updating several fields atomically in some shared structure.
2) Adding/removing an entry from a shared list.
As Don says a Mutex is much more sophisticated and expensive and although
very uself in Win32 programming is no match for a spinlock when it comes to
raw speed (this is why device drivers use them).
The real question here is why has MS not provided developers with a spinlock
API for user mode code?
I had to hand craft such a beast in C, it has to support:
1) Nested locking (a spinlock can be acquired 'n' times and must then be
released 'n' times).
2) Read/Write locking modes.
3) Not end up dragging its heels because of these needs.
If you do a lot of work with shared/mapped memory as we do, then such an API
is very important, if you rely on Mutexes it will cost dearly, also the
interlock functions alone wont let you easily do the above (but they do
provide the foundation).
I've been interested for some time in seeing open source or public domain
implementations of a user-mode spinlock API (one that is truly tested on
multicore H/W) but there seems to be absolutely nothing, what we have seems
to be pretty good but it is pretty complex and I worry sometimes...
;)
"o0Zz" wrote:
> Thank you very much !
>
> "Daniel Terhell" <dan...@resplendence.com> a écrit dans le message de groupe
> .
>
I bet there is a reason. I bet by the time you do a switching from
user mode to kernel mode, the whole performance gain becomes negligible
in comparison.
I thought mutex is already quite an efficient way of doing locking.
Once you are in user mode trying to use a kernel mode spinlock
equivalent somehow does not make much sense.
MS recommends to hold a spin lock for no more than 50 microsecs
if I recall correctly.
>I had to hand craft such a beast in C, it has to support:
>
>1) Nested locking (a spinlock can be acquired 'n' times and must then be
>released 'n' times).
>2) Read/Write locking modes.
>3) Not end up dragging its heels because of these needs.
>
>If you do a lot of work with shared/mapped memory as we do, then such an API
>is very important, if you rely on Mutexes it will cost dearly, also the
>interlock functions alone wont let you easily do the above (but they do
>provide the foundation).
>
>I've been interested for some time in seeing open source or public domain
>implementations of a user-mode spinlock API (one that is truly tested on
>multicore H/W) but there seems to be absolutely nothing, what we have seems
>to be pretty good but it is pretty complex and I worry sometimes...
Do you have estimates on performance comparison between your app level
spin lock and a regular mutex?
>;)
>
>
>"o0Zz" wrote:
>
>> Thank you very much !
>>
>> "Daniel Terhell" <dan...@resplendence.com> a écrit dans le message de groupe
>
>> de discussion : emRp7$LdKHA...@TK2MSFTNGP04.phx.gbl...
>> > "m" <m@b.c> wrote in message news:eP9kilId...@TK2MSFTNGP02.phx.gbl...
>> >> The loop will consume all 100% of a single CPU until the lock is
>> >> acquired, the thread is interrupted by a higher priority thread or the
>> >> thread quantum ends (and the OS schedules another thread).
>> >>
>> >
>> > That information is incorrect, on Windows a spinlock raises to
>> > DISPATCH_LEVEL (or higher) so it is not possible to get interrupted by any
>> > thread. It's not even possible for the thread to finish its quantum while
>> > it spins or for the dispatcher to schedule another thread on the same CPU.
>> > It can get interrupted by an ISR but it will return (to the spinning
>> > state) as soon as the ISR has executed and returns control.
>> >
>> > //Daniel
>> >
>> >
>> .
--
Programmer's Goldmine collections:
Tens of thousands of code examples and expert discussions on
C++, MFC, VC, ATL, STL, templates, Java, Python, Javascript,
organized by major topics of language, tools, methods, techniques.
CRITICAL_SECTION provide an option of limited spinning in case of
contention, if you expect that the CS will mostly be held for very short
time.
"Hugo gle...@hotmail.com>" <hugh<underbar> wrote in message
news:574400EE-AFAF-4620...@microsoft.com...
>> "Daniel Terhell" <dan...@resplendence.com> a ecrit dans le message de
Good point.
Actually, when I was doing a contract with Intel on a multi-gigabit
network card, there was one guy that was suggesting to do something
totally out of the wall in my view. Pretty much the same idea as in
this thread.
They were trying to squeze out as much performance as possible.
So, this "smart", fat blob, came up with the idea to manipulate
the kernel level synchronization objects from the app level, such
as resetting the count of kernel level semaphores under some contitions.
On a weekly project meeting I told him: you will never prove this
thing is going to work. That is the whole point with semaphores.
They have been mathematically proven to work. Once you start resetting
the counts at will, outside of the normal semaphore mechanics,
you'll be fixing this thing for years to come.
The result?
They fired me. Not him.
I wonder if they were ever able to make that network card to work.
When they gave me this project to create a hack to manipulate the
kernel level structures from the user mode, and gave me a couple
of weeks to do that, I could not believe these smart donkeys.
It takes YEARS to work on things like these. These are some of
the central concepts in O/S design and the very idea of separation
of kernel and user mode. I nearly flipped out when manager told
me my "project".
Some people really need their brains examined.
Where did they get their estimates on "improving performance"
by slaughtering some of the most fundamental concepts in o/s
design and architecture just escapes me.
--
"tanix" <ta...@mongo.net> wrote in message
news:hggafe$ot7$2...@news.eternal-september.org...
If you work with shared mapped memory a great deal as we do, then you need
very fast means of locking shared data structures, especially of this is
happening very often.
For example we have a shared heap library that we designed, it obvioulsy
must handle safe updates to heap control metadata by many threads and many
processes at the same time, perhaps tens of thousands of times a second.
A mutex simply can't deliver the speed and efficiency.
Getting/releaseing the Mutex costs far far more CPU cycles then updating
four fields in a little structure and a pointer or two.
Most people perhaps use shared memory for the odd neat trick or some useful
little mechanism, but if you rely on it as a vehicle for sharing/storing many
gigs of data then you really do need speedy locks.
Hugo
"tanix" wrote:
> .
>
However, I have to ask how is that fool's suggestion "pretty much the same
idea as in this thread"?
I am totally opposed to the kind of thing he was suggesting.
All I said is that user-mode code sometimes has a real need for a very fast
locking mechanism, I never once said that user mode code should in any way
access or use or interfere with kernel mode mechanisms, I would never suggest
such a thing.
But it is naive to believe that non-kernel mode developers never need very
fast read/write locks.
Even the OS kernel does not support read/write spinlocks, but would it not
be good if it did? Surely there are scenarios within driver design where a
read spinlock would be useful?
Hugo
IMHO, a minimal implementation can be created with three volatile long
variable in less than 50 lines of code so the APIs provided by MS
(AcquireSRWLockExclusive and friends) are almost redundant
http://msdn.microsoft.com/en-us/library/aa904937(VS.85).aspx
"Hugog...@hotmail.com>" <hugh<underbar> wrote in message
news:574400EE-AFAF-4620...@microsoft.com...
"Hugo gle...@hotmail.com>" <hugh<underbar> wrote in message
news:D3ADE2F2-4CC9-474A...@microsoft.com...
"Pavel A." <pav...@12fastmail34.fm> wrote in message
news:DC1BDC51-74D5-49E7...@microsoft.com...
What implementation are you referring to?
FWIW, here is the best I could do so far:
http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/65822
(please read all...)
This algorithm is 100% fair and starvation free, and has wait-free
fast-paths for readers. It even has wait-free starvation freedom for writers
as well in the case in which I removed the CRITICAL_SECTION:
http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/65822/reply/87849
Also, this performs better than a lot of native POSIX rw-mutex
implementations out there. Unfortunately, the Windows implementation is
completely non-deterministic wrt access patterns...
And here is the kicker followup story.
I think you will appreciate this one.
When I signed a contract with Intel to do some kenel level work
on that network card project, I noticed this fat guy behaving so loosely,
that he looked like a bum from the Haight St. in San Francisco, than
to top level kernel level developer.
That got me interested.
That was in Moutain View, CA, USA.
A walking distance from the downtown and all the restaurants.
So, one day, when I was walking to downtown to have my lunch,
I met him and asked: btw, what kind of thing are you doing here?
And he said: Well, nothing really. Just screwing around here and there
and raising hell here.
That was just a cunning way to tell me that he is so great, that
he can bluntly tell me that he gives a flying dead chicken what ANYONE
might think of him saying things like that because he is SO great,
that THEY need him more than he needs THEM.
Wow. I am not exactly a present myself, and to hear such a thing
was the first time in my life, and I worked for the biggest and
baddest of them in the Silicon Valley doing top level work,
being paid so much money, many would not believe it.
And now the kicker:
You know what? That fat moron with 2 feet arms was making there...
Tadaaaam.....
Half a million bux!
Get that?
I can tell you another story when I, single handedly, saved about
$200 mills to one of the biggest and baddest players in the industry.
If I did not do what I did and did not stop the production of a new
product, which was their entry into a lucrative server market,
they would have a flop on a presentation on some major TV networks
and that would cause AT LEAST $200 mills. fall of their stock price
in a single trading session.
A couple of weeks before the product release, when everythign was
scheduled to go and they had a last meeting with their markerting
parner, I sent one email that got the director of development
of the whole company fired apparently.
That was a 100% correct and true email that did address the problem
and did propose a specific solution that was about the only solution
that could get those lethal problems resolved.
The result: the product was in tip top condition and the initial
release went with flying colors.
And then?
Well, instead of giving me a pink Rolls Royce silver spour,
they kicked my ass, because that stoopid manager just got frightened
of me, telling things like they in fact were.
Because he was also a screwball and he knew all too well
that one of these days he might be next to get his arse kicked
because I might write some other email or tell someone something,
that may get him fired.
And THAT is the reality of this vicious shark tank called
Da Silicon Valley.
Have fun.
>However, I have to ask how is that fool's suggestion "pretty much the same
>idea as in this thread"?
Well, because it was essentially the same thing.
Doing spinlocks from a user mode is UTTERLY out of the wall idea.
That is NOT what spinlocks are meant for.
Otherwise, it would be already done ages ago.
People do not seem to comprehend the basic ideas in o/s architecture,
and that is the separation of user and kernel mode.
Don't you EVER try to short circuit something, what took generations
to comprehend, design and develop and don't you challange the fundamental
concepts.
After all, who do you think you are Jigstra (sic)?
Can you prove ANYTHING that whatever you think you have done is
going to work?
>I am totally opposed to the kind of thing he was suggesting.
>
>All I said is that user-mode code sometimes has a real need for a very fast
>locking mechanism,
SCREW that idea.
Do you think you are the fist one to come up with THESE kinds of excuses?
Do you think no one before you looked into this thing,
and, for some strange reason, it is still not accepted by anybody else?
Do you think you are the first one that experiences the performance
issues?
Are you willing to redesign the generations old O/S principles
and propose something so revolutionary that it will increase the
performance drastically?
There is FOREVER a "need" to increase performance.
Since the day one of computers.
And in the case I was talking about, they had a number of alternatives
to increase the performance to achieve that target bandwidth.
One of those alternatives was Intel going with smaller die size.
Because they were still using dies that were 50% bigger than others
in the industry. I can not get into details here for obvious reasons.
There were other alternatives.
And the most ridiculous thing is: how much performance do you thnink
you are going to achive by simply resetting the semaphore count to 0
at some points?
Well, probably < 1%, no matter what that smaart moron tells them.
How OFTEN do you need to reset the semaphore count and at which points
of execution of what, trying to achieve what?
I bet, if I sat on that meeting and requested this moron to give
me FULL details of his idea and the estimate on performance,
he'd get fired. That is about the only thing I can see from that,
at least from my previous experience.
Because he was simply peddling crap, and of the lowest grade,
and since he knows all too well that the manager is not sufficiently
technically competent to even comprehend things like these,
and other people in the room could care less about HIS problems,
cause they were loaded up the hilt with THEIR problems.
Plus, like that dude said on this thread:
"... pure technical merits too often are not enough to stay in.
Need to have strong "communication skills"
This is funken ugliest insult I have ever heard!
I can communicate better than this sucker by ORDERS of magnitude.
Well, what he actually means is this:
You need to know whose ass to lick
and to whom to smile with a plastic smile
on your crocolile face.
And THAT is what they call: Strng communication skills".
The ability and cunningness to know which ass to lick
and to whom to smile, and what kind of lie to say and to whom,
and where to keep quiet when you actually have something to say,
and when to say something, even if it is the wrongest thing
in the worl, and things like that.
Clear enough?
> I never once said that user mode code should in any way
>access or use or interfere with kernel mode mechanisms, I would never suggest
>such a thing.
Well, it CAN use the INTERFACE. After all, that is what system calls are.
But it can NOT poke into kernel mode structures and interfere in
kernel mode mechanisms for ALL sorts of reasons.
>But it is naive to believe that non-kernel mode developers never need very
>fast read/write locks.
It is not even clear what you need.
Do you have the SPECIFIC estimates on what do you think is going to
improve what and to what extent, or it is just a purely mental excesize
in futility?
Do you know what are the bottlenecks in your architecture?
Do you know how well is your system structured?
Sorry, I view programming from the systems standpoint, even if you do
the app level code. Not in the strict sense of a system as it is known
in sw industry. But in a sense that every program is a system.
It is an extremely complex conglameration of all sorts of things
that cooperate in all sorts of ways.
Are you sure your operations are architectured well enough to make
sure the overall target of the entire project is achived?
What IS the "goals" of your project?
What ARE you trying to achive?
To simply run faster?
Even if your entire system is totally screwed up and users have a pain
on the neck to use it? Sorry for being so drastic, but that IS what
this boils down to.
You have holes ALL over the place in your design,
and instead of looking at the overall idea,
you keep inventing these utterly useless tricks
and then run to your manager and say: look how great I am.
I can increase performance like you never seen before.
And sure, he is happy to hear that, and probably considers you
as having "strong communication skills", peddling this bullshit
of the lowest grade, knowing full well he has no clue what you
are talking about, and even if he does, he is loaded SO much
with all other crap on his shoulders, that he has no more than
a few seconds to even look at your crap.
I bet what you are proposing to do is going to create more proglems
for your project, for your product and for your company, than the
amount of benefits it is going to bring.
I just bet you with about 99% certainty.
Because you simply do not know what you are doing.
You should go work at macdonalds with these kinds of approaches
or get a job as a snake oil salesman instead.
Sorry, have no more time for this kind of crap.
See if you can figure somethings out.
If you can, fine. If you can't fine.
Cya.
Are you a pervert?
I am not communicating to him when I say this.
I am communicating to YOU. Haven't you noticed?
And how would you describe someone that weighs about 3 times the
size of a "normal" human being?
That thing can hardly walk!
What do you call this, mr. smart?
>Calling a coworker "fat blob" is not the best way to communicate.
Btw, top posting (writing your followup at the top of the post
instead of making your comment in-line is what, mr. smart?
Means your brains are of a size of a peanut to figure out such
a simple thing?
Taking things out of context is what?
Talking about the kernel level here
while salivating at the same time?
Impressinve, I tellya.
:--}
Fine. I had to deal with that kind of thing.
And then?
>For example we have a shared heap library that we designed, it obvioulsy
>must handle safe updates to heap control metadata by many threads and many
>processes at the same time, perhaps tens of thousands of times a second.
Understood. I had to deal with a similar situation on a global
voice messaging system, just like many people do with their products.
>A mutex simply can't deliver the speed and efficiency.
How do you know where is your REAL bottlenecks?
Are you sure your system architecture is up to snuff?
May be the whole thing is not correctly archtected?
You can not just go for any critical point without seeing the whole
picture. ESPECIALLY, once you start suggesting things that FUNDAMENTALLY
violate the generations old principles of separation of kernel and
user mode?
Do you think people that created these concepts are stoopid?
Infantile?
You know better?
Well, can YOU propose the solution then?
Do you think the idea of separating the user mode and kernel mode
is flawed? How so?
>Getting/releaseing the Mutex costs far far more CPU cycles then updating
>four fields in a little structure and a pointer or two.
Yep, that is what ALL of them say.
It is called looking for the trees and not seeing the forest.
OBVIOUSLY.
Do you have an estimate on what kinds of problems your approach may
cause the the whole LIVELIHOOD of your system?
Do you have an estimate on how many light years is it going to take
to deal with problems are you going to create as a result of your
great "improvement"?
How many man years others would have to spend, forever fixing the
problems are you going to create, pretty much inevitably?
Do you have a mathematical proof that your concept is going to be
stable?
What DO you have beyond these small greedy things, tring to
squeeze some CPU cycles even from the places that clearly have
a label:
DO NOT ENTER. KERNEL LEVEL. NO MORONS ALLOWED.
Enough.
"tanix" <ta...@mongo.net> wrote in message
news:hghvjd$cj5$3...@news.eternal-september.org...
> And how would you describe someone that weighs about 3 times the
> size of a "normal" human being?
>
> That thing can hardly walk!
Hmm. How about "a wide-profiled professional" ?
Apologies, my previous advice was wrong... you probably need to see a
doctor.
--pa
"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:Z0_Wm.9220$ft1....@newsfe10.iad...
> It is a custom proprietary algorithm and I am not at liberty to reveal the
> source or owner.
Okay.
> It is not a fair algorithm and guarantees only forward progress in a
> non-deterministic way. In practice, it works well for the specific job it
> was designed to do - guarding access by _many_ fast readers and _few_ slow
> writers to in memory indices for a data cache with remote coherency.
Have you tried out a distributed rw-mutex? The idea is simple in that each
thread has a lock and read-access is comprised of a thread taking it's own
lock. Write access is achieved when a thread takes all the locks.
[...]
Hugh
"Alexander Grigoriev" wrote:
> .
>
You seem to think that a spinlock is is something that only the kernel can
do? is that your understanding of sofware engineering? Yes, the kernel has a
spinlock API designed for use within the kernel.
But anyone can write an equivalent API for user mode, all you need to do is
use the Win32 InterlockXXX functions and you can easily create a real,
working spinlock API.
This is what I have been discussing.
What if my shared memory contains thousands of distinct data items? What if
we have a need to read/update these in real-time from multiple processes each
running many threads? perhaps tens of thousands of times per second?
Do you naively think a Mutex or two or a hundred is going to help?
Is it not better to simply better to define a tny struct typedef "SpinLock"
and have a tiny API that uses InterlockXXX to "lock" and the struct? spinning
if the item is already locked?
Is this not a very good solution especially if we KNOW that the updates are
very short indeed?
Once again I am not in way suggesting access too the kernel, you have failed
to read my posts.
Hugh
Hugh
"tanix" wrote:
> .
>
The problem definition is this:
1. Must be a spinlock, using Interlock operations only, for concurrency
handling.
2. Must not use Mutex, Critical Sections, Semaphores etc.
3. Must supported nested lock acquisition.
4. Must support Read/Write modes.
5. Must work in multi-process settings , not just multithreaded settings in
a singe process.
I found nothing at all on this subject, I have seen zero published on the
subject of read/write spin locks. I have created an API that does this on
Windows, the API also has a vicious test harness. Interestingly an early
version appeared to work fine but when I tested it on a true dual processor
box it failed, I debugged and fixed the failure and now it seems to be stable
(assuming my test harness is truly rigorous).
I also once asked id anyone could dream up a good test harness to prove the
code, but nobody seemed interested, I'd still like to see anyone's
suggestions for a test shell because I may have missed something.
Either way, the subject of read/write spinlocks is genuinely interesting and
it seems neglected by current movers & shakers.
Hugh
"m" wrote:
> .
>
For number 5, how are you handling the case when a thread dies while it has
acquired either read or write access?
> I found nothing at all on this subject, I have seen zero published on the
> subject of read/write spin locks. I have created an API that does this on
> Windows, the API also has a vicious test harness. Interestingly an early
> version appeared to work fine but when I tested it on a true dual
> processor
> box it failed, I debugged and fixed the failure and now it seems to be
> stable
> (assuming my test harness is truly rigorous).
You should probably model the algorithm in Relacy Race Detector:
http://groups.google.com/group/relacy/web
It will quickly find all types of bugs. It can even simulate every possible
thread interleaving, which will find all errors. Can you post the algorithm?
> I also once asked id anyone could dream up a good test harness to prove
> the
> code, but nobody seemed interested, I'd still like to see anyone's
> suggestions for a test shell because I may have missed something.
>
> Either way, the subject of read/write spinlocks is genuinely interesting
> and
> it seems neglected by current movers & shakers.
FWIW, have you seen what is probably the most clever rw-spinlock out there?
Well,
IMHO that is:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/be3871ad661efa73
I applied eventcount's to Joe's bakery algorithm in order to allow waiters
to block on a kernel waitset, but you can easily strip all of that out and
add simple backoff scheme in the wait loop. It's 100% fair,
starvation-free, and has wait-free fast-paths. It was invented by Joe Seigh
and presented on `comp.programming.threads' some years ago, IIRC. However, I
don't think it supports read or write recursion. I need to revisit the
algorithm. You can always add read/write recursion using thread local
data-structures.
The idea is that after a number of failed tries (e.g. 100 spins) call
IsProcessRunning (the lock struct currently records the PID and
DateTimeStarted of the locker, these two items uniquely identify a process)
and if it has died, execute an "unlock on behalf of" type operation.
Then (in effect) retry the lock operation. If it was read locked by many
readers in that process, then eventually this approach will release each read
lock and resolve the problem.
However, I have not inserted this code or worked it out at the code level of
detail yet, the reason is that I am undecided as to how to handle the dead
locker anyway. If a locker died with a spin lock held then we may have a
corrupt system.
These locks are held purely within a larger API, the locking is never
exposed to consumers of the main API, any API function that gets a lock will
release it before it exits, so locks are never held unless executing inside
one of the main APIs.
I'm not at liberty unfortunately to reveal the algorithm or source code.
Let me look at this relay race thing....
Thx
Hugh
"Chris M. Thomasson" wrote:
> .
>
Curently the test code creates four writer threads and ten reader threads,
the writer threads run a writer-test function and the reader threads run a
reader-test function.
These are similar, the tests execute a loop, getting the lock then
repeatedly checking some shared vars. The vars should never take on
unexpected values whilst a lock is held. The reader threads also briefly lock
in write lock mode to further stress the code.
The core testing consists of each thread ensuring that while it has a lock,
the shared variables never take on any unexpected or inavlid value as would
be the case if some thread "thought" it had the lock when technically it did
not.
In addition the threads all sleep for a short but random time, whilst the
havey the lock and whilst the dont, further ensuring that no cyclic patters
emerge that could mask problems.
Ive run this test many times, soemtimes for hours and never seen any issue,
after a number of iterations each thread ends and the test shell thread does
a WaitForMultipleObjects on them.
So at the end of the test shell main function, I can examine the state of
the lock struct to see if it too looks OK.
The overall design of the lock algorithms began life as a FSM and was
defined in a spreadsheet as a state table with transitions etc, I pored over
this for weeks before writing any code.
Should I post the test code here as a straightforward message?
Thx
Hugh
"Chris M. Thomasson" wrote:
> .
>
FWIW, here is a very simple mutex implementation that only relies on atomic
swap and can be used in shared memory:
<pseudo-code typed in news reader>
_________________________________________________________
struct mutex
{
LONG m_state; // = 0
HANDLE m_waitset; // auto-reset event
void lock()
{
if (InterlockedExchange(&m_state, 1))
{
while (InterlockedExchange(&m_state, 2))
{
WaitForSingleObject(m_waitset, INFINITE);
}
}
}
void unlock()
{
if (InterlockedExchange(&m_state, 0) == 2)
{
SetEvent(m_waitset);
}
}
};
_________________________________________________________
However, it's not robust...
Are you packing all that information into a 32-bit or 64-bit word? I guess
you could use the low-part of the
typedef struct _FILETIME {
DWORD dwLowDateTime;
DWORD dwHighDateTime;
}FILETIME, *PFILETIME;
And the PID in a lock anchor like:
struct anchor
{
DWORD time;
DWORD pid;
};
And use DWCAS to atomically mutate it on a 32-bit system, and normal CAS to
mutate it on a 64-bit system. Is that what you are doing?
[...]
Not really, if you comprehend what was being said.
> and did not bother to read my orignal
>post
Not sure how original was the post I was following up on.
> (I have said three times here that I do not in an way suggest user mode
>code should use kernel mode mechanisms,
Spinlocks ARE a kernel mode mechanism.
There is a reason Microsoft suggests to hold them for extremely
short periods of time. These guys are not exactly dumb.
Anyway, I really have not much time or interest to deal with this.
Just one point: realize that by top posting (writing your response
BEFORE the stuff you are following up, is one of the most despised
ways to post. You are presenting your position out of context
and readers do not know what in your response relates to what
in the article you are following up.
That single thing by itself indicates to me you have a problem
with slopiness, and I would not be surprised it will eventually
translate into how you design and code things.
It is best to reply to some specific statement in place.
Beyond that, do as thou wilt.
Cya.
IMHO, it is not possible to implement a reliable handler for process
termination in this kind of algorithm without either KM support (slow) or a
watchdog (fragile). The better way is usually to implement lock timeout /
ABEND in the other tasks when data corruption might otherwise occur; but
that depends on your particular operating environment.
"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:aQPXm.72902$de6....@newsfe21.iad...
At a high level, the algorithm relies on signals (writer pending, reader
pending) + a status code to maintain the lock. The only concern in my mind
about this approach is the poor performance on NUMA. But as I said, it is a
minimal implementation! The best attribute is that reader acquisition when
uncontended by writers (99%+ of actual cases) requires only a single atomic
increment.
"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:qLFXm.1313$8e4...@newsfe03.iad...
"Hugo gle...@hotmail.com>" <hugh<underbar> wrote in message
news:697530AB-81F0-4B1B...@microsoft.com...
Yes, exactly.
> and wait for a sequence point to inject its new lock, before accessing the
> object for the first time.
I might be totally misunderstanding you but a thread can actually use it's
own lock right after registration occurs. Pseudo-code for dynamic
registration might look like:
_____________________________________________________________
struct thread
{
mutex m_mutex;
void read_lock()
{
m_mutex.lock();
}
void read_unlock()
{
m_mutex.unlock();
}
};
struct rwmutex
{
collection<thread*> m_threads;
mutex m_mutex;
void register(thread* t)
{
mutex::scoped_lock lock(m_mutex);
m_threads.push(t);
}
void unregister(thread* t)
{
mutex::scoped_lock lock(m_mutex);
m_threads.pop(t);
}
void write_lock()
{
m_mutex.lock();
for each thread in `m_threads' as i
{
i->read_lock();
}
}
void write_unlock()
{
for each thread in `m_threads' as i
{
i->read_unlock();
}
m_mutex.unlock();
}
};
_____________________________________________________________
This has some fairly decent scalability characteristics and, registration
aside for a moment, moves the heavy lifting to writers which should be few
and far between. IIRC, this type of lock is called a `brlock' in Linux.
> At a high level, the algorithm relies on signals (writer pending, reader
> pending) + a status code to maintain the lock. The only concern in my
> mind about this approach is the poor performance on NUMA. But as I said,
> it is a minimal implementation! The best attribute is that reader
> acquisition when uncontended by writers (99%+ of actual cases) requires
> only a single atomic increment.
Indeed. A simple atomic increment is pretty good for a fast-path. Humm. Have
you experimented with so-called asymmetric synchronization?
http://groups.google.com/group/comp.programming.threads/msg/6d6c97d11e432db8
The cool thing about this is that Windows Vista and onward provides
everything one needs to get this done! The reader acquisition and release
are made up plain loads and stores without any memory barriers. The writer
invokes membar on behalf of the readers by executing something like
`FlushProcess WriteBuffers()':
http://msdn.microsoft.com/en-us/library/ms683148(VS.85).aspx
FWIW, in Linux, the locks are kept on a per-CPU basis.
The implementation that you referenced for asymmetric locks is similar to
one I have used before.
What is chiefly bothering me at the moment is how to deal with NUMA effects
(access speed) on single volatiles in a general way. I know that if I mess
with thread affinity, and use a NUMA aware allocator, I can allocate node
local sub-structures and guard them with node local locks, but it would be
better if a solution that did not depend on known preconditions could be
found.
"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:RZZXm.28825$_b5....@newsfe22.iad...
Second: A spinlock is NOT a "kernel mode mechanism" period. Yes the NT
kernel uses spinlocks and has a small API to support them, but a spinlock is
not something that one can only use or have in kernel mode, this the first of
many errors you make.
Read and learn somthing:
http://en.wikipedia.org/wiki/Spinlock
Note the sentence "In software engineering, a spinlock is a lock where the
thread simply waits in a loop ("spins") repeatedly checking until the lock
becomes available."
Where does that say "kernel mode"?
In fact it says "For this reason, spinlocks are often used inside operating
system kernels" note the term "often" clearly stating that they are also used
elsewhere.
If you are going to post rude and disparaging remarks simply when someone
dares to disagree with you, then you fully deserve any humiliating responses
such as this.
H
Correct. And so it spends 100% of your processing time
and nothing else may happen mean while.
That is NOT the solution for user mode.
That is all I am willing to say on this.
Cya.
Have you ever heard of a backoff algorithm?
> and nothing else may happen mean while.
Please explain why nothing else could happen in the mean time? It kind of
seems like you don't have all that much experience with synchronization
algorithms in general; sorry...
> It is a custom proprietary algorithm and I am not at liberty to reveal the
> source or owner.
Okay.
> It is not a fair algorithm and guarantees only forward progress in a
> non-deterministic way. In practice, it works well for the specific job it
> was designed to do - guarding access by _many_ fast readers and _few_ slow
> writers to in memory indices for a data cache with remote coherency.
Have you tried out a distributed rw-mutex? The idea is simple in that each
> It is a custom proprietary algorithm and I am not at liberty to reveal the
> source or owner.
Okay.
> It is not a fair algorithm and guarantees only forward progress in a
> non-deterministic way. In practice, it works well for the specific job it
> was designed to do - guarding access by _many_ fast readers and _few_ slow
> writers to in memory indices for a data cache with remote coherency.
Have you tried out a distributed rw-mutex? The idea is simple in that each