I have a function that is supposed to run only once in my program.
Though it might be called "simultaneously" by two or more different
threads. Is the below implementation guaranteed to work? I ask because
while googling for PTHREAD_MUTEX_INITIALIZER, I read that in some
cases this might not be a real static initialization but just a hint
to do initialization on the first call to pthread_mutex_lock. If this
is the case, what happens if two threads enter this "hidden"
initialization routine? Is it supposed to be thread-safe?
Second question: is pthread_once supposed to call a void ( void )
function with "C" linkage? Is it portable to pass as an argument to
pthread_once a C++ static member function?
void DoSomethingOnce( int inNumb )
{
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock( &mutex );
static bool init = false;
if( !init )
{
// do stuff
init = true;
}
pthread_mutex_unlock( &mutex );
}
Thanks in advance for any help.
Francesco.
> I have a function that is supposed to run only once in my program.
> Though it might be called "simultaneously" by two or more different
> threads. Is the below implementation guaranteed to work? I ask because
> while googling for PTHREAD_MUTEX_INITIALIZER, I read that in some
> cases this might not be a real static initialization but just a hint
> to do initialization on the first call to pthread_mutex_lock. If this
> is the case, what happens if two threads enter this "hidden"
> initialization routine? Is it supposed to be thread-safe?
If the initialization is not truly static, the library must take care
of that behind the scenes, so it must be thread-safe. Yes, your code
is "guaranteed" to work, except that pthread_mutex_lock can fail for a
reason internal to the implementation, so you need to check the return
value for failure.
> Second question: is pthread_once supposed to call a void ( void )
> function with "C" linkage? Is it portable to pass as an argument to
> pthread_once a C++ static member function?
POSIX currently has nothing to say about C++ (though a POSIX C++
binding is in the works), so it's up to your compiler.
Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
> If the initialization is not truly static, the library must take care
> of that behind the scenes, so it must be thread-safe. Yes, your code
> is "guaranteed" to work, except that pthread_mutex_lock can fail for a
> reason internal to the implementation, so you need to check the return
> value for failure.
I wouldn't bother. There's nothing sane you can do when
pthread_mutex_lock fails. This would likely be an indication of some
kind of memory corruption or fatal error condition. You can't do
anything sane if locking a mutex fails for no application-level
reason.
Do you want to log the error? Odds are the log is protected by a
mutex.
Do you want to call 'abort'? Guess what many 'abort' implementations
do first? (Hint: They don't want more than one thread trying to
'abort' at the same time.)
Of course, 'exit' has to manipulate the list of exit handlers. Guess
what that's protected by. And the same argument about flushing stdio.
What about '_exit'? Guess what, it usually calls 'abort'.
So if your argument is that pthreads standards-compliant code can't
assume that pthread_mutex_lock won't fail for some mysterious reason,
please tell me what pthreads standards-compliant functions you can
reasonably expect to work after such a failure. Good luck with that.
DS
> On Jun 23, 6:16 am, Anthony Williams <anthony....@gmail.com> wrote:
>
>> If the initialization is not truly static, the library must take care
>> of that behind the scenes, so it must be thread-safe. Yes, your code
>> is "guaranteed" to work, except that pthread_mutex_lock can fail for a
>> reason internal to the implementation, so you need to check the return
>> value for failure.
>
> I wouldn't bother. There's nothing sane you can do when
> pthread_mutex_lock fails. This would likely be an indication of some
> kind of memory corruption or fatal error condition. You can't do
> anything sane if locking a mutex fails for no application-level
> reason.
Possibly. However, you certainly don't want to continue as if
everything is OK.
> Do you want to log the error? Odds are the log is protected by a
> mutex.
Possibly. Locking that mutex might not fail, though. e.g. if the
problem is memory corruption, that mutex's memory might be fine.
> Do you want to call 'abort'? Guess what many 'abort' implementations
> do first? (Hint: They don't want more than one thread trying to
> 'abort' at the same time.)
Likewise.
> Of course, 'exit' has to manipulate the list of exit handlers. Guess
> what that's protected by. And the same argument about flushing stdio.
> What about '_exit'? Guess what, it usually calls 'abort'.
Likewise.
> So if your argument is that pthreads standards-compliant code can't
> assume that pthread_mutex_lock won't fail for some mysterious reason,
> please tell me what pthreads standards-compliant functions you can
> reasonably expect to work after such a failure. Good luck with that.
Locking a mutex may fail, just like many library calls may fail. If
locking a mutex fails this is always a serious error. Sometimes it's
an application design error, sometimes it's an OS error, but it's
always serious. Continuing normally is unlikely to be a good idea. The
application should cease operation as fast as possible, possibly
logging the error on the way, in order to avoid incorrect results. In
order to cease it may well have to call other functions, such as
abort. If they don't work either, the application is truly stuffed,
and it had *definitely* not try and perform normal work.
> Locking a mutex may fail, just like many library calls may fail. If
> locking a mutex fails this is always a serious error. Sometimes it's
> an application design error, sometimes it's an OS error, but it's
> always serious. Continuing normally is unlikely to be a good idea.
Right, but calling 'abort' is unlikely to be a good idea either, since
that requires the very same operation as the one that just failed.
> The
> application should cease operation as fast as possible, possibly
> logging the error on the way, in order to avoid incorrect results.
Not possible. Any attempt at logging will likely require accessing a
mutex. Even if it doesn't, if the problem is memory corruption, maybe
the file descriptor that tells you where to log is corrupt. You might
wind up sending sensitive data over a network connection to a hostile
entity.
> In
> order to cease it may well have to call other functions, such as
> abort. If they don't work either, the application is truly stuffed,
> and it had *definitely* not try and perform normal work.
The problem is, your argument that it shouldn't try normal work
applies equally well to anything it might do. It should definitely not
call 'abort' or try to log anything. But you have to do something.
DS
You're assuming that any failure from pthread_mutex_lock() is likely to
mean that ALL and ANY mutex activity is hosed.
It's far more likely, however, that either the pthread_mutex_t or some
auxiliary data for THAT PARTICULAR mutex was hosed, and that unrelated
mutex operations will proceed as normal.
You might as well argue that any SIGSEGV means that every memory
location has been corrupted. It's possible you're right, and it would be
irresponsible to assume that EVERY other location, or every other mutex,
is guaranteed to be OK; but it's far more likely that most of the time
the failure is isolated and specific, and reasonable shutdown procedures
will proceed cleanly.
If you've got secure data in memory that can be compromised, that, and
the tradeoffs ensuing, are a whole different issue.
> You're assuming that any failure from pthread_mutex_lock() is likely to
> mean that ALL and ANY mutex activity is hosed.
Yep, since this mutex is as run-of-the-mill as can be, and no
different from any other.
> It's far more likely, however, that either the pthread_mutex_t or some
> auxiliary data for THAT PARTICULAR mutex was hosed, and that unrelated
> mutex operations will proceed as normal.
You imagine some sort of mutex-specific hoser that focuses its hosing
on one specific mutex while other nearby mutexes are immune?
> You might as well argue that any SIGSEGV means that every memory
> location has been corrupted. It's possible you're right, and it would be
> irresponsible to assume that EVERY other location, or every other mutex,
> is guaranteed to be OK; but it's far more likely that most of the time
> the failure is isolated and specific, and reasonable shutdown procedures
> will proceed cleanly.
Except when they don't, and then the shutdown procedures cause as much
harm as continuing normally.
> If you've got secure data in memory that can be compromised, that, and
> the tradeoffs ensuing, are a whole different issue.
I just can't imagine anything you can do that can even be argued to be
better than continuing. This is a definite "can't ever happen" error.
DS
> You imagine some sort of mutex-specific hoser that focuses its hosing
> on one specific mutex while other nearby mutexes are immune?
Sure...there are many ways that a small section of memory could get
corrupted. Array indexing bug, dereferencing a stale pointer, bad DMA
address passed to hardware, etc.
> I just can't imagine anything you can do that can even be argued to be
> better than continuing. This is a definite "can't ever happen" error.
Obviously not...it could be caused by coding bugs elsewhere as seen above.
If you know something is corrupt, it's often better to wipe the slate
clean and start over.
Chris
Must it be a bug? If pthread_mutex_init() needs some resource (like
memory or an OS limit), I imagine PTHREAD_MUTEX_INITIALIZER can simply
be a value which tells pthread_mutex_lock() to call pthread_mutex_init()
(thread-safely) before proceeding. Then pthread_mutex_lock() could
fail the same ways that pthread_mutex_init() can fail.
Except the POSIX spec says init can fail this way with EAGAIN/ENOMEM,
but it doesn't mention lock can fail for that reason. Does that may
it can't? (And that implementations can be trusted with that?) If so,
how does PTHREAD_MUTEX_INITIALIZER work on an implementation where
pthread_mutex_init() does take resources?
--
Hallvard
None of these things are necessarily specific to a related mutex. That
mutex A is corrupted may be a problem with code B. This is especially
true if the mutex is statically-allocated.
> > I just can't imagine anything you can do that can even be argued to be
> > better than continuing. This is a definite "can't ever happen" error.
> Obviously not...it could be caused by coding bugs elsewhere as seen above.
>
> If you know something is corrupt, it's often better to wipe the slate
> clean and start over.
The problem is that your attempt to "wipe the slate clean" can do
damage. There is no reason to expect wiping the slate clean to work.
In fact, in the source code I just looked at, calling 'abort' attempts
to lock a mutex without checking the return value.
If normal progress after an attempt to acquire an unbuggy mutex fails
is unacceptable, then calling 'abort' is unacceptable, since it does
exactly this.
DS
All true. However, in the above scenarios if mutex A is corrupted
there's a chance that mutex Z is still okay.
> The problem is that your attempt to "wipe the slate clean" can do
> damage. There is no reason to expect wiping the slate clean to work.
> In fact, in the source code I just looked at, calling 'abort' attempts
> to lock a mutex without checking the return value.
Just about any attempt to clean things up is likely better than
continuing on when you already know that there is something corrupted.
If you don't want to risk trying to take another mutex, how about
calling raise(SIGKILL)?
Chris
No. :-)
"In cases where default mutex attributes are appropriate, the macro
PTHREAD_MUTEX_INITIALIZER can be used to initialize mutexes that are
statically allocated. The effect shall be equivalent to dynamic
initialization by a call to pthread_mutex_init() with parameter attr
specified as NULL, except that no error checks are performed." The thing
is that many programs using PTHREAD_MUTEX_INITIALIZER may well exhibit
undefined behaviour without any application's bugs under the current
wording. It would be helpful if the standard would make it clear that
*initial* call to pthread_mutex_lock() for a PTHREAD_MUTEX_INITIALIZER'd
mutex shall fail for all the same reasons as pthread_mutex_init() (in
addition to the current pthread_mutex_lock() "shall fail" failure
modes)...
regards,
alexander.
> All true. However, in the above scenarios if mutex A is corrupted
> there's a chance that mutex Z is still okay.
Right, but this is precisely the chance everyone is saying they're
*NOT* willing to take. Yes, you're right, there's a chance everything
might work normally. So in that case, why do anything special?
If you get lucky and everything is fine and this was some one-in-a-
million fluke, the your code won't die for no reason. Maybe the error
code was itself the error and you actually have the mutex.
If you get unlucky and you are hosed, trying to acquire another hosed
mutex won't make things better.
So the argument that you should try to acquire some other mutex rather
than continue normally makes no sense.
> > The problem is that your attempt to "wipe the slate clean" can do
> > damage. There is no reason to expect wiping the slate clean to work.
> > In fact, in the source code I just looked at, calling 'abort' attempts
> > to lock a mutex without checking the return value.
> Just about any attempt to clean things up is likely better than
> continuing on when you already know that there is something corrupted.
Except you have no choice. As I said, even calling 'abort' entails
acquiring a mutex. There is no "perfect way out". This is a "can't
possibly happen" case. You can make no assumptions. You may or may not
hold the mutex. The process context may or may not be sane.
> If you don't want to risk trying to take another mutex, how about
> calling raise(SIGKILL)?
Sure, you can. And who knows, maybe this is the best possible thing
you could do and it will make everything fine. On the other hand,
maybe this is the worst possible thing you can do and it will cause
serious data corruption. We can't know -- we are in a "can't ever
happen" case. We literally have no idea what's going on or what the
consequences of our actions might be.
On many platforms (including Linux), this calls
pthread_kill(pthread_self(),SIGKILL). Suppose the memory that is read
to find our own thread ID is corrupted and the namespace for thread
IDs and process IDs is shared (again, all possible on Linux). This may
kill a completely innocent process. It may kill our entire process
group.
No, I'm sorry, there is no rational justification for calling
raise(SIGKILL). It is as likely to do harm as good. We are in a "can't
ever possibly happen under any conceivable circumstances" case.
DS
> Right, but this is precisely the chance everyone is saying they're
> *NOT* willing to take. Yes, you're right, there's a chance everything
> might work normally. So in that case, why do anything special?
We know some specific mutex is corrupt. We have two choices...try and
clean things up or try and continue as normal. Either way we have the
possibility of hitting corrupt information, but if we continue as normal
we could potentially operate for hours, days, or months with that
corrupt information, possibly corrupting other information as well. On
the other hand, we could shut down the corrupt app, restart it, and
potentially run for a long time without hitting the bug again.
> If you get lucky and everything is fine and this was some one-in-a-
> million fluke, the your code won't die for no reason. Maybe the error
> code was itself the error and you actually have the mutex.
>
> If you get unlucky and you are hosed, trying to acquire another hosed
> mutex won't make things better.
There's no guarantee that the other mutex is hosed. The problem could
easily be localized in memory, in which case it would be quite likely
that we would be able to shut down without any further errors.
>>If you don't want to risk trying to take another mutex, how about
>>calling raise(SIGKILL)?
> Sure, you can. And who knows, maybe this is the best possible thing
> you could do and it will make everything fine. On the other hand,
> maybe this is the worst possible thing you can do and it will cause
> serious data corruption. We can't know -- we are in a "can't ever
> happen" case. We literally have no idea what's going on or what the
> consequences of our actions might be.
We're not in a "can't ever happen" case, we're in an error case where
behaviour is undefined. It makes no sense to talk about a "can't ever
happen" case that has obviously happened.
> On many platforms (including Linux), this calls
> pthread_kill(pthread_self(),SIGKILL). Suppose the memory that is read
> to find our own thread ID is corrupted and the namespace for thread
> IDs and process IDs is shared (again, all possible on Linux). This may
> kill a completely innocent process. It may kill our entire process
> group.
Valid point. So call _exit() instead. On linux it immediately calls
the appropriate syscall, no locking required.
Chris
> > Right, but this is precisely the chance everyone is saying they're
> > *NOT* willing to take. Yes, you're right, there's a chance everything
> > might work normally. So in that case, why do anything special?
> We know some specific mutex is corrupt. We have two choices...try and
> clean things up or try and continue as normal. Either way we have the
> possibility of hitting corrupt information, but if we continue as normal
> we could potentially operate for hours, days, or months with that
> corrupt information, possibly corrupting other information as well.
The same could happen if we try to clean things up and fail. So that
possibility exists either way.
> On
> the other hand, we could shut down the corrupt app, restart it, and
> potentially run for a long time without hitting the bug again.
Except we can't, since we are in an insane situation. We have no
mechanism that is assured to shut down the corrupt app, and they all
have risks.
> > If you get lucky and everything is fine and this was some one-in-a-
> > million fluke, the your code won't die for no reason. Maybe the error
> > code was itself the error and you actually have the mutex.
>
> > If you get unlucky and you are hosed, trying to acquire another hosed
> > mutex won't make things better.
> There's no guarantee that the other mutex is hosed. The problem could
> easily be localized in memory, in which case it would be quite likely
> that we would be able to shut down without any further errors.
And it could quite easily not be localized in memory, in which case
the kill might send the signal to the wrong place.
We have no idea what's going on. Every argument of the form "X may be
better than Y" is perfectly countered by a "Y might be better than X"
argument. If we have no idea what's going on, we have no reason to
consider any course superior to any other course.
> >>If you don't want to risk trying to take another mutex, how about
> >>calling raise(SIGKILL)?
> > Sure, you can. And who knows, maybe this is the best possible thing
> > you could do and it will make everything fine. On the other hand,
> > maybe this is the worst possible thing you can do and it will cause
> > serious data corruption. We can't know -- we are in a "can't ever
> > happen" case. We literally have no idea what's going on or what the
> > consequences of our actions might be.
> We're not in a "can't ever happen" case, we're in an error case where
> behaviour is undefined. It makes no sense to talk about a "can't ever
> happen" case that has obviously happened.
It is a "can't ever happen" case. If it happens, we are screwed. There
is nothing sane we can do. We have no rational reason to prefer one
course of conduct over another because we have no way to to balance
them. It's like asking "if 2 plus 2 was 5 instead of four, should we
celebrate Christmas on the 25th or the 26th?"
> > On many platforms (including Linux), this calls
> > pthread_kill(pthread_self(),SIGKILL). Suppose the memory that is read
> > to find our own thread ID is corrupted and the namespace for thread
> > IDs and process IDs is shared (again, all possible on Linux). This may
> > kill a completely innocent process. It may kill our entire process
> > group.
> Valid point. So call _exit() instead. On linux it immediately calls
> the appropriate syscall, no locking required.
I think this is perhaps the closest thing to a safe thing to do.
However, I still think this is a mistake. If the only safe thing to do
was to return _exit, why would the pthreads library return an error
code? Wouldn't the library just call _exit?
If we agree that the only safe thing you can do is call _exit, then I
think the library is buggy if it returns an error code to you and puts
you, knowingly, in such a bad state.
In any event, I have had bad things happens on FreeBSD in the past
calling _exit in insane cases (for example, after catching SIGSEGV due
to memory corruption). Sometimes it just doesn't exit and instead
hangs. The code that's hanging may be capable of causing damage, I
don't know. There is no way you can know this is safe from a pthreads-
standards only view.
I still maintain that the safest thing to do with no platform-specific
knowledge is not test for the error. Anything specific you might try
to do would be as likely to do harm as good. (And it may not even
terminate the process.)
DS
So perhaps you should specifically handle ENOMEM. And if you
deliberately create a low memory situation (or are designed to operate
normally in that situation), you may wish to ensure your first access
to the mutex occurs before any heavy lifting.
DS
This line of reasoning is fallacious, when this error happens and one
does not check the return code then goes on modifying the data that
this mutex protects it is certain that the programmer is further
corrupting the state of the system.
Unfortunatelly it seems like most of the software out there is written
by people who subscribe to your point of view. Ptmalloc goes to great
lengths in this area (casting results of mutex_* calls to void and so
on), it's not that hard to write synthetic code that will corrupt it's
internal mutex while leaving everything else in place and then enjoy
the havoc this corrupt mutex brings to a mulithreaded app that uses
dynamic memory (and btw cosmic ray can do this thing to all it takes
to flip a bit)
>> >>If you don't want to risk trying to take another mutex, how about
>> >>calling raise(SIGKILL)?
>> > Sure, you can. And who knows, maybe this is the best possible thing
>> > you could do and it will make everything fine. On the other hand,
>> > maybe this is the worst possible thing you can do and it will cause
>> > serious data corruption. We can't know -- we are in a "can't ever
>> > happen" case. We literally have no idea what's going on or what the
>> > consequences of our actions might be.
>
>> We're not in a "can't ever happen" case, we're in an error case where
>> behaviour is undefined. It makes no sense to talk about a "can't ever
>> happen" case that has obviously happened.
>
> It is a "can't ever happen" case. If it happens, we are screwed. There
> is nothing sane we can do. We have no rational reason to prefer one
> course of conduct over another because we have no way to to balance
> them. It's like asking "if 2 plus 2 was 5 instead of four, should we
> celebrate Christmas on the 25th or the 26th?"
If this happens and we are not even attempting to do anything, no matter
how futile, we are just being lazy and have excuses.
[..snip..]
>
> I still maintain that the safest thing to do with no platform-specific
> knowledge is not test for the error. Anything specific you might try
> to do would be as likely to do harm as good. (And it may not even
> terminate the process.)
Can you somehow prove this assertion?
--
mailto:av1...@comtv.ru
> On Jun 23, 5:41 pm, Chris Friesen <cbf...@mail.usask.ca> wrote:
>
>> > Right, but this is precisely the chance everyone is saying they're
>> > *NOT* willing to take. Yes, you're right, there's a chance everything
>> > might work normally. So in that case, why do anything special?
>
>> We know some specific mutex is corrupt. We have two choices...try and
>> clean things up or try and continue as normal. Either way we have the
>> possibility of hitting corrupt information, but if we continue as normal
>> we could potentially operate for hours, days, or months with that
>> corrupt information, possibly corrupting other information as well.
>
> The same could happen if we try to clean things up and fail. So that
> possibility exists either way.
> I still maintain that the safest thing to do with no platform-specific
> knowledge is not test for the error. Anything specific you might try
> to do would be as likely to do harm as good. (And it may not even
> terminate the process.)
If locking the mutex failed, the one thing that is guaranteed not to
be safe is accessing the protected data, since you would be accessing
it without a lock. On that basis, anything else is an improvement,
since there is a chance that it is safe.
If you're really truly concerned that calling abort() or _exit() is
not safe, you have a serious problem. If this is a multithreaded
program (which one assumes it is, since there's a mutex being used),
then the other threads are going to continue their merry way calling
library functions all over the place. These are surely just as unsafe
as abort() or _exit().
I stand by my assertion that you need to check the return code for an
error, and terminate the process as quickly as possible if there is an
error code you cannot handle.
thanks a lot for the answers.
May I kindly ask you guys to comment on the following code?
Please take into account that I'm a newbie on both threading and
atomics.
I'm trying to understand the concepts, here I'm interested in creating
a template for singletons where the access point protects the object
from multiple "simultaneous" access to the constructor.
I'm interested especially in understanding:
- can this thing actually work? :-)
- is the use of the atomic compare and swap correct to ensure that the
sPtr static has always a "correct" value?
- is there anything that can go wrong in the destructor of the object?
- is the critical section concept implemented correctly? apart for the
"evil" macro, using a static mutex and scope locking...
Any other comment is vastly appreciated.
Thanks in advance and sorry for the poor english.
Bye,
Francesco
P.S.
the important part is where the CSingleton template is, the rest is
just to reproduce approx. what I'm doing....
#include <exception>
#include <iostream>
// do not have pthreads here...
typedef int pthread_mutex_t;
int pthread_mutex_lock( pthread_mutex_t * ) {}
int pthread_mutex_unlock( pthread_mutex_t * ) {}
int PTHREAD_MUTEX_INITIALIZER = 0;
// simulate mac os x atomic op
void OSCompareAndSwap( int inOldVal, int inNewVal, int volatile *
inPtr )
{
if( *inPtr == inOldVal )
*inPtr = inNewVal;
}
#define CONCAT_AUX( ARG1, ARG2 ) ARG1 ## ARG2
#define CONCAT( ARG1, ARG2 ) CONCAT_AUX( ARG1, ARG2 )
//
class CPThrExc : public std::exception
{
public:
CPThrExc & operator=( int inError )
{ if( mError = inError ) throw *this; return *this; }
private:
int mError;
};
//
class CPThrMxLock
{
public:
CPThrMxLock( pthread_mutex_t * inAddr ) : mPtr( inAddr )
{ CPThrExc() = pthread_mutex_lock( mPtr ); }
~CPThrMxLock()
{ CPThrExc() = pthread_mutex_unlock( mPtr ); }
private:
pthread_mutex_t * mPtr;
};
//
// handy only if code does not contain "free" commas
#define CRITICAL_SECTION( ARG_Code ) \
{ \
static pthread_mutex_t CONCAT( sCritSectMx, __LINE__ ) \
= PTHREAD_MUTEX_INITIALIZER; \
CPThrMxLock CONCAT( sCritSectLock, __LINE__ ) \
( & CONCAT( sCritSectMx , __LINE__ ) ); \
ARG_Code \
}
//
//============= SINGLETON ==============
template< typename T >
class CSingleton
{
public:
static T & Get()
{
if( not sPtr )
CRITICAL_SECTION
(
OSCompareAndSwap( 0, (int)&GetPriv(), (int*)&sPtr );
)
return *sPtr;
}
protected:
CSingleton() {}
private:
static T * sPtr;
static T & GetPriv()
{ static T sObj; return sObj; }
CSingleton( CSingleton const & );
CSingleton & operator=( CSingleton const & );
};
template< typename T >
T * CSingleton< T >::sPtr = 0;
//
class CTest : public CSingleton< CTest >
{ public: void Do( void ){} };
//
// just testing...
int main() try
{
CTest::Get().Do();
std::cin.get();
}
catch(...)
{
std::cin.get();
}
> static T & Get()
> {
> if( not sPtr )
> CRITICAL_SECTION
> (
> OSCompareAndSwap( 0, (int)&GetPriv(), (int*)&sPtr );
> )
> return *sPtr;
> }
This is the Double-Checked-Locking pattern. If your atomic accesses to
sPtr are sufficiently synchronized to ensure that the newly-created
object is visible to all threads that read sPtr as non-NULL after the
CAS is used to set the pointer, this can work. However, as-written
sPtr accesses are not atomic, so I would doubt that this is the case.
If you're going to try and do this, check your compiler documentation
carefully and ensure you use the appropriate syntaxes for the required
level of atomic access. Atomic accesses make everything
platform/compiler-specific.
> This line of reasoning is fallacious, when this error happens and one
> does not check the return code then goes on modifying the data that
> this mutex protects it is certain that the programmer is further
> corrupting the state of the system.
Certain? It is not certain -- you might hold the mutex. The error
might be the generation of the error return. The correct thing to do
might be to continue normally.
> Unfortunatelly it seems like most of the software out there is written
> by people who subscribe to your point of view. Ptmalloc goes to great
> lengths in this area (casting results of mutex_* calls to void and so
> on), it's not that hard to write synthetic code that will corrupt it's
> internal mutex while leaving everything else in place and then enjoy
> the havoc this corrupt mutex brings to a mulithreaded app that uses
> dynamic memory (and btw cosmic ray can do this thing to all it takes
> to flip a bit)
This argument is nonsense. Yes, in that case, you know the right thing
to do. But we are talking about the case where you *don't* know what
to do. Clearly, if you are in the case where you know the right thing
to do, then go ahead and do that. But if you're in the case where you
don't know the right thing to do, there is no reason you should do the
thing that happened to be right in some other case.
If a doctor has a patient with a fever, he does not know what to do.
He doesn't just give the patient antibiotics because that works for
some causes of fever. He is in the case where he doesn't know the
cause of the fever, and he can't justify giving antibiotics on the
chance that he might be in a case where that helps. He might be in a
case where that hurts.
> > It is a "can't ever happen" case. If it happens, we are screwed. There
> > is nothing sane we can do. We have no rational reason to prefer one
> > course of conduct over another because we have no way to to balance
> > them. It's like asking "if 2 plus 2 was 5 instead of four, should we
> > celebrate Christmas on the 25th or the 26th?"
> If this happens and we are not even attempting to do anything, no matter
> how futile, we are just being lazy and have excuses.
Right, my arguments are excuses.
> > I still maintain that the safest thing to do with no platform-specific
> > knowledge is not test for the error. Anything specific you might try
> > to do would be as likely to do harm as good. (And it may not even
> > terminate the process.)
> Can you somehow prove this assertion?
I can prove that it's simplest and has the least overhead. I can prove
that it might be the right thing some of the time. And that's about
all you anyone can prove. So if you demand strict proof, I win flat
out.
The argument that it's better to do something else, something that has
a non-zero cost, requires a demonstration that the benefits outweigh
the cost. That's nearly impossible to prove because there is no
rational way to estimate the benefits.
DS
> If locking the mutex failed, the one thing that is guaranteed not to
> be safe is accessing the protected data, since you would be accessing
> it without a lock. On that basis, anything else is an improvement,
> since there is a chance that it is safe.
Your argument is completely based on an incorrect assumption. We don't
know that locking the mutex failed. It may have succeeded. The error
may have occurred in testing whether the lock operation succeeded, not
in the operation itself.
You are pretending that we are in a "we know what happened" case. We
are not. So your "anything else is an improvement" argument is, again,
incorrect.
DS
> static T & Get()
> {
> if( not sPtr )
> CRITICAL_SECTION
> (
> OSCompareAndSwap( 0, (int)&GetPriv(), (int*)&sPtr );
> )
> return *sPtr;
> }
You tested 'sPtr', a potentially shared variable, without a lock. The
behavior of this is undefined.
DS
> On Jun 23, 11:18 pm, m...@pulsesoft.com wrote:
>
>> This line of reasoning is fallacious, when this error happens and one
>> does not check the return code then goes on modifying the data that
>> this mutex protects it is certain that the programmer is further
>> corrupting the state of the system.
>
> Certain? It is not certain -- you might hold the mutex. The error
> might be the generation of the error return. The correct thing to do
> might be to continue normally.
Huh?
RETURN VALUE
If successful, the pthread_mutex_lock() and pthread_mutex_unlock()
functions shall return zero; otherwise, an error number shall be
returned to indicate the error.
The pthread_mutex_trylock() function shall return zero if a lock on the
mutex object referenced by mutex is acquired. Otherwise, an error num-
ber is returned to indicate the error.
You do NOT hold the mutex.
>
>> Unfortunatelly it seems like most of the software out there is written
>> by people who subscribe to your point of view. Ptmalloc goes to great
>> lengths in this area (casting results of mutex_* calls to void and so
>> on), it's not that hard to write synthetic code that will corrupt it's
>> internal mutex while leaving everything else in place and then enjoy
>> the havoc this corrupt mutex brings to a mulithreaded app that uses
>> dynamic memory (and btw cosmic ray can do this thing to all it takes
>> to flip a bit)
>
> This argument is nonsense. Yes, in that case, you know the right thing
> to do. But we are talking about the case where you *don't* know what
> to do. Clearly, if you are in the case where you know the right thing
> to do, then go ahead and do that. But if you're in the case where you
> don't know the right thing to do, there is no reason you should do the
> thing that happened to be right in some other case.
What argument? That was just an observation of the sad state of affairs
in this area.
[..snip..]
>
>> If this happens and we are not even attempting to do anything, no matter
>> how futile, we are just being lazy and have excuses.
>
> Right, my arguments are excuses.
>
>> > I still maintain that the safest thing to do with no platform-specific
>> > knowledge is not test for the error. Anything specific you might try
>> > to do would be as likely to do harm as good. (And it may not even
>> > terminate the process.)
>
>> Can you somehow prove this assertion?
>
> I can prove that it's simplest and has the least overhead. I can prove
> that it might be the right thing some of the time. And that's about
> all you anyone can prove. So if you demand strict proof, I win flat
> out.
And my own experience with this shows that in 100% of the cases when
i checked and got error code out of the call to pthread_mutex_lock it
was my own fault, no other mutex was affected, the abort was the only
sane way out and merrily continuing the execution would have resulted
in.. well.. anything at all.
> The argument that it's better to do something else, something that has
> a non-zero cost, requires a demonstration that the benefits outweigh
> the cost. That's nearly impossible to prove because there is no
> rational way to estimate the benefits.
>
> DS
--
mailto:av1...@comtv.ru
> > Certain? It is not certain -- you might hold the mutex. The error
> > might be the generation of the error return. The correct thing to do
> > might be to continue normally.
> Huh?
How hard is what I said to understand? The error might have occurred
after the mutex was acquired, when the code was trying to confirm that
it had in fact acquired the mutex.
This is a fairly common pattern for mutex implementation. You perform
some operation that will acquire the mutex IF IT WAS PREVIOUS UNHELD.
You then check to see if you did in fact get the mutex. If the error
occurs in the latter step, you may or may not hold the mutex.
> RETURN VALUE
> If successful, the pthread_mutex_lock() and pthread_mutex_unlock()
> functions shall return zero; otherwise, an error number shall be
> returned to indicate the error.
>
> The pthread_mutex_trylock() function shall return zero if a lock on the
> mutex object referenced by mutex is acquired. Otherwise, an error num-
> ber is returned to indicate the error.
>
> You do NOT hold the mutex.
One of the things that can happen when there's an error is that the
implementation, unintentionally, violates the standard.
You are assuming that everything is fine and nothing has gone wrong.
The whole point is, we are not in the "everything is fine, nothing has
gone wrong" state. The mutex might be corrupted. The standard might be
broken. Our program might be, accidentally, non-compliant.
> > This argument is nonsense. Yes, in that case, you know the right thing
> > to do. But we are talking about the case where you *don't* know what
> > to do. Clearly, if you are in the case where you know the right thing
> > to do, then go ahead and do that. But if you're in the case where you
> > don't know the right thing to do, there is no reason you should do the
> > thing that happened to be right in some other case.
> What argument? That was just an observation of the sad state of affairs
> in this area.
The argument that this state of affairs is sad.
> > I can prove that it's simplest and has the least overhead. I can prove
> > that it might be the right thing some of the time. And that's about
> > all you anyone can prove. So if you demand strict proof, I win flat
> > out.
> And my own experience with this shows that in 100% of the cases when
> i checked and got error code out of the call to pthread_mutex_lock it
> was my own fault, no other mutex was affected, the abort was the only
> sane way out and merrily continuing the execution would have resulted
> in.. well.. anything at all.
That says nothing about platforms on which you haven't tested. If you
want to code based on how the platforms you happen to have used happen
to have worked, that's fine. But I wouldn't say this is grounds to
take some kind of high road "sad state of affairs" argument.
DS
> On Jun 24, 2:04 pm, m...@pulsesoft.com wrote:
>
>> > Certain? It is not certain -- you might hold the mutex. The error
>> > might be the generation of the error return. The correct thing to do
>> > might be to continue normally.
>
>> Huh?
>
> How hard is what I said to understand? The error might have occurred
> after the mutex was acquired, when the code was trying to confirm that
> it had in fact acquired the mutex.
>
By the same token malloc can return memory that was never allocated,
write number of bytes that have never been really written and so on.
> This is a fairly common pattern for mutex implementation. You perform
> some operation that will acquire the mutex IF IT WAS PREVIOUS UNHELD.
> You then check to see if you did in fact get the mutex. If the error
> occurs in the latter step, you may or may not hold the mutex.
Really? Care to construct an example where this will work the way you
describe on any system (of your choice)? I.e.
ret = pthread_mutex_lock (...);
if (ret) {
/* The mutex was acquired by this thread */
}
Saying that this is possible is not good enough, many things are
possible including this one, but we do code against standards not
against possibilities.
>
>> RETURN VALUE
>> If successful, the pthread_mutex_lock() and pthread_mutex_unlock()
>> functions shall return zero; otherwise, an error number shall be
>> returned to indicate the error.
>>
>> The pthread_mutex_trylock() function shall return zero if a lock on the
>> mutex object referenced by mutex is acquired. Otherwise, an error num-
>> ber is returned to indicate the error.
>>
>> You do NOT hold the mutex.
>
> One of the things that can happen when there's an error is that the
> implementation, unintentionally, violates the standard.
Once the standard has been violated the only safe option is to abort as
soon as possible since all bets at this point are off and proceeding
doing anything at all is worst possible option, arguing that since
violation did in fact happen and abort can do even more harm is silly.
>
> You are assuming that everything is fine and nothing has gone wrong.
> The whole point is, we are not in the "everything is fine, nothing has
> gone wrong" state. The mutex might be corrupted. The standard might be
> broken. Our program might be, accidentally, non-compliant.
>
>> > This argument is nonsense. Yes, in that case, you know the right thing
>> > to do. But we are talking about the case where you *don't* know what
>> > to do. Clearly, if you are in the case where you know the right thing
>> > to do, then go ahead and do that. But if you're in the case where you
>> > don't know the right thing to do, there is no reason you should do the
>> > thing that happened to be right in some other case.
>
>> What argument? That was just an observation of the sad state of affairs
>> in this area.
>
> The argument that this state of affairs is sad.
Well not being native english speaker i was under impression that i wasn't
really arguing anything.
>
>> > I can prove that it's simplest and has the least overhead. I can prove
>> > that it might be the right thing some of the time. And that's about
>> > all you anyone can prove. So if you demand strict proof, I win flat
>> > out.
>
>> And my own experience with this shows that in 100% of the cases when
>> i checked and got error code out of the call to pthread_mutex_lock it
>> was my own fault, no other mutex was affected, the abort was the only
>> sane way out and merrily continuing the execution would have resulted
>> in.. well.. anything at all.
>
> That says nothing about platforms on which you haven't tested. If you
> want to code based on how the platforms you happen to have used happen
> to have worked, that's fine. But I wouldn't say this is grounds to
> take some kind of high road "sad state of affairs" argument.
On platforms were i tested, should the error checking be omitted, i
would have never known that i corrupt anything, not checking the
return code just means i would have been happily oblivious of the
bugs elsewhere in my code.
--
mailto:av1...@comtv.ru
m...@pulsesoft.com wrote:
> > How hard is what I said to understand? The error might have occurred
> > after the mutex was acquired, when the code was trying to confirm that
> > it had in fact acquired the mutex.
> By the same token malloc can return memory that was never allocated,
> write number of bytes that have never been really written and so on.
Of course, but there are well-understood failure modes. We know that
they actually happen, and we know how to handle them.
> > This is a fairly common pattern for mutex implementation. You perform
> > some operation that will acquire the mutex IF IT WAS PREVIOUS UNHELD.
> > You then check to see if you did in fact get the mutex. If the error
> > occurs in the latter step, you may or may not hold the mutex.
> Really? Care to construct an example where this will work the way you
> describe on any system (of your choice)? I.e.
>
> ret = pthread_mutex_lock (...);
> if (ret) {
> /* The mutex was acquired by this thread */
> }
As I said, this is a "can't ever happen" case. No, I can't make it
happen, because it can't happen. If it happens, we have no idea what
happened.
> Saying that this is possible is not good enough, many things are
> possible including this one, but we do code against standards not
> against possibilities.
Exactly. That's why there's no reason to check the return value.
> Once the standard has been violated the only safe option is to abort as
> soon as possible since all bets at this point are off and proceeding
> doing anything at all is worst possible option, arguing that since
> violation did in fact happen and abort can do even more harm is silly.
Why do you repeat a claim I've already refuted without addressing the
refutation?! I don't get it.
I've already shown that aborting as soon as possible is *NOT* safe. So
how can it be the "only safe option"?
> >> > I can prove that it's simplest and has the least overhead. I can prove
> >> > that it might be the right thing some of the time. And that's about
> >> > all you anyone can prove. So if you demand strict proof, I win flat
> >> > out.
> >
> >> And my own experience with this shows that in 100% of the cases when
> >> i checked and got error code out of the call to pthread_mutex_lock it
> >> was my own fault, no other mutex was affected, the abort was the only
> >> sane way out and merrily continuing the execution would have resulted
> >> in.. well.. anything at all.
> > That says nothing about platforms on which you haven't tested. If you
> > want to code based on how the platforms you happen to have used happen
> > to have worked, that's fine. But I wouldn't say this is grounds to
> > take some kind of high road "sad state of affairs" argument.
> On platforms were i tested, should the error checking be omitted, i
> would have never known that i corrupt anything, not checking the
> return code just means i would have been happily oblivious of the
> bugs elsewhere in my code.
I was really only talking about release code. I don't see a problem
with checking for this error in debug code. In debug code, we accept
that we may corrupt data or permit incorrect operation, and in
exchange we gain an increased ability to improve the stability for
release code. There is no harm in doing the wrong thing in debug code,
and we frequently do it if it improves debuggability.
However, that's not a good tradeoff for release code.
DS
> m...@pulsesoft.com wrote:
[..snip..]
>
>> Saying that this is possible is not good enough, many things are
>> possible including this one, but we do code against standards not
>> against possibilities.
>
> Exactly. That's why there's no reason to check the return value.
>
>> Once the standard has been violated the only safe option is to abort as
>> soon as possible since all bets at this point are off and proceeding
>> doing anything at all is worst possible option, arguing that since
>> violation did in fact happen and abort can do even more harm is silly.
>
> Why do you repeat a claim I've already refuted without addressing the
> refutation?! I don't get it.
>
> I've already shown that aborting as soon as possible is *NOT* safe. So
> how can it be the "only safe option"?
By your own logic, it CAN be safe, not aborting can not be:
a) The return code is not zero and the mutex IS NOT acquired:
You are then modifying shared data without protection - standard
violation
b) The return code is not zero and the mutex IS acquired:
Standard violation.
If one takes the liberty of thinking that system provided pthread
library doesn't violated standard just for the heck of it, then
the only safe thing to assume is that something is seriously
broken and continuing using threading library with broken
internals is not safe.
But scratch the above paragraph, proceeding while assuming that
standard was violated is just plain wrong, no justification is
needed.
So the safest thing to do is to call an abort(in one way or the other)
thus ensuring that you - the programmer has done the minimal amount of
(posssibly harmful) things.
Snippets from
http://www.opengroup.org/onlinepubs/007908799/xsh/pthread_mutex_lock.html
The pthread_mutex_lock() and pthread_mutex_trylock() functions will fail if:
[EINVAL]
The mutex was created with the protocol attribute having the value
PTHREAD_PRIO_PROTECT and the calling thread's priority is higher
than the mutex's current priority ceiling.
...
The pthread_mutex_lock() function may fail if:
[EDEADLK]
The current thread already owns the mutex.
Neither of the above cases depends on anything being corrupted, so one
must check for them and act accordingly. Furthermore here one might
even go as far as to try and perform the cleanup and then exit/abort.
Judging by what You have said before one should not check
WaitForXXXObject(s) with a mutex handle for errors either,
unfortunatelly in a win32 version of my code i do get WAIT_ABANDONED
on a mutex that is shared between processes, WAIT_ABANDONED case was
triggered for the first time half a year after the relevant code was
written, if the error checking wasn't there the code would still run,
but would have been wrong, silently. (I guess the same applies to
POSIX robust mutexes)
>> >> > I can prove that it's simplest and has the least overhead. I can prove
>> >> > that it might be the right thing some of the time. And that's about
>> >> > all you anyone can prove. So if you demand strict proof, I win flat
>> >> > out.
>> >
>> >> And my own experience with this shows that in 100% of the cases when
>> >> i checked and got error code out of the call to pthread_mutex_lock it
>> >> was my own fault, no other mutex was affected, the abort was the only
>> >> sane way out and merrily continuing the execution would have resulted
>> >> in.. well.. anything at all.
>
>> > That says nothing about platforms on which you haven't tested. If you
>> > want to code based on how the platforms you happen to have used happen
>> > to have worked, that's fine. But I wouldn't say this is grounds to
>> > take some kind of high road "sad state of affairs" argument.
>
>> On platforms were i tested, should the error checking be omitted, i
>> would have never known that i corrupt anything, not checking the
>> return code just means i would have been happily oblivious of the
>> bugs elsewhere in my code.
>
> I was really only talking about release code. I don't see a problem
> with checking for this error in debug code. In debug code, we accept
> that we may corrupt data or permit incorrect operation, and in
> exchange we gain an increased ability to improve the stability for
> release code. There is no harm in doing the wrong thing in debug code,
> and we frequently do it if it improves debuggability.
>
> However, that's not a good tradeoff for release code.
We work in different ways, i'm of the opinion that if something can
happen in "debug" code then it can and will happen in "release" code
too. As a matter of fact i don't even make a debug/release
distinction, but that's just me.
--
mailto:av1...@comtv.ru
> > I've already shown that aborting as soon as possible is *NOT* safe. So
> > how can it be the "only safe option"?
> By your own logic, it CAN be safe, not aborting can not be:
Huh? How can not aborting not be safe?
> a) The return code is not zero and the mutex IS NOT acquired:
>
> You are then modifying shared data without protection - standard
> violation
At least that won't kill an innocent process, which aborting might do.
> b) The return code is not zero and the mutex IS acquired:
>
> Standard violation.
So what? We're in a case of probably memory corruption. A standard
violation is just as likely as not.
> If one takes the liberty of thinking that system provided pthread
> library doesn't violated standard just for the heck of it, then
> the only safe thing to assume is that something is seriously
> broken and continuing using threading library with broken
> internals is not safe.
I agree, but there is nothing else you can do. Any attempt to stop
things consists of using the threading library with broken internals.
Nothing is safe. The only questions is what is less dangerous, and I
submit that there is no way to know that without knowing what failure
modes are more probable. We don't know that for an arbitrary platform.
> But scratch the above paragraph, proceeding while assuming that
> standard was violated is just plain wrong, no justification is
> needed.
>
> So the safest thing to do is to call an abort(in one way or the other)
> thus ensuring that you - the programmer has done the minimal amount of
> (posssibly harmful) things.
The problem is that calling 'abort' is an inherently dangerous thing.
It sends a fatal signal to a process. If you don't trust the process
context to attempt something dangerous, you have a big problem.
DS
> On Jun 25, 1:02 am, m...@pulsesoft.com wrote:
>
>> > I've already shown that aborting as soon as possible is *NOT* safe. So
>> > how can it be the "only safe option"?
>
>> By your own logic, it CAN be safe, not aborting can not be:
>
> Huh? How can not aborting not be safe?
This is addressed in a) and b) bellow, in either case the standard
is violated and all bets are off.
>> a) The return code is not zero and the mutex IS NOT acquired:
>>
>> You are then modifying shared data without protection - standard
>> violation
>
> At least that won't kill an innocent process, which aborting might do.
>
>> b) The return code is not zero and the mutex IS acquired:
>>
>> Standard violation.
>
> So what? We're in a case of probably memory corruption. A standard
> violation is just as likely as not.
Sigh. pthread_mutex_lock returned non zero and yet the mutex is acquired,
likely has nothing to do with it, standard is plainly violated.
And therefore continuing after pthread_mutex_lock returned non zero is
never safe since, wether mutex was acquired or not, the behaviour of
the program is outside the scope of the standard.
>> If one takes the liberty of thinking that system provided pthread
>> library doesn't violated standard just for the heck of it, then
>> the only safe thing to assume is that something is seriously
>> broken and continuing using threading library with broken
>> internals is not safe.
>
> I agree, but there is nothing else you can do. Any attempt to stop
> things consists of using the threading library with broken internals.
> Nothing is safe. The only questions is what is less dangerous, and I
> submit that there is no way to know that without knowing what failure
> modes are more probable. We don't know that for an arbitrary platform.
If you don't agree with what i wrote above further arguing is pointless.
>> But scratch the above paragraph, proceeding while assuming that
>> standard was violated is just plain wrong, no justification is
>> needed.
>>
>> So the safest thing to do is to call an abort(in one way or the other)
>> thus ensuring that you - the programmer has done the minimal amount of
>> (posssibly harmful) things.
>
> The problem is that calling 'abort' is an inherently dangerous thing.
> It sends a fatal signal to a process. If you don't trust the process
> context to attempt something dangerous, you have a big problem.
And once again, continuing is NEVER safe, and not continuing means
aborting.
I'm kill-filing this thread, it entered infinite loop.
--
mailto:av1...@comtv.ru
NEWS FLASH:
David Schwartz announces that software engineering is ultimately futile
and hopeless.
We should all give up and go home. Turn off your news readers, folks;
it's all over.
> This is addressed in a) and b) bellow, in either case the standard
> is violated and all bets are off.
Right, so if all bets are off, you cannot argue that your recommended
course of conduct is better than mine. All bets are off. Yet mine is
plainly faster than yours.
> Sigh. pthread_mutex_lock returned non zero and yet the mutex is acquired,
> likely has nothing to do with it, standard is plainly violated.
> And therefore continuing after pthread_mutex_lock returned non zero is
> never safe since, wether mutex was acquired or not, the behaviour of
> the program is outside the scope of the standard.
I agree. But you have no choice. There is no safe way to stop. Your
only choice is to continue. Even calling 'abort' is a form of
continuing, as your program continue to run, calls 'abort', and who
knows what that will do.
> And once again, continuing is NEVER safe, and not continuing means
> aborting.
1) Continuing is safe in the case where we actually acquired the
mutex.
2) Aborting is not safe in the case where the process ID is corrupted.
The next line of code in normal operation might be a call to 'getpid'
which might restore the cached PID copy.
How do you know which case we are in? You don't. You're just guessing
that one case is more likely than another.
DS
> NEWS FLASH:
>
> David Schwartz announces that software engineering is ultimately futile
> and hopeless.
>
> We should all give up and go home. Turn off your news readers, folks;
> it's all over.
I am not arguing that it is always futile and hopeless. I am arguing
that there are cases where it is futile and hopeless. For example:
if( (2+2) != 4 )
{
/* what code can you possibly put here? */
}
This is case where there is nothing sane to put there, and your best
bet is simply to eliminate the test. Why waste time calculating the
sum and comparing it if there is nothing sane you can do if the
comparison fails?
I am saying that this case is just like that case. (With the sole
exception of ENOMEM on first acquisition.)
DS
> On Jun 25, 4:23 am, Dave Butenhof <david.buten...@hp.com> wrote:
>
>> NEWS FLASH:
>>
>> David Schwartz announces that software engineering is ultimately futile
>> and hopeless.
> I am not arguing that it is always futile and hopeless. I am arguing
> that there are cases where it is futile and hopeless. For example:
>
> if( (2+2) != 4 )
> {
> /* what code can you possibly put here? */
> }
abort();
:-)
You're unlikely to ever see this happen, of course, (nevermind the fact
that its so unlikely that writing this code in the first place is a
bizarre waste of time and we're in "silly territory" no matter where we
go with it). Even so, if it happens it doesn't mean "life as we know it
is over"; it just means you have a broken compiler or machine.
Then again, you got this far. The OS started; and that's an enormous
amount of code. It mapped your process address space and loaded your
binary presumably more or less as you had compiled it, and began
execution. Unless this is the first statement in main() (and there are
no static constructors, etc.), even your application has probably been
"doing stuff" successfully for what in computer terms amounts to "a
while". Why would you conceivably want to imagine that there's some
sudden universal failure mode and nothing SUBSEQUENT to this statement
will work, after so much before has?
Now maybe just this nanosecond the computer got gamma-rayed into "the
Computerized Hulk" and all is truly hopeless. Nothing you do will be
worse than doing nothing, however.
And most of the time it probably means some much more subtle Pentium add
bug that will allow reasonable cleanup actions to proceed normally.
Look at it this way. A customer might well accept "I did everything
reasonable to try to handle the failure, but really what would you
expect me to accomplish when 2 + 2 doesn't equal 4?" ("Sorry I crashed
your plane, or your spaceship", has a less happy ring to it, but let's
not go there.) I expect the customer might be just slightly less happy
with "The error that just destroyed your financial database is
impossible, so I didn't bother trying to deal with it."
Covering your eyes may save you from the Ravenous Bugbladder Beast of
Traal; but it's rarely a good strategy anywhere else. ;-)
> }
>
> This is case where there is nothing sane to put there, and your best
> bet is simply to eliminate the test. Why waste time calculating the
> sum and comparing it if there is nothing sane you can do if the
> comparison fails?
>
> I am saying that this case is just like that case. (With the sole
> exception of ENOMEM on first acquisition.)
Sure, in some radically extreme cases any error (or for that matter a
LACK of any error) might really mean "the sky is falling". If you're
going to live your life and design your code assuming that, then the
message is exactly as I somewhat whimsically paraphrased it. You can't
count on anything, you can't do anything, "game over".
The point is that most of the time the failure is local and constrained,
and you CAN clean up. You owe to yourself and your users the time to
analyze the consequences and design a solution. Sometimes an immediate
core dump is the safest failure mode, if you can manage that. Sometimes
you're best off trying a graceful shutdown. Making those evaluations and
decisions, and implementing and testing them, is the soul of software
engineering. That's the point, the whole point, and nothing but the point.
There simply isn't any single universal rule for these cases; and "throw
up your hands in despair" doesn't sound like a very good strategy for
much of anything, engineering or not.
And you seem to be proceeding from the axiom that there is no middle
ground between "something might still be done" and "any attempt to react
to the problem will fail". But even then, why it's then less futile to
continue running than to try to react to the error is still a mystery.
You said earlier that even when checking for mutex lock error, one
should explicitly pick out the error which "can happen" (ENOMEM) - and
presumably deliberatly ignore the rest? Since ignoring "impossible"
errors is what you say one should do. Except you missed one other
"possible" error condition, even ignoring bugs in the program.
>> For example:
>>
>> if( (2+2) != 4 )
>> {
>> /* what code can you possibly put here? */
>> }
>
> abort();
Which happens to be an entirely normal thing to do - it's called
assert() and is commonly used to test for errors that "can't happen".
Amazingly, I've seen an assert(<mutex operation succeeded as it must>)
successfully report an error. For some reason the hosed mutex didn't
prevent abort() from working, and it didn't even hunt down stderr's
mutex and prevent the error message. Which is apparently what is so
likely that one might as well quit trying.
--
Hallvard
That's a really bad idea. The 'abort' command sends a fatal signal to
a process. It's supposed to send it to this process, but if the cached
copy of the current process' PID was corrupted, this will send a fatal
signal to an arbitrary process.
The 'abort' call might help, or it might hurt. I defy you to make a
coherent argument about what the odds of it hurting are and what the
odds of it helping are.
DS
> You're unlikely to ever see this happen, of course, (nevermind the fact
> that its so unlikely that writing this code in the first place is a
> bizarre waste of time and we're in "silly territory" no matter where we
> go with it). Even so, if it happens it doesn't mean "life as we know it
> is over"; it just means you have a broken compiler or machine.
You already made my point for me. Writing code that checks for "silly
territory" errors is "a bizarre waste of time". That's all I was
trying to say from the beginning.
DS
> > If the initialization is not truly static, the library must take care
> > of that behind the scenes, so it must be thread-safe. Yes, your code
> > is "guaranteed" to work, except that pthread_mutex_lock can fail for a
> > reason internal to the implementation, so you need to check the return
> > value for failure.
> I wouldn't bother. There's nothing sane you can do when
> pthread_mutex_lock fails.
Humm...
> This would likely be an indication of some
> kind of memory corruption or fatal error condition. You can't do
> anything sane if locking a mutex fails for no application-level
> reason.
Is EOWNERDEAD going to be standardized?
http://groups.google.com/group/comp.lang.c++/msg/90f8dfcc01d387ef
[...]
I hesitate to get into this discussion because my posts are going to be
separated by a number of days... Anyway, IMHO, you NEED to be able to at
least attempt a recovery from process-wide mutex locking failures; its not a
bizarre waste of time, indeed its required... EOWNERDEAD and WAIT_ABANDONED
can be recoverable errors; please read this entire thread before responding:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/b5775d829f3f1259
Thanks.
You argue that anything can and will happen, that the least conceivable
and most bizarre universal failure mode is at least as likely as the
trivial and obvious local failure mode, and therefore that there's no
point in doing anything; all is lost before you start.
That's just silly. Why are you even saying it? Are you trying to make
some joke we're not getting? Or are you just having a really bad week of
debugging??? ;-)
> You argue that anything can and will happen, that the least conceivable
> and most bizarre universal failure mode is at least as likely as the
> trivial and obvious local failure mode, and therefore that there's no
> point in doing anything; all is lost before you start.
Yes, in this one particular case. If you look at the code, the
"trivial and obvious local failure mode" is also impossible.
> That's just silly. Why are you even saying it? Are you trying to make
> some joke we're not getting? Or are you just having a really bad week of
> debugging??? ;-)
I am saying that there is no point in testing for "can't ever happen"
errors. It's a waste of time and effort. And if you did test for such
an error, you couldn't do anything rational once you detected them.
The point I am responding to is the claim that in this particular
case, you "need to" check for errors.
DS
You're being obtuse, and I don't know why.
PTHREAD_MUTEX_INITIALIZER is applied at compile/load time, long before
program execution may reach the scope where the mutex is first used.
(The OP was using a function-scoped static.)
There are a wide variety of things that could corrupt that data without
affecting, for example, any libc data.
It absolutely is NOT a "cannot happen" (I've seen this often enough in
real code), and in nearly all instances it will be perfectly feasible to
follow normal error handling and graceful shutdown procedures. While
there are certainly going to be some cases where you can't clean up,
that's true for anything. If something scribbles all libc data, you're
not going to be able to recover (or continue) from a malloc or fopen
failure, either.
You seem to be arguing, therefore, and quite insistently, that because
there will always be some small class of errors from which you cannot
recover, that you should never attempt recovery, or even bother testing
for, ANY error at all. I don't see any boundaries or conditions in your
"advice" that make even the slightest bit of sense. There's simply
nothing "special" about locking a mutex, or even the first lock of a
statically initialized mutex, that would make any difference here. You
can't use malloc'ed memory that didn't malloc, or an opened file that
didn't open, or a pipe that didn't create.
If you fail to acquire ANY resource, you can't continue as if nothing
happened. Some failure modes may imply catastrophic corruption of "the
system" from which recovery will prove impossible or infeasible; but
there's no general class of errors for which this will always be true,
and you really can't know the implications of any particular instance
until you try. If recovery fails, at least you've made the attempt. And,
yes, there will even be some extreme failures where any attempt to clean
up might make things even worse. But failing to TRY isn't going to make
it better, either.
So either throw your hands up in the air and shout "I can't write any
reliable code because it might fail", or do the engineering.
"To fail to try is to try to fail," to quote an old aphorism.
> I have a function that is supposed to run only once in my program.
> Though it might be called "simultaneously" by two or more different
> threads. Is the below implementation guaranteed to work? I ask because
> while googling for PTHREAD_MUTEX_INITIALIZER, I read that in some
> cases this might not be a real static initialization but just a hint
> to do initialization on the first call to pthread_mutex_lock. If this
> is the case, what happens if two threads enter this "hidden"
> initialization routine? Is it supposed to be thread-safe?
Yes, as already indicated by others. It would be rather pointless if not.
> Second question: is pthread_once supposed to call a void ( void )
> function with "C" linkage? Is it portable to pass as an argument to
> pthread_once a C++ static member function?
That is compiler dependent. Theoretically, the calling convention of
static C++ member functions and functions with C linkage could be
different, even if both got compiled by the same compiler. However, in
practice my experience is that if you use the same compiler (and the
same version, same settings...) then this is works - on all platforms
I worked with.
So long,
Thomas
> You seem to be arguing, therefore, and quite insistently, that because
> there will always be some small class of errors from which you cannot
> recover, that you should never attempt recovery, or even bother testing
> for, ANY error at all. I don't see any boundaries or conditions in your
> "advice" that make even the slightest bit of sense. There's simply
> nothing "special" about locking a mutex, or even the first lock of a
> statically initialized mutex, that would make any difference here. You
> can't use malloc'ed memory that didn't malloc, or an opened file that
> didn't open, or a pipe that didn't create.
The boundary is the error being one that can't ever happen. You do not
need to check for "can't ever happen" errors and it is absurd to
insist on doing so. Specifically, the claim that you "must" check for
errors from pthread_mutex_lock, even in a case where clearly no errors
are possible, is absurd.
You must check for errors when there are defined errors that could
happen in your code. It may be nice to check for errors that could
happen if there are bugs in your code, but it's not required. Being
bug-resistant or corruption-resistant may be nice, but it's not a
must.
> If you fail to acquire ANY resource, you can't continue as if nothing
> happened. Some failure modes may imply catastrophic corruption of "the
> system" from which recovery will prove impossible or infeasible; but
> there's no general class of errors for which this will always be true,
> and you really can't know the implications of any particular instance
> until you try. If recovery fails, at least you've made the attempt. And,
> yes, there will even be some extreme failures where any attempt to clean
> up might make things even worse. But failing to TRY isn't going to make
> it better, either.
This argument can't be right. It would suggest that any program that
catches whatever fatal signal an OS sends when it gets a page fault it
can't resolve due to physical memory exhaustion is broken. That's
clearly not true.
It is certainly important to check for errors you actually expect due
to foreseeable events, such as 'malloc' returning NULL. It is useful
to check for errors that might indicate corruption in your code or
buggy states in a library, but it is absurd to argue that you "must".
> So either throw your hands up in the air and shout "I can't write any
> reliable code because it might fail", or do the engineering.
>
> "To fail to try is to try to fail," to quote an old aphorism.
You've already agreed that double-checking arithmetic (after all, the
adder might have failed) is absurd. It's just a question of where the
line is drawn.
DS
>>If you fail to acquire ANY resource, you can't continue as if nothing
>>happened. ... If recovery fails, at least you've made the attempt. And,
>>yes, there will even be some extreme failures where any attempt to clean
>>up might make things even worse. But failing to TRY isn't going to make
>>it better, either.
> This argument can't be right. It would suggest that any program that
> catches whatever fatal signal an OS sends when it gets a page fault it
> can't resolve due to physical memory exhaustion is broken. That's
> clearly not true.
I don't follow your train of thought. How does "verify whether you
successfuly acquired a resource" lead to "don't register to be notified
if we can't satisfy a page fault"?
If we consider pthread_mutex_lock() as "trying to acquire a resource",
it makes sense to check whether it actually succeeded if we care about
correct operation.
> It is certainly important to check for errors you actually expect due
> to foreseeable events, such as 'malloc' returning NULL. It is useful
> to check for errors that might indicate corruption in your code or
> buggy states in a library, but it is absurd to argue that you "must".
Now you're saying it's useful but not required, but before you said,
"It's a waste of time and effort. And if you did test for such an error,
you couldn't do anything rational once you detected them." Have you
changed your mind based on this thread?
In my own view it all depends what level of correctness we're aiming
for. I work on systems were we monitor for PCI device faults,
uncorrectable ECC faults on memory and cpu busses, SONET and ethernet
faults, disk controller faults, etc. Checking that we actually got the
lock that we asked for makes sense in those circumstances.
For a quick-n-dirty prototype, I probably wouldn't bother.
Chris
Except that, as has mostly been mentioned already (several times):
- There practically is no such thing, since there can be a compiler or
hardware bug if nothing else.
- You are the only one pretending that a test like ((2+2) != 4) is
particularly relevant, the rest of us are talking about code people
might actually write.
- Less silly tests for errors that clearly "can't happen" can and do
catch bugs - in the program or some component it depends on.
Because programmers are wrong all the time about what "can't happen".
Or they were right at the time they wrote the code, but then the
OS/hardware/version/whatever invented a new failure mode.
- As you demonstrated by recommending to ignore errors which _can_
happen without any bugs, in the code which started this thread.
- And demonstrated again by then recommending that even code which
handles a detected error in that mutex lock, should only handle "the"
error which validly "can happen" - except you missed that there may be
several such valid errors.
- Which would be purely silly anyway: If it makes no sense to test for
an error just because it "can't happen", it makes even less sense to
explicitly test for and exclude it from code which handles a detected
error.
> Specifically, the claim that you "must" check for errors from
> pthread_mutex_lock, even in a case where clearly no errors are
> possible, is absurd.
Another straw man. A quick grep for "must"s in this thread finds no such
claim.
> You must check for errors when there are defined errors that could
> happen in your code.
That's the way "must" has been used a few times in this thread. Which
isn't strictly true either. Depends on how sloppy it's OK for the
program to be vs how serious the error would be. And you yourself have
been saying, on whether you are able to do anything sensible if you do
detect an error.
>> If you fail to acquire ANY resource, you can't continue as if nothing
>> happened. Some failure modes may imply catastrophic corruption of "the
>> system" from which recovery will prove impossible or infeasible; but
>> there's no general class of errors for which this will always be true,
>> and you really can't know the implications of any particular instance
>> until you try. If recovery fails, at least you've made the attempt. And,
>> yes, there will even be some extreme failures where any attempt to clean
>> up might make things even worse. But failing to TRY isn't going to make
>> it better, either.
>
> This argument can't be right. It would suggest that any program that
> catches whatever fatal signal an OS sends when it gets a page fault it
> can't resolve due to physical memory exhaustion is broken.
Huh? A program which doesn't do that doesn't do much of recovery in
such a situation either, it just crashes. OTOH if you do catch such
signals, presumably you have a reason for it. If your reason for doing
something - anything - makes it unreasonably hard to catch some types of
errors, then you don't catch those errors. Or if that's not good
enough, you have a watcher program which deals with the program failure.
You keep inventing weird near-theoretical examples which seem to have
little to do with what anyone is talking about. Or maybe I
misunderstood which way you did so - I presume the "class of errors"
above should be "class" as in "classifiable by the program". As
opposed to the error class "the computer was obliterated", for example.
--
Hallvard
This
<http://en.wikipedia.org/wiki/Assertion_(computing)>
might help. :-)
"Since assertions are primarily a development tool, they are often
disabled when a program is released to the public. ... The removal of
assertions from production code is almost always done automatically."
To DS: care to file a bug report against
PTHREAD_MUTEX_INITIALIZER->pthread_mutex_lock() to allow
pthread_mutex_lock() a non-assertion failure mode for lazy initialized
PTHREAD_MUTEX_INITIALIZER'd mutexes?
regards,
alexander.