timed_mutex -- but it acquires the lock

Frederick Virchanza Gotham

unread,

Feb 7, 2023, 4:25:19 AM2/7/23

to

Since C++11, we already have "std::timed_mutex". It will try to acquire the lock for N seconds and fail, and you can handle the failure.

I want to make another kind of timed mutex, called something like "std::timed_rescuable_mutex", but the difference will be that when you invoke "try_lock_for", it will wait for N seconds, and then if the mutex is still locked, it will kill the thread that locked it, and then unlock the mutex.

Last week I posted code here for a mutex that could be unlocked by a thread other than the one that locked it:

https://groups.google.com/g/comp.lang.c++/c/HtUKQlNUI50

Similarly, I would use an "std::atomic_flag" as a mutex that can be unlocked by any thread.

A well-written program should work properly 99.9% of the time. There will always be unforeseen problems like a hardware failure, or a yet-to-be-discovered bug in a third-party library, or an operating system bug, or another process is set to a high priority with high CPU usage. This is why, when I'm finished writing a program that works very well, I try to add extra robustness to it in order to try make it recover gracefully from unforeseen errors. This means I might add in code to kill a thread that has become unresponsive. This is why I'm thinking I should write a 'timed_rescuable_mutex". In my code I would then use it with "std::unique_lock::try_lock_for".

Mut...@dastardlyhq.com

unread,

Feb 7, 2023, 4:40:20 AM2/7/23

to

On Tue, 7 Feb 2023 01:25:11 -0800 (PST)
Frederick Virchanza Gotham <cauldwel...@gmail.com> wrote:
>Since C++11, we already have "std::timed_mutex". It will try to acquire the=

> lock for N seconds and fail, and you can handle the failure.
>

>I want to make another kind of timed mutex, called something like "std::tim=
>ed_rescuable_mutex", but the difference will be that when you invoke "try_l=
>ock_for", it will wait for N seconds, and then if the mutex is still locked=

>, it will kill the thread that locked it, and then unlock the mutex.
>

>Last week I posted code here for a mutex that could be unlocked by a thread=

> other than the one that locked it:
>
> https://groups.google.com/g/comp.lang.c++/c/HtUKQlNUI50
>

>Similarly, I would use an "std::atomic_flag" as a mutex that can be unlocke=
>d by any thread.

Huh? If any thread can unlock a mutex then whats the point of having it in
the first place?

Bo Persson

unread,

Feb 7, 2023, 4:48:24 AM2/7/23

to

On 2023-02-07 at 10:25, Frederick Virchanza Gotham wrote:
>
> Since C++11, we already have "std::timed_mutex". It will try to acquire the lock for N seconds and fail, and you can handle the failure.
>
> I want to make another kind of timed mutex, called something like "std::timed_rescuable_mutex", but the difference will be that when you invoke "try_lock_for", it will wait for N seconds, and then if the mutex is still locked, it will kill the thread that locked it, and then unlock the mutex.
>

Killing a thread at a random position seems to be a perfect way to enter
an unspecified state for the program.

What if that thread is halfway through reallocating a vector?

Paavo Helde

unread,

Feb 7, 2023, 5:24:11 AM2/7/23

to

07.02.2023 11:25 Frederick Virchanza Gotham kirjutas:

>
> A well-written program should work properly 99.9% of the time.

So if our customer wants to analyze yet another batch of their 100,000
microscope images, the program would randomly fail on 100 images. Not a
great selling point.

Frederick Virchanza Gotham

unread,

Feb 7, 2023, 8:47:02 AM2/7/23

to

On Tuesday, February 7, 2023 at 9:48:24 AM UTC, Bo Persson wrote:
>
> Killing a thread at a random position seems to be a perfect way to enter
> an unspecified state for the program.
>
> What if that thread is halfway through reallocating a vector?

> Paavo Helde wrote:
>
> So if our customer wants to analyze yet another batch of their 100,000
> microscope images, the program would randomly fail on 100 images. Not a
> great selling point.

I would consider what the estimated time is for the task. If one thread is processing 800 images, and if each image should take between 7 and 9 milliseconds, then the whole lot should take between 5.6 seconds and 7.2 seconds. After 10 seconds I can check if the thread is still responsive. After another 5 seconds I can kill it and discard all of its work.

Chris M. Thomasson

unread,

Feb 7, 2023, 3:13:14 PM2/7/23

to

The thread that locks a mutex MUST be the thread that unlocks it.
Period. If you need something different than that, use a binary
semaphore or something.

Chris M. Thomasson

unread,

Feb 7, 2023, 3:14:31 PM2/7/23

to

Humm... Are you just starting out with threading?

Chris M. Thomasson

unread,

Feb 7, 2023, 3:22:37 PM2/7/23

to

It's horrible. Well, I had to deal with robust mutexes many moons ago.
If a processed dies while holding a mutex, we can try to recover and
repair its state:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_setrobust.html

Windows even has WAIT_ABANDONED.

Have fun! It can be done, but is it not exactly, fun...

>
> What if that thread is halfway through reallocating a vector?
>

Yikes!

Frederick Virchanza Gotham

unread,

Feb 7, 2023, 5:18:02 PM2/7/23

to

On Tuesday, February 7, 2023 at 8:22:37 PM UTC, Chris M. Thomasson wrote:

> It's horrible. Well, I had to deal with robust mutexes many moons ago.

I've worked with robust mutexes in Linux; if the thread that acquired the lock dies, the lock is released.

However I'm trying to accommodate a situation in which a worker thread freezes -- the thread's still alive but it's caught in an infinite loop or a blocking function won't return.

In your previous post you said that the thread that unlocks a mutex must be that one had locked it ... well I'll use an atomic_flag like I described in the link I gave in my original post.

Chris M. Thomasson

unread,

Feb 7, 2023, 5:24:47 PM2/7/23

to

On 2/7/2023 2:17 PM, Frederick Virchanza Gotham wrote:
> On Tuesday, February 7, 2023 at 8:22:37 PM UTC, Chris M. Thomasson wrote:
>
>> It's horrible. Well, I had to deal with robust mutexes many moons ago.
>
>
> I've worked with robust mutexes in Linux; if the thread that acquired the lock dies, the lock is released.

No. Not threads. Processes. If a thread dies in a process it will take
down the whole process.

Chris M. Thomasson

unread,

Feb 7, 2023, 5:29:10 PM2/7/23

to

On 2/7/2023 2:17 PM, Frederick Virchanza Gotham wrote:

Huh? Are you creating a mutex or a binary semaphore? They are quite
different.

Chris M. Thomasson

unread,

Feb 7, 2023, 5:49:44 PM2/7/23

to

On 2/7/2023 2:24 PM, Chris M. Thomasson wrote:
> On 2/7/2023 2:17 PM, Frederick Virchanza Gotham wrote:
>> On Tuesday, February 7, 2023 at 8:22:37 PM UTC, Chris M. Thomasson wrote:
>>
>>> It's horrible. Well, I had to deal with robust mutexes many moons ago.
>>
>>
>> I've worked with robust mutexes in Linux; if the thread that acquired
>> the lock dies, the lock is released.
>
> No. Not threads. Processes. If a thread dies in a process it will take
> down the whole process.
>
>
>>
>> However I'm trying to accommodate a situation in which a worker thread
>> freezes -- the thread's still alive but it's caught in an infinite
>> loop or a blocking function won't return.

[...]

That is a lot different than a thread up and crashing in a process. Why
do you think a mutex would help you with that? Something reeks of a bad
design. Using robust mutexes for a single process does not make any
sense at all.

Frederick Virchanza Gotham

unread,

Feb 7, 2023, 5:51:42 PM2/7/23

to

On Tuesday, February 7, 2023 at 10:24:47 PM UTC, Chris M. Thomasson wrote:

> No. Not threads. Processes. If a thread dies in a process it will take
> down the whole process.

I used the following function:

https://man7.org/linux/man-pages/man3/pthread_mutexattr_setrobust.3.html

Here's an excerpt:

"The robustness attribute specifies the behavior of the mutex when the owning thread dies without unlocking the mutex."

Chris M. Thomasson

unread,

Feb 7, 2023, 5:53:17 PM2/7/23

to

You might be interested in cancellation points and pthread_cancel, but
its a bitch and a half to use. asynchronous or deferred canceling. There
are better ways. Start with the actual design. asynchronous
cancelability should be avoided at all costs! deferred makes more sense,
but still...

Chris M. Thomasson

unread,

Feb 7, 2023, 5:54:00 PM2/7/23

to

If the owning thread dies, it means the process has died. robust mutexes
are for inter process sync.

Scott Lurndal

unread,

Feb 7, 2023, 6:01:45 PM2/7/23

to

The owning thread could exit (or be cancelled) without releasing the mutex
and without killing the process.

That said, I think the application should be designed such that robust
mutexes are not required; Just because a robust mutex automatically
becomes unlocked - that doesn't prevent the state protected by the mutex
from being partially updated and hence buggy when the robust mutex was released.

If the dead thread had been in the middle of a linked list update, for
example, the list could end up with a loop or sans a bunch of entries.

Chris M. Thomasson

unread,

Feb 7, 2023, 7:06:46 PM2/7/23

to

On 2/7/2023 3:01 PM, Scott Lurndal wrote:
> "Chris M. Thomasson" <chris.m.t...@gmail.com> writes:
>> On 2/7/2023 2:51 PM, Frederick Virchanza Gotham wrote:
>>> On Tuesday, February 7, 2023 at 10:24:47 PM UTC, Chris M. Thomasson wrote:
>>>
>>>> No. Not threads. Processes. If a thread dies in a process it will take
>>>> down the whole process.
>>>
>>> I used the following function:
>>>
>>> https://man7.org/linux/man-pages/man3/pthread_mutexattr_setrobust.3.html
>>>
>>> Here's an excerpt:
>>>
>>> "The robustness attribute specifies the behavior of the mutex when the owning thread dies without unlocking the mutex."
>>
>> If the owning thread dies, it means the process has died. robust mutexes
>> are for inter process sync.
>
> The owning thread could exit (or be cancelled) without releasing the mutex
> and without killing the process.

Touche. Yeah, well, I was thinking of died as in seg fault without
handling it. I have never had a use for the following:

https://linux.die.net/man/3/pthread_kill

Never had any use for pthread_cancel. I have only used robust mutexes
for inter process communication. Using them for anything else is really
strange to me. Threads just up and dying in a process is NOT kosher, in
my humble opinion. However, a process dying is a different story.

> That said, I think the application should be designed such that robust
> mutexes are not required; Just because a robust mutex automatically
> becomes unlocked - that doesn't prevent the state protected by the mutex
> from being partially updated and hence buggy when the robust mutex was released.
>
> If the dead thread had been in the middle of a linked list update, for
> example, the list could end up with a loop or sans a bunch of entries.

Yup. Iirc, I had to do some tricky rollback operations back when I had
to use them. A process dying had to be handled. I never had to rely on
robust mutexes for intra process sync, only inter process.

Chris M. Thomasson

unread,

Feb 7, 2023, 7:08:20 PM2/7/23

to

On 2/7/2023 1:25 AM, Frederick Virchanza Gotham wrote:
>

> Since C++11, we already have "std::timed_mutex". It will try to acquire the lock for N seconds and fail, and you can handle the failure.
>
> I want to make another kind of timed mutex, called something like "std::timed_rescuable_mutex", but the difference will be that when you invoke "try_lock_for", it will wait for N seconds, and then if the mutex is still locked, it will kill the thread that locked it, and then unlock the mutex.

[...]

My god, that is a bad idea.

Frederick Virchanza Gotham

unread,

Feb 8, 2023, 1:37:25 PM2/8/23

to

On Wednesday, February 8, 2023 at 12:08:20 AM UTC, Chris M. Thomasson wrote:
> > I want to make another kind of timed mutex, called something like "std::timed_rescuable_mutex", but the difference will be that when you invoke "try_lock_for", it will wait for N seconds, and then if the mutex is still locked, it will kill the thread that locked it, and then unlock the mutex.
> [...]
>
> My god, that is a bad idea.

is it worse than the entire program locking up because one thread has frozen?

I'm proposing a solution that is less bad than the problem. Better to kill a frozen thread and discard its work, than have the entire program lock up.

Paavo Helde

unread,

Feb 8, 2023, 1:52:43 PM2/8/23

to

If your program has a bug so that a thread gets stuck, then find and fix
the bug. Do not try to hide it or to work around it, it will just cause
endless pain and in the end you still need to fix the bug.

Do not forget to create an automated test case to ensure the bug is gone
and does not come back.

Ah, I know you won't listen to me anyway, but maybe others do.

Chris M. Thomasson

unread,

Feb 8, 2023, 3:36:31 PM2/8/23

to

On 2/8/2023 10:37 AM, Frederick Virchanza Gotham wrote:
> On Wednesday, February 8, 2023 at 12:08:20 AM UTC, Chris M. Thomasson wrote:
>>> I want to make another kind of timed mutex, called something like "std::timed_rescuable_mutex", but the difference will be that when you invoke "try_lock_for", it will wait for N seconds, and then if the mutex is still locked, it will kill the thread that locked it, and then unlock the mutex.
>> [...]
>>
>> My god, that is a bad idea.
>
> is it worse than the entire program locking up because one thread has frozen?

When you wrote that a thread can wind up getting itself into an infinite
loop, what exactly do you mean? A loop that is just running burning up
the CPU doing nothing?

for (;;)
{

}

?

> I'm proposing a solution that is less bad than the problem. Better to kill a frozen thread and discard its work, than have the entire program lock up.

Why are your threads getting "frozen" and how does it end up locking up
the entire program? Don't tell me you are holding a mutex while the
thread blocks on io... That is a "classic" bad design issue.

Chris M. Thomasson

unread,

Feb 8, 2023, 3:38:16 PM2/8/23

to

On 2/8/2023 10:52 AM, Paavo Helde wrote:
> 08.02.2023 20:37 Frederick Virchanza Gotham kirjutas:
>> On Wednesday, February 8, 2023 at 12:08:20 AM UTC, Chris M. Thomasson
>> wrote:
>>>> I want to make another kind of timed mutex, called something like
>>>> "std::timed_rescuable_mutex", but the difference will be that when
>>>> you invoke "try_lock_for", it will wait for N seconds, and then if
>>>> the mutex is still locked, it will kill the thread that locked it,
>>>> and then unlock the mutex.
>>> [...]
>>>
>>> My god, that is a bad idea.
>>
>> is it worse than the entire program locking up because one thread has
>> frozen?
>>
>> I'm proposing a solution that is less bad than the problem. Better to
>> kill a frozen thread and discard its work, than have the entire
>> program lock up.
>
> If your program has a bug so that a thread gets stuck, then find and fix
> the bug. Do not try to hide it or to work around it, it will just cause
> endless pain and in the end you still need to fix the bug.

Big time.

> Do not forget to create an automated test case to ensure the bug is gone
> and does not come back.

Taking the time to model a synchronization scheme in Relacy is a good thing:

https://github.com/dvyukov/relacy

Frederick Virchanza Gotham

unread,

Feb 8, 2023, 6:14:45 PM2/8/23

to

On Wednesday, February 8, 2023 at 6:52:43 PM UTC, Paavo Helde wrote:

> If your program has a bug so that a thread gets stuck, then find and fix
> the bug. Do not try to hide it or to work around it, it will just cause
> endless pain and in the end you still need to fix the bug.

There hasn't been a bug reported yet.

> Do not forget to create an automated test case to ensure the bug is gone
> and does not come back.

I'm trying to make my program more robust and resilient, and so even though it seems to work fine in the handful of hours I test it, I don't assume that it always works perfectly in the dozens of hours that other people use it. I haven't had a thread freeze up on me yet......... but if it does happen in the future, I'd like to be ready for it.

Of course thought I might make my program *less* reliable by adding in precautionary stuff.

First I'll write the code for 'timeout_mutex' and see how it looks.

Frederick Virchanza Gotham

unread,

Feb 8, 2023, 8:33:57 PM2/8/23

to

On Wednesday, February 8, 2023 at 11:14:45 PM UTC, Frederick Virchanza Gotham wrote:

> First I'll write the code for 'timeout_mutex' and see how it looks.

Here's the beginnings:

https://godbolt.org/z/6rj355G7c

Of course there are race conditions in the setting of 'id' and 't' but I'll spend more time on it.

Öö Tiib

unread,

Feb 8, 2023, 8:45:59 PM2/8/23

to

On Thursday, 9 February 2023 at 01:14:45 UTC+2, Frederick Virchanza Gotham wrote:
> On Wednesday, February 8, 2023 at 6:52:43 PM UTC, Paavo Helde wrote:
>
> > If your program has a bug so that a thread gets stuck, then find and fix
> > the bug. Do not try to hide it or to work around it, it will just cause
> > endless pain and in the end you still need to fix the bug.
>
> There hasn't been a bug reported yet.

If there is a bug then such tricks can only make its discovering harder.
Your program has diagnosed its own madness of being stuck. Instead
of alerting and asking for aid it tries to hide it and do brain surgery to
itself. The result of that is very unlikely to be correct.

Paavo Helde

unread,

Feb 9, 2023, 2:14:25 AM2/9/23

to

08.02.2023 20:37 Frederick Virchanza Gotham kirjutas:

BTW, having a program locked up is perfect, because then you can attach
a debugger, study the thread backtraces and can figure out the bug
easily. I just did that a couple of days ago. The most problematic part
was the setup of a remote debugger as the bug only appeared on a single
particular Windows machine which did not have VS installed.

An alternative would be to kill the whole process and to create a core
dump or a minidump for later examination. Killing a single thread on
spot is not a solution, because that would leave the process state in an
unknown and potentially invalid state, plus it would hide the bug and
make fixing it harder.

There are ways to cleanly shut down a thread, but this requires
cooperation from the thread itself. In C++ this can be done by e.g.
std::jthread::request_stop().

David Brown

unread,

Feb 9, 2023, 3:23:38 AM2/9/23

to

On 09/02/2023 00:14, Frederick Virchanza Gotham wrote:
> On Wednesday, February 8, 2023 at 6:52:43 PM UTC, Paavo Helde wrote:
>
>> If your program has a bug so that a thread gets stuck, then find
>> and fix the bug. Do not try to hide it or to work around it, it
>> will just cause endless pain and in the end you still need to fix
>> the bug.
>
>
> There hasn't been a bug reported yet.
>
>
>> Do not forget to create an automated test case to ensure the bug is
>> gone and does not come back.
>
>
> I'm trying to make my program more robust and resilient, and so even
> though it seems to work fine in the handful of hours I test it, I
> don't assume that it always works perfectly in the dozens of hours
> that other people use it. I haven't had a thread freeze up on me
> yet......... but if it does happen in the future, I'd like to be
> ready for it.
>

Certainly bugs can happen in code - no developer gets things right /all/
the time. Put your emphasis on making it easier to write correct code,
harder to write incorrect code, and easier to find the bugs, rather than
trying to add software methods to handle software errors.

> Of course thought I might make my program *less* reliable by adding
> in precautionary stuff.
>

Exactly - it's good that you realise that. Such check features are
usually untestable, and they will almost certainly make things worse.

> First I'll write the code for 'timeout_mutex' and see how it looks.

A half dozen experienced developers have told you it is a terrible idea.
Consider taking the hint.

Frederick Virchanza Gotham

unread,

Feb 9, 2023, 4:24:36 AM2/9/23

to

I do extreme testing on my programs. I compile them with "-fsanitize=address" and then deliberately try to get them to crash (for example send them characters strings when they're expecting numbers, or click a push button 20 times in 8 seconds, or delete the open file while they're reading from it, or pull out the USB cable while they're communicating).

But let's say that I write a program perfectly. Let's say that I've accounted for every possible corner case, and that my program will always do its job properly. Now let's say in one of the worker threads, I have:

int Thread_Entry_Point(void *const arg)
{
intptr_t const f = reinterpret_cast<intptr_t>(arg); // file descriptor

struct timeval timeout;
timeout.tv_sec = 0;
timeout.tv_usec = 10000;

int const rv = select(f, nullptr, nullptr, nullptr, &timeout);

if (rv == -1)
perror("select"); /* an error accured */
else if(rv == 0)
printf("timeout"); /* a timeout occured */
else
read(f, global_buf, sizeof global_buf); /* there was data to read */
}

Here's what it says in the Linux programmer's manual about the 'select' function:

"The timeout argument is a timeval structure
that specifies the interval that select() should block
waiting for a file descriptor to become ready. The call
will block until either:
• a file descriptor becomes ready;
• the call is interrupted by a signal handler; or
• the timeout expires."

Now let's say that the 'select' function, around 1 time in 650,000, never returns. Let's say the user plugs a USB stick into their laptop just as 'select' is called, or there's a CPU spike at that moment, or just some other unforeseen event that causes 'select' to block forever. I'm trying to account for circumstances such as these -- stuff like operating system issues. Computers aren't as perfect as some people think they are, I mean if you copy one terabyte of data from one place to another, you'll have at least 1 bit wrong.

If I write code for a worker thread that never ever ever should take more than 5 seconds to do its job, then killing it at 10 seconds, discarding its data and releasing its locks is better than allowing it to keep the lock and keep the entire program frozen.

On Thursday, February 9, 2023 at 8:23:38 AM UTC, David Brown wrote:
>
> A half dozen experienced developers have told you it is a terrible idea.
> Consider taking the hint.

I do take advice and listen to other people's ideas, but I'm also aware of my own creativity. There's stuff I never would have pulled off if I'd acquiesced to everyone else -- the 'fat binary' for Windows and Linux being one example.

Frederick Virchanza Gotham

unread,

Feb 9, 2023, 5:06:22 AM2/9/23

to

On Thursday, February 9, 2023 at 9:24:36 AM UTC, Frederick Virchanza Gotham wrote:
>
> If I write code for a worker thread that never ever ever should take more than 5 seconds to do its job,
> then killing it at 10 seconds, discarding its data and releasing its locks is better than allowing it to keep
> the lock and keep the entire program frozen.

I did a little more work on it. You can see it on GodBolt here:

https://godbolt.org/z/8G6s6Y3dx

and also I've copy-pasted it:

#include <cassert> // assert
#include <cstdint> // uint_fast32_t
#include <atomic> // atomic_flag, atomic<T>
#include <mutex> // mutex, condition_variable
#include <chrono> // steady_clock, milliseconds, years
#include <condition_variable> // condition_variable

#include <pthread.h> // pthread_t, pthread_equal, pthread_self
#include <signal.h> // pthread_kill

class timeout_mutex {

// Flag to be used as the main lock (like a mutex)
std::atomic_flag flag = ATOMIC_FLAG_INIT;

// Combination of mutex and condition_variable
// for waiting for a specific time interval
std::mutex mtx_for_cv{};
std::condition_variable cv{};

// Information about the currently-held lock
std::atomic<pthread_t> id{};
std::atomic<std::chrono::time_point<std::chrono::steady_clock>> t =
std::chrono::steady_clock::now() + std::chrono::years(65535u);

// A flag to be used as a lock to ensure that two threads
// don't simultaneously try to kill another thread
std::atomic_flag flag_for_killing = ATOMIC_FLAG_INIT;

public:
void lock_for_max(std::uint_fast32_t const millis) noexcept(false)
{
for (;;)
{
using namespace std::chrono;

if ( false == flag.test_and_set() ) // Try to acquire main lock
{
id = pthread_self();
t = steady_clock::now() + milliseconds(millis);
return;
}

// If control reaches here, we failed to acquire the lock
// So now we wait until one of two things happen:
// 1) The lock becomes available
// 2) The timeout expires

{ // Braces to ensure reduced scope for objects within
std::unique_lock<std::mutex> mylock(mtx_for_cv);
cv.wait_until(mylock,t.load()); // might get spurious wake-up here
}

if ( t.load() >= steady_clock::now() ) continue; // No killing before timeout expired
if ( false == flag.test() ) continue; // No killing if lock has become available

// If control reaches here, we should kill the thread,
// however two threads might simultaneously reach this
// point, so we need another atomic_flag as a lock
if ( false == flag_for_killing.test_and_set() )
{
// One last check that main lock hasn't been released
if ( flag.test() )
{
pthread_kill(id,SIGKILL); // or maybe pthread_cancel ?
// maybe put a delay here and check that the thread is dead
flag.clear();
}

flag_for_killing.clear();
}
}
}

void unlock(void) noexcept
{
// The calling thread might be in the middle
// of being killed right now, so acquire
// the 'killing lock' before proceeding
if ( flag_for_killing.test_and_set() ) return;

assert( pthread_equal(pthread_self(),id) ); // Ensure current thread is the one that locked it
assert( flag.test() ); // Ensure lock is still locked

using namespace std::chrono;
t = steady_clock::now() + years(65535u);
id = {};
flag_for_killing.clear();
flag.clear();
cv.notify_all();
}
};

int main(void)
{

}

Chris M. Thomasson

unread,

Feb 9, 2023, 7:02:20 AM2/9/23

to

On 2/8/2023 3:14 PM, Frederick Virchanza Gotham wrote:
> On Wednesday, February 8, 2023 at 6:52:43 PM UTC, Paavo Helde wrote:
>
>> If your program has a bug so that a thread gets stuck, then find and fix
>> the bug. Do not try to hide it or to work around it, it will just cause
>> endless pain and in the end you still need to fix the bug.
>
>
> There hasn't been a bug reported yet.

[...]

The plane went down... In this highly contrived scenario, the people who
died can talk to each other after the died! Bear with me...

Hey man, "There hasn't been a bug reported yet."

Hummm....

Frederick Virchanza Gotham

unread,

Feb 9, 2023, 7:16:49 AM2/9/23

to

On Thursday, February 9, 2023 at 12:02:20 PM UTC, Chris M. Thomasson wrote:
>
> The plane went down... In this highly contrived scenario, the people who
> died can talk to each other after the died! Bear with me...
>
> Hey man, "There hasn't been a bug reported yet."
>
> Hummm....

I'm in a theatre group so I can appreciate your input. But sometimes I want less poetry and more fact.

I posted code earlier today showing the beginnings of my "timeout_mutex". If you want to point out a specific weakness, please do so.

Chris M. Thomasson

unread,

Feb 9, 2023, 7:28:45 AM2/9/23

to

Model it in Relacy.

Paavo Helde

unread,

Feb 9, 2023, 7:37:07 AM2/9/23

to

You used pthread_kill() there. It will kill the thread, but with these
specific weaknesses, and more:

- any dynamically allocated memory which was going to be released by
this thread, is leaked.

- any mutexes locked by the thread (other than the single one you
bothered to unlock) remain locked and cannot be unlocked, potentially
causing further deadlocks.

- any files and other external resources opened by the thread remain
opened and may cause access denials, or consume unneeded resources.

- any content written to output files and other IO might remain
inconsistent or in non-flushed buffers.

- any changes which the thread made under the now-unlocked mutex
might have remained uncompleted and leave the data structure in an
invalid state. As the mutex is now unlocked forcefully, there is nothing
to prevent other threads to access invalid data.

There are probably more. For me personally, a single such "specific
weakness" is more than enough to abandon the whole idea.

Frederick Virchanza Gotham

unread,

Feb 9, 2023, 11:34:35 AM2/9/23

to

On Thursday, February 9, 2023 at 12:37:07 PM UTC, Paavo Helde wrote:
>
> You used pthread_kill() there.

I'll also take a look at pthread_cancel.

> It will kill the thread, but with these
> specific weaknesses, and more:
>
> - any dynamically allocated memory which was going to be released by
> this thread, is leaked.

I could overload 'operator new' and maintain a list of memory allocations made by "std::this_thread::get_id()". So then when I kill the thread, I can free the thread's memory.

> - any mutexes locked by the thread (other than the single one you
> bothered to unlock) remain locked and cannot be unlocked, potentially
> causing further deadlocks.

There's less than 5 locks in the entire program so this won't be difficult to make sure of.

> - any files and other external resources opened by the thread remain
> opened and may cause access denials, or consume unneeded resources.

Again I can keep an eye on device opened like COM ports, and any open files, and any global resource.

> - any content written to output files and other IO might remain
> inconsistent or in non-flushed buffers.

This will be the case anyway if the thread freezes.

> - any changes which the thread made under the now-unlocked mutex
> might have remained uncompleted and leave the data structure in an
> invalid state.
> As the mutex is now unlocked forcefully, there is nothing
> to prevent other threads to access invalid data.

I would make sure to discard all of the thread's work after killing it so that other threads don't access garbage. I would do this before releasing the lock.

> There are probably more. For me personally, a single such "specific
> weakness" is more than enough to abandon the whole idea.

I think all of these perils you've mentioned are better than the application just locking up, and MS-Windows showing a progress bar saying "We're contacting Microsoft to tell them this program crashed".

David Brown

unread,

Feb 9, 2023, 2:25:59 PM2/9/23

to

On 09/02/2023 17:34, Frederick Virchanza Gotham wrote:
> On Thursday, February 9, 2023 at 12:37:07 PM UTC, Paavo Helde wrote:
>>
>> You used pthread_kill() there.
>
>
> I'll also take a look at pthread_cancel.
>
>
>> It will kill the thread, but with these specific weaknesses, and
>> more:
>>
>> - any dynamically allocated memory which was going to be released
>> by this thread, is leaked.
>
>
> I could overload 'operator new' and maintain a list of memory
> allocations made by "std::this_thread::get_id()". So then when I kill
> the thread, I can free the thread's memory.
>
>

That is not sufficient. When you are killing a thread from the outside,
it could be killed at any time. That can include while it is in the
middle of an allocation, which may leave the various memory-tracking
structures in an inconsistent state. The same applies to all kinds of
resources and data structures.

The OS will clear up most resources if a process is killed. But if you
just kill a thread, you have no way to be sure of re-establishing
consistency and a working process. You can get some level of assurance
by implementing your own memory management system (at lot more than just
overriding new), and other resource tracking - but with all the effort
involved, and the overheads, you would be far better off using multiple
processes instead of multiple threads.

Note that no matter how fantastic your test procedures are, you will
probably not find inconsistency bugs in testing. But they can still
happen - traditionally, they occur after you have deployed thousands of
systems or when you are demonstrating the program for your biggest
potential customer. Testing can only ever prove the /existence/ of
bugs, never their /absence/ - for that, you need to design the software
so that they can't happen.

Basically, you are trying to do your robustness here at the wrong level.
Threads live and die together. If you need a "watchdog" system to
monitor and kill a misbehaving task, kill the process, not a thread.

Scott Lurndal

unread,

Feb 9, 2023, 2:39:59 PM2/9/23

to

David Brown <david...@hesbynett.no> writes:
>On 09/02/2023 17:34, Frederick Virchanza Gotham wrote:
>> On Thursday, February 9, 2023 at 12:37:07 PM UTC, Paavo Helde wrote:
>>>
>>> You used pthread_kill() there.
>>
>>
>> I'll also take a look at pthread_cancel.
>>
>>
>>> It will kill the thread, but with these specific weaknesses, and
>>> more:
>>>
>>> - any dynamically allocated memory which was going to be released
>>> by this thread, is leaked.
>>
>>
>> I could overload 'operator new' and maintain a list of memory
>> allocations made by "std::this_thread::get_id()". So then when I kill
>> the thread, I can free the thread's memory.
>>
>>
>
>That is not sufficient. When you are killing a thread from the outside,
>it could be killed at any time. That can include while it is in the
>middle of an allocation, which may leave the various memory-tracking
>structures in an inconsistent state. The same applies to all kinds of
>resources and data structures.

Particularly malloc/free and struct FILE (and the corresponding mutex in libc used
for thread safe access to struct FILE or the heap allocator
won't use Frederick's new soi disant killable mutex).

In the datacenters I've visited, they use a watchdog process that
detects unexpected inactivity[*] from a server process and kills then restarts
the entire server process when the watchdog fires.

[*] A ping via a pipe, updated timestamp on disk file, inbound UDP packet,
updated field in mmap(MAP_SHARED) or shmat() region, etc.

Paavo Helde

unread,

Feb 9, 2023, 4:25:16 PM2/9/23

to

09.02.2023 18:34 Frederick Virchanza Gotham kirjutas:
> On Thursday, February 9, 2023 at 12:37:07 PM UTC, Paavo Helde wrote:
>>
>> You used pthread_kill() there.
>
>
> I'll also take a look at pthread_cancel.
>
>
>> It will kill the thread, but with these
>> specific weaknesses, and more:
>>
>> - any dynamically allocated memory which was going to be released by
>> this thread, is leaked.
>
>
> I could overload 'operator new' and maintain a list of memory allocations made by "std::this_thread::get_id()". So then when I kill the thread, I can free the thread's memory.

You sure have lots of free time for doing meaningless things. But hey,
that's why we call them hobbies, right?

>
>> - any mutexes locked by the thread (other than the single one you
>> bothered to unlock) remain locked and cannot be unlocked, potentially
>> causing further deadlocks.
>
>
> There's less than 5 locks in the entire program so this won't be difficult to make sure of.

I got an impression you proposed your timed_mutex as an universal
solution (to some not so clear problems), why else would you advertize
it in some public space? Note htat other people might have more than 5
locks in their programs.

>
> I think all of these perils you've mentioned are better than the application just locking up, and MS-Windows showing a progress bar saying "We're contacting Microsoft to tell them this program crashed".

Why would you think that? Has Microsoft called you back and reprimanded
you?

Besides, I believe pthread_kill() does not even compile on Windows, does it?

Chris M. Thomasson

unread,

Feb 9, 2023, 5:09:11 PM2/9/23

to

Actually... Iirc, pthreads-win32 has a special kernel hack to do async
thread cancellation:

https://sourceware.org/pthreads-win32/

God its been ages since I have used that library! Actually, it is pretty
damn good!

Chris M. Thomasson

unread,

Feb 9, 2023, 5:17:29 PM2/9/23

to

On 2/8/2023 11:14 PM, Paavo Helde wrote:
> 08.02.2023 20:37 Frederick Virchanza Gotham kirjutas:
>> On Wednesday, February 8, 2023 at 12:08:20 AM UTC, Chris M. Thomasson
>> wrote:
>>>> I want to make another kind of timed mutex, called something like
>>>> "std::timed_rescuable_mutex", but the difference will be that when
>>>> you invoke "try_lock_for", it will wait for N seconds, and then if
>>>> the mutex is still locked, it will kill the thread that locked it,
>>>> and then unlock the mutex.
>>> [...]
>>>
>>> My god, that is a bad idea.
>>
>> is it worse than the entire program locking up because one thread has
>> frozen?
>>
>> I'm proposing a solution that is less bad than the problem. Better to
>> kill a frozen thread and discard its work, than have the entire
>> program lock up.
>
> BTW, having a program locked up is perfect, because then you can attach
> a debugger, study the thread backtraces and can figure out the bug
> easily.

Indeed! Well, perhaps not "easy", so to speak. For instance, I found and
had to debug a full blown nightmare deadlock in some multi-threaded code
I was asked to consult on, many moons ago. They told me it might run for
days before the deadlock occurred. The funny part is that I found it
during night of stress testing certain parts of the code. It would only
deadlock when a certain chain of events would occur, in a specific
order... It was a nightmare!. It had to do with recursive locking...
YIKES! I had to pause threads at certain points.

Chris M. Thomasson

unread,

Feb 9, 2023, 6:07:22 PM2/9/23

to

On 2/9/2023 1:24 PM, Paavo Helde wrote:

> 09.02.2023 18:34 Frederick Virchanza Gotham kirjutas:
>> On Thursday, February 9, 2023 at 12:37:07 PM UTC, Paavo Helde wrote:
>>>
>>> You used pthread_kill() there.
>>
>>
>> I'll also take a look at pthread_cancel.
>>
>>
>>> It will kill the thread, but with these
>>> specific weaknesses, and more:
>>>
>>> - any dynamically allocated memory which was going to be released by
>>> this thread, is leaked.
>>
>>
>> I could overload 'operator new' and maintain a list of memory
>> allocations made by "std::this_thread::get_id()". So then when I kill
>> the thread, I can free the thread's memory.
>
> You sure have lots of free time for doing meaningless things. But hey,
> that's why we call them hobbies, right?

[...]

He should use some of that free time to model his sync scheme in Relacy:

https://www.1024cores.net/home/relacy-race-detector

>

Chris M. Thomasson

unread,

Feb 9, 2023, 6:10:42 PM2/9/23

to

On 2/9/2023 2:02 AM, Frederick Virchanza Gotham wrote:
> On Thursday, February 9, 2023 at 9:24:36 AM UTC, Frederick Virchanza Gotham wrote:
>>
>> If I write code for a worker thread that never ever ever should take more than 5 seconds to do its job,
>> then killing it at 10 seconds, discarding its data and releasing its locks is better than allowing it to keep
>> the lock and keep the entire program frozen.

[...]

Now, are you trying to deal with real time sync? Look up wait-free sync.

Frederick Virchanza Gotham

unread,

Feb 10, 2023, 11:06:58 AM2/10/23

to

Okay I'm starting to think that a process might not be salvageable
after any of its threads is killed or cancelled.

So instead maybe I could program a mutex watchdog system something
like as follows:
(1) Periodically check that all the worker threads are still alive.
(2) If one of the watchdog timers times out, do the following:
(2.a) Take a photograph of the GUI window on the screen, i.e. take
a screenshot of it and save it as a bitmap.
(2.b) Save the values of all the widgets to a file (e.g. the text in
text boxes, whether a tick box was ticked, the position
of a slider).
(3) Start a new process that does the following:
(3.a) Displays the aforementioned bitmap on the screen over the real dialog
(3.b) Kills the main program
(3.c) Restarts the main program and loads the widget values from the file
(3.d) Removes the fake bitmap off the screen

The idea here is that the main program would be killed and restarted
without so much as a flicker on the screen.

David Brown

unread,

Feb 10, 2023, 11:30:26 AM2/10/23

to

On 10/02/2023 17:06, Frederick Virchanza Gotham wrote:
>
> Okay I'm starting to think that a process might not be salvageable
> after any of its threads is killed or cancelled.

Progress!

>
> So instead maybe I could program a mutex watchdog system something
> like as follows:
> (1) Periodically check that all the worker threads are still alive.
> (2) If one of the watchdog timers times out, do the following:
> (2.a) Take a photograph of the GUI window on the screen, i.e. take
> a screenshot of it and save it as a bitmap.
> (2.b) Save the values of all the widgets to a file (e.g. the text in
> text boxes, whether a tick box was ticked, the position
> of a slider).
> (3) Start a new process that does the following:
> (3.a) Displays the aforementioned bitmap on the screen over the real dialog
> (3.b) Kills the main program
> (3.c) Restarts the main program and loads the widget values from the file
> (3.d) Removes the fake bitmap off the screen
>
> The idea here is that the main program would be killed and restarted
> without so much as a flicker on the screen.
>

You are over-thinking this all.

Make a supervisor process that monitors the working process, and kills
then restarts it if necessary. That's fine, and a commonly used
strategy for high availability systems.

Having a method of tracking the existing settings or configuration, then
replicating them on restart is perhaps useful, but perhaps worse than
useless. The process hung and was killed because of a software error -
replicating the state could lead to exactly the same fault. (This
effect can be a PITA with web browsers that crash due to problems with a
web page, then try to restore the session on restart and crash again.)

Faffing around with bitmaps and screenshots in order to avoid flicker
is, however, utterly ridiculous. If flicker is annoying users, then
find and fix the software bug that is causing the hang - don't waste
effort on a complicating system to try to hide the problem. You'll just
introduce even more bugs.

Scott Lurndal

unread,

Feb 10, 2023, 12:03:11 PM2/10/23

to

Indeed. Such an approach is less viable for the interactive GUI
program that Frederick is proposing, which may require that he
rethink the application and perhaps disaggregate the worker
parts of the application from the display management elements.

Paavo Helde

unread,

Feb 10, 2023, 1:58:08 PM2/10/23

to

10.02.2023 18:06 Frederick Virchanza Gotham kirjutas:
>
> Okay I'm starting to think that a process might not be salvageable
> after any of its threads is killed or cancelled.
>
> So instead maybe I could program a mutex watchdog system something
> like as follows:
> (1) Periodically check that all the worker threads are still alive.

I'm not sure why you are fixated on thread deadlocks. Why don't you
worry about NULL dereferences or other violations which will typically
kill the process on spot?

> (2) If one of the watchdog timers times out, do the following:
> (2.a) Take a photograph of the GUI window on the screen, i.e. take
> a screenshot of it and save it as a bitmap. > (2.b) Save the values of all the widgets to a file (e.g. the text in
> text boxes, whether a tick box was ticked, the position
> of a slider).

It would be much easier and more robust (and would also work in case of
crash) to save settings in file or in registry as soon as they are changed.

> (3) Start a new process that does the following:
> (3.a) Displays the aforementioned bitmap on the screen over the real dialog
> (3.b) Kills the main program
> (3.c) Restarts the main program and loads the widget values from the file
> (3.d) Removes the fake bitmap off the screen
>
> The idea here is that the main program would be killed and restarted
> without so much as a flicker on the screen.

I'm just curious - what is the intended purpose? You don't really expect
the user is so dumb that they will not notice when your buggy program
freezes or misbehaves, but yet will somehow notice some flicker on the
screen?

Richard Damon

unread,

Feb 10, 2023, 8:02:56 PM2/10/23

to

I would think the simple answer is make the GUI process as bullet proof
as possible and not need to do things that need these Mutexes, but spin
up killable processes to do the work which sends answers back on
something like a pipe to the GUI.

If a computation process hangs, you can then just kill it and restart,
maybe letting the user know things are running slow.

Chris M. Thomasson

unread,

Feb 17, 2023, 3:58:41 PM2/17/23

to

Humm... I think there is a timer in windows that might "stop" a shader
program after some time...