Parts of the patch go a little further than level 2, but we're nowhere
near level 3 (or antything higher). The major obstacle is the umtx
interface, which in my eyes is fundamentally broken.
The _umtx_op() syscall is intended to replace _umtx_lock() and
_umtx_unlock(), and support other operations like sleep, wakeup etc.
This is the wrong way to go. If we applied this kind of thinking
universally, we'd just define all our system calls as macro wrappers
for __syscall().
Beyond these purely philosophical aspects, _umtx_op() seems designed
to encourage poor coding practices. It's impossible to use in a type-
safe manner: its first argument is supposed to be a struct umtx *, but
I don't think there's a single instance in libthr where it is called
with an actual struct umtx *. Instead, libthr uses umtx_t, which is
defined as long. Sometimes, an umtx_t is passed as the third argument
to _umtx_op() as well. Various fields in struct pthread, struct
pthread_barrier and struct pthread_cond are declared as umtx_t. Some
are used purely as cookies for passing to _umtx_op(); some are used as
counters or state variables.
It's a miracle libthr works at all. The kernel expects a pointer to a
struct umtx; instead, it gets (mostly) a pointer to long. Luckily,
struct umtx (which contains a single pointer) is the same size as long
on all our platforms.
_umtx_op() needs to be split into four system calls:
int _umtx_lock_timeout(struct umtx *mtx,
const struct timespec * __restrict timeout);
int _umtx_unlock(struct umtx *mtx);
int _umtx_wait_timeout(const void *cookie, struct umtx *mtx,
const struct timespec * __restrict timeout);
int _umtx_wake(const void *cookie);
_umtx_unlock() already exists with the correct semantics; the rest are
new. Note that we can't just add a struct timespec to _umtx_lock(),
as that would break the upgrade path from RELENG_5; hence the _timeout
suffix.
I'm not sure the current implementation of UMTX_OP_WAIT / UMTX_OP_WAKE
in the kernel is correct. Normally, a wait primitive (like msleep(),
pthread_cond_wait() etc.) takes a cookie and a mutex, and unlocks the
mutex while it's sleeping on the cookie. It looks like the umtx code
originally worked like this, but I'm not sure it does anymore; it's
hard to unravel, partly because I haven't quite figured out the queues
yet and partly because I haven't had breakfast.
DES
--
Dag-Erling Smørgrav - d...@des.no
When libthr was redesigned, things were not clear, I have figured out
all semantics needed to implement libthr, the above functions are the
interfaces current internally implemented by libthr, and functions in
umtx.h is not used because it can generate bloat code than the
versions in libthr, current libthr shared library only has 64K size.
I really dont need struct umtx at all, an integer is enough, basic idea
is using atomic operations to test in userland and wait in kernel.
the above functions should be changed to:
int _umtx_lock_timeout(umtx_t *,
const struct timespec * __restrict timeout);
int _umtx_unlock(umtx_t *mtx);
int _umtx_wait_timeout(umtx_t *, umtx_t expect,
const struct timespec * __restrict timeout);
int _umtx_wake(umtx_t *i, int number);
the umtx_t could be an integer type, but to maintain binary
compatibility, it has to be a long integer type.
> I'm not sure the current implementation of UMTX_OP_WAIT / UMTX_OP_WAKE
> in the kernel is correct. Normally, a wait primitive (like msleep(),
> pthread_cond_wait() etc.) takes a cookie and a mutex, and unlocks the
> mutex while it's sleeping on the cookie. It looks like the umtx code
> originally worked like this, but I'm not sure it does anymore; it's
> hard to unravel, partly because I haven't quite figured out the queues
> yet and partly because I haven't had breakfast.
>
> DES
>
Well, I am not sure you know the history of umtx, the orignal work only
did a lock and unlock semantic, there is no general sleep and wait
semantic, orignal umtx code has obscure race, I think in early days,
valgrind suffered from this bug. I am not sure you understand how a
userland synchronization works, these two operations are used to
implement compare and wait for a integer to be changed, you should
check source code before talking a lot, saying that you doubt the
UMTX_OP_WAIT and UMTX_OP_WAKE's correctness is not professional, there
are users using libthr in daily work and stress testing had been done.
I have put lots of work and time on libthr, I know the problems,
though the _umtx interface is a bit ulgly because it was unclear when it
was being designed, but I don't think it really hurt you or other
people, it can be fixed in few days, I just was hesitating to add the
new interfaces.
David Xu
To maintain compatibility and ensure type safety, it should be a
struct umtx.
In effect, the wait / wake operations implement a condition variable.
You should not use the same struct (or type) to describe the condition
variable as the one you use to describe a mutex.
Condition variables are always used in conjunction with a mutex. The
mutex must be passed to the wait function so it can be unlocked while
the waiting thread waits. It must be held by the thread calling the
wake method. Neither the existing interface nor the one you propose
do this.
> Well, I am not sure you know the history of umtx, the orignal work
> only did a lock and unlock semantic, there is no general sleep and
> wait semantic, orignal umtx code has obscure race, I think in early
> days, valgrind suffered from this bug. I am not sure you understand
> how a userland synchronization works, these two operations are used
> to implement compare and wait for a integer to be changed,
You cannot "wait for an integer to be changed" (at least not without
using hardware debugging facilities), and that is not what
UMTX_OP_WAIT does. It is a botched condition variable. It waits for
another thread to perform an UMTX_OP_WAKE operation with the correct
arguments; the fact that an integer has changed is incidental, and the
test for that change could be implemented in userland: look up
condition variables in any good CS textbook and you will see an
example of this, likely in the guise of a sample message queue (or
mailbox) implementation.
I can understand wanting to move the check into the kernel to avoid
spurious context switches, but it has to be done right.
> you
> should check source code before talking a lot,
That is precisely what I have done.
> saying that you doubt
> the UMTX_OP_WAIT and UMTX_OP_WAKE's correctness is not professional,
> there are users using libthr in daily work and stress testing had
> been done.
> I have put lots of work and time on libthr, I know the problems,
> though the _umtx interface is a bit ulgly because it was unclear
> when it was being designed, but I don't think it really hurt you or
> other people, it can be fixed in few days, I just was hesitating to
> add the new interfaces.
The amount of work you have put into libthr and its importance to your
self-esteem do not guarantee its correctness. What is unprofessional
here is 1) the quality of your code and 2) your refusal to consider
that I might have a point.
> In effect, the wait / wake operations implement a condition variable.
> You should not use the same struct (or type) to describe the condition
> variable as the one you use to describe a mutex.
>
> Condition variables are always used in conjunction with a mutex. The
> mutex must be passed to the wait function so it can be unlocked while
> the waiting thread waits. It must be held by the thread calling the
> wake method. Neither the existing interface nor the one you propose
> do this.
>
You are talking about how to use it, not how to implement it.
> You cannot "wait for an integer to be changed" (at least not without
> using hardware debugging facilities), and that is not what
> UMTX_OP_WAIT does. It is a botched condition variable. It waits for
> another thread to perform an UMTX_OP_WAKE operation with the correct
> arguments; the fact that an integer has changed is incidental, and the
> test for that change could be implemented in userland: look up
> condition variables in any good CS textbook and you will see an
> example of this, likely in the guise of a sample message queue (or
> mailbox) implementation.
>
again, there are many ways to implement it, I think you are talking
about how to use it, not how to implement it.
> I can understand wanting to move the check into the kernel to avoid
> spurious context switches, but it has to be done right.
>
>
>> you
>>should check source code before talking a lot,
>
>
> That is precisely what I have done.
>
>
>> saying that you doubt
>>the UMTX_OP_WAIT and UMTX_OP_WAKE's correctness is not professional,
>>there are users using libthr in daily work and stress testing had
>>been done.
>>I have put lots of work and time on libthr, I know the problems,
>>though the _umtx interface is a bit ulgly because it was unclear
>>when it was being designed, but I don't think it really hurt you or
>>other people, it can be fixed in few days, I just was hesitating to
>>add the new interfaces.
>
>
> The amount of work you have put into libthr and its importance to your
> self-esteem do not guarantee its correctness. What is unprofessional
> here is 1) the quality of your code and 2) your refusal to consider
> that I might have a point.
>
> DES
I don't trust you, I don't think your work is useful to me, you are
rushing, and command everyone to catch up with you, but we have life,
wife, and children, I am old, I should leave FreeBSD.
David Xu
No, I am talking about how to implement it. If you try to implement
condition variable primitives which do not require a mutex to be held,
your condition variables will not work; they may work most of the
time, but there will be races.
> > You cannot "wait for an integer to be changed" (at least not without
> > using hardware debugging facilities), and that is not what
> > UMTX_OP_WAIT does. It is a botched condition variable. It waits for
> > another thread to perform an UMTX_OP_WAKE operation with the correct
> > arguments; the fact that an integer has changed is incidental, and the
> > test for that change could be implemented in userland: look up
> > condition variables in any good CS textbook and you will see an
> > example of this, likely in the guise of a sample message queue (or
> > mailbox) implementation.
> again, there are many ways to implement it, I think you are talking
> about how to use it, not how to implement it.
Again, I am talking about how to implement them. See above.
The counter ("integer to be changed") is necessary to avoid races when
multiple threads are sleeping on the same condition variable - but the
counter itself must be protected, which is what the mutex is for. You
can't sleep with that mutex held, because the signalling thread needs
to hold the mutex while incrementing the counter and signalling the
sleeping threads. The only way to achieve this is to pass the mutex
to the wait function, so the wait function can drop it (and later
reacquire it) in a critical section.
If you remove the mutex, you need to protect your counter with a
critical section; but then you no longer have a condition variable,
but a counting semaphore. If you remove the counter, all you have
left is some kind of multicast asynchronous signal.
> I don't trust you, I don't think your work is useful to me, you are
> rushing, and command everyone to catch up with you, but we have life,
> wife, and children, I am old, I should leave FreeBSD.
???
FYI, I have a life, a job and a wife, too; and while I don't have
children, I do have a cat with chronic renal failure which requires
constant attention; but I don't use them as excuses to brush off
criticism of my work.