Kernel Module: Locking and Canceling

Neil T. Dantam

unread,

Jan 19, 2015, 4:56:32 PM1/19/15

to ach-...@googlegroups.com, kim...@gmail.com

Hi Kim,

I took a closer look at the kernel module locks and canceling and see
a couple (minor) potential issues. Would you mind commenting on these?

ach_klinux rdlock/wrlock()
==========================

Wait queue behavior
-------------------

At first glance, it looks like kernel wait queues may to lead to the
following race:

1. reader checks condition, gets false
2. writer sets condition to true
3. writer wakes the wait queue
4. readers waits queue

Now, it seems the reader has missed the writers update. Pthreads gets
around this race by holding a mutex while the condition is updated and
checked. However, from the kernel's wait.h, it seems the actual
operations instead would be:

0. reader adds itself to queue
1. reader checks condition, gets false
2. writer sets condition to true
3. writer wakes the wait queue
4. readers goes to wait in queue, immediately returning due to wake
from writer

Kim, could you please confirm that this is correct?

Interrupted calls
-----------------

Another small issue: mutex_lock_interruptible() may return -EINTR if a
signal is received. If this happens, we should probably retry the
lock (or just use mutex_lock() instead) or return a suitable status
code to userspace. Similarly for wait_event_interruptible(),
wait_event_interruptible_timeout(), and wrlock().

A quick test shows that when a signal is received during a wait in
ach_get(), the caller receives ACH_FAILED_SYSCALL with errno=EINTR.
To be consistent with user-mode channels (which continue waiting
through signals), we would need to retry the system call. Does that
seem like the best behavior? One could still break on signals by
waiting for kernel channels in select(), etc.

ach_klinux ach_cancel()
=======================

I think the kernel mode ach_cancel() may be susceptible to deadlocks.

The user mode ach_cancel() is really a fancy way of interrupting a
pthread condition wait. If called from another thread
(async_unsafe=true), one can just take the mutex, set a flag, and
broadcast the condition to wakeup the other threads. However, if
called from signal handler (async_unsafe=false), we cannot issue any
pthread_* calls because they are not in the list of async safe
functions. Specifically, we cannot take the mutex because if also
held outside the signal handler, this will deadlock. Updating the
condition without the mutex held is racy. The workaround is to call
fork() (which is async safe) from the signal handler and then take the
mutex_lock in the child process.

In the kernel mode ach_cancel(), taking the lock from a signal handler
should have a similar deadlock risk. This is what happens when
unsafe=0, so I think it is incorrect to take the lock. Actually, if
Linux wait queues do the test-and-wait atomically (as above), then it
seems that we can always omit the mutex lock here, just setting
chan->cancel=1 and calling wake_up(). Is this correct?

Cheers,
--
Neil T. Dantam
http://www.Neil.Dantam.name

Kim Bøndergaard

unread,

Jan 26, 2015, 5:11:19 AM1/26/15

to ach-...@googlegroups.com, kim...@gmail.com

Hi Neil

I've now had some time to look into you review comments.

Regarding 'Wait Queue Behaviour'
=========================
It's my understanding that you are actually questioning the kernels wait queue behaviour regarding what's called the "lost wakeup problem" (according to this article: http://www.linuxjournal.com/article/8144?page=0,0 ). I think the article very well describes how the race is handles by changing thread state and do 'condition check' in the right order.

That said I think we might have a problem if we have more 'readers' on the wait queue. As it is implemented (calling wake_up() ) only one reader will be scheduled in and process the new data.
Even though the aforementioned article mention 'the thundering herd' problem when waking all waiting processes (using wake_up_all() ) I think we have do do something to ensure all readers get informed about new data.
I'll suggest calling wake_up() from within rdlock().

Furthermore I think we'll have to have means to go back to the wait queue. It might happen that right after wakeup, but before having acquired the mutex another thread has flushed the stream. In this case we ought go back and wait again.

Interrupted calls
============
I agree we do have an issue with the return code of wrlock() not being read.
rdlock() is called in both the ach_get() hierarchy and in ach_flush(). In both cases the calls originate from functions returning ERESTARTSYS in case rdlock() fails. This is how the kernel driver signals to 'kernel' that the driver call should be reissued when the signal handling has finished.

Getting EINTR back from mutex locking in wrlock() must lead to same behaviour

ach_klinux and ach_cancel()
=====================
Good description of the cancel problem. You ought consider adding it into ach.c. I'm not sure I truly understood the problem when I read the source

Isn't the problem in the kernel module that I've checked for the inverse condition of unsafe?
In the situation where I choose not to lock I simply assign the flag (int assignment assumed atomic) and wake_up who ever may be sleeping.
As you suggest we might totally skip the locking. Can't really see an issue with that because this is the only place where the cancel flag is written.

Please notice that we also here have an issue with wakening up multiple readers

Let me know what you think

/Kim

Kim Bøndergaard

unread,

Jan 26, 2015, 6:00:49 AM1/26/15

to ach-...@googlegroups.com, kim...@gmail.com

FYI:

All comments in my previous posting where based on the versions I had in my github git.

I wasn't aware of your huge work on integrating, reorganizing and combining functions from the kernel and the user space code.

Don't know if your work has any impact on my comments. I see many changes especially related to return codes (and also a few error fixes, like missing copy_to_user() - that's good)

One thing though; Maybe the cancel flag ought be declared volatile.

/Kim

/Kim

Den mandag den 19. januar 2015 kl. 22.56.32 UTC+1 skrev Neil Dantam:

Neil T. Dantam

unread,

Jan 27, 2015, 10:14:29 PM1/27/15

to Kim Bøndergaard, ach-...@googlegroups.com

Hi Kim,

Thank you for the comments. I think the signals and synchronization
issues are resolved, and I've merged the kernel module to the master
branch.

I added a "stress tester" that forks a ton of processes and hammers
channels with data, and publishers and subscribers with signals. Data
validity is checked with a CRC. This found a couple errors in
checking for EINTR, and now runs without error on my workstation
(4-cores) with 1000 processes.

Detailed response to your comments below.

> http://www.linuxjournal.com/article/8144?page=0,0

Thanks for the reference. Yeah, different semantics from pthreads.

> That said I think we might have a problem if we have more 'readers'
> on the wait queue. As it is implemented (calling wake_up() ) only
> one reader will be scheduled in and process the new data.

Yeah, this is right. I added a call to wakeup() in unrdlock().

> Furthermore I think we'll have to have means to go back to the wait
> queue. It might happen that right after wakeup, but before having
> acquired the mutex another thread has flushed the stream. In this
> case we ought go back and wait again.

That's another race... We can prevent this without any extra locking,
so that seems good to do. There's an additional thread race in the
ioctl to set the mode. That may require an additional lock to
prevent, so I'm not sure if it's worth it. Different threads can just
ach_get() from different channel descriptors.

> ach_klinux and ach_cancel()
> ==========================

> Good description of the cancel problem. You ought consider adding it into
> ach.c.

Done.

> Isn't the problem in the kernel module that I've checked for the inverse
> condition of unsafe?

Yes, this is where async-safety becomes different from
thread-safety...

> In the situation where I choose not to lock I simply assign the flag
> (int assignment assumed atomic) and wake_up who ever may be
> sleeping. As you suggest we might totally skip the locking

Agree, since kernel wait_queues don't need a mutex, we should be able
to just set the cancel flag and wake everyone up.

> Please notice that we also here have an issue with wakening up multiple

> readers.

There is a thundering herd issue here, and it would be better to just
wake the specific process we want to cancel (maybe
wake_up_process()?). However, I don't think this is a major
performance issue since ach_cancel() calls should be rare. Personally,
I really only use ach_cancel() to terminate a process when signaled.
Normally calls to ach_get() would be aborted with timeouts, or avoided
with poll()/select().

> Interrupted calls...

This took some cleanup. I think EINTR is propagating back to userspace,
and we are now robust to signaling.

Cheers,
-ntd

Reply all

Reply to author

Forward