Suspending HWT - interrupting blocked delegate thread

26 views
Skip to first unread message

Tim Hansmeier

unread,
Apr 5, 2018, 4:10:34 PM4/5/18
to ReconOS
Hi all,

I am experimenting with partial reconfiguration using ReconOS and Vivado. Before reconfiguring a slot, I suspend the running hardware thread with the function reconos_thread_suspend_block() (see reconos.c). When suspending a thread, it is likely that the corresponding delegate thread is currently in a blocked function call(e.g. mbox_get). To interrupt the blocked function, the suspend function sends a signal to the delegate thread with pthread_kill(). Now it can happen that the signal is sent, the signal handler is executed, but the blocked function call is not interrupted and/or the delegate thread is not resumed.

With more details:
I modified the hardware thread sources (vhdl) of the sortdemo, such that it exits the thread when the HWT_signal is set. This is necessary to use the suspend function. To test the suspend function, I simply run the sortdemo and at the end call the suspend function on all hardware threads. Sometimes all HWTs are successfully suspended, sometimes not. After some test runs, I would say that the use of software threads significantly decreases the chance of success.

The code that sends the interrupt signal looks like this:
code from hwslot_suspendthread(struct hwslot *slot)  in reconos.c
do {
switch (slot->dt_state) {
case DELEGATE_STATE_BLOCKED_OSIF:
reconos_osif_break(slot->osif);
break;
case DELEGATE_STATE_BLOCKED_SYSCALL:
pthread_kill(slot->dt, SIGUSR1);
break;
}
sched_yield();
} while (slot->dt_flags & DELEGATE_FLAG_SUSPEND);

I never used signals in my code before, but I suppose the rationale behind the code snippet is to send the signal to interrupt the blocked function and then yield in the hope that the delegate is executed next, so that 'the suspending" can continue. The signal handler outputs a message as debug output, so when I enable the debug output, I can observe how often the signal handler is called. The delegate thread on the other hand will also output a debug message when the blocked function call is interrupted.

Even when everything works fine, the signal handler is called ~800 to 8000 times when suspending a HWT. After the function call was interrupted, the signal handler is no longer called and the remainder of the suspend function runs to completion and successfully suspends the thread.
In many cases, especially when using software threads, the suspend function does not terminate. In these situations, the function call is never interrupted, but instead the signal handler is continuously called.

Has anyone suggestions or ideas on how to solve the problem that the signal is not interrupting the blocked function? Since the problem seems to get worse with an increasing number of additional software threads, I suspected that the behavior of the scheduler has something to do with it, but I don't know if this is a plausible explanation.

Thanks in advance! (And for reading this rather long text...)
Tim


 

Sebastian Meisner

unread,
Apr 7, 2018, 6:14:06 AM4/7/18
to rec...@googlegroups.com
Hi Tim!

I had a quick look at your situation. The Mbox implementation uses standard
pthread calls, like pthread_mutex_lock() (see [2] and [3]) . The documentation
for that calls says that:

"If a signal is delivered to a thread waiting for a mutex, upon return from
the signal handler the thread resumes waiting for the mutex as if it was not
interrupted. "  [1]

I.e., if a (delegate) thread is waiting for a mutex, it won't wake up. One
idea would be to rewrite the code to use pthread_mutex_trylock() in a loop.
This would effectively turn the mutexes into  spinlocks and waste CPU time.

Another idea would be to put something into the mboxes to force a wake up of
the delegate thread.

Hope that helps somehow...

Kind regards,
Sebastian


[1] http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_lock.html
[2]https://github.com/ReconOS/reconos/blob/master/lib/runtime/comp/mbox.c
[3]https://github.com/ReconOS/reconos/blob/master/lib/runtime/reconos.c
> --
> You received this message because you are subscribed to the Google Groups
> "ReconOS" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to reconos+u...@googlegroups.com
> <mailto:reconos+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.



Tim Hansmeier

unread,
Apr 12, 2018, 9:50:24 AM4/12/18
to ReconOS
Hi Sebastian,

thanks for your quick reply!
It was indeed the pthread_mutex_lock() that caused the problem. When I investigated the issue before creating this post, I assumed the mutex_lock would be interruptible, because in worked in some cases. But in hindsight it is more likely that the delegate thread was stuck in a sem_wait(), which is interruptible. At least this would explain the observed behavior.

As suggested, I now have an implementation that relies on mbox_tryget() in a loop, instead of mbox_get(). The HWTs can now be reliably suspended. As you have mentioned, this wastes a bit of CPU time, but I guess the use of software threads is the more dominating factor for this, as the SWTs also have to use mbox_tryget() to avoid deadlock scenarios.

On my way of implementing this, I found two smaller bugs in the ReconOS sources that lead to deadlocks and segmentation faults when suspending/resuming HWTs. I will upload a new demo soon, so I will upload the fixed sources alongside.

Thanks again for your help!
Tim
Reply all
Reply to author
Forward
0 new messages