Barret Rhoden
unread,Oct 20, 2015, 2:15:42 PM10/20/15Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to aka...@googlegroups.com
Hi -
(this bug is fixed, i'm emailing it out for those interested in such
things).
I had an app that uses eventfd and epoll can trigger:
uthread.c:621: run_uthread: Assertion `uthread->state == 2'
failed.
and
pthread.c:246: pth_sched_entry: Assertion `new_thread->state ==
2' failed.
The first one is usually a sign of running a uthread more than once,
concurrently (2LS / parlib bug). The second is a similar catch.
When one fails for my app, they usually both do, which probably means
we're dealing with multiple cores (o/w one assert would kill the
process, and we couldn't get to the second one). It's also racy.
I poked around in uth_blockon_evqs to see if there is anything
obviously wrong that could lead to this. Like waking a thread multiple
times, etc. Nothing obvious, and a few printfs in the area didn't help.
After failure, I printed some pthread 2LS debugging info. Sometimes
uthreads get put onto the same list multiple times, or the lists
otherwise get corrupted. Result:
uth 0x3622300, type 2 (debugging crap from the uth assert location)
uth 0x3623700, type 2
PTH 0x3622300, state 8 (debugging crap from the pth assert location)
ready q
PTH 0x3623700, state 8 (BLK_MUTEX)
PTH 0x3622300, state 8
active q
PTH 0x3620f00, state 2 (RUNNABLE)
PTH 0x3621e00, state 8
PTH 0x3623700, state 8
PTH 0x3622300, state 8
[user] pthread.c:260, vcore 3, Assertion failed: 0
The same uth/pth pops up a few times. 3622300 should be on the active
list, but not the ready queue. that's messed up. likewise, anything
on the ready q should be pth state 2 (RUNNABLE), not 8.
likewise, anything on the active q should be PTH_RUNNING (3). the
latter is actually a minor bug in pthreads, where we don't set that
state at any point (except for thread0, once). it's not actually
critical that we do that, but most things in the pthread 2LS are for
debugging anyways, so we ought to do it.
So it looks like we screwed up some of our 2LS callbacks, causing a
thread to be added to the same list a couple of times.
The most likely reason for this is pth_thread_runnable was called on
the same uthread repeatedly. though we check the state in there too,
so if someone called it twice, we should have had a printf at least
(which should be an assert/panic, really). perhaps someone is both
mucking with the pth->state (via the has_blocked callback) and calling
runnable around the same time. that could explain why the ready q has
entries that are not state 2 (runnable), though since its racy, i'd
expect to have errored out once in a while.
Checked for that. It doesn't appear that pth_thread_runnable
is adding a pthread to the ready q that is already on the q.
On a hunch, let me make sure that it's not on the active queue
at that point. By the time we get to thread_runnable, it should have
been removed from the active q a long time ago.
While this test is running, I can already see the bug. The
only time we're removed from the active q is thread_paused,
thread_blockon_sysc, thread_refl_fault, and generic_yield. However,
pth_thread_has_blocked does *not* call it.
The test (checking the active q during pth_thread_runnable)
quickly confirms the problem.
With that change (removing the pth from the active q during
has_blocked), the problem went away.
Barret