Kernel deadlock with PMC

18 views
Skip to first unread message

Andrew Brampton

unread,
Oct 16, 2009, 9:42:51 PM10/16/09
to PMCTools Discuss
I've been using pmcstat for a few months now on FreeBSD 7.2 and
FreeBSD 8, however just recently I've been encountering deadlocks
which I am able to reproduce. I am however, unable to find the source
of the deadlock. This happens on a dual quad-core xeon running a amd64
FreeBSD 8.0 rc1 kernel (I have not had the chance to test on any other
version). It only seems to happen when I have all 8 cores of the
machine processing interrupts (from a multi-queue network card).

The deadlock involves one thread sleeping inside pmclog_flush and
another sleeping inside pmclog_loop. Using ddb I printed out this:
2426 2407 2407 0 S+ pmc-sx 0xffffffff80c28600 elfdump
2425 845 845 0 S pmc-sx 0xffffffff80c28600 sh
2424 1619 1619 1001 S+ pmc-sx 0xffffffff80c28600 cat
2410 0 0 0 SL pmcloop 0xffffff00026fc500 [hwpmc:
proc(2409)]
2409 2407 2407 0 S+ pmcflush 0xffffff0048957460 pmcstat

The system grinds to a halt because the pmclog_flush thread is holding
the pmc-sx lock, which blocks new processes in execv.

I think the problem is a lost wakeup problem, since pmclog_flush is
obviously waiting for pmclog_loop to unset the PMC_PO_IN_FLUSH flag,
however, pmclog_loop is never woken. I have hacked a solution to this
problem by putting a timeout of one second on the msleep in
pmclog_loop. This of course is not the correct solution, but after
staring at this unfamiliar code for a couple of hours I've been unable
to see the cause of the lost wakeup.

I hope someone will be able to spot the true cause of this problem. If
more information is needed I'll be happy to provide it.
thanks
Andrew

Joseph Koshy

unread,
Oct 19, 2009, 8:43:41 AM10/19/09
to pmctools...@googlegroups.com

> I've been using pmcstat for a few months now on FreeBSD 7.2 and
> FreeBSD 8, however just recently I've been encountering deadlocks
> which I am able to reproduce. I am however, unable to find the source
> of the deadlock. This happens on a dual quad-core xeon running a amd64
> FreeBSD 8.0 rc1 kernel (I have not had the chance to test on any other
> version). It only seems to happen when I have all 8 cores of the
> machine processing interrupts (from a multi-queue network card).

Q1: Which CPU are you running? Could you post the initial output from
dmesg(8) and the messages printed by hwpmc(4)?

Q2: What command line are you using? Does the deadlock happen if you
turn off user space callchain capture (option -N)?

Koshy

Andrew Brampton

unread,
Oct 19, 2009, 10:22:13 AM10/19/09
to pmctools...@googlegroups.com, Joseph Koshy
2009/10/19 Joseph Koshy <jko...@freebsd.org>:

>
>
>> I've been using pmcstat for a few months now on FreeBSD 7.2 and
>> FreeBSD 8, however just recently I've been encountering deadlocks
>> which I am able to reproduce. I am however, unable to find the source
>> of the deadlock. This happens on a dual quad-core xeon running a amd64
>> FreeBSD 8.0 rc1 kernel (I have not had the chance to test on any other
>> version). It only seems to happen when I have all 8 cores of the
>> machine processing interrupts (from a multi-queue network card).
>
> Q1: Which CPU are you running?  Could you post the initial output from
>    dmesg(8) and the messages printed by hwpmc(4)?

FreeBSD reports the CPU as a "Intel(R) Xeon(R) CPU E5335 @ 2.00GHz
(1997.03-MHz K8-class CPU)", and I have put the output of dmesg here:
http://bramp.pastebin.com/f1f55f03 to avoid spamming this mailing
list.

Additionally, the problem was occurring when hwpmc was compiled into
the kernel, but since then I have compiled it as a module, and it is
still occurring. The output when I load the module is:

hwpmc: TSC/1/64/0x20<REA>
IAP/2/40/0x3ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA,PRC>
IAF/0/0/0x61<INT,REA,WRI>


>
> Q2: What command line are you using?  Does the deadlock happen if you
>    turn off user space callchain capture (option -N)?

pmcstat -S unhalted-core-cycles -O blah

Before I reported this problem I was able to deadlock the machine
every single time, however, today it has taken me 8 attempts before
the machine deadlocked :(. As you asked, I tested with the -N flag,
and it deadlocked first time.

>
> Koshy
>

thanks for any help
Andrew

Joseph Koshy

unread,
Oct 25, 2009, 9:11:27 AM10/25/09
to Andrew Brampton, pmctools...@googlegroups.com, Joseph Koshy

> hwpmc: TSC/1/64/0x20<REA>
> IAP/2/40/0x3ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA,PRC>
> IAF/0/0/0x61<INT,REA,WRI>

> Before I reported this problem I was able to deadlock the machine


> every single time, however, today it has taken me 8 attempts before
> the machine deadlocked :(. As you asked, I tested with the -N flag,
> and it deadlocked first time.

Q3: What's the "deadlock" like: is it a "hard" hang (i.e., the NUMLOCK
key fails to work, the network stack becomes unresponsive), or is
it a "soft" hang, i.e., that every process gets stuck inside
HWPMC(4), but the kernel is still operational?

If the hang was of the "soft" variety, could you try a recent -current, in
particular r198464 and let us know if that fixes the behaviour?

Thanks,
Koshy

Andrew Brampton

unread,
Oct 25, 2009, 9:20:59 AM10/25/09
to Joseph Koshy, pmctools...@googlegroups.com
2009/10/25 Joseph Koshy <jko...@freebsd.org>:

I think it is a soft hang, as the machine will still respond to pings,
and my SSH connections still work, until I try an execute any command.
When I execute a command it blocks waiting for pmc-sx.

I will update to r198464 and report my findings back later in the week.

thanks
Andrew

Fabien Thomas

unread,
Nov 12, 2009, 11:28:24 AM11/12/09
to pmctools...@googlegroups.com, Andrew Brampton
Can you try this patch and tell me if it fixes the problem.

cd /usr/src && patch -p1 < .../patch-fixlor+deadlock
it's on head but should apply to 8.x and 7.x.

Kind regards,
Fabien

patch-fixlor+deadlock
Reply all
Reply to author
Forward
0 new messages