Re: Instant panic while trying run ports-mgmt/poudriere

K. Macy

unread,

Aug 6, 2015, 3:33:29 PM8/6/15

to

Is this still happening?
On Jul 15, 2015 1:41 PM, "Pawel Pekala" <pa...@freebsd.org> wrote:

> Hi John-Mark,
>
> On 2015-07-15 11:05 -0700, John-Mark Gurney <j...@funkthat.com> wrote:
> >Please repost the entire panic message, and the back trace w/o X
> >running... Also, if you could share the core and kernel w/ me (you can
> >email me directly if you'd like), that'd help.
>
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 1; apic id = 01
> instruction pointer = 0x20:0xffffffff809338c0
> stack pointer = 0x28:0xfffffe046c818a00
> frame pointer = 0x28:0xfffffe046c818a50
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = 1491 (sh)
>
> Reading symbols from /boot/kernel/zfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/zfs.ko.symbols
> Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
> Loaded symbols for /boot/kernel/opensolaris.ko.symbols
> Reading symbols from /boot/kernel/amdtemp.ko.symbols...done.
> Loaded symbols for /boot/kernel/amdtemp.ko.symbols
> Reading symbols from /boot/modules/cuse4bsd.ko...done.
> Loaded symbols for /boot/modules/cuse4bsd.ko
> Reading symbols from /boot/kernel/fuse.ko.symbols...done.
> Loaded symbols for /boot/kernel/fuse.ko.symbols
> Reading symbols from /boot/kernel/tmpfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/tmpfs.ko.symbols
> Reading symbols from /boot/kernel/radeonkms.ko.symbols...done.
> Loaded symbols for /boot/kernel/radeonkms.ko.symbols
> Reading symbols from /boot/kernel/iicbb.ko.symbols...done.
> Loaded symbols for /boot/kernel/iicbb.ko.symbols
> Reading symbols from /boot/kernel/iicbus.ko.symbols...done.
> Loaded symbols for /boot/kernel/iicbus.ko.symbols
> Reading symbols from /boot/kernel/iic.ko.symbols...done.
> Loaded symbols for /boot/kernel/iic.ko.symbols
> Reading symbols from /boot/kernel/drm2.ko.symbols...done.
> Loaded symbols for /boot/kernel/drm2.ko.symbols
> Reading symbols from /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols...done.
> Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols
> Reading symbols from /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols...done.
> Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols
> Reading symbols from /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols...done.
> Loaded symbols for /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols
> Reading symbols from /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols...done.
> Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols
> Reading symbols from /boot/kernel/fdescfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/fdescfs.ko.symbols
> Reading symbols from /boot/kernel/ums.ko.symbols...done.
> Loaded symbols for /boot/kernel/ums.ko.symbols
> Reading symbols from /boot/kernel/uhid.ko.symbols...done.
> Loaded symbols for /boot/kernel/uhid.ko.symbols
> Reading symbols from /boot/kernel/linux.ko.symbols...done.
> Loaded symbols for /boot/kernel/linux.ko.symbols
> Reading symbols from /boot/kernel/linux_common.ko.symbols...done.
> Loaded symbols for /boot/kernel/linux_common.ko.symbols
> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/nullfs.ko.symbols
> Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/linprocfs.ko.symbols
> Reading symbols from /boot/kernel/sem.ko.symbols...done.
> Loaded symbols for /boot/kernel/sem.ko.symbols
> #0 doadump (textdump=0) at pcpu.h:221
> 221 pcpu.h: No such file or directory.
> in pcpu.h
> (kgdb) #0 doadump (textdump=0) at pcpu.h:221
> #1 0xffffffff8035b45e in db_dump (dummy=<value optimized out>,
> dummy2=false,
> dummy3=0, dummy4=0x0) at /hdd/src/sys/ddb/db_command.c:533
> #2 0xffffffff8035afd1 in db_command (cmd_table=0x0)
> at /hdd/src/sys/ddb/db_command.c:440
> #3 0xffffffff8035ac64 in db_command_loop ()
> at /hdd/src/sys/ddb/db_command.c:493
> #4 0xffffffff8035d7fb in db_trap (type=<value optimized out>, code=0)
> at /hdd/src/sys/ddb/db_main.c:251
> #5 0xffffffff809b4094 in kdb_trap (type=9, code=0, tf=<value optimized
> out>)
> at /hdd/src/sys/kern/subr_kdb.c:654
> #6 0xffffffff80d9e065 in trap_fatal (frame=0xfffffe046c818950,
> eva=<value optimized out>) at /hdd/src/sys/amd64/amd64/trap.c:848
> #7 0xffffffff80d9dd33 in trap (frame=<value optimized out>)
> at /hdd/src/sys/amd64/amd64/trap.c:201
> #8 0xffffffff80d7ecb2 in calltrap ()
> at /hdd/src/sys/amd64/amd64/exception.S:235
> #9 0xffffffff809338c0 in knote (list=0xfffff80013ae4408, hint=2147483648,
> lockflags=<value optimized out>) at /hdd/src/sys/kern/kern_event.c:1920
> #10 0xffffffff80938ef1 in exit1 (td=0xfffff800135c5980,
> rv=<value optimized out>) at /hdd/src/sys/kern/kern_exit.c:559
> #11 0xffffffff809383be in sys_sys_exit (td=0x0, uap=<value optimized out>)
> at /hdd/src/sys/kern/kern_exit.c:177
> #12 0xffffffff80d9e8d2 in amd64_syscall (td=0xfffff800135c5980, traced=0)
> at subr_syscall.c:133
> #13 0xffffffff80d7ef9b in Xfast_syscall ()
> at /hdd/src/sys/amd64/amd64/exception.S:395
> #14 0x0000000800922f3a in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> Current language: auto; currently minimal
>
> --
> pozdrawiam / with regards
> Paweł Pękala
> _______________________________________________
> freebsd...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Pawel Pekala

unread,

Aug 6, 2015, 5:38:19 PM8/6/15

to

Hi K.,

On 2015-08-06 12:33 -0700, "K. Macy" <km...@freebsd.org> wrote:
>Is this still happening?

Still crashes:

Thu Aug 6 23:22:05 CEST 2015

FreeBSD blaviken.slowicza.org 11.0-CURRENT FreeBSD 11.0-CURRENT #50 r286370: Thu Aug 6 19:55:29 CEST 2015 ro...@blaviken.slowicza.org:/usr/obj/hdd/src/sys/GENERIC amd64

panic:

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:

Fatal trap 9: general protection fault while in kernel mode

cpuid = 2; apic id = 02
instruction pointer = 0x20:0xffffffff809d6b80
stack pointer = 0x28:0xfffffe046cc68a00
frame pointer = 0x28:0xfffffe046cc68a50

code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0

current process = 2147 (sh)

Reading symbols from /boot/modules/vboxnetflt.ko...done.
Loaded symbols for /boot/modules/vboxnetflt.ko
Reading symbols from /boot/kernel/netgraph.ko.symbols...done.
Loaded symbols for /boot/kernel/netgraph.ko.symbols
Reading symbols from /boot/modules/vboxdrv.ko...done.
Loaded symbols for /boot/modules/vboxdrv.ko
Reading symbols from /boot/kernel/ng_ether.ko.symbols...done.
Loaded symbols for /boot/kernel/ng_ether.ko.symbols
Reading symbols from /boot/modules/vboxnetadp.ko...done.
Loaded symbols for /boot/modules/vboxnetadp.ko

Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/kernel/linux_common.ko.symbols...done.
Loaded symbols for /boot/kernel/linux_common.ko.symbols
Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
Loaded symbols for /boot/kernel/nullfs.ko.symbols
Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
Loaded symbols for /boot/kernel/linprocfs.ko.symbols
Reading symbols from /boot/kernel/sem.ko.symbols...done.
Loaded symbols for /boot/kernel/sem.ko.symbols
#0 doadump (textdump=0) at pcpu.h:221
221 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) #0 doadump (textdump=0) at pcpu.h:221

#1 0xffffffff80377f5e in db_dump (dummy=<value optimized out>, dummy2=false,

dummy3=0, dummy4=0x0) at /hdd/src/sys/ddb/db_command.c:533

#2 0xffffffff80377ad1 in db_command (cmd_table=0x0)
at /hdd/src/sys/ddb/db_command.c:440
#3 0xffffffff80377764 in db_command_loop ()
at /hdd/src/sys/ddb/db_command.c:493
#4 0xffffffff8037a31b in db_trap (type=<value optimized out>, code=0)
at /hdd/src/sys/ddb/db_main.c:251
#5 0xffffffff80a57074 in kdb_trap (type=9, code=0, tf=<value optimized out>)
at /hdd/src/sys/kern/subr_kdb.c:654
#6 0xffffffff80e643b5 in trap_fatal (frame=0xfffffe046cc68950,

eva=<value optimized out>) at /hdd/src/sys/amd64/amd64/trap.c:848

#7 0xffffffff80e64083 in trap (frame=<value optimized out>)
at /hdd/src/sys/amd64/amd64/trap.c:201
#8 0xffffffff80e44652 in calltrap ()
at /hdd/src/sys/amd64/amd64/exception.S:235
#9 0xffffffff809d6b80 in knote (list=0xfffff801dbebfea0, hint=2147483648,

lockflags=<value optimized out>) at /hdd/src/sys/kern/kern_event.c:1920

#10 0xffffffff809dc424 in exit1 (td=0xfffff802bd0559a0,
rval=<value optimized out>, signo=0) at /hdd/src/sys/kern/kern_exit.c:564
#11 0xffffffff809db8cd in sys_sys_exit (td=0x0, uap=<value optimized out>)
at /hdd/src/sys/kern/kern_exit.c:178
#12 0xffffffff80e64c22 in amd64_syscall (td=0xfffff802bd0559a0, traced=0)
at subr_syscall.c:133
#13 0xffffffff80e4493b in Xfast_syscall ()
at /hdd/src/sys/amd64/amd64/exception.S:395
#14 0x0000000800922eea in ?? ()

Previous frame inner to this frame (corrupt stack?)
Current language: auto; currently minimal

(kgdb)

Mateusz Guzik

unread,

Aug 6, 2015, 5:44:52 PM8/6/15

to

> #8 0xffffffff80e44652 in calltrap ()
> at /hdd/src/sys/amd64/amd64/exception.S:235
> #9 0xffffffff809d6b80 in knote (list=0xfffff801dbebfea0, hint=2147483648,
> lockflags=<value optimized out>) at /hdd/src/sys/kern/kern_event.c:1920
> #10 0xffffffff809dc424 in exit1 (td=0xfffff802bd0559a0,
> rval=<value optimized out>, signo=0) at /hdd/src/sys/kern/kern_exit.c:564
> #11 0xffffffff809db8cd in sys_sys_exit (td=0x0, uap=<value optimized out>)
> at /hdd/src/sys/kern/kern_exit.c:178
> #12 0xffffffff80e64c22 in amd64_syscall (td=0xfffff802bd0559a0, traced=0)
> at subr_syscall.c:133
> #13 0xffffffff80e4493b in Xfast_syscall ()
> at /hdd/src/sys/amd64/amd64/exception.S:395
> #14 0x0000000800922eea in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> Current language: auto; currently minimal
> (kgdb)
>

Sorry, I completely forgot about this.

Can you please modify debug flags in your kernel config file to be "-O0 -g3"
and reproduce with that? This should allow kgdb to obtain full info
(along with exact rash site for inspection) without further tinkering or
guessing.

--
Mateusz Guzik <mjguzik gmail.com>

Pawel Pekala

unread,

Aug 10, 2015, 5:52:44 PM8/10/15

to

Hi Mateusz,

On 2015-08-06 23:44 +0200, Mateusz Guzik <mjg...@gmail.com> wrote:
>Sorry, I completely forgot about this.
>
>Can you please modify debug flags in your kernel config file to be
>"-O0 -g3" and reproduce with that? This should allow kgdb to obtain
>full info (along with exact rash site for inspection) without further
>tinkering or guessing.

I'm unable to provide this for you, kernel compiled with this flags
panics during boot at zfs root mount.

--
pozdrawiam / with regards
Paweł Pękala

Bryan Drewery

unread,

Aug 10, 2015, 5:58:26 PM8/10/15

to

On 8/10/15 2:47 PM, Pawel Pekala wrote:
> Hi Mateusz,
>
> On 2015-08-06 23:44 +0200, Mateusz Guzik <mjg...@gmail.com> wrote:
>> Sorry, I completely forgot about this.
>>
>> Can you please modify debug flags in your kernel config file to be
>> "-O0 -g3" and reproduce with that? This should allow kgdb to obtain
>> full info (along with exact rash site for inspection) without further
>> tinkering or guessing.
>
> I'm unable to provide this for you, kernel compiled with this flags
> panics during boot at zfs root mount.
>

Try raising kern.kstack_pages to 5 or 6 in the loader prompt too.

--
Regards,
Bryan Drewery

Lawrence Stewart

unread,

Aug 12, 2015, 10:12:24 AM8/12/15

to

On 08/07/15 07:33, Pawel Pekala wrote:
> Hi K.,
>
> On 2015-08-06 12:33 -0700, "K. Macy" <km...@freebsd.org> wrote:
>> Is this still happening?
>
> Still crashes:

+1 for me running r286617

Cheers,
Lawrence

Bryan Drewery

unread,

Aug 13, 2015, 2:59:23 PM8/13/15

to

On 8/12/15 7:11 AM, Lawrence Stewart wrote:
> On 08/07/15 07:33, Pawel Pekala wrote:
>> Hi K.,
>>
>> On 2015-08-06 12:33 -0700, "K. Macy" <km...@freebsd.org> wrote:
>>> Is this still happening?
>>
>> Still crashes:
>
> +1 for me running r286617
>

r286510 has been stable in the package build cluster. r286593 is stable
on my own system.

--
Regards,
Bryan Drewery
_______________________________________________
freebsd...@freebsd.org mailing list

https://lists.freebsd.org/mailman/listinfo/freebsd-current

Andriy Gapon

unread,

Aug 23, 2015, 2:55:58 AM8/23/15

to

On 12/08/2015 17:11, Lawrence Stewart wrote:
> On 08/07/15 07:33, Pawel Pekala wrote:
>> Hi K.,
>>
>> On 2015-08-06 12:33 -0700, "K. Macy" <km...@freebsd.org> wrote:
>>> Is this still happening?
>>
>> Still crashes:
>
> +1 for me running r286617

Here is another +1 with r286922.
I can add a couple of bits of debugging data:

(kgdb) fr 8
#8 0xffffffff80639d60 in knote (list=0xfffff8019a733ea0,

hint=2147483648, lockflags=<value optimized out>) at

/usr/src/sys/kern/kern_event.c:1964
1964 } else if ((lockflags & KNF_NOKQLOCK) != 0) {
(kgdb) p *list
$2 = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff8063a1e0
<knlist_mtx_lock>, kl_unlock = 0xffffffff8063a200 <knlist_mtx_unlock>,
kl_assert_locked = 0xffffffff8063a220 <knlist_mtx_assert_locked>,
kl_assert_unlocked = 0xffffffff8063a240 <knlist_mtx_assert_unlocked>,
kl_lockarg = 0xfffff8019a733bb0}
(kgdb) disassemble
Dump of assembler code for function knote:
0xffffffff80639d00 <knote+0>: push %rbp
0xffffffff80639d01 <knote+1>: mov %rsp,%rbp
0xffffffff80639d04 <knote+4>: push %r15
0xffffffff80639d06 <knote+6>: push %r14
0xffffffff80639d08 <knote+8>: push %r13
0xffffffff80639d0a <knote+10>: push %r12
0xffffffff80639d0c <knote+12>: push %rbx
0xffffffff80639d0d <knote+13>: sub $0x18,%rsp
0xffffffff80639d11 <knote+17>: mov %edx,%r12d
0xffffffff80639d14 <knote+20>: mov %rsi,-0x30(%rbp)
0xffffffff80639d18 <knote+24>: mov %rdi,%rbx
0xffffffff80639d1b <knote+27>: test %rbx,%rbx
0xffffffff80639d1e <knote+30>: je 0xffffffff80639ef6 <knote+502>
0xffffffff80639d24 <knote+36>: mov %r12d,%eax
0xffffffff80639d27 <knote+39>: and $0x1,%eax
0xffffffff80639d2a <knote+42>: mov %eax,-0x3c(%rbp)
0xffffffff80639d2d <knote+45>: mov 0x28(%rbx),%rdi
0xffffffff80639d31 <knote+49>: je 0xffffffff80639d38 <knote+56>
0xffffffff80639d33 <knote+51>: callq *0x18(%rbx)
0xffffffff80639d36 <knote+54>: jmp 0xffffffff80639d42 <knote+66>
0xffffffff80639d38 <knote+56>: callq *0x20(%rbx)
0xffffffff80639d3b <knote+59>: mov 0x28(%rbx),%rdi
0xffffffff80639d3f <knote+63>: callq *0x8(%rbx)
0xffffffff80639d42 <knote+66>: mov %rbx,-0x38(%rbp)
0xffffffff80639d46 <knote+70>: mov (%rbx),%rbx
0xffffffff80639d49 <knote+73>: test %rbx,%rbx
0xffffffff80639d4c <knote+76>: je 0xffffffff80639ee5 <knote+485>
0xffffffff80639d52 <knote+82>: and $0x2,%r12d
0xffffffff80639d56 <knote+86>: nopw %cs:0x0(%rax,%rax,1)
0xffffffff80639d60 <knote+96>: mov 0x28(%rbx),%r14

Panic is in the last quoted instruction.
And:
(kgdb) i reg
rax 0x246 582
rbx 0xdeadc0dedeadc0de -2401050962867404578
rcx 0x0 0
rdx 0x12e 302
rsi 0xffffffff80a26a5a -2136839590
rdi 0xffffffff80e81b80 -2132272256
rbp 0xfffffe02b7efea20 0xfffffe02b7efea20
rsp 0xfffffe02b7efe9e0 0xfffffe02b7efe9e0
r8 0xffffffff80a269ce -2136839730
r9 0xffffffff80e82838 -2132269000
r10 0x10000 65536
r11 0xffffffff80fabd10 -2131051248
r12 0x0 0
r13 0xfffff801ff84a818 -8787511171048
r14 0xfffff801ff84a800 -8787511171072
r15 0xfffff8019a6974f0 -8789207452432
rip 0xffffffff80639d60 0xffffffff80639d60 <knote+96>
eflags 0x10286 66182

I think that $rbx stands out here (this is a kernel with INVARIANTS).

Looking at the code, is it possible that one of the calls from within
the loop's body modifies the list? If that is so and provided that is a
valid behavior, then maybe using SLIST_FOREACH_SAFE would help.

--
Andriy Gapon

Konstantin Belousov

unread,

Aug 23, 2015, 5:08:53 AM8/23/15

to

This is first time a useful debugging data was posted.

The 0x28 offset may indicate either kn_kq member access of the struct
knote, or kq_list of the struct kqueue.

kl_list.slh_first of the list parameter is NULL, how would a list
iteration loop even start ? Can you look up the list argument value
from the previous frame (%rdi is overwritten, so debugger might be
confused) ?

Konstantin Belousov

unread,

Aug 23, 2015, 8:55:14 AM8/23/15

to

After looking at your data closely, I think you are right. The panic
occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT). This is the
only case in the tree where filter uses knlist_remove_inevent() to detach
processed note, so indeed the slist is modified under the iterator.

Below is the patch with the suggested change and unrelated cleanup of
the uma(9) KPI use. Please test, everybody who has a panic with the
backtrace pointing to the sys_exit().

diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index a4388aa..2f15f7f 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
@@ -1106,7 +1106,12 @@ kqueue_register(struct kqueue *kq, struct kevent *kev, struct thread *td, int wa
return EINVAL;

if (kev->flags & EV_ADD)
- tkn = knote_alloc(waitok); /* prevent waiting with locks */
+ /*
+ * Prevent waiting with locks. Non-sleepable
+ * allocation failures are handled in the loop, only
+ * if the spare knote appears to be actually required.
+ */
+ tkn = knote_alloc(waitok);
else
tkn = NULL;

@@ -1310,8 +1315,7 @@ done:
FILEDESC_XUNLOCK(td->td_proc->p_fd);
if (fp != NULL)
fdrop(fp, td);
- if (tkn != NULL)
- knote_free(tkn);
+ knote_free(tkn);
if (fops != NULL)
kqueue_fo_release(filt);
return (error);
@@ -1507,10 +1511,6 @@ kqueue_scan(struct kqueue *kq, int maxevents, struct kevent_copyops *k_ops,
} else
asbt = 0;
marker = knote_alloc(1);
- if (marker == NULL) {
- error = ENOMEM;
- goto done_nl;
- }
marker->kn_status = KN_MARKER;
KQ_LOCK(kq);

@@ -1929,7 +1929,7 @@ void
knote(struct knlist *list, long hint, int lockflags)
{
struct kqueue *kq;
- struct knote *kn;
+ struct knote *kn, *tkn;
int error;

if (list == NULL)
@@ -1948,7 +1948,7 @@ knote(struct knlist *list, long hint, int lockflags)
* only safe if you want to remove the current item, which we are
* not doing.
*/
- SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
+ SLIST_FOREACH_SAFE(kn, &list->kl_list, kn_selnext, tkn) {
kq = kn->kn_kq;
KQ_LOCK(kq);
if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
@@ -2385,15 +2385,16 @@ SYSINIT(knote, SI_SUB_PSEUDO, SI_ORDER_ANY, knote_init, NULL);
static struct knote *
knote_alloc(int waitok)
{
- return ((struct knote *)uma_zalloc(knote_zone,
- (waitok ? M_WAITOK : M_NOWAIT)|M_ZERO));
+
+ return (uma_zalloc(knote_zone, (waitok ? M_WAITOK : M_NOWAIT) |
+ M_ZERO));
}

static void
knote_free(struct knote *kn)
{
- if (kn != NULL)
- uma_zfree(knote_zone, kn);
+
+ uma_zfree(knote_zone, kn);
}

/*

John-Mark Gurney

unread,

Aug 24, 2015, 1:36:20 AM8/24/15

to

if you add this comment, please add curly braces around the block...

> else
> tkn = NULL;
>
> @@ -1310,8 +1315,7 @@ done:
> FILEDESC_XUNLOCK(td->td_proc->p_fd);
> if (fp != NULL)
> fdrop(fp, td);
> - if (tkn != NULL)
> - knote_free(tkn);
> + knote_free(tkn);

Probably should just change knote_free to a static inline that does
a uma_zfree as uma_zfree also does nothing is the input is NULL...

Clearly you didn't read the comment that preceeds this line, or at
least didn't update it:
* SLIST_FOREACH, SLIST_FOREACH_SAFE is not safe in our case, it is

* only safe if you want to remove the current item, which we are
* not doing.

So, you'll need to be more specific in why this needs to change...
When I wrote this code, I spent a lot of time looking at this, and
reasoned as to why SLIST_FOREACH_SAFE was NOT correct usage here...

> kq = kn->kn_kq;
> KQ_LOCK(kq);
> if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
> @@ -2385,15 +2385,16 @@ SYSINIT(knote, SI_SUB_PSEUDO, SI_ORDER_ANY, knote_init, NULL);
> static struct knote *
> knote_alloc(int waitok)
> {
> - return ((struct knote *)uma_zalloc(knote_zone,
> - (waitok ? M_WAITOK : M_NOWAIT)|M_ZERO));
> +
> + return (uma_zalloc(knote_zone, (waitok ? M_WAITOK : M_NOWAIT) |
> + M_ZERO));
> }
>
> static void

per above, we should add inline here...

> knote_free(struct knote *kn)
> {
> - if (kn != NULL)
> - uma_zfree(knote_zone, kn);
> +
> + uma_zfree(knote_zone, kn);
> }
>
> /*

I agree w/ the all the non-SLIST changes, but I disagree w/ the SLIST
change as I don't believe that all cases was considered...

Anyways, the other changes shouldn't be committed w/ the SLIST change
as they are unrelated...

--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."

Andriy Gapon

unread,

Aug 24, 2015, 2:14:09 AM8/24/15

to

On 23/08/2015 15:54, Konstantin Belousov wrote:
> After looking at your data closely, I think you are right. The panic
> occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT). This is the
> only case in the tree where filter uses knlist_remove_inevent() to detach
> processed note, so indeed the slist is modified under the iterator.
>
> Below is the patch with the suggested change and unrelated cleanup of
> the uma(9) KPI use. Please test, everybody who has a panic with the
> backtrace pointing to the sys_exit().

Thank you very much!
I no longer get the panic in the test case that previously triggered it.

--
Andriy Gapon

Konstantin Belousov

unread,

Aug 24, 2015, 4:10:50 AM8/24/15

to

On Sun, Aug 23, 2015 at 10:35:44PM -0700, John-Mark Gurney wrote:
> Konstantin Belousov wrote this message on Sun, Aug 23, 2015 at 15:54 +0300:
> > if (kev->flags & EV_ADD)
> > - tkn = knote_alloc(waitok); /* prevent waiting with locks */
> > + /*
> > + * Prevent waiting with locks. Non-sleepable
> > + * allocation failures are handled in the loop, only
> > + * if the spare knote appears to be actually required.
> > + */
> > + tkn = knote_alloc(waitok);
>
> if you add this comment, please add curly braces around the block...

Ok.

>
> > else
> > tkn = NULL;
> >
> > @@ -1310,8 +1315,7 @@ done:
> > FILEDESC_XUNLOCK(td->td_proc->p_fd);
> > if (fp != NULL)
> > fdrop(fp, td);
> > - if (tkn != NULL)
> > - knote_free(tkn);
> > + knote_free(tkn);
>
> Probably should just change knote_free to a static inline that does
> a uma_zfree as uma_zfree also does nothing is the input is NULL...

This was already done in the patch (the removal of the NULL check in
knote_free()). I usually do not add excessive inline keywords. Compilers
are good, sometimes even too good, at figuring out the possibilities for
inlining. knote_free() is inlined automatically.

> > @@ -1948,7 +1948,7 @@ knote(struct knlist *list, long hint, int lockflags)
> > * only safe if you want to remove the current item, which we are
> > * not doing.
> > */
> > - SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
> > + SLIST_FOREACH_SAFE(kn, &list->kl_list, kn_selnext, tkn) {
>
> Clearly you didn't read the comment that preceeds this line, or at
> least didn't update it:
> * SLIST_FOREACH, SLIST_FOREACH_SAFE is not safe in our case, it is
> * only safe if you want to remove the current item, which we are
> * not doing.
>
> So, you'll need to be more specific in why this needs to change...
> When I wrote this code, I spent a lot of time looking at this, and
> reasoned as to why SLIST_FOREACH_SAFE was NOT correct usage here...

I explained what happens in the message. The knote list is modified
by the filter, see knlist_remove_inevent() call in filt_proc().

>
> > kq = kn->kn_kq;
> > KQ_LOCK(kq);
> > if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
> > @@ -2385,15 +2385,16 @@ SYSINIT(knote, SI_SUB_PSEUDO, SI_ORDER_ANY, knote_init, NULL);
> > static struct knote *
> > knote_alloc(int waitok)
> > {
> > - return ((struct knote *)uma_zalloc(knote_zone,
> > - (waitok ? M_WAITOK : M_NOWAIT)|M_ZERO));
> > +
> > + return (uma_zalloc(knote_zone, (waitok ? M_WAITOK : M_NOWAIT) |
> > + M_ZERO));
> > }
> >
> > static void
>
> per above, we should add inline here...
>
> > knote_free(struct knote *kn)
> > {
> > - if (kn != NULL)
> > - uma_zfree(knote_zone, kn);
> > +
> > + uma_zfree(knote_zone, kn);
> > }
> >
> > /*
>
> I agree w/ the all the non-SLIST changes, but I disagree w/ the SLIST
> change as I don't believe that all cases was considered...

What cases do you mean ?

The patch does not unlock knlist lock in the iteration. As such, the
only thread which could remove elements from the knlist, or rearrange
the list, while loop is active, is the current thread. So I claim that
the only the current iterating element can be removed, and the next list
element stays valid. This is enough for _SAFE loop to work.

Why do you think that _SAFE is incorrect ? Comment talks about very
different case, where the knlist lock is dropped. Then indeed, other
thread may iterate in parallel, and invalidate the memoized next element
while KN_INFLUX is set for the current element and knlist is dropped.
But _SAFE in sys/queue.h never means 'safe for parallel mutators', it
only means 'safe for the current iterator removing current element'.

I preferred not to touch the comment until it is confirmed that the
change help. I reformulated it now, trying to keep the note about
unlock (but is it useful ?).

>
> Anyways, the other changes shouldn't be committed w/ the SLIST change
> as they are unrelated...

Sure, I posted the diff against the WIP branch. The commits will be split.

diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index a4388aa..0e26a78 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
@@ -1105,10 +1105,16 @@ kqueue_register(struct kqueue *kq, struct kevent *kev, struct thread *td, int wa
if (fops == NULL)
return EINVAL;

- if (kev->flags & EV_ADD)

- tkn = knote_alloc(waitok); /* prevent waiting with locks */

- else
+ if (kev->flags & EV_ADD) {

+ /*
+ * Prevent waiting with locks. Non-sleepable
+ * allocation failures are handled in the loop, only
+ * if the spare knote appears to be actually required.
+ */
+ tkn = knote_alloc(waitok);

+ } else {
tkn = NULL;
+ }

findkn:
if (fops->f_isfd) {
@@ -1310,8 +1316,7 @@ done:

FILEDESC_XUNLOCK(td->td_proc->p_fd);
if (fp != NULL)
fdrop(fp, td);
- if (tkn != NULL)
- knote_free(tkn);
+ knote_free(tkn);

if (fops != NULL)
kqueue_fo_release(filt);
return (error);

@@ -1507,10 +1512,6 @@ kqueue_scan(struct kqueue *kq, int maxevents, struct kevent_copyops *k_ops,

} else
asbt = 0;
marker = knote_alloc(1);
- if (marker == NULL) {
- error = ENOMEM;
- goto done_nl;
- }
marker->kn_status = KN_MARKER;
KQ_LOCK(kq);

@@ -1929,7 +1930,7 @@ void

knote(struct knlist *list, long hint, int lockflags)
{
struct kqueue *kq;
- struct knote *kn;
+ struct knote *kn, *tkn;
int error;

if (list == NULL)

@@ -1941,14 +1942,13 @@ knote(struct knlist *list, long hint, int lockflags)
list->kl_lock(list->kl_lockarg);

/*
- * If we unlock the list lock (and set KN_INFLUX), we can eliminate
- * the kqueue scheduling, but this will introduce four
- * lock/unlock's for each knote to test. If we do, continue to use
- * SLIST_FOREACH, SLIST_FOREACH_SAFE is not safe in our case, it is
- * only safe if you want to remove the current item, which we are
- * not doing.
+ * If we unlock the list lock (and set KN_INFLUX), we can
+ * eliminate the kqueue scheduling, but this will introduce
+ * four lock/unlock's for each knote to test. Also, marker
+ * would be needed to keep iteration position, since filters
+ * or other threads could remove events.

*/
- SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
+ SLIST_FOREACH_SAFE(kn, &list->kl_list, kn_selnext, tkn) {

kq = kn->kn_kq;
KQ_LOCK(kq);
if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
@@ -2385,15 +2385,16 @@ SYSINIT(knote, SI_SUB_PSEUDO, SI_ORDER_ANY, knote_init, NULL);
static struct knote *
knote_alloc(int waitok)
{
- return ((struct knote *)uma_zalloc(knote_zone,
- (waitok ? M_WAITOK : M_NOWAIT)|M_ZERO));
+
+ return (uma_zalloc(knote_zone, (waitok ? M_WAITOK : M_NOWAIT) |
+ M_ZERO));
}

static void

knote_free(struct knote *kn)
{
- if (kn != NULL)
- uma_zfree(knote_zone, kn);
+
+ uma_zfree(knote_zone, kn);
}

/*

Pawel Pekala

unread,

Aug 26, 2015, 6:45:01 PM8/26/15

to

Hi Konstantin,

On 2015-08-23 15:54 +0300, Konstantin Belousov <kost...@gmail.com>

wrote:
>After looking at your data closely, I think you are right. The panic
>occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT). This is the
>only case in the tree where filter uses knlist_remove_inevent() to
>detach processed note, so indeed the slist is modified under the
>iterator.
>
>Below is the patch with the suggested change and unrelated cleanup of
>the uma(9) KPI use. Please test, everybody who has a panic with the
>backtrace pointing to the sys_exit().

This patch fixes issue for me. Thank you.

--
pozdrawiam / with regards
Paweł Pękala

John-Mark Gurney

unread,

Aug 26, 2015, 7:43:15 PM8/26/15

to

Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
> On 12/08/2015 17:11, Lawrence Stewart wrote:
> > On 08/07/15 07:33, Pawel Pekala wrote:
> >> Hi K.,
> >>
> >> On 2015-08-06 12:33 -0700, "K. Macy" <km...@freebsd.org> wrote:
> >>> Is this still happening?
> >>
> >> Still crashes:
> >
> > +1 for me running r286617
>
> Here is another +1 with r286922.
> I can add a couple of bits of debugging data:
>
> (kgdb) fr 8
> #8 0xffffffff80639d60 in knote (list=0xfffff8019a733ea0,
> hint=2147483648, lockflags=<value optimized out>) at
> /usr/src/sys/kern/kern_event.c:1964
> 1964 } else if ((lockflags & KNF_NOKQLOCK) != 0) {
> (kgdb) p *list
> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff8063a1e0

We should/cannot get here w/ an empty list. If we do, then there is
something seriously wrong... The current kn (which we must have as we
are here) MUST be on the list, but as you just showed, there are no
knotes on the list.

Can you get me a print of the knote? That way I can see what flags
are on it?

Yeh, it was probably r284861 that I added to catch use after free bugs
like this... You could try reverting r284861 to see if the bug goes
away... If it does, then this is most likely a use after free bug...

> Looking at the code, is it possible that one of the calls from within
> the loop's body modifies the list? If that is so and provided that is a
> valid behavior, then maybe using SLIST_FOREACH_SAFE would help.

--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."

John-Mark Gurney

unread,

Aug 26, 2015, 7:51:45 PM8/26/15

to

Though it is, if we really change knote_free to a bare uma_free, then
either mark it inline (to be explicit about it's behavior), or make a
macro out of it... I don't particularly like functions that contain one
line of simple code...

I can't think of the reason right now, but I do remeber puzzling over
this issue for some hours when I wrote this code, and I had proved
to myself that _SAFE was NOT _SAFE for this use case...

In the quick look I just had, I have not been able to decide one way
or the other, but I'm suspicious that this is a recent issue, as this
code has been running for close to a decade w/o any issues, and wonder
if there was some other change that trigger the issue...

The reason I'm cautious about changing this is that the code has been
running fine for over a decade... Have you done a full test to
validate that nothing else breaks?

Ok, after looking more at the original dump, this is a use after free
bug... As I said in another email, it should not be possible to get
into the _FOREACH loop where knlist is an empty list. If it does,
then there is another major bug that needs to be found... A simple
change to _SAFE will not fix this use after free bug...

> different case, where the knlist lock is dropped. Then indeed, other
> thread may iterate in parallel, and invalidate the memoized next element
> while KN_INFLUX is set for the current element and knlist is dropped.
> But _SAFE in sys/queue.h never means 'safe for parallel mutators', it
> only means 'safe for the current iterator removing current element'.
>
> I preferred not to touch the comment until it is confirmed that the
> change help. I reformulated it now, trying to keep the note about
> unlock (but is it useful ?).

I prefer to prove through logic that the change is correct...

The comment is about a possible future optimization to eliminate
the kqueue_schedtask, though as that is less common (kq in kq) than
standard knote...

Lawrence Stewart

unread,

Aug 27, 2015, 2:39:47 AM8/27/15

to

On 08/27/15 09:36, John-Mark Gurney wrote:
> Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
>> On 12/08/2015 17:11, Lawrence Stewart wrote:
>>> On 08/07/15 07:33, Pawel Pekala wrote:
>>>> Hi K.,
>>>>
>>>> On 2015-08-06 12:33 -0700, "K. Macy" <km...@freebsd.org> wrote:
>>>>> Is this still happening?
>>>>
>>>> Still crashes:
>>>
>>> +1 for me running r286617
>>
>> Here is another +1 with r286922.
>> I can add a couple of bits of debugging data:
>>
>> (kgdb) fr 8
>> #8 0xffffffff80639d60 in knote (list=0xfffff8019a733ea0,
>> hint=2147483648, lockflags=<value optimized out>) at
>> /usr/src/sys/kern/kern_event.c:1964
>> 1964 } else if ((lockflags & KNF_NOKQLOCK) != 0) {
>> (kgdb) p *list
>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff8063a1e0
>
> We should/cannot get here w/ an empty list. If we do, then there is
> something seriously wrong... The current kn (which we must have as we
> are here) MUST be on the list, but as you just showed, there are no
> knotes on the list.
>
> Can you get me a print of the knote? That way I can see what flags
> are on it?

I quickly tried to get this info for you by building my kernel with -O0
and reproducing, but I get an insta-panic on boot with the new kernel:

Fatal double fault
rip = 0xffffffff8218c794
rsp = 0xfffffe044cdc9fe0
rbp = 0xfffffe044cdca110

cpuid = 2; apic id = 02

panic: double fault
cpuid = 2
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe03dcfffe30
vpanic() at vpanic+0x189/frame 0xfffffe03dcfffeb0
panic() at panic+0x43/frame 0xfffffe03dcffff10
dblfault_handler() at dblfault_handler+0xa2/frame 0xfffffe03dcffff30
Xdblfault() at Xdblfault+0xac/frame 0xfffffe03dcffff30
--- trap 0x17, rip = 0xffffffff8218c794, rsp = 0xfffffe044cdc9fe0, rbp =
0xfffffe044cdca110 ---
vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfffffe044cdca110
vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame
0xfffffe044cdca560
vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfffffe044cdca5b0
zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfffffe044cdca6e0
zio_execute() at zio_execute+0x23b/frame 0xfffffe044cdca730
zio_nowait() at zio_nowait+0xbe/frame 0xfffffe044cdca760
vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame
0xfffffe044cdca800
zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfffffe044cdca930
zio_execute() at zio_execute+0x23b/frame 0xfffffe044cdca980
zio_nowait() at zio_nowait+0xbe/frame 0xfffffe044cdca9b0
spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfffffe044cdcaa50
traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfffffe044cdcac60
traverse_dnode() at traverse_dnode+0x98/frame 0xfffffe044cdcacd0
traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfffffe044cdcaee0
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb0f0
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb300
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb510
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb720
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb930
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcbb40
traverse_dnode() at traverse_dnode+0x98/frame 0xfffffe044cdcbbb0
traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfffffe044cdcbdc0
traverse_impl() at traverse_impl+0x79d/frame 0xfffffe044cdcbfd0
traverse_dataset() at traverse_dataset+0x93/frame 0xfffffe044cdcc040
traverse_pool() at traverse_pool+0x1f2/frame 0xfffffe044cdcc140
spa_load_verify() at spa_load_verify+0xf3/frame 0xfffffe044cdcc1f0
spa_load_impl() at spa_load_impl+0x2069/frame 0xfffffe044cdcc610
spa_load() at spa_load+0x320/frame 0xfffffe044cdcc6d0
spa_load_impl() at spa_load_impl+0x150b/frame 0xfffffe044cdccaf0
spa_load() at spa_load+0x320/frame 0xfffffe044cdccbb0
spa_load_best() at spa_load_best+0xc6/frame 0xfffffe044cdccc50
spa_open_common() at spa_open_common+0x246/frame 0xfffffe044cdccd40
spa_open() at spa_open+0x35/frame 0xfffffe044cdccd70
dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfffffe044cdccdb0
dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfffffe044cdcce30
zfsvfs_create() at zfsvfs_create+0x100/frame 0xfffffe044cdcd050
zfs_domount() at zfs_domount+0xa7/frame 0xfffffe044cdcd0e0
zfs_mount() at zfs_mount+0x6c3/frame 0xfffffe044cdcd390
vfs_donmount() at vfs_donmount+0x1330/frame 0xfffffe044cdcd660
kernel_mount() at kernel_mount+0x62/frame 0xfffffe044cdcd6c0
parse_mount() at parse_mount+0x668/frame 0xfffffe044cdcd810
vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfffffe044cdcd9d0
start_init() at start_init+0x62/frame 0xfffffe044cdcda70
fork_exit() at fork_exit+0x84/frame 0xfffffe044cdcdab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe044cdcdab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Didn't get a core because it panics before dumpdev is set.

Is anyone else able to run -O0 kernels or do I have something set to evil?

Cheers,
Lawrence

Don Lewis

unread,

Aug 27, 2015, 3:16:21 AM8/27/15

to

On 27 Aug, Don Lewis wrote:

> As I recall, double faults are commonly caused by overflowing the kernel
> stack. If I subtract the values of the first and last frame pointers, I
> get 14752, which is getting pretty large, and rsp rbp in the trap point
> to different 4K pages, so a stack overflow certainly looks possible.
>
> Try bumping up KSTACK_PAGES in your kernel config.

Actually, that's not necessary anymore since it was made into a tunable
in -CURRENT fairly recently. Just set kern.kstack_pages to something
larger in loader.conf.

Andriy Gapon

unread,

Aug 27, 2015, 3:23:20 AM8/27/15

to

On 27/08/2015 02:36, John-Mark Gurney wrote:
> We should/cannot get here w/ an empty list. If we do, then there is
> something seriously wrong... The current kn (which we must have as we
> are here) MUST be on the list, but as you just showed, there are no
> knotes on the list.
>
> Can you get me a print of the knote? That way I can see what flags
> are on it?

Apologies if the following might sound a little bit patronizing, but it
seems that you have got all the facts correctly, but somehow the
connection between them did not become clear.

So:
1. The list originally is NOT empty. I guess that it has one entry, but
that's an unimportant detail.
2. This is why the loop is entered. It's a fact that it is entered.
3. The list becomes empty precisely because the entry is removed during
the iteration in the loop (as kib has explained). It's a fact that the
list became empty at least in the panic that I reported.
4. The element is not only unlinked from the list, but its memory is
also freed.
5. That's why we have the use after free: SLIST_FOREACH is trying to get
a pointer to a next element from the freed memory.
6. This is why the commit for trashing the freed memory made all the
difference: previously the freed memory was unlikely to be re-used /
modified, so the use-after-free had a high chance of succeeding. It's a
fact that in my panic there was an attempt to dereference a trashed pointer.
7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
pointer to the next element beforehand and, thus, we do not access the
freed memory.

Please let me know if you see any fault in above reasoning or if
something is still no clear.

--
Andriy Gapon

Don Lewis

unread,

Aug 27, 2015, 3:38:48 AM8/27/15

to

On 27 Aug, Lawrence Stewart wrote:
> On 08/27/15 09:36, John-Mark Gurney wrote:
>> Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
>>> On 12/08/2015 17:11, Lawrence Stewart wrote:
>>>> On 08/07/15 07:33, Pawel Pekala wrote:
>>>>> Hi K.,
>>>>>
>>>>> On 2015-08-06 12:33 -0700, "K. Macy" <km...@freebsd.org> wrote:
>>>>>> Is this still happening?
>>>>>
>>>>> Still crashes:
>>>>
>>>> +1 for me running r286617
>>>
>>> Here is another +1 with r286922.
>>> I can add a couple of bits of debugging data:
>>>
>>> (kgdb) fr 8
>>> #8 0xffffffff80639d60 in knote (list=0xfffff8019a733ea0,
>>> hint=2147483648, lockflags=<value optimized out>) at
>>> /usr/src/sys/kern/kern_event.c:1964
>>> 1964 } else if ((lockflags & KNF_NOKQLOCK) != 0) {
>>> (kgdb) p *list
>>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff8063a1e0
>>

>> We should/cannot get here w/ an empty list. If we do, then there is
>> something seriously wrong... The current kn (which we must have as we
>> are here) MUST be on the list, but as you just showed, there are no
>> knotes on the list.
>>
>> Can you get me a print of the knote? That way I can see what flags
>> are on it?
>

Konstantin Belousov

unread,

Aug 27, 2015, 4:07:20 AM8/27/15

to

On Thu, Aug 27, 2015 at 10:21:47AM +0300, Andriy Gapon wrote:
> On 27/08/2015 02:36, John-Mark Gurney wrote:

> > We should/cannot get here w/ an empty list. If we do, then there is
> > something seriously wrong... The current kn (which we must have as we
> > are here) MUST be on the list, but as you just showed, there are no
> > knotes on the list.
> >
> > Can you get me a print of the knote? That way I can see what flags
> > are on it?
>

> Apologies if the following might sound a little bit patronizing, but it
> seems that you have got all the facts correctly, but somehow the
> connection between them did not become clear.
>
> So:
> 1. The list originally is NOT empty. I guess that it has one entry, but
> that's an unimportant detail.
> 2. This is why the loop is entered. It's a fact that it is entered.
> 3. The list becomes empty precisely because the entry is removed during
> the iteration in the loop (as kib has explained). It's a fact that the
> list became empty at least in the panic that I reported.

The only detail I can add to this explanation, which is probably third (?)
time, is that the removal is done in the filt_proc() event method, by
the call to knlist_remove_inevent().

> 4. The element is not only unlinked from the list, but its memory is
> also freed.
> 5. That's why we have the use after free: SLIST_FOREACH is trying to get
> a pointer to a next element from the freed memory.
> 6. This is why the commit for trashing the freed memory made all the
> difference: previously the freed memory was unlikely to be re-used /
> modified, so the use-after-free had a high chance of succeeding. It's a
> fact that in my panic there was an attempt to dereference a trashed pointer.
> 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
> pointer to the next element beforehand and, thus, we do not access the
> freed memory.

The additional, eighth item, should explain why the change to _SAFE() is
the correct fix, and not just a papering over the problem. Nobody except
the current thread can modify the knlist, because knlist is locked. As
a consequence, only the current element can be unlinked and removed. So
the _SAFE() iterator actually work.

>
> Please let me know if you see any fault in above reasoning or if
> something is still no clear.

John-Mark Gurney

unread,

Aug 27, 2015, 2:10:19 PM8/27/15

to

Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300:
> On 27/08/2015 02:36, John-Mark Gurney wrote:
> > We should/cannot get here w/ an empty list. If we do, then there is
> > something seriously wrong... The current kn (which we must have as we
> > are here) MUST be on the list, but as you just showed, there are no
> > knotes on the list.
> >
> > Can you get me a print of the knote? That way I can see what flags
> > are on it?
>
> Apologies if the following might sound a little bit patronizing, but it
> seems that you have got all the facts correctly, but somehow the
> connection between them did not become clear.
>
> So:
> 1. The list originally is NOT empty. I guess that it has one entry, but
> that's an unimportant detail.
> 2. This is why the loop is entered. It's a fact that it is entered.
> 3. The list becomes empty precisely because the entry is removed during
> the iteration in the loop (as kib has explained). It's a fact that the
> list became empty at least in the panic that I reported.

On you're latest dump, you said:
Here is another +1 with r286922.
I can add a couple of bits of debugging data:

(kgdb) fr 8
#8 0xffffffff80639d60 in knote (list=0xfffff8019a733ea0,
hint=2147483648, lockflags=<value optimized out>) at
/usr/src/sys/kern/kern_event.c:1964
1964 } else if ((lockflags & KNF_NOKQLOCK) != 0) {

First off, that can't be r286922, per:
https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922

line 1964 is blank... The line of code above should be at line 1884,
so not sure what is wrong here...

Assuming that the pc really is at the line, f_event has not yet been
called, which is why I said that the list cannot be empty yet, as
f_event hasn't been called yet to remove the knote... It could be that
optimization moved stuff around, but if that is the case, then the
above wasn't useful..

> 4. The element is not only unlinked from the list, but its memory is
> also freed.

Where is the memory freed? A knote MUST NOT be freed in an f_event
handler. The only location that a list element is allowed to be
freed is in knote_drop, which must happen after f_detach is called,
but that can't/won't happen from knote (I believe the timer handles
this specially, but we are talking about normal knlist type filters)..

The rest of your explination is invalid due to the invalid assumption
of this point...

If you can provide to me where the knote is free'd in knote, w/
function/line number stack trace (does not have to be dump, but a
sample call path), then I'll reconsider, and fix that bug...

> 5. That's why we have the use after free: SLIST_FOREACH is trying to get
> a pointer to a next element from the freed memory.
> 6. This is why the commit for trashing the freed memory made all the
> difference: previously the freed memory was unlikely to be re-used /
> modified, so the use-after-free had a high chance of succeeding. It's a
> fact that in my panic there was an attempt to dereference a trashed pointer.
> 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
> pointer to the next element beforehand and, thus, we do not access the
> freed memory.
>

> Please let me know if you see any fault in above reasoning or if
> something is still no clear.

--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."

Andriy Gapon

unread,

Aug 27, 2015, 4:22:40 PM8/27/15

to

No, it can not be indeed, because I am running head.
r286922 was the latest version of the repository, not the head branch,
at the moment when I pulled the repository via git.

> Assuming that the pc really is at the line, f_event has not yet been
> called,

Even on the second loop iteration?

>which is why I said that the list cannot be empty yet, as
> f_event hasn't been called yet to remove the knote... It could be that
> optimization moved stuff around, but if that is the case, then the
> above wasn't useful..

I provided the disassembly of the code as well, it's very obvious how
the code was translated.

>> 4. The element is not only unlinked from the list, but its memory is
>> also freed.
>
> Where is the memory freed? A knote MUST NOT be freed in an f_event
> handler. The only location that a list element is allowed to be
> freed is in knote_drop, which must happen after f_detach is called,
> but that can't/won't happen from knote (I believe the timer handles
> this specially, but we are talking about normal knlist type filters)..

Well, right. knote()->filt_proc()->knlist_remove_inevent() just removes
the knote from the list. But then there is KNOTE_ACTIVATE() that passes
the knote to a different owner (so to say). And given that the knote has
EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a
stress load on a system, I am not surprised that another thread gets a
chance to call knote_drop() on the knote before the original thread
proceeds to the next iteration.

> The rest of your explination is invalid due to the invalid assumption
> of this point...

Eagerly waiting for your explanation...

> If you can provide to me where the knote is free'd in knote, w/
> function/line number stack trace (does not have to be dump, but a
> sample call path), then I'll reconsider, and fix that bug...
>> 5. That's why we have the use after free: SLIST_FOREACH is trying to get
>> a pointer to a next element from the freed memory.
>> 6. This is why the commit for trashing the freed memory made all the
>> difference: previously the freed memory was unlikely to be re-used /
>> modified, so the use-after-free had a high chance of succeeding. It's a
>> fact that in my panic there was an attempt to dereference a trashed pointer.
>> 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
>> pointer to the next element beforehand and, thus, we do not access the
>> freed memory.
>>
>> Please let me know if you see any fault in above reasoning or if
>> something is still no clear.
>

--

Andriy Gapon

Konstantin Belousov

unread,

Aug 27, 2015, 4:25:07 PM8/27/15

to

> Assuming that the pc really is at the line, f_event has not yet been

> called, which is why I said that the list cannot be empty yet, as

> f_event hasn't been called yet to remove the knote... It could be that
> optimization moved stuff around, but if that is the case, then the
> above wasn't useful..
>

> > 4. The element is not only unlinked from the list, but its memory is
> > also freed.
>
> Where is the memory freed? A knote MUST NOT be freed in an f_event
> handler. The only location that a list element is allowed to be
> freed is in knote_drop, which must happen after f_detach is called,
> but that can't/won't happen from knote (I believe the timer handles
> this specially, but we are talking about normal knlist type filters)..
>

> The rest of your explination is invalid due to the invalid assumption
> of this point...
>

> If you can provide to me where the knote is free'd in knote, w/
> function/line number stack trace (does not have to be dump, but a
> sample call path), then I'll reconsider, and fix that bug...

Sigh. Did you ever read the mails I sent ?

Look at the filt_proc()->knlist_remove_inevent().

> > 5. That's why we have the use after free: SLIST_FOREACH is trying to get
> > a pointer to a next element from the freed memory.
> > 6. This is why the commit for trashing the freed memory made all the
> > difference: previously the freed memory was unlikely to be re-used /
> > modified, so the use-after-free had a high chance of succeeding. It's a
> > fact that in my panic there was an attempt to dereference a trashed pointer.
> > 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
> > pointer to the next element beforehand and, thus, we do not access the
> > freed memory.
> >
> > Please let me know if you see any fault in above reasoning or if
> > something is still no clear.
>
> --

> John-Mark Gurney Voice: +1 415 225 5579
>
> "All that I will do, has been done, All that I have, has not."

Andriy Gapon

unread,

Aug 27, 2015, 4:44:29 PM8/27/15

to

On 27/08/2015 23:21, Andriy Gapon wrote:
>> > First off, that can't be r286922, per:
>> > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922
>> >
>> > line 1964 is blank... The line of code above should be at line 1884,
>> > so not sure what is wrong here...

> No, it can not be indeed, because I am running head.
> r286922 was the latest version of the repository, not the head branch,
> at the moment when I pulled the repository via git.

Hrm, a small - irrelevant for me, but probably not for you - nit:
r286922 is actually a head commit:
https://svnweb.freebsd.org/base?view=revision&revision=286922

And:
https://svnweb.freebsd.org/base/head/sys/kern/kern_event.c?annotate=286922#l1964

Not sure why you chose to look at stable/10 (given the mailing list).

--
Andriy Gapon

Lawrence Stewart

unread,

Aug 27, 2015, 9:58:17 PM8/27/15

to

On 08/27/15 17:15, Don Lewis wrote:

> On 27 Aug, Don Lewis wrote:
>> On 27 Aug, Lawrence Stewart wrote:
>>> On 08/27/15 09:36, John-Mark Gurney wrote:
>>>> Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
>>>>> On 12/08/2015 17:11, Lawrence Stewart wrote:
>>>>>> On 08/07/15 07:33, Pawel Pekala wrote:
>>>>>>> Hi K.,
>>>>>>>
>>>>>>> On 2015-08-06 12:33 -0700, "K. Macy" <km...@freebsd.org> wrote:
>>>>>>>> Is this still happening?
>>>>>>>
>>>>>>> Still crashes:
>>>>>>
>>>>>> +1 for me running r286617
>>>>>

>>>>> Here is another +1 with r286922.
>>>>> I can add a couple of bits of debugging data:
>>>>>
>>>>> (kgdb) fr 8
>>>>> #8 0xffffffff80639d60 in knote (list=0xfffff8019a733ea0,
>>>>> hint=2147483648, lockflags=<value optimized out>) at
>>>>> /usr/src/sys/kern/kern_event.c:1964
>>>>> 1964 } else if ((lockflags & KNF_NOKQLOCK) != 0) {

>>>>> (kgdb) p *list
>>>>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff8063a1e0
>>>>

>>>> We should/cannot get here w/ an empty list. If we do, then there is
>>>> something seriously wrong... The current kn (which we must have as we
>>>> are here) MUST be on the list, but as you just showed, there are no
>>>> knotes on the list.
>>>>
>>>> Can you get me a print of the knote? That way I can see what flags
>>>> are on it?
>>>

> Actually, that's not necessary anymore since it was made into a tunable
> in -CURRENT fairly recently. Just set kern.kstack_pages to something
> larger in loader.conf.
>

Thanks Don, I thought the call chain looked deep but didn't realise I
could do the calculation you demonstrate above to see how close to the 4
pack kstack things were. Bumping kstack_pages from 4->5 resolved the issue.

Cheers,
Lawrence

Lawrence Stewart

unread,

Aug 28, 2015, 12:36:53 AM8/28/15

to

On 08/23/15 22:54, Konstantin Belousov wrote:
> On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote:
>> On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote:

>>> On 12/08/2015 17:11, Lawrence Stewart wrote:
>>>> On 08/07/15 07:33, Pawel Pekala wrote:
>>>>> Hi K.,
>>>>>
>>>>> On 2015-08-06 12:33 -0700, "K. Macy" <km...@freebsd.org> wrote:
>>>>>> Is this still happening?
>>>>>
>>>>> Still crashes:
>>>>
>>>> +1 for me running r286617
>>>
>>> Here is another +1 with r286922.
>>> I can add a couple of bits of debugging data:
>>>
>>> (kgdb) fr 8
>>> #8 0xffffffff80639d60 in knote (list=0xfffff8019a733ea0,
>>> hint=2147483648, lockflags=<value optimized out>) at
>>> /usr/src/sys/kern/kern_event.c:1964
>>> 1964 } else if ((lockflags & KNF_NOKQLOCK) != 0) {
>>> (kgdb) p *list
>>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff8063a1e0

>>> Looking at the code, is it possible that one of the calls from within
>>> the loop's body modifies the list? If that is so and provided that is a
>>> valid behavior, then maybe using SLIST_FOREACH_SAFE would help.
>>

>> This is first time a useful debugging data was posted.
>>
>> The 0x28 offset may indicate either kn_kq member access of the struct
>> knote, or kq_list of the struct kqueue.
>>
>> kl_list.slh_first of the list parameter is NULL, how would a list
>> iteration loop even start ? Can you look up the list argument value
>> from the previous frame (%rdi is overwritten, so debugger might be
>> confused) ?
>

> After looking at your data closely, I think you are right. The panic
> occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT). This is the
> only case in the tree where filter uses knlist_remove_inevent() to detach
> processed note, so indeed the slist is modified under the iterator.
>
> Below is the patch with the suggested change and unrelated cleanup of
> the uma(9) KPI use. Please test, everybody who has a panic with the
> backtrace pointing to the sys_exit().

Fixes the panic for me too, thanks Kostik.

John-Mark Gurney

unread,

Sep 1, 2015, 2:24:33 PM9/1/15

to

Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 23:21 +0300:

> On 27/08/2015 21:09, John-Mark Gurney wrote:
> > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300:
> >> On 27/08/2015 02:36, John-Mark Gurney wrote:

> >>> We should/cannot get here w/ an empty list. If we do, then there is
> >>> something seriously wrong... The current kn (which we must have as we
> >>> are here) MUST be on the list, but as you just showed, there are no
> >>> knotes on the list.
> >>>
> >>> Can you get me a print of the knote? That way I can see what flags
> >>> are on it?
> >>

> >> Apologies if the following might sound a little bit patronizing, but it
> >> seems that you have got all the facts correctly, but somehow the
> >> connection between them did not become clear.
> >>
> >> So:
> >> 1. The list originally is NOT empty. I guess that it has one entry, but
> >> that's an unimportant detail.
> >> 2. This is why the loop is entered. It's a fact that it is entered.
> >> 3. The list becomes empty precisely because the entry is removed during
> >> the iteration in the loop (as kib has explained). It's a fact that the
> >> list became empty at least in the panic that I reported.
> >
> > On you're latest dump, you said:

> > Here is another +1 with r286922.
> > I can add a couple of bits of debugging data:
> >
> > (kgdb) fr 8
> > #8 0xffffffff80639d60 in knote (list=0xfffff8019a733ea0,
> > hint=2147483648, lockflags=<value optimized out>) at
> > /usr/src/sys/kern/kern_event.c:1964
> > 1964 } else if ((lockflags & KNF_NOKQLOCK) != 0) {
> >

> > First off, that can't be r286922, per:
> > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922
> >
> > line 1964 is blank... The line of code above should be at line 1884,
> > so not sure what is wrong here...
>
> No, it can not be indeed, because I am running head.
> r286922 was the latest version of the repository, not the head branch,
> at the moment when I pulled the repository via git.
>

> > Assuming that the pc really is at the line, f_event has not yet been
> > called,
>

> Even on the second loop iteration?
>

> >which is why I said that the list cannot be empty yet, as
> > f_event hasn't been called yet to remove the knote... It could be that
> > optimization moved stuff around, but if that is the case, then the
> > above wasn't useful..
>

> I provided the disassembly of the code as well, it's very obvious how
> the code was translated.
>

> >> 4. The element is not only unlinked from the list, but its memory is
> >> also freed.
> >
> > Where is the memory freed? A knote MUST NOT be freed in an f_event
> > handler. The only location that a list element is allowed to be
> > freed is in knote_drop, which must happen after f_detach is called,
> > but that can't/won't happen from knote (I believe the timer handles
> > this specially, but we are talking about normal knlist type filters)..
>

> Well, right. knote()->filt_proc()->knlist_remove_inevent() just removes
> the knote from the list. But then there is KNOTE_ACTIVATE() that passes
> the knote to a different owner (so to say). And given that the knote has
> EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a
> stress load on a system, I am not surprised that another thread gets a
> chance to call knote_drop() on the knote before the original thread
> proceeds to the next iteration.

Ok, I think I have identified the race that you guys were trying to
tell me about, and though the _SAFE macro would be a similar fix, I'm
going to rewrite the loop so that this is more explicit on what
is happening here...

So, the race is this... In knote, when the note is removed by
f_event, things are find until the KQ lock is dropped... Once this
lock is dropped, effective ownership of the knote is transfered
from the knlist to the kq lock as the _DETACHED flag is now set,
which means that reading any fields from that note is undefined..

Once the kq lock is released in knote, then it is possible for a
functional like kqueue_scan to endup knote_drop'ing the note...

Upon further examination, we may have another race as in knote_drop,
when we call f_detach, we don't have the list locked, nor kq, which
means that knlist_remove_inevent could be modifing the list at the same
time that kqueue_register could be modifing it to remove a _DELETED
note...

I'd like to close both races at the same time since they go
hand in hand...

> > The rest of your explination is invalid due to the invalid assumption
> > of this point...
>

> Eagerly waiting for your explanation...
>

> > If you can provide to me where the knote is free'd in knote, w/
> > function/line number stack trace (does not have to be dump, but a
> > sample call path), then I'll reconsider, and fix that bug...

> >> 5. That's why we have the use after free: SLIST_FOREACH is trying to get
> >> a pointer to a next element from the freed memory.
> >> 6. This is why the commit for trashing the freed memory made all the
> >> difference: previously the freed memory was unlikely to be re-used /
> >> modified, so the use-after-free had a high chance of succeeding. It's a
> >> fact that in my panic there was an attempt to dereference a trashed pointer.
> >> 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
> >> pointer to the next element beforehand and, thus, we do not access the
> >> freed memory.
> >>
> >> Please let me know if you see any fault in above reasoning or if
> >> something is still no clear.

--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."

Konstantin Belousov

unread,

Sep 1, 2015, 2:45:10 PM9/1/15

to

Did you read the commit message and my previous N messages about the
subject ? Can you point me at a difference between the commit message
and the above text ?

I object against the your pointless and fact-less backout request
and have no intention of complying with it.

John-Mark Gurney

unread,

Sep 1, 2015, 3:02:16 PM9/1/15

to

Well, when I sent this email, I had not read the commit message,
because I was hurried to send this email, because of your other
email... And this email was rushed because of your threat to
commit email, I was unable to completely review the email like
I would have liked...

As for your other emails, only it looks like one of the emails
contained a proper description of the issue... Simply saying, oh, it
panics because it accessed previously free'd memory does not at all
describe the race and most of your emails were like that, even once I
asked for a description of the race, you could not provide exactly
HOW the race occured...

> I object against the your pointless and fact-less backout request
> and have no intention of complying with it.

I remove my request to back out, but not because you objected to it,
but because after further thought this morning, I'm fine w/ the
change... I would still like to see it made explicit, but that can
be added after this...

But I would ask you to respect my maintainership of the code... Just
because you get paid to work on FreeBSD full time does not mean you
get to run roughshod over other people's work and force them to work
on your time frame... Other people have jobs, and families and
responsiblities too...

Would you like me to ignore your requests for review, or put short
deadlines on reviews? I doubt you do, though the above attitude
sure seems like you don't care about it...

> > Upon further examination, we may have another race as in knote_drop,
> > when we call f_detach, we don't have the list locked, nor kq, which
> > means that knlist_remove_inevent could be modifing the list at the same
> > time that kqueue_register could be modifing it to remove a _DELETED
> > note...

I was sleepy when I was writing this email, and put it off till later
when I could properly go over it again... Upon further review, this is
not an issue as knlist_remove_kq does lock the knlist lock before
removing it, and knote_drop doesn't touch the list... I probably got
confused w/ the kn_link vs kn_selnext lists...

Ed Maste

unread,

Sep 1, 2015, 3:31:05 PM9/1/15

to

On 1 September 2015 at 15:01, John-Mark Gurney <j...@funkthat.com> wrote:
>
> But I would ask you to respect my maintainership of the code... Just
> because you get paid to work on FreeBSD full time does not mean you
> get to run roughshod over other people's work and force them to work
> on your time frame... Other people have jobs, and families and
> responsiblities too...

A quick comment on this point, on behalf of the FreeBSD Foundation
(and not core): working for the Foundation as either permanent staff
or on a project grant conveys no special status with respect to making
changes in FreeBSD. Staff and project developers are expected to abide
by the same rules and social conventions when interacting with the
FreeBSD community.

That said, the discussion and diagnosis of this issue has been ongoing
for about ten days, and avg provided a detailed sequence of events
five days ago. In this case the patch fixed a panic that several
people were experiencing, was tested by several people who experienced
the panic, and received review. In my opinion r287366 was handled in a
fair and reasonable fashion.

Andriy Gapon

unread,

Sep 1, 2015, 4:54:52 PM9/1/15

to

John-Mark,

with all the due respect I have to invoke the forest-vs-trees argument here:

- it is established that in the knote() loop the current knote member of the
klist can be removed
- it's a fact that getting a pointer to a next element from a removed element is
an illegal operation
- FOREACH_SAFE is specifically designed to handle exactly this kind of the iteration

--
Andriy Gapon