Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Bug 204764] Filesystem deadlock, process in vodead state

22 views
Skip to first unread message

bugzilla...@freebsd.org

unread,
Nov 23, 2015, 1:53:35 PM11/23/15
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

Mark Linimon <lin...@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Assignee|freebs...@FreeBSD.org |freeb...@FreeBSD.org

--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
freeb...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"

bugzilla...@freebsd.org

unread,
Jan 8, 2016, 6:47:23 AM1/8/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #1 from dan...@blodan.se ---
I'm actually having the exact same problem, it started when we upgraded to 10.2
(we were running 10.1 before and it never happened there) and it happens about
once a month and after a reboot it's good for about another month.

It seems that its the periodic find that always gets stuck and after that alot
of other i/o procs get stuck

Here's the periodic find:
23293 100188 find - mi_switch+0xe1 sleepq_wait+0x3a
_sleep+0x287 vnode_create_vobject+0x100 ufs_lookup_ino+0xa0
VOP_CACHEDLOOKUP_APV+0xa1 vfs_cache_lookup+0xd6 VOP_LOOKUP_APV+0xa1
lookup+0x5a1 namei+0x4d4 kern_statat_vnhook+0xae sys_fstatat+0x2c
amd64_syscall+0x357 Xfast_syscall+0xfb

And here's a rsync call thats stuck:
83577 100644 rsync - mi_switch+0xe1 sleepq_wait+0x3a
_sleep+0x287 vnode_create_vobject+0x100 ufs_open+0x6d VOP_OPEN_APV+0xa1
vn_open_vnode+0x234 vn_open_cred+0x33e kern_openat+0x26f amd64_syscall+0x357
Xfast_syscall+0xfb

One difference in setup is that I'm running a mfi raid and not zfs

[root@backup-se ~]# mfiutil show volumes
mfi0 Volumes:
Id Size Level Stripe State Cache Name
mfid0 ( 12T) RAID-5 64K OPTIMAL Disabled
mfid1 ( 931G) RAID-0 64K OPTIMAL Disabled


Filesystem Size Used Avail Capacity Mounted on
/dev/mfid1p2 898G 4.8G 821G 1% /
/dev/mfid0p1 12T 8.9T 1.8T 83% /backups
devfs 1.0K 1.0K 0B 100% /dev
fdescfs 1.0K 1.0K 0B 100% /dev/fd

bugzilla...@freebsd.org

unread,
Jan 9, 2016, 1:09:30 PM1/9/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

Konstantin Belousov <k...@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |k...@FreeBSD.org

--- Comment #2 from Konstantin Belousov <k...@FreeBSD.org> ---
(In reply to daniel from comment #1)
Recompile the kernel with INVARIANTS etc, see the guide at
https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

When the situation occurs, execute 'kgdb <path>/kernel.debug /dev/mem'. After
that, switch to the thread which hung, with the kgdb command 'thread <id>'.
Backtrace 'bt' would should you the arguments, find the vnode address which
caused the hang (most likely it is shown as vp), and do 'p *vp', 'p
*(vp->v_bufobj.bo_object)'.

bugzilla...@freebsd.org

unread,
Jan 10, 2016, 6:49:23 AM1/10/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #3 from dan...@blodan.se ---
Thanks for the pointers, I'll get back to you with the output as soon as it
happens.

bugzilla...@freebsd.org

unread,
Jan 28, 2016, 7:36:46 AM1/28/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #4 from Daniel Ylitalo <dan...@blodan.se> ---
Okay! Got a proc stuck again, however, I'm having trouble understanding "find
the vnode address which caused the hang", how do I know what caused the hang?

Heres all the output I've gatherd so far:


procstat -kk -a
=====================================
94614 100627 python2.7 - mi_switch+0x179
sleepq_switch+0x152 sleepq_wait+0x43 _sleep+0x366 vnode_create_vobject+0xe9
ufs_lookup_ino+0xa0 VOP_CACHEDLOOKUP_APV+0x10f vfs_cache_lookup+0xd6
VOP_LOOKUP_APV+0x10f lookup+0x5c2 namei+0x504 kern_statat_vnhook+0xae
sys_lstat+0x30 amd64_syscall+0x278 Xfast_syscall+0xfb
=====================================
[root@hub-se /usr/obj/usr/src/sys/GENERIC]# kgdb kernel.debug /dev/mem
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Reading symbols from /boot/kernel/aio.ko.symbols...done.
Loaded symbols for /boot/kernel/aio.ko.symbols
Reading symbols from /boot/kernel/accf_data.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_data.ko.symbols
Reading symbols from /boot/kernel/accf_http.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_http.ko.symbols
Reading symbols from /boot/kernel/fdescfs.ko.symbols...done.
Loaded symbols for /boot/kernel/fdescfs.ko.symbols
Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
Loaded symbols for /boot/kernel/nullfs.ko.symbols
#0 sched_switch (td=0xffffffff81850720, newtd=<value optimized out>,
flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945
1945 cpuid = PCPU_GET(cpuid);

(kgbd) info threads
505 Thread 100627 (PID=94614: python2.7) sched_switch
(td=0xfffff80379b5e940, newtd=<value optimized out>, flags=<value optimized
out>) at /usr/src/sys/kern/sched_ule.c:1945

(kgdb) thread 505
[Switching to thread 505 (Thread 100627)]#0 sched_switch
(td=0xfffff80379b5e940, newtd=<value optimized out>, flags=<value optimized
out>) at /usr/src/sys/kern/sched_ule.c:1945
1945 cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0 sched_switch (td=0xfffff80379b5e940, newtd=<value optimized out>,
flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945
#1 0xffffffff8094e149 in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:491
#2 0xffffffff8098bb02 in sleepq_switch (wchan=<value optimized out>,
pri=<value optimized out>) at /usr/src/sys/kern/subr_sleepqueue.c:538
#3 0xffffffff8098b963 in sleepq_wait (wchan=0xfffff808cdd41e00, pri=84) at
/usr/src/sys/kern/subr_sleepqueue.c:617
#4 0xffffffff8094da56 in _sleep (ident=<value optimized out>, lock=<value
optimized out>, priority=<value optimized out>, wmesg=<value optimized out>,
sbt=<value optimized out>, pr=<value optimized out>)
at /usr/src/sys/kern/kern_synch.c:255
#5 0xffffffff80be8759 in vnode_create_vobject (vp=0xfffff8036ee569c0,
isize=512, td=0xfffff80379b5e940) at /usr/src/sys/vm/vnode_pager.c:120
#6 0xffffffff80bb1010 in ufs_lookup_ino (vdp=0xfffff8036ee569c0,
vpp=0xfffffe0a5bdfd9a8, cnp=0xfffffe0a5bdfd9d0, dd_ino=0x0) at
/usr/src/sys/ufs/ufs/ufs_lookup.c:259
#7 0xffffffff80e791ef in VOP_CACHEDLOOKUP_APV (vop=<value optimized out>,
a=<value optimized out>) at vnode_if.c:197
#8 0xffffffff809dba26 in vfs_cache_lookup (ap=<value optimized out>) at
vnode_if.h:80
#9 0xffffffff80e7902f in VOP_LOOKUP_APV (vop=<value optimized out>, a=<value
optimized out>) at vnode_if.c:129
#10 0xffffffff809e40c2 in lookup (ndp=0xfffffe0a5bdfd948) at vnode_if.h:54
#11 0xffffffff809e3764 in namei (ndp=0xfffffe0a5bdfd948) at
/usr/src/sys/kern/vfs_lookup.c:302
#12 0xffffffff809f911e in kern_statat_vnhook (td=0xfffff80379b5e940,
flag=<value optimized out>, fd=-100, path=0x8067103d4 <Error reading address
0x8067103d4: Bad address>, pathseg=UIO_USERSPACE, sbp=0xfffffe0a5bdfda60,
hook=0)
at /usr/src/sys/kern/vfs_syscalls.c:2298
#13 0xffffffff809f92b0 in sys_lstat (td=0x0, uap=0xfffffe0a5bdfdb80) at
/usr/src/sys/kern/vfs_syscalls.c:2278
#14 0xffffffff80d51b38 in amd64_syscall (td=0xfffff80379b5e940, traced=0) at
subr_syscall.c:134
#15 0xffffffff80d359ab in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396
#16 0x000000080157f8aa in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)


I'm letting the server stay stuck until I know how to get the info you need out
of it :)

bugzilla...@freebsd.org

unread,
Jan 28, 2016, 8:31:21 AM1/28/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #5 from Konstantin Belousov <k...@FreeBSD.org> ---
In the frame 5, the argument vp points to the vnode. Also do 'p *td' in that
frame.

bugzilla...@freebsd.org

unread,
Jan 28, 2016, 9:49:17 AM1/28/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #6 from Daniel Ylitalo <dan...@blodan.se> ---
Sorry, but this is the first time I'm doing any kind of kernel debuging.

So I'm supposed to write this after the backtrace?

(kgdb) p *0xfffff8036ee569c0
$1 = -2130911031


But then the second command fails
(kgdb) p *(vp->v_bufobj.bo_object)
No symbol "vp" in current context.

bugzilla...@freebsd.org

unread,
Jan 28, 2016, 10:44:02 AM1/28/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #7 from Konstantin Belousov <k...@FreeBSD.org> ---
(In reply to Daniel Ylitalo from comment #6)
frame 5
p *vp
p *td
etc

You must use symbols to access data, otherwise gdb cannot deduce types.

bugzilla...@freebsd.org

unread,
Jan 28, 2016, 10:56:05 AM1/28/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #8 from Daniel Ylitalo <dan...@blodan.se> ---
Created attachment 166244
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=166244&action=edit
threads frame output

bugzilla...@freebsd.org

unread,
Jan 28, 2016, 10:56:45 AM1/28/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #9 from Daniel Ylitalo <dan...@blodan.se> ---
I've attached the output, let me know if you need something else or if I can
reboot without the kernel debugging in place.

bugzilla...@freebsd.org

unread,
Jan 28, 2016, 11:39:06 AM1/28/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #10 from Konstantin Belousov <k...@FreeBSD.org> ---
(In reply to Daniel Ylitalo from comment #9)
This is seemngly what I need to start, but I probably would not look closer
until some time, most likely tomorrow.

bugzilla...@freebsd.org

unread,
Jan 28, 2016, 4:29:40 PM1/28/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #11 from Konstantin Belousov <k...@FreeBSD.org> ---
(In reply to Konstantin Belousov from comment #10)
Is there a procstat -kk output somewhere ? I want to see the state of all
threads when the problem occurs.

bugzilla...@freebsd.org

unread,
Jan 29, 2016, 6:02:37 AM1/29/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #12 from Daniel Ylitalo <dan...@blodan.se> ---
Unfortunately I rebooted the server already, but I still have debug compiled in
so I'll get one for you the next time it happens.

bugzilla...@freebsd.org

unread,
Jan 30, 2016, 5:43:09 AM1/30/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #13 from Konstantin Belousov <k...@FreeBSD.org> ---
Created attachment 166295
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=166295&action=edit
Debugging patch

I attached the debugging patch to try to see something about code which set the
OBJ_DEAD flag. In fact, I do not expect that this would catch the offender,
but we will see.

Somebody who wants to help with the bug should apply the patch, reproduce the
issue and then provide the same information as I already asked, most important
is the p *vp and p *bo_object printouts. If I see what I wanted, I would
provide further instructions how to proceed.

bugzilla...@freebsd.org

unread,
Feb 2, 2016, 10:34:01 AM2/2/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #14 from Daniel Ylitalo <dan...@blodan.se> ---
I've recompiled the kernel with the patch applied, I'll wait for it to happen
again.

bugzilla...@freebsd.org

unread,
Feb 26, 2016, 7:52:37 AM2/26/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

johans <jo...@300.nl> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |jo...@300.nl

--- Comment #15 from johans <jo...@300.nl> ---
We've just ran into the same bug on one of our machines.

# procstat -kk 52827
PID TID COMM TDNAME KSTACK
52827 100187 ruby21 - mi_switch+0xe1
thread_suspend_switch+0x170 thread_single+0x4e5 exit1+0xbe sigexit+0x925
postsig+0x286 ast+0x427 doreti_ast+0x1f
52827 100389 ruby21 - mi_switch+0xe1 sleepq_wait+0x3a
_sleep+0x287 vnode_create_vobject+0x100 zfs_freebsd_open+0xf5 VOP_OPEN_APV+0xa1
vn_open_vnode+0x234 vn_open_cred+0x33e kern_openat+0x26f amd64_syscall+0x357
Xfast_syscall+0xfb


(kgdb) print *object
$10 = {
lock = {
lock_object = {
lo_name = 0xffffffff80ff8196 "vm object",
lo_flags = 90374144,
lo_data = 0,
lo_witness = 0x0
},
rw_lock = 1
},
object_list = {
tqe_next = 0xfffff80135aaa600,
tqe_prev = 0xfffff80135aaa420
},
shadow_head = {
lh_first = 0x0
},
shadow_list = {
le_next = 0x0,
le_prev = 0xfffff8019f85f030
},
memq = {
tqh_first = 0x0,
tqh_last = 0xfffff80135aaa548
},
rtree = {
rt_root = 0,
rt_flags = 0 '\0'
},
size = 0,
generation = 1,
ref_count = 0,
shadow_count = 0,
memattr = 6 '\006',
type = 2 '\002',
flags = 24584,
pg_color = 1569,
paging_in_progress = 0,
resident_page_count = 0,
backing_object = 0x0,
backing_object_offset = 0,
pager_object_list = {
tqe_next = 0x0,
tqe_prev = 0x0
},
rvq = {
lh_first = 0x0
},
cache = {
rt_root = 0,
rt_flags = 0 '\0'
},
handle = 0xfffff8010bc08760,
un_pager = {
vnp = {
vnp_size = 0,
writemappings = 0
},
devp = {
devp_pglist = {
tqh_first = 0x0,
tqh_last = 0x0
},
ops = 0x0,
dev = 0x0
},
sgp = {
sgp_pglist = {
tqh_first = 0x0,
tqh_last = 0x0
}
},
swp = {
swp_tmpfs = 0x0,
swp_bcount = 0
}
},
cred = 0x0,
charge = 0
}

(kgdb) p *vp
$11 = {
v_tag = 0xffffffff81acefb6 "zfs",
v_op = 0xffffffff81ae12f0,
v_data = 0xfffff80031109450,
v_mount = 0xfffff8000801a330,
v_nmntvnodes = {
tqe_next = 0xfffff801c912a1d8,
tqe_prev = 0xfffff8022bfa0780
},
v_un = {
vu_mount = 0x0,
vu_socket = 0x0,
vu_cdev = 0x0,
vu_fifoinfo = 0x0
},
v_hashlist = {
le_next = 0x0,
le_prev = 0x0
},
v_cache_src = {
lh_first = 0x0
},
v_cache_dst = {
tqh_first = 0xfffff801a260f4d0,
tqh_last = 0xfffff801a260f4f0
},
v_cache_dd = 0x0,
v_lock = {
lock_object = {
lo_name = 0xffffffff81acefb6 "zfs",
lo_flags = 117112832,
lo_data = 0,
lo_witness = 0x0
},
lk_lock = 1,
lk_exslpfail = 0,
lk_timo = 51,
lk_pri = 96
},
v_interlock = {
lock_object = {
lo_name = 0xffffffff80fd2f2d "vnode interlock",
lo_flags = 16973824,
lo_data = 0,
lo_witness = 0x0
},
mtx_lock = 4
},
v_vnlock = 0xfffff8010bc087c8,
v_actfreelist = {
tqe_next = 0xfffff80079317000,
tqe_prev = 0xfffff80092a9e820
},
v_bufobj = {
bo_lock = {
lock_object = {
lo_name = 0xffffffff80fd2f3d "bufobj interlock",
lo_flags = 86179840,
lo_data = 0,
lo_witness = 0x0
},
rw_lock = 1
},
bo_ops = 0xffffffff8149bff0,
bo_object = 0xfffff80135aaa500,
bo_synclist = {
le_next = 0x0,
le_prev = 0x0
},
bo_private = 0xfffff8010bc08760,
__bo_vnode = 0xfffff8010bc08760,
bo_clean = {
bv_hd = {
tqh_first = 0x0,
tqh_last = 0xfffff8010bc08880
},
bv_root = {
pt_root = 0
},
bv_cnt = 0
},
bo_dirty = {
bv_hd = {
tqh_first = 0x0,
tqh_last = 0xfffff8010bc088a0
},
bv_root = {
pt_root = 0
},
bv_cnt = 0
},
bo_numoutput = 0,
bo_flag = 0,
bo_bsize = 131072
},
v_pollinfo = 0x0,
v_label = 0x0,
v_lockf = 0x0,
v_rl = {
rl_waiters = {
tqh_first = 0x0,
tqh_last = 0xfffff8010bc088e8
},
rl_currdep = 0x0
},
v_cstart = 0,
v_lasta = 0,
v_lastw = 0,
v_clen = 0,
v_holdcnt = 2,
v_usecount = 2,
v_iflag = 512,
v_vflag = 0,
v_writecount = 0,
v_hash = 17547399,
v_type = VREG
}

(kgdb) print vp->v_cache_dst->tqh_first->nc_name
$14 = 0xfffff801a260f512 "puppet20160226-52827-x6ea8y"

The file where things go wrong for us is: /tmp/puppet20160226-52827-x6ea8y

Performing a cat on this file also hangs on vodead.

Any relevant waiting channels:

# ps -o lwp -laxwwwSH | awk '{ if ($10 !~
"^(-|ttyin|lockf|wait|select|kqread|tx\->)") print; }'
LWP UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT
TIME COMMAND
100000 0 0 0 0 -16 0 0 285088 swapin DLs -
0:41.09 [kernel/swapper]
100048 0 0 0 0 -92 0 0 285088 vtbslp DLs -
0:00.00 [kernel/virtio_ballo]
100018 0 2 0 0 -16 0 0 16 crypto_w DL -
0:00.00 [crypto]
100019 0 3 0 0 -16 0 0 16 crypto_r DL -
0:00.00 [crypto returns]
100057 0 5 0 0 20 0 0 6176 arc_recl DL -
13:07.13 [zfskern/arc_reclaim]
100058 0 5 0 0 -8 0 0 6176 l2arc_fe DL -
0:40.44 [zfskern/l2arc_feed_]
102486 0 5 0 0 -8 0 0 6176 zio->io_ DL -
8:24.66 [zfskern/txg_thread_]
100062 0 7 0 0 -16 0 0 32 psleep DL -
17:00.73 [pagedaemon/pagedaem]
100068 0 7 0 0 -16 0 0 32 umarcl DL -
0:00.00 [pagedaemon/uma]
100063 0 8 0 0 -16 0 0 16 psleep DL -
0:00.00 [vmdaemon]
100064 0 9 0 0 155 0 0 16 pgzero DL -
0:00.17 [pagezero]
100001 0 10 0 0 -16 0 0 16 audit_wo DL -
14:36.35 [audit]
100065 0 16 0 0 -16 0 0 16 psleep DL -
0:51.50 [bufdaemon]
100066 0 17 0 0 -16 0 0 16 vlruwt DL -
1:27.70 [vnlru]
100067 0 18 0 0 16 0 0 16 syncer DL -
63:11.82 [syncer]
100406 65534 921 1 0 52 0 71104 33604 uwait Is -
0:00.00 /usr/local/bin/memcached -l 127.0.0.1 -m 1024 -p 11211 -C -d -u nobody
-P /var/run/memcached/memcached.pid
100387 0 958 1 0 23 0 16612 2224 nanslp Is -
97:24.14 /usr/sbin/cron -s
100394 0 1194 0 0 -16 0 0 16 pftm DL -
30:48.46 [pf purge]
100388 0 1201 1 0 52 0 14700 2256 sbwait Is -
0:00.00 pflogd: [priv] (pflogd)
100375 64 1202 1201 0 20 0 14700 2312 bpf S -
5:36.59 pflogd: [running] -s 116 -i pflog0 -f /dev/null (pflogd)
100389 0 52827 1 0 52 0 213452 112968 vodead T -
0:08.69 ruby21: puppet agent: applying configuration (ruby21)
100400 0 55604 0 0 -16 0 0 16 ftcl DL -
0:00.00 [ftcleanup]
101076 0 58064 1 0 20 0 16588 2168 auditd Is -
0:07.02 /usr/sbin/auditd
101349 0 37651 37601 0 20 0 12356 2132 vodead D+ 0-
0:00.00 cat puppet20160226-52827-x6ea8y

This is the first time we're seeing this.

bugzilla...@freebsd.org

unread,
Feb 26, 2016, 7:54:52 AM2/26/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #16 from johans <jo...@300.nl> ---
I have to reboot the machine now, so hope gives enough extra information
(though I doubt it). This was all the relevant information I could come up
with.

bugzilla...@freebsd.org

unread,
Mar 30, 2016, 7:44:52 PM3/30/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #17 from Jason Wolfe <j...@nitrology.com> ---
We are seeing this issue at Limelight across a good number of boxes on
stable/10 @ r285800. We have a newer 10 version in testing @ r296969, but have
not used it with enough anger to empirically know if it has been fixed. It
should be getting deployed further in the next months or so.

Unfortunately we don't yet have a way to reproduce, so we can introduce
INVARIANTS into a kernel to help capture some info. I'll report back if we do,
but in the meantime I did want to +1 this.

bugzilla...@freebsd.org

unread,
Apr 1, 2016, 8:04:53 AM4/1/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

Rudolf Čejka <cej...@fit.vutbr.cz> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |cej...@fit.vutbr.cz

--- Comment #18 from Rudolf Čejka <cej...@fit.vutbr.cz> ---
Hello,
another +1 for us.

bugzilla...@freebsd.org

unread,
Apr 1, 2016, 8:14:30 AM4/1/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #19 from Rudolf Čejka <cej...@fit.vutbr.cz> ---
And now the debugging info. Unfortunately, we are new for this bug, so we have
the kernel without INVARIANTS and without debugging.patch yet.

We have locked bacula-fd, which is backup client with similar function to find
& stat and it occured during incremental backup operation:

procstat -kk (I can send full procstat -kk -a personally, it has 429876 bytes):

1331 101013 bacula-fd - mi_switch+0xbe
sleepq_catch_signals+0xab sleepq_wait_sig+0xc _cv_wait_sig+0x17a seltdwait+0xae
kern_select+0x8fa sys_select+0x54 amd64_syscall+0x321 Xfast_syscall+0xfb
1331 101035 bacula-fd - mi_switch+0xbe
sleepq_catch_signals+0xab sleepq_timedwait_sig+0xf _sleep+0x238
umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83
amd64_syscall+0x321 Xfast_syscall+0xfb
1331 102066 bacula-fd - mi_switch+0xbe sleepq_wait+0x3a
_sleep+0x287 vnode_create_vobject+0x100 ufs_lookup_ino+0xa0
VOP_CACHEDLOOKUP_APV+0x7a vfs_cache_lookup+0xd6 VOP_LOOKUP_APV+0x7a
lookup+0x571 namei+0x344 kern_statat_vnhook+0xae sys_lstat+0x30
amd64_syscall+0x321 Xfast_syscall+0xfb
1331 103370 bacula-fd - mi_switch+0xbe
sleepq_catch_signals+0xab sleepq_timedwait_sig+0xf _cv_timedwait_sig_sbt+0x19e
seltdwait+0xa4 kern_select+0x8fa sys_select+0x54 amd64_syscall+0x321
Xfast_syscall+0xfb

where the important thread is TID 102066, with vnode_create_vobject.

(kgdb) info threads
...
701 Thread 102066 (PID=1331: bacula-fd) sched_switch (td=0xfffff803ec526960,
newtd=0x0, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945
...

(kgdb) thread 701
[Switching to thread 701 (Thread 102066)]#0 sched_switch
(td=0xfffff803ec526960, newtd=0x0, flags=<value optimized out>) at


/usr/src/sys/kern/sched_ule.c:1945
1945 cpuid = PCPU_GET(cpuid);

(kgdb) bt
#0 sched_switch (td=0xfffff803ec526960, newtd=0x0, flags=<value optimized


out>) at /usr/src/sys/kern/sched_ule.c:1945

#1 0xffffffff80580d4e in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:491
#2 0xffffffff805c155a in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:618
#3 0xffffffff80580837 in _sleep (ident=<value optimized out>, lock=<value


optimized out>, priority=<value optimized out>, wmesg=<value optimized out>,

sbt=<value optimized out>, pr=<value optimized out>, flags=<value optimized
out>)
at /usr/src/sys/kern/kern_synch.c:255
#4 0xffffffff80766e60 in vnode_create_vobject (vp=0xfffff816fe4f3588,
isize=512, td=0xfffff803ec526960) at /usr/src/sys/vm/vnode_pager.c:120
#5 0xffffffff8072fe80 in ufs_lookup_ino (vdp=0xfffff816fe4f3588,
vpp=0xfffffe17f2e63858, cnp=0xfffffe17f2e63880, dd_ino=0x0) at
/usr/src/sys/ufs/ufs/ufs_lookup.c:259
#6 0xffffffff807f897a in VOP_CACHEDLOOKUP_APV (vop=<value optimized out>,


a=<value optimized out>) at vnode_if.c:197

#7 0xffffffff806092e6 in vfs_cache_lookup (ap=<value optimized out>) at
vnode_if.h:80
#8 0xffffffff807f883a in VOP_LOOKUP_APV (vop=<value optimized out>, a=<value


optimized out>) at vnode_if.c:129

#9 0xffffffff806111f1 in lookup (ndp=0xfffffe17f2e637f8) at vnode_if.h:54
#10 0xffffffff80610a04 in namei (ndp=0xfffffe17f2e637f8) at
/usr/src/sys/kern/vfs_lookup.c:302
#11 0xffffffff8062631e in kern_statat_vnhook (td=0xfffff803ec526960,
flag=<value optimized out>, fd=-100, path=0x43c18128 <Error reading address
0x43c18128: Bad address>, pathseg=UIO_USERSPACE, sbp=0xfffffe17f2e63910,
hook=0)
at /usr/src/sys/kern/vfs_syscalls.c:2300
#12 0xffffffff80626420 in sys_lstat (td=0x0, uap=0xfffffe17f2e63a40) at
/usr/src/sys/kern/vfs_syscalls.c:2280
#13 0xffffffff8078acd1 in amd64_syscall (td=0xfffff803ec526960, traced=0) at
subr_syscall.c:141
#14 0xffffffff8076fffb in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396
#15 0x00000000425a731a in ?? ()


Previous frame inner to this frame (corrupt stack?)

(kgdb) frame 4
#4 0xffffffff80766e60 in vnode_create_vobject (vp=0xfffff816fe4f3588,
isize=512, td=0xfffff803ec526960) at /usr/src/sys/vm/vnode_pager.c:120
120 VM_OBJECT_SLEEP(object, object, PDROP | PVM, "vodead",
0);

(kgdb) p *vp
$1 = {v_tag = 0xffffffff8085c651 "ufs", v_op = 0xffffffff80b0f950, v_data =
0xfffff81488311160, v_mount = 0xfffff8019a27d000, v_nmntvnodes = {tqe_next =
0xfffff816894dc3b0, tqe_prev = 0xfffff80b0d4991f8}, v_un = {vu_mount = 0x0,

vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next =

0xfffff80824c363b0, le_prev = 0xfffff81742127b48}, v_cache_src = {lh_first =
0x0}, v_cache_dst = {tqh_first = 0xfffff80e5f6a5af0,
tqh_last = 0xfffff80e5f6a5b10}, v_cache_dd = 0xfffff80e5f6a5af0, v_lock =
{lock_object = {lo_name = 0xffffffff8085c651 "ufs", lo_flags = 117112832,


lo_data = 0, lo_witness = 0x0}, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51,

lk_pri = 96}, v_interlock = {lock_object = {lo_name = 0xffffffff80870687


"vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0},

mtx_lock = 4}, v_vnlock = 0xfffff816fe4f35f0, v_actfreelist = {
tqe_next = 0xfffff80b0d4991d8, tqe_prev = 0xfffff816894dc470}, v_bufobj =
{bo_lock = {lock_object = {lo_name = 0xffffffff80870697 "bufobj interlock",


lo_flags = 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1},

bo_ops = 0xffffffff80afc070, bo_object = 0xfffff802f8423d00, bo_synclist =
{le_next = 0x0, le_prev = 0x0}, bo_private = 0xfffff816fe4f3588, __bo_vnode =
0xfffff816fe4f3588, bo_clean = {bv_hd = {tqh_first = 0x0,
tqh_last = 0xfffff816fe4f36a8}, bv_root = {pt_root = 0}, bv_cnt = 0},
bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff816fe4f36c8}, bv_root =
{pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 32768},

v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters =

{tqh_first = 0x0, tqh_last = 0xfffff816fe4f3710}, rl_currdep = 0x0}, v_cstart =
0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 8, v_usecount = 7,
v_iflag = 512, v_vflag = 0, v_writecount = 0, v_hash = 76467851, v_type =
VDIR}

(kgdb) p *(vp->v_bufobj.bo_object)
$2 = {lock = {lock_object = {lo_name = 0xffffffff8088aeb6 "vm object", lo_flags


= 90374144, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, object_list =

{tqe_next = 0xfffff802f8423e00, tqe_prev = 0xfffff802f8423c20}, shadow_head = {


lh_first = 0x0}, shadow_list = {le_next = 0x0, le_prev =

0xfffff801744afa30}, memq = {tqh_first = 0xfffff8183ea423d0, tqh_last =
0xfffff8183ea423e0}, rtree = {rt_root = 18446735381746688977, rt_flags = 0
'\0'}, size = 1,

generation = 1, ref_count = 0, shadow_count = 0, memattr = 6 '\006', type = 2

'\002', flags = 16392, pg_color = 1608, paging_in_progress = 0,
resident_page_count = 1, backing_object = 0x0, backing_object_offset = 0,

pager_object_list = {tqe_next = 0x0, tqe_prev = 0x0}, rvq = {lh_first = 0x0},

cache = {rt_root = 0, rt_flags = 0 '\0'}, handle = 0xfffff816fe4f3588, un_pager
= {vnp = {vnp_size = 512, writemappings = 0}, devp = {devp_pglist = {
tqh_first = 0x200, tqh_last = 0x0}, ops = 0x0, dev = 0x0}, sgp =
{sgp_pglist = {tqh_first = 0x200, tqh_last = 0x0}}, swp = {swp_tmpfs = 0x200,


swp_bcount = 0}}, cred = 0x0, charge = 0}

(kgdb) p *td
$3 = {td_lock = 0xffffffff80b9d4d0, td_proc = 0xfffff8025dd694f0, td_plist =
{tqe_next = 0xfffff801744ac4b0, tqe_prev = 0xfffff805374dd970}, td_runq =
{tqe_next = 0x0, tqe_prev = 0xffffffff80b728b0}, td_slpq = {
tqe_next = 0xfffff80389d11960, tqe_prev = 0xfffff80223243e80}, td_lockq =
{tqe_next = 0x0, tqe_prev = 0xfffffe00003849b8}, td_hash = {le_next = 0x0,
le_prev = 0xfffffe00007d7590}, td_cpuset = 0xfffff801100593a8,
td_sel = 0xfffff806ce36eb00, td_sleepqueue = 0x0, td_turnstile =
0xfffff80174d07e40, td_rlqe = 0xfffff8017477ba28, td_umtxq =
0xfffff803a5a44f00, td_tid = 102066, td_sigqueue = {sq_signals = {__bits = {0,
0, 0, 0}}, sq_kill = {
__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last =
0xfffff803ec526a18}, sq_proc = 0xfffff8025dd694f0, sq_flags = 1},
td_lend_user_pri = 255 '˙', td_flags = 4, td_inhibitors = 2, td_pflags = 0,
td_dupfd = 0,
td_sqqueue = 0, td_wchan = 0xfffff802f8423d00, td_wmesg = 0xffffffff8088be0a
"vodead", td_lastcpu = 13 '\r', td_oncpu = 255 '˙', td_owepreempt = 0 '\0',
td_tsqueue = 0 '\0', td_locks = 26721, td_rw_rlocks = 0, td_lk_slocks = 0,
td_stopsched = 0, td_blocked = 0x0, td_lockname = 0x0, td_contested =
{lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned =
0, td_ucred = 0xfffff8017438e800, td_estcpu = 0, td_slptick = 1266348642,
td_blktick = 0, td_swvoltick = 1266348642, td_cow = 0, td_ru = {ru_utime =
{tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss =
7952, ru_ixrss = 21233268, ru_idrss = 1299996, ru_isrss = 13866624,
ru_minflt = 1, ru_majflt = 0, ru_nswap = 0, ru_inblock = 2105608,
ru_oublock = 0, ru_msgsnd = 1531599, ru_msgrcv = 100, ru_nsignals = 0, ru_nvcsw
= 1685809, ru_nivcsw = 9633}, td_rux = {rux_runtime = 2226241352945,
rux_uticks = 46227, rux_sticks = 62106, rux_iticks = 0, rux_uu = 365363300,
rux_su = 490865795, rux_tu = 856229096}, td_incruntime = 0, td_runtime =
2226241352945, td_pticks = 0, td_sticks = 0, td_iticks = 0, td_uticks = 0,
td_intrval = 0, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_generation =
1695442, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 0}, td_xsig = 0,
td_profil_addr = 0, td_profil_ticks = 0,
td_name = "bacula-fd\000\000\000\000\000\000\000\000\000\000", td_fpop = 0x0,
td_dbgflags = 0, td_dbgksi = {ksi_link = {tqe_next = 0x0, tqe_prev = 0x0},
ksi_info = {si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0,
si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0,
sigval_int = 0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0}, _timer =
{_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = {_band = 0},
__spare__ = {__spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0}}}},
ksi_flags = 0, ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0,
osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}},
td_map_def_user = 0x0, td_dbg_forked = 0, td_vp_reserv = 0, td_no_sleeping =
0, td_dom_rr_idx = 0, td_sigmask = {__bits = {0, 0, 0, 0}}, td_rqindex = 48
'0', td_base_pri = 152 '\230', td_priority = 152 '\230',
td_pri_class = 3 '\003', td_user_pri = 183 'ˇ', td_base_user_pri = 183 'ˇ',
td_pcb = 0xfffffe17f2e63b80, td_state = TDS_INHIBITED, td_retval = {0,
1084620448}, td_slpcallout = {c_links = {le = {le_next = 0x0,
le_prev = 0xfffffe0000ad9288}, sle = {sle_next = 0x0}, tqe = {tqe_next
= 0x0, tqe_prev = 0xfffffe0000ad9288}}, c_time = 5430875852315327, c_precision
= 26843, c_arg = 0xfffff803ec526960,
c_func = 0xffffffff805c1370 <sleepq_timeout>, c_lock = 0x0, c_flags = 2,
c_iflags = 272, c_cpu = 11}, td_frame = 0xfffffe17f2e63ac0, td_kstack_obj =
0xfffff8076ec32a00, td_kstack = 18446741977545703424, td_kstack_pages = 4,
td_critnest = 1, td_md = {md_spinlock_count = 1, md_saved_flags = 582,
md_spurflt_addr = 1131679744}, td_sched = 0xfffff803ec526de0, td_ar = 0x0,
td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0x0, td_errno = 0,
td_vnet = 0x0, td_vnet_lpush = 0x0, td_intr_frame = 0x0, td_rfppwait_p =
0xfffff80616f0a000, td_ma = 0x0, td_ma_cnt = 0, td_su = 0x0, td_dbg_sc_code =
0, td_dbg_sc_narg = 0, td_emuldata = 0x0}

(kgdb) p *object
$4 = {lock = {lock_object = {lo_name = 0xffffffff8088aeb6 "vm object", lo_flags


= 90374144, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, object_list =

{tqe_next = 0xfffff802f8423e00, tqe_prev = 0xfffff802f8423c20}, shadow_head = {


lh_first = 0x0}, shadow_list = {le_next = 0x0, le_prev =

0xfffff801744afa30}, memq = {tqh_first = 0xfffff8183ea423d0, tqh_last =
0xfffff8183ea423e0}, rtree = {rt_root = 18446735381746688977, rt_flags = 0
'\0'}, size = 1,

generation = 1, ref_count = 0, shadow_count = 0, memattr = 6 '\006', type = 2

'\002', flags = 16392, pg_color = 1608, paging_in_progress = 0,
resident_page_count = 1, backing_object = 0x0, backing_object_offset = 0,

pager_object_list = {tqe_next = 0x0, tqe_prev = 0x0}, rvq = {lh_first = 0x0},

cache = {rt_root = 0, rt_flags = 0 '\0'}, handle = 0xfffff816fe4f3588, un_pager
= {vnp = {vnp_size = 512, writemappings = 0}, devp = {devp_pglist = {
tqh_first = 0x200, tqh_last = 0x0}, ops = 0x0, dev = 0x0}, sgp =
{sgp_pglist = {tqh_first = 0x200, tqh_last = 0x0}}, swp = {swp_tmpfs = 0x200,


swp_bcount = 0}}, cred = 0x0, charge = 0}

--

bugzilla...@freebsd.org

unread,
Apr 1, 2016, 10:35:25 AM4/1/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #20 from Konstantin Belousov <k...@FreeBSD.org> ---
(In reply to Rudolf Čejka from comment #19)
From your report, I am almost certain (with ~95% level) that this issue, and
the issue reported in PR204426 are the same. I would follow on that PR since
the reporter seems to have low time to reproduce, and more involved debugging
patches are already in flight there.

bugzilla...@freebsd.org

unread,
Apr 14, 2016, 8:30:29 AM4/14/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #21 from Robert Schulze <r...@bytecamp.net> ---
Just to clarify: After rebooting the machine once, the deadlock did not occur
again. I'm currently unable to reproduce this issue but hope that the workloads
of the other reporters could lead to a better diagnostic.

bugzilla...@freebsd.org

unread,
Apr 20, 2016, 7:29:03 AM4/20/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

ro...@qsp.nl changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |ro...@qsp.nl

--- Comment #22 from ro...@qsp.nl ---
And another +1. Deadlock occurs on a raidz1 pool with 4 1.2TB members. This
system is running on r294572. The issue started after approximately 59 days of
production use with an average load of about 150 io transactions per second.

The deadlock is on a directory, which is located on a +-600GB ZFS dataset and
contains 7 files and two subdirs, a few kb total.

procstat -kk for a stuck du process:

PID TID COMM TDNAME KSTACK

97300 101745 du - mi_switch+0xe1 sleepq_wait+0x3a
_sleep+0x287 vnode_create_vobject+0x100 zfs_freebsd_open+0xf5 VOP_OPEN_APV+0xa1
vn_open_vnode+0x234 vn_open_cred+0x36a kern_openat+0x26f amd64_syscall+0x40f
Xfast_syscall+0xfb

And here is an rsync that is stuck:

25689 101762 rsync-backup - mi_switch+0xe1 sleepq_wait+0x3a
_sleep+0x287 vnode_create_vobject+0x100 zfs_freebsd_open+0xf5 VOP_OPEN_APV+0xa1
vn_open_vnode+0x234 vn_open_cred+0x36a kern_openat+0x26f amd64_syscall+0x40f
Xfast_syscall+0xfb

This filesystem is exported through nfs. We have a few nfsd threads stuck as
well:

root@nfs01:/home/roel # procstat -kk 857


PID TID COMM TDNAME KSTACK

857 100485 nfsd nfsd: master mi_switch+0xe1
sleepq_catch_signals+0xab sleepq_timedwait_sig+0x10 _cv_timedwait_sig_sbt+0x18b
svc_run_internal+0x46d svc_run+0x182 nfsrvd_nfsd+0x242 nfssvc_nfsd+0x107
sys_nfssvc+0x9c amd64_syscall+0x40f Xfast_syscall+0xfb
857 100505 nfsd nfsd: service mi_switch+0xe1
sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x16a
svc_run_internal+0x86e svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe
... lots of similar threads, except for: ...
857 100541 nfsd nfsd: service mi_switch+0xe1 sleepq_wait+0x3a
_sleep+0x287 vnode_create_vobject+0x100 zfs_fhtovp+0x45b nfsvno_fhtovp+0x7c
nfsd_fhtovp+0xc8 nfsrvd_dorpc+0x12c nfssvc_program+0x54e svc_run_internal+0xd2b
svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe
857 100588 nfsd nfsd: service mi_switch+0xe1 sleepq_wait+0x3a
_sleep+0x287 vnode_create_vobject+0x100 zfs_fhtovp+0x45b nfsvno_fhtovp+0x7c
nfsd_fhtovp+0xc8 nfsrvd_dorpc+0x12c nfssvc_program+0x54e svc_run_internal+0xd2b
svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe

Some kgdb output for one of the stuck du threads:

[Switching to thread 555 (Thread 101720)]#0 sched_switch
(td=0xfffff8033a17e000, newtd=<value optimized out>, flags=<value optimized
out>) at /usr/src/sys/kern/sched_ule.c:1945


1945 cpuid = PCPU_GET(cpuid);
(kgdb) bt

#0 sched_switch (td=0xfffff8033a17e000, newtd=<value optimized out>,
flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945
#1 0xffffffff80959ff1 in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:491
#2 0xffffffff8099b31a in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:617
#3 0xffffffff80959a17 in _sleep (ident=<value optimized out>, lock=<value


optimized out>, priority=<value optimized out>, wmesg=<value optimized out>,
sbt=<value optimized out>, pr=<value optimized out>,
flags=<value optimized out>) at /usr/src/sys/kern/kern_synch.c:255

#4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10,
isize=16, td=0xfffff8033a17e000) at /usr/src/sys/vm/vnode_pager.c:120
#5 0xffffffff81a7d5f5 in zfs_freebsd_open (ap=0xfffffe0466890658) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6076
#6 0xffffffff80e7e651 in VOP_OPEN_APV (vop=<value optimized out>, a=<value
optimized out>) at vnode_if.c:469
#7 0xffffffff80a05694 in vn_open_vnode (vp=0xfffff801d3e11b10, fmode=1179653,
cred=0xfffff803682bd400, td=0xfffff8033a17e000, fp=0xfffff80240d403c0) at
vnode_if.h:196
#8 0xffffffff80a0528a in vn_open_cred (ndp=0xfffffe0466890870,
flagp=0xfffffe046689094c, cmode=<value optimized out>, vn_open_flags=<value
optimized out>, cred=0x0, fp=0xfffff80240d403c0)
at /usr/src/sys/kern/vfs_vnops.c:268
#9 0xffffffff809fe44f in kern_openat (td=0xfffff8033a17e000, fd=-100,
path=0x801422d48 <Error reading address 0x801422d48: Bad address>,
pathseg=UIO_USERSPACE, flags=1179653, mode=<value optimized out>)
at /usr/src/sys/kern/vfs_syscalls.c:1093
#10 0xffffffff80d55aff in amd64_syscall (td=0xfffff8033a17e000, traced=0) at
subr_syscall.c:141
#11 0xffffffff80d3ad7b in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396
#12 0x0000000800b7570a in ?? ()


Previous frame inner to this frame (corrupt stack?)

(kgdb) frame 4
#4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10,
isize=16, td=0xfffff8033a17e000) at /usr/src/sys/vm/vnode_pager.c:120


120 VM_OBJECT_SLEEP(object, object, PDROP | PVM, "vodead",
0);

(kgdb) print vp
$4 = (struct vnode *) 0xfffff801d3e11b10
(kgdb) print *vp
$5 = {v_tag = 0xffffffff81aecb3b "zfs", v_op = 0xffffffff81aff790, v_data =
0xfffff801a2d1ca10, v_mount = 0xfffff8011a46e660, v_nmntvnodes = {tqe_next =
0xfffff801a53281d8, tqe_prev = 0xfffff801a6b8e5a8}, v_un = {


vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0},

v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = {lh_first =
0xfffff80019708850}, v_cache_dst = {tqh_first = 0xfffff80257063850,
tqh_last = 0xfffff80257063870}, v_cache_dd = 0xfffff80257063850, v_lock =
{lock_object = {lo_name = 0xffffffff81aecb3b "zfs", lo_flags = 117112832,


lo_data = 0, lo_witness = 0x0}, lk_lock = 1, lk_exslpfail = 0,
lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name =

0xffffffff80fe1a6c "vnode interlock", lo_flags = 16973824, lo_data = 0,
lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 0xfffff801d3e11b78,
v_actfreelist = {tqe_next = 0xfffff801a6b8e588, tqe_prev =
0xfffff801a5328298}, v_bufobj = {bo_lock = {lock_object = {lo_name =
0xffffffff80fe1a7c "bufobj interlock", lo_flags = 86179840, lo_data = 0,
lo_witness = 0x0},
rw_lock = 1}, bo_ops = 0xffffffff814b3900, bo_object =
0xfffff801fd0ea100, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private =
0xfffff801d3e11b10, __bo_vnode = 0xfffff801d3e11b10, bo_clean = {bv_hd = {
tqh_first = 0x0, tqh_last = 0xfffff801d3e11c30}, bv_root = {pt_root =


0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last =

0xfffff801d3e11c50}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0,
bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf =
0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfffff801d3e11c98},


rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0,

v_clen = 0, v_holdcnt = 127, v_usecount = 126, v_iflag = 512, v_vflag = 0,
v_writecount = 0, v_hash = 7955980, v_type = VDIR}
(kgdb) print *td
$6 = {td_lock = 0xffffffff81619670, td_proc = 0xfffff8006233f4f0, td_plist =
{tqe_next = 0x0, tqe_prev = 0xfffff8006233f500}, td_runq = {tqe_next = 0x0,
tqe_prev = 0xffffffff815e3998}, td_slpq = {
tqe_next = 0xfffff803f7d36960, tqe_prev = 0xfffff802836ed030}, td_lockq =
{tqe_next = 0x0, tqe_prev = 0xfffffe03e44e7908}, td_hash = {le_next = 0x0,
le_prev = 0xfffffe000087aac0}, td_cpuset = 0xfffff800083cd3a8,
td_sel = 0xfffff80039695e00, td_sleepqueue = 0x0, td_turnstile =
0xfffff8015b0cacc0, td_rlqe = 0xfffff80117ffa4d8, td_umtxq =
0xfffff802d770a780, td_tid = 101720, td_sigqueue = {sq_signals = {__bits = {0,


0, 0, 0}},
sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last =

0xfffff8033a17e0b8}, sq_proc = 0xfffff8006233f4f0, sq_flags = 1},
td_lend_user_pri = 255 'ÿ', td_flags = 133124, td_inhibitors = 2,
td_pflags = 0, td_dupfd = -1, td_sqqueue = 0, td_wchan = 0xfffff801fd0ea100,
td_wmesg = 0xffffffff81006e8d "vodead", td_lastcpu = 1 '\001', td_oncpu = 255
'ÿ', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0',
td_locks = 4348, td_rw_rlocks = 0, td_lk_slocks = 0, td_stopsched = 0,


td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0},
td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0,

td_ucred = 0xfffff803682bd400, td_estcpu = 0, td_slptick = 444244584,
td_blktick = 0, td_swvoltick = 444244584, td_cow = 39, td_ru = {ru_utime =


{tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0},

ru_maxrss = 1656, ru_ixrss = 192, ru_idrss = 64, ru_isrss = 2048, ru_minflt
= 119, ru_majflt = 0, ru_nswap = 0, ru_inblock = 749, ru_oublock = 0, ru_msgsnd
= 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 739,
ru_nivcsw = 2}, td_rux = {rux_runtime = 514120625, rux_uticks = 1,
rux_sticks = 15, rux_iticks = 0, rux_uu = 12852, rux_su = 192791, rux_tu =
205644}, td_incruntime = 0, td_runtime = 514120625, td_pticks = 0,

td_sticks = 0, td_iticks = 0, td_uticks = 0, td_intrval = 0, td_oldsigmask =

{__bits = {0, 0, 0, 0}}, td_generation = 741, td_sigstk = {ss_sp = 0x0, ss_size
= 0, ss_flags = 4}, td_xsig = 0, td_profil_addr = 0,
td_profil_ticks = 0, td_name = "du", '\0' <repeats 17 times>, td_fpop = 0x0,


td_dbgflags = 0, td_dbgksi = {ksi_link = {tqe_next = 0x0, tqe_prev = 0x0},
ksi_info = {si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0,
si_uid = 0, si_status = 0, si_addr = 0x0, si_value = {sival_int = 0,
sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = {_fault =
{_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0},
_poll = {_band = 0}, __spare__ = {__spare1__ = 0, __spare2__ = {0, 0,
0, 0, 0, 0, 0}}}}, ksi_flags = 0, ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd =
{osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0,
le_prev = 0x0}}, td_map_def_user = 0x0, td_dbg_forked = 0, td_vp_reserv =
0, td_no_sleeping = 0, td_dom_rr_idx = 0, td_sigmask = {__bits = {0, 0, 0, 0}},

td_rqindex = 31 '\037', td_base_pri = 120 'x',
td_priority = 120 'x', td_pri_class = 3 '\003', td_user_pri = 126 '~',
td_base_user_pri = 126 '~', td_pcb = 0xfffffe0466890b80, td_state =
TDS_INHIBITED, td_retval = {0, 1}, td_slpcallout = {c_links = {le = {
le_next = 0x0, le_prev = 0xfffffe00008d0a10}, sle = {sle_next = 0x0},
tqe = {tqe_next = 0x0, tqe_prev = 0xfffffe00008d0a10}}, c_time =
20354765951718809, c_precision = 268435437, c_arg = 0xfffff8033a17e000,
c_func = 0xffffffff8099b0d0 <sleepq_timeout>, c_lock = 0x0, c_flags = 0,
c_iflags = 272, c_cpu = 2}, td_frame = 0xfffffe0466890ac0, td_kstack_obj =
0xfffff803af9b7200, td_kstack = 18446741893586407424,

td_kstack_pages = 4, td_critnest = 1, td_md = {md_spinlock_count = 1,

md_saved_flags = 582, md_spurflt_addr = 0}, td_sched = 0xfffff8033a17e480,


td_ar = 0x0, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}},

td_dtrace = 0xfffff8037c56e700, td_errno = 0, td_vnet = 0x0, td_vnet_lpush =
0x0, td_intr_frame = 0x0, td_rfppwait_p = 0xfffff802656e09e0, td_ma = 0x0,


td_ma_cnt = 0, td_su = 0x0, td_dbg_sc_code = 0, td_dbg_sc_narg = 0,
td_emuldata = 0x0}


And another stuck du process:

(kgdb) thread 795
[Switching to thread 795 (Thread 101745)]#0 sched_switch
(td=0xfffff8006f2e84b0, newtd=<value optimized out>, flags=<value optimized
out>) at /usr/src/sys/kern/sched_ule.c:1945


1945 cpuid = PCPU_GET(cpuid);
(kgdb) bt

#0 sched_switch (td=0xfffff8006f2e84b0, newtd=<value optimized out>,
flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945
#1 0xffffffff80959ff1 in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:491
#2 0xffffffff8099b31a in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:617
#3 0xffffffff80959a17 in _sleep (ident=<value optimized out>, lock=<value


optimized out>, priority=<value optimized out>, wmesg=<value optimized out>,
sbt=<value optimized out>, pr=<value optimized out>,
flags=<value optimized out>) at /usr/src/sys/kern/kern_synch.c:255

#4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10,
isize=16, td=0xfffff8006f2e84b0) at /usr/src/sys/vm/vnode_pager.c:120
#5 0xffffffff81a7d5f5 in zfs_freebsd_open (ap=0xfffffe046690d658) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6076
#6 0xffffffff80e7e651 in VOP_OPEN_APV (vop=<value optimized out>, a=<value
optimized out>) at vnode_if.c:469
#7 0xffffffff80a05694 in vn_open_vnode (vp=0xfffff801d3e11b10, fmode=1179653,
cred=0xfffff802f6e2f000, td=0xfffff8006f2e84b0, fp=0xfffff80104fcc5a0) at
vnode_if.h:196
#8 0xffffffff80a0528a in vn_open_cred (ndp=0xfffffe046690d870,
flagp=0xfffffe046690d94c, cmode=<value optimized out>, vn_open_flags=<value
optimized out>, cred=0x0, fp=0xfffff80104fcc5a0)
at /usr/src/sys/kern/vfs_vnops.c:268
#9 0xffffffff809fe44f in kern_openat (td=0xfffff8006f2e84b0, fd=-100,
path=0x801422d48 <Error reading address 0x801422d48: Bad address>,
pathseg=UIO_USERSPACE, flags=1179653, mode=<value optimized out>)
at /usr/src/sys/kern/vfs_syscalls.c:1093
#10 0xffffffff80d55aff in amd64_syscall (td=0xfffff8006f2e84b0, traced=0) at
subr_syscall.c:141
#11 0xffffffff80d3ad7b in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396
#12 0x0000000800b7570a in ?? ()


Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 4

#4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10,
isize=16, td=0xfffff8006f2e84b0) at /usr/src/sys/vm/vnode_pager.c:120


120 VM_OBJECT_SLEEP(object, object, PDROP | PVM, "vodead",
0);
(kgdb) p *vp

$7 = {v_tag = 0xffffffff81aecb3b "zfs", v_op = 0xffffffff81aff790, v_data =
0xfffff801a2d1ca10, v_mount = 0xfffff8011a46e660, v_nmntvnodes = {tqe_next =
0xfffff801a53281d8, tqe_prev = 0xfffff801a6b8e5a8}, v_un = {


vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0},

v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = {lh_first =
0xfffff80019708850}, v_cache_dst = {tqh_first = 0xfffff80257063850,
tqh_last = 0xfffff80257063870}, v_cache_dd = 0xfffff80257063850, v_lock =
{lock_object = {lo_name = 0xffffffff81aecb3b "zfs", lo_flags = 117112832,


lo_data = 0, lo_witness = 0x0}, lk_lock = 1, lk_exslpfail = 0,
lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name =

0xffffffff80fe1a6c "vnode interlock", lo_flags = 16973824, lo_data = 0,
lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 0xfffff801d3e11b78,
v_actfreelist = {tqe_next = 0xfffff801a6b8e588, tqe_prev =
0xfffff801a5328298}, v_bufobj = {bo_lock = {lock_object = {lo_name =
0xffffffff80fe1a7c "bufobj interlock", lo_flags = 86179840, lo_data = 0,
lo_witness = 0x0},
rw_lock = 1}, bo_ops = 0xffffffff814b3900, bo_object =
0xfffff801fd0ea100, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private =
0xfffff801d3e11b10, __bo_vnode = 0xfffff801d3e11b10, bo_clean = {bv_hd = {
tqh_first = 0x0, tqh_last = 0xfffff801d3e11c30}, bv_root = {pt_root =


0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last =

0xfffff801d3e11c50}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0,
bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf =
0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfffff801d3e11c98},


rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0,

v_clen = 0, v_holdcnt = 127, v_usecount = 126, v_iflag = 512, v_vflag = 0,
v_writecount = 0, v_hash = 7955980, v_type = VDIR}
(kgdb) p *td
$8 = {td_lock = 0xffffffff81619670, td_proc = 0xfffff80009eb59e0, td_plist =
{tqe_next = 0x0, tqe_prev = 0xfffff80009eb59f0}, td_runq = {tqe_next = 0x0,
tqe_prev = 0xffffffff815e87a8}, td_slpq = {
tqe_next = 0xfffff803cfb27960, tqe_prev = 0xfffff8006f2e8990}, td_lockq =
{tqe_next = 0x0, tqe_prev = 0xfffffe04662f90c8}, td_hash = {le_next = 0x0,
le_prev = 0xfffffe000087ab88}, td_cpuset = 0xfffff800083cd3a8,
td_sel = 0xfffff80290088500, td_sleepqueue = 0x0, td_turnstile =
0xfffff800b09c4840, td_rlqe = 0xfffff8015a73a910, td_umtxq =
0xfffff8012d91b280, td_tid = 101745, td_sigqueue = {sq_signals = {__bits = {0,


0, 0, 0}},
sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last =

0xfffff8006f2e8568}, sq_proc = 0xfffff80009eb59e0, sq_flags = 1},
td_lend_user_pri = 255 'ÿ', td_flags = 133124, td_inhibitors = 2,
td_pflags = 0, td_dupfd = -1, td_sqqueue = 0, td_wchan = 0xfffff801fd0ea100,
td_wmesg = 0xffffffff81006e8d "vodead", td_lastcpu = 7 '\a', td_oncpu = 255
'ÿ', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', td_locks = 3768,

td_rw_rlocks = 0, td_lk_slocks = 0, td_stopsched = 0, td_blocked = 0x0,
td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0,

td_intr_nesting_level = 0, td_pinned = 0, td_ucred = 0xfffff802f6e2f000,
td_estcpu = 0, td_slptick = 729400807, td_blktick = 0, td_swvoltick =
729400807, td_cow = 41, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime
= {tv_sec = 0, tv_usec = 0}, ru_maxrss = 1696, ru_ixrss = 312,
ru_idrss = 104, ru_isrss = 3328, ru_minflt = 121, ru_majflt = 0, ru_nswap =
0, ru_inblock = 840, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals
= 0, ru_nvcsw = 795, ru_nivcsw = 0}, td_rux = {
rux_runtime = 547891773, rux_uticks = 1, rux_sticks = 25, rux_iticks = 0,
rux_uu = 8428, rux_su = 210723, rux_tu = 219152}, td_incruntime = 0, td_runtime
= 547891773, td_pticks = 0, td_sticks = 0, td_iticks = 0,

td_uticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 0, 0, 0}},

td_generation = 795, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4},


td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0,

td_name = "du", '\0' <repeats 17 times>, td_fpop = 0x0, td_dbgflags = 0,


td_dbgksi = {ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {si_signo
= 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0,
si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0,
sigval_int = 0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0}, _timer =
{_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = {
_band = 0}, __spare__ = {__spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0,
0, 0}}}}, ksi_flags = 0, ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd =
{osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0,
le_prev = 0x0}}, td_map_def_user = 0x0, td_dbg_forked = 0, td_vp_reserv =
0, td_no_sleeping = 0, td_dom_rr_idx = 0, td_sigmask = {__bits = {0, 0, 0, 0}},

td_rqindex = 32 ' ', td_base_pri = 120 'x',
td_priority = 120 'x', td_pri_class = 3 '\003', td_user_pri = 130 '\202',
td_base_user_pri = 130 '\202', td_pcb = 0xfffffe046690db80, td_state =
TDS_INHIBITED, td_retval = {0, 1}, td_slpcallout = {c_links = {le = {
le_next = 0x0, le_prev = 0xfffffe0000ac29b0}, sle = {sle_next = 0x0},
tqe = {tqe_next = 0x0, tqe_prev = 0xfffffe0000ac29b0}}, c_time =
21579501454196634, c_precision = 268435437, c_arg = 0xfffff8006f2e84b0,
c_func = 0xffffffff8099b0d0 <sleepq_timeout>, c_lock = 0x0, c_flags = 0,
c_iflags = 272, c_cpu = 9}, td_frame = 0xfffffe046690dac0, td_kstack_obj =
0xfffff80198fc0e00, td_kstack = 18446741893586919424,

td_kstack_pages = 4, td_critnest = 1, td_md = {md_spinlock_count = 1,

md_saved_flags = 582, md_spurflt_addr = 0}, td_sched = 0xfffff8006f2e8930,


td_ar = 0x0, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}},

td_dtrace = 0xfffff802a4f1fc00, td_errno = 0, td_vnet = 0x0, td_vnet_lpush =
0x0, td_intr_frame = 0x0, td_rfppwait_p = 0xfffff803cb6dd4f0, td_ma = 0x0,


td_ma_cnt = 0, td_su = 0x0, td_dbg_sc_code = 0, td_dbg_sc_narg = 0,
td_emuldata = 0x0}


We are replicating this filesystem using incremental ZFS snapshots onto a
secondary host (using zrep). I can mount the replicated fs and read the file
there without issue.

I am currently compiling the latest -STABLE kernel and am scheduling a reboot
tonight.

bugzilla...@freebsd.org

unread,
Apr 20, 2016, 7:51:55 AM4/20/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #23 from Andriy Gapon <a...@FreeBSD.org> ---
(In reply to roel from comment #22)
It would be interesting to see *vp->v_bufobj.bo_object.
Also, it's better to add a really long output as an attachment rather than
pasting it into a comment.

bugzilla...@freebsd.org

unread,
Apr 20, 2016, 8:08:57 AM4/20/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #24 from ro...@qsp.nl ---
Here it is, for both du processes listed above:

(kgdb) thread 555
(kgdb) frame 4
#4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10,
isize=16, td=0xfffff8033a17e000) at /usr/src/sys/vm/vnode_pager.c:120
120 VM_OBJECT_SLEEP(object, object, PDROP | PVM, "vodead",
0);
(kgdb) p *vp->v_bufobj.bo_object
$1 = {lock = {lock_object = {lo_name = 0xffffffff810062d6 "vm object", lo_flags
= 90374144, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, object_list =
{tqe_next = 0xfffff801fd0ea200, tqe_prev = 0xfffff801fd0ea020},
shadow_head = {lh_first = 0x0}, shadow_list = {le_next = 0x0, le_prev =
0xfffff8022d69bb38}, memq = {tqh_first = 0x0, tqh_last = 0xfffff801fd0ea148},
rtree = {rt_root = 0, rt_flags = 0 '\0'}, size = 1, generation = 1,
ref_count = 0, shadow_count = 0, memattr = 6 '\006', type = 2 '\002', flags =
16392, pg_color = 21, paging_in_progress = 0, resident_page_count = 0,
backing_object = 0x0, backing_object_offset = 0, pager_object_list = {
tqe_next = 0x0, tqe_prev = 0x0}, rvq = {lh_first = 0x0}, cache = {rt_root =
0, rt_flags = 0 '\0'}, handle = 0xfffff801d3e11b10, un_pager = {vnp = {vnp_size
= 16, writemappings = 0}, devp = {devp_pglist = {
tqh_first = 0x10, tqh_last = 0x0}, ops = 0x0, dev = 0x0}, sgp =
{sgp_pglist = {tqh_first = 0x10, tqh_last = 0x0}}, swp = {swp_tmpfs = 0x10,
swp_bcount = 0}}, cred = 0x0, charge = 0}
(kgdb) thread 795
[Switching to thread 795 (Thread 101745)]#0 sched_switch
(td=0xfffff8006f2e84b0, newtd=<value optimized out>, flags=<value optimized
out>) at /usr/src/sys/kern/sched_ule.c:1945
1945 cpuid = PCPU_GET(cpuid);
(kgdb) frame 4
#4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10,
isize=16, td=0xfffff8006f2e84b0) at /usr/src/sys/vm/vnode_pager.c:120
120 VM_OBJECT_SLEEP(object, object, PDROP | PVM, "vodead",
0);
(kgdb) p *vp->v_bufobj.bo_object
$2 = {lock = {lock_object = {lo_name = 0xffffffff810062d6 "vm object", lo_flags
= 90374144, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, object_list =
{tqe_next = 0xfffff801fd0ea200, tqe_prev = 0xfffff801fd0ea020},
shadow_head = {lh_first = 0x0}, shadow_list = {le_next = 0x0, le_prev =
0xfffff8022d69bb38}, memq = {tqh_first = 0x0, tqh_last = 0xfffff801fd0ea148},
rtree = {rt_root = 0, rt_flags = 0 '\0'}, size = 1, generation = 1,
ref_count = 0, shadow_count = 0, memattr = 6 '\006', type = 2 '\002', flags =
16392, pg_color = 21, paging_in_progress = 0, resident_page_count = 0,
backing_object = 0x0, backing_object_offset = 0, pager_object_list = {
tqe_next = 0x0, tqe_prev = 0x0}, rvq = {lh_first = 0x0}, cache = {rt_root =
0, rt_flags = 0 '\0'}, handle = 0xfffff801d3e11b10, un_pager = {vnp = {vnp_size
= 16, writemappings = 0}, devp = {devp_pglist = {
tqh_first = 0x10, tqh_last = 0x0}, ops = 0x0, dev = 0x0}, sgp =
{sgp_pglist = {tqh_first = 0x10, tqh_last = 0x0}}, swp = {swp_tmpfs = 0x10,
swp_bcount = 0}}, cred = 0x0, charge = 0}

bugzilla...@freebsd.org

unread,
Apr 21, 2016, 4:51:11 AM4/21/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

Erik <fre...@erik.eu> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |fre...@erik.eu

--- Comment #25 from Erik <fre...@erik.eu> ---
+1

Same/similar enviroment like others
- zfs mirrored pool with multiple ssd disks
- exported over NFS
- random directories are 'stuck' after a couple of days
- ls works fine, but accessing resluts in vodead state and newnfs state on
client

this is on FreeBSD 10.3-RELEASE

I'll try to get the debug info later.

bugzilla...@freebsd.org

unread,
Jun 21, 2016, 1:50:36 PM6/21/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #26 from commi...@freebsd.org ---
A commit references this bug:

Author: kib
Date: Tue Jun 21 17:49:33 UTC 2016
New revision: 302063
URL: https://svnweb.freebsd.org/changeset/base/302063

Log:
The vmtotal sysctl handler marks active vm objects to calculate
statistics. Marking is done by setting the OBJ_ACTIVE flag. The
flags change is locked, but the problem is that many parts of system
assume that vm object initialization ensures that no other code could
change the object, and thus performed lockless. The end result is
corrupted flags in vm objects, most visible is spurious OBJ_DEAD flag,
causing random hangs.

Avoid the active object marking, instead provide equally inexact but
immutable is_object_alive() definition for the object mapped state.

Avoid iterating over the processes mappings altogether by using
arguably improved definition of the paging thread as one which sleeps
on the v_free_count.

PR: 204764
Diagnosed by: pho
Tested by: pho (previous version)
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (gjb)

Changes:
head/sys/vm/vm_meter.c
head/sys/vm/vm_object.h

bugzilla...@freebsd.org

unread,
Jul 4, 2016, 4:32:16 AM7/4/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #27 from Daniel Ylitalo <dan...@blodan.se> ---
Anyone aware if this will be released as a patch for 10.3 through
freebsd-update or if one should go patch the diff? :)

bugzilla...@freebsd.org

unread,
Jul 27, 2016, 8:45:55 AM7/27/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #28 from Robert Schulze <r...@bytecamp.net> ---
Created attachment 173037
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=173037&action=edit
patch to apply r302063 on FreeBSD-10

Due to the lack of r263260 ("Rename global cnt to vm_cnt to avoid shadowing.")
the patch for r302063 cannot be applied directly to FreeBSD-10. I created a
patch that does not rely on r263260, so users of FreeBSD-10 can test it, too.

with kind regards,
Robert Schulze

bugzilla...@freebsd.org

unread,
Jul 28, 2016, 4:36:16 AM7/28/16
to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

--- Comment #29 from Jason Wolfe <j...@nitrology.com> ---
(In reply to Robert Schulze from comment #28)

kib MFCd r302063 into stable/10 a few weeks ago, 302243, so custom patching
isn't needed if you bring your sources up to date. You are correct though
cnt.v_free_count was renamed and should be the only difference between 10 and
11.
0 new messages