zfs resilver keeps restarting

Marc UBM Bocklet via freebsd-stable

unread,

Sep 18, 2016, 9:09:48 AM9/18/16

to

Hi all,

due to two bad cables, I had two drives drop from my striped raidz2
pool (built on top of geli encrypted drives). I replaced one of the
drives before I realized that the cabling was at fault - that's the
drive which is being replaced in the ouput of zpool status below.

I have just installed the new cables and all sata errors are gone.
However, the resilver of the pool keeps restarting.

I see no errors in /var/log/messages, but zpool history -i says:

2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3
maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0
2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3
maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0
2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3
maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0
2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3
maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0
2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3
maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0
2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3
maxtxg=1219391

I assume that "scan done complete=0" means that the resilver didn't
finish?

pool layout is the following:

pool: pool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool
will continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Sep 18 14:51:39 2016
235G scanned out of 9.81T at 830M/s, 3h21m to go
13.2M resilvered, 2.34% done
config:

NAME STATE READ WRITE CKSUM
pool DEGRADED 0 0 0
raidz2-0 ONLINE 0 0 0
da6.eli ONLINE 0 0 0
da7.eli ONLINE 0 0 0
ada1.eli ONLINE 0 0 0
ada2.eli ONLINE 0 0 0
da10.eli ONLINE 0 0 2
da11.eli ONLINE 0 0 0
da12.eli ONLINE 0 0 0
da13.eli ONLINE 0 0 0
raidz2-1 DEGRADED 0 0 0
da0.eli ONLINE 0 0 0
da1.eli ONLINE 0 0 0
da2.eli ONLINE 0 0 1
(resilvering)
replacing-3 DEGRADED 0 0 1
10699825708166646100 UNAVAIL 0 0 0
was /dev/da3.eli da4.eli ONLINE 0 0 0
(resilvering)
da3.eli ONLINE 0 0 0
da5.eli ONLINE 0 0 0
da8.eli ONLINE 0 0 0
da9.eli ONLINE 0 0 0

errors: No known data errors

system is
FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633:
Mon Sep 15 22:34:05 CEST 2014
root@xxx:/usr/obj/usr/src/sys/xxx amd64

controller is
SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor]

Drives are connected via four four-port sata cables.

Should I upgrade to 10.3-release or did I make some sort of
configuration error / overlook something?

Thanks in advance!

Cheers,
Marc
_______________________________________________
freebsd...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Alan Somers

unread,

Sep 18, 2016, 12:06:19 PM9/18/16

to

Resilver will start over anytime there's new damage. In your case,
with two failed drives, resilver should've begun after you replaced
the first drive, and restarted after you replaced the second. Have
you seen it restart more than that? If so, keep an eye on the error
counters in "zpool status"; they might give you a clue. You could
also raise the loglevel of devd to "info" in /etc/syslog.conf and see
what gets logged to /etc/devd.log. That will tell you if drives a
dropping out and automatically rejoining the pool, for example.

-Alan

Marc UBM Bocklet via freebsd-stable

unread,

Sep 18, 2016, 12:47:08 PM9/18/16

to

Thanks a lot for your fast reply, unfortunately (or not), devd is silent
and the error count for the pool remains at zero. The resilver, however,
just keeps restarting. The furthest it got was about 68% resilvered.
Usually, it gets to 2 - 3%, then restarts.

I plan on offlining the pool, upgrading to 10.3, and then reimporting
the pool next. Does that make sense?

Cheers,
Marc

--
Marc "UBM" Bocklet <etern...@gmail.com>

Alan Somers

unread,

Sep 18, 2016, 1:42:23 PM9/18/16

to

On Sun, Sep 18, 2016 at 10:46 AM, Marc UBM Bocklet via freebsd-stable

I suspect an upgrade won't make a difference, but it certainly won't
hurt. Did you remember to change devd's loglevel to "info" and
restart syslogd?

Marc UBM Bocklet via freebsd-stable

unread,

Sep 19, 2016, 2:43:30 PM9/19/16

to

On Sun, 18 Sep 2016 11:41:54 -0600

I just upgraded to
FreeBSD xxx 10.3-STABLE FreeBSD 10.3-STABLE #29 r305937: Sun Sep 18
19:48:32 CEST 2016 xxx amd64

and the resilver is apparently finishing on the first try (currently at
97% and continuing).
I changed devd's loglevel and restarted syslog, but caught no messages
in the log except those (malformatted due to line wrap)

Sep 19 19:55:54 hamstor devd: Executing 'logger -p kern.notice -t ZFS
'vdev state changed, pool_guid=17098460560433284558
vdev_guid=15989198335786279524'' Sep 19 19:55:54 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=15989198335786279524 Sep 19 19:55:54 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=7556163349212343309'' Sep 19
19:55:54 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=7556163349212343309 Sep 19
19:55:54 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=11941432590348604434'' Sep 19 19:55:54 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=11941432590348604434 Sep 19 19:55:54 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=417984871670910036'' Sep 19
19:55:54 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=417984871670910036 Sep 19
19:55:54 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=9683634954111285635'' Sep 19 19:55:54 hamstor ZFS: vdev state
changed, pool_guid=17098460560433284558 vdev_guid=9683634954111285635
Sep 19 19:55:54 hamstor devd: Executing 'logger -p kern.notice -t ZFS
'vdev state changed, pool_guid=17098460560433284558
vdev_guid=13914729548038345942'' Sep 19 19:55:54 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=13914729548038345942 Sep 19 19:55:54 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=11390813927692356269'' Sep 19
19:55:54 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=11390813927692356269 Sep 19
19:55:54 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=17789791879708299257'' Sep 19 19:55:54 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=17789791879708299257 Sep 19 19:55:54 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=3906761392810563354'' Sep 19
19:55:54 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=3906761392810563354 Sep 19
19:55:54 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=14587480937528490734'' Sep 19 19:55:54 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=14587480937528490734 Sep 19 19:55:54 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=10215581546520896687'' Sep 19
19:55:54 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=10215581546520896687 Sep 19
19:55:56 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=15040171490277026943'' Sep 19 19:55:56 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=15040171490277026943 Sep 19 19:55:56 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=1334619275191205373'' Sep 19
19:55:56 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=1334619275191205373 Sep 19
19:55:56 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=10400807283721245288'' Sep 19 19:55:56 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=10400807283721245288 Sep 19 19:55:56 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=12117703925014396267'' Sep 19
19:55:56 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=12117703925014396267 Sep 19
19:55:56 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=1855871756760387608'' Sep 19 19:55:56 hamstor ZFS: vdev state
changed, pool_guid=17098460560433284558 vdev_guid=1855871756760387608
Sep 19 19:55:57 hamstor devd: Executing 'logger -p kern.notice -t ZFS
'vdev state changed, pool_guid=17098460560433284558
vdev_guid=15989198335786279524'' Sep 19 19:55:57 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=15989198335786279524 Sep 19 19:55:57 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=7556163349212343309'' Sep 19
19:55:57 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=7556163349212343309 Sep 19
19:55:57 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=11941432590348604434'' Sep 19 19:55:57 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=11941432590348604434 Sep 19 19:55:57 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=417984871670910036'' Sep 19
19:55:57 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=417984871670910036 Sep 19
19:55:57 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=9683634954111285635'' Sep 19 19:55:57 hamstor ZFS: vdev state
changed, pool_guid=17098460560433284558 vdev_guid=9683634954111285635
Sep 19 19:55:57 hamstor devd: Executing 'logger -p kern.notice -t ZFS
'vdev state changed, pool_guid=17098460560433284558
vdev_guid=13914729548038345942'' Sep 19 19:55:57 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=13914729548038345942 Sep 19 19:55:57 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=11390813927692356269'' Sep 19
19:55:57 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=11390813927692356269 Sep 19
19:55:57 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=17789791879708299257'' Sep 19 19:55:57 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=17789791879708299257 Sep 19 19:55:57 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=3906761392810563354'' Sep 19
19:55:57 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=3906761392810563354 Sep 19
19:55:57 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=14587480937528490734'' Sep 19 19:55:57 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=14587480937528490734 Sep 19 19:55:57 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=10215581546520896687'' Sep 19
19:55:57 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=10215581546520896687 Sep 19
19:55:58 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=15040171490277026943'' Sep 19 19:55:58 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=15040171490277026943 Sep 19 19:55:58 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=1334619275191205373'' Sep 19
19:55:58 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=1334619275191205373 Sep 19
19:55:58 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=10400807283721245288'' Sep 19 19:55:58 hamstor ZFS: vdev
state changed, pool_guid=17098460560433284558
vdev_guid=10400807283721245288 Sep 19 19:55:58 hamstor devd: Executing
'logger -p kern.notice -t ZFS 'vdev state changed,
pool_guid=17098460560433284558 vdev_guid=12117703925014396267'' Sep 19
19:55:58 hamstor ZFS: vdev state changed,
pool_guid=17098460560433284558 vdev_guid=12117703925014396267 Sep 19
19:55:58 hamstor devd: Executing 'logger -p kern.notice -t ZFS 'vdev
state changed, pool_guid=17098460560433284558
vdev_guid=1855871756760387608'' Sep 19 19:55:58 hamstor ZFS: vdev state
changed, pool_guid=17098460560433284558 vdev_guid=1855871756760387608

These were logged while already on 10.3-stable.

Thanks a lot for taking the time to help me!

Cheers,
Marc