Cannot attach or detach drives from mirrors

398 views
Skip to first unread message

Bryan

unread,
Jun 28, 2010, 11:25:03 PM6/28/10
to zfs-fuse
I'm running 0.6.9 recently upgraded from 0.6.0 on Ubuntu 9.10. I
cannot attach or detach a mirror with any drive identifier that I can
find. The only identifier that allowed me to detach a drive was a
long numeric name that was assigned when I removed a failing drive
(sdc) and it was automatically replaced by the new sdc. This was
after I had trashed zpool.cache and re-imported using '-d /dev/disk/by-
id', so that surprised me a bit. I have tried using the sd[a-z]
identifiers to attach/detach/replace. I always get "no such device
in pool". I would appreciate any ideas - I've assumed up to now that
this is a known issue and searched extensively, but haven't found
anything on point.

Currently, this is what I have:


root@storage:~# zpool status
pool: canaanpool
state: ONLINE
scrub: none requested
config:

NAME
STATE READ WRITE CKSUM
canaanpool
ONLINE 0 0 0
mirror-0
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0K4W8B-part1
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0PDPDB-part1
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0N6U3B-part1
ONLINE 0 0 0

errors: No known data errors

pool: newpool
state: ONLINE
scrub: scrub completed after 2h23m with 0 errors on Mon Jun 28
00:20:55 2010
config:

NAME STATE
READ WRITE CKSUM
newpool ONLINE
0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0R299B ONLINE
0 0 0

errors: No known data errors

I have replicated all filesystems to newpool to try to work around
this, as I thought the partitions might be the problem. Now I can't
get a mirror to attach to newpool. What drive identifiers should I be
using?

sgheeren

unread,
Jun 29, 2010, 3:00:35 AM6/29/10
to zfs-...@googlegroups.com
Hi Brian,

I get some of the picture, but not all is clear. Could you simply show the commands that are failing, because the prose version has me confused :)

Meanwhile

a. did you notice that zpool.cache has moved to /var/lib/zfs/ ?

b .here is a pointer on changed default behaviour in 0.6.9,
http://zfs-fuse.net/documentation/upgrading/pool-discovery
so you might see if you can get the old behaviour back by running zpool import with '-d /dev' explicitely (zap zpool.cache again)

c. disk partitions are _recommended_ even with full disks (a single primary partition, then) for portability/protection against other tools

d. the especially unclear bits in your story:

"the long numeric ID"
- What does that look like? /dev/disk/by-id are _not_ numeric

automatically replaced?
- What automatic? Are you using /etc/zfs/zfs_pool_alert and if you disable it do things work?
- if things have been replaced, have you waited for resilver to complete?


Now I can't
get a mirror to attach to newpool
- what is the command you are using for this to happen? What is the output

In general, be as specific as you can be. I _know_ you have a problem but I'm sure you want to tell me more than just that :)

Regards,
Seth

Bryan

unread,
Jun 29, 2010, 2:10:26 PM6/29/10
to zfs-fuse
a. /var/lib/zfs? Didn't notice. I was going to rm it, but it
disappeared when I exported all pools. Re-importing in any manner
hasn't helped.

b. Thanks, that helps. I really want to stick with /dev/disk/by-id
(I had re-imported with that before upgrading anyway). I just tried:

root@storage:/var/lib/zfs# zpool import -d /dev canaanpool
sh: exportfs: not found
exportfs -o fsid=100,no_subtree_check '*:/canaanpool/persephane' ->
32512
cannot share 'canaanpool/persephane': exportfs failed
sh: exportfs: not found
exportfs -o fsid=100,no_subtree_check '*:/canaanpool/timemachine' ->
32512
cannot share 'canaanpool/timemachine': exportfs failed
root@storage:/var/lib/zfs# zpool status
pool: canaanpool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
canaanpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb1 ONLINE 0 0 0
sdd1 ONLINE 0 0 0
sde1 ONLINE 0 0 0

errors: No known data errors

pool: newpool
state: ONLINE
scrub: none requested
config:

NAME STATE
READ WRITE CKSUM
newpool ONLINE
0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0R299B ONLINE
0 0 0

errors: No known data errors
root@storage:/var/lib/zfs# zpool detach canaanpool sdd1
cannot detach sdd1: no such device in pool
root@storage:/var/lib/zfs# zpool detach canaanpool sdd
cannot detach sdd: no such device in pool
root@storage:/var/lib/zfs# zpool detach canaanpool /dev/sdd1
cannot detach /dev/sdd1: no such device in pool
root@storage:/var/lib/zfs# zpool detach canaanpool /dev/sdd
cannot detach /dev/sdd: no such device in pool


c. I had read that partitions are recommended mentioned on some forum
post from a couple of years ago - couldn't tell if that was still the
case. From my zpool status, you can tell that the original pool was
using partitions. All these drives use GUID Partition Tables created
under either OpenSolaris or Nexenta - is that relevant? When I tried
to examine the partition structure with gparted, it wanted to repair
the partitions, perhaps because of endianness? Could this be part of
the trouble with the device names?

d. After I had upgraded to 0.6.9, pulled the failing disk, and
rebooted, another unused disk automatically replaced the failed disk
(as if I had physically replaced the disk). I'm sure this occurred b/
c the old disk had been sdc and the new disk was given the sdc label
(even though I had already switched to using /dev/disk/by-id). I
thought this was weird behavior, but maybe it was b/c I had switched
to /dev/disk/by-id before upgrading version, and the old version was
still using "sdc" internally?

- the "long numeric id" - in zpool status, the disk name was
something like 8372648596947368235 - in other words, just a string of
digits, I believe 19 digits long. Using that, I was able to detach
the new drive from the mirror, but not able to reattach it. I've
never seen such a disk identifier except in some forum post about
zfs. The post was old, on Solaris, and off-point (somebody was trying
to detach a drive from a raidz).

It was similar to the pool id from here, except it was identifying a
single drive not belonging to a pool:

root@storage:/var/lib/zfs# zpool import
pool: newpool
id: 4487651816773099050
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

newpool ONLINE
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0R299B ONLINE

pool: canaanpool
id: 16178643373490038152
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

canaanpool
ONLINE
mirror-0
ONLINE
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0K4W8B-part1
ONLINE
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0PDPDB-part1
ONLINE
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0N6U3B-part1
ONLINE


- here is a string of commands and output using the by-id names:

root@storage:~# zpool detach canaanpool disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDB-part1
cannot detach disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0PDPDB-
part1: no such device in pool
root@storage:~# zpool export canaanpool
root@storage:~# zpool attach newpool disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0R299B /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDB
cannot attach /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDB to disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0R299B: no such device in pool
root@storage:~# zpool attach newpool sdc /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDBcannot attach /dev/disk/by-id/
ata-Hitachi_HDT721010SLA360_STF604MH0PDPDB to sdc: no such device in
pool
root@storage:~# zpool attach newpool /dev/sdc /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDB
cannot attach /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDB to /dev/sdc: no such device in
pool
root@storage:~# zpool attach newpool /disk/sdc /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDB
cannot attach /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDB to /disk/sdc: no such device in
pool
root@storage:~# zpool attach newpool /dev/disk/sdc /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDB
cannot attach /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDB to /dev/disk/sdc: no such
device in pool

(incidentally, sdc is the same disk as STF604MH0R299B, the only disk
in newpool)

Literally every command I've tried, using any name I can think of or
find for a disk, under 0.6.0 or 0.6.9 to replace, attach, or detach a
drive has resulted in "no such device in pool". The only successful
detach was detaching STF604MH0R299B from canaanpool using the string
of digits as its name.

I'm really hoping there's something I've overlooked. I really
appreciate your work on this project. There is nothing like zfs
available and usable right now.

Bryan

On Jun 29, 2:00 am, sgheeren <sghee...@hotmail.com> wrote:
> Hi Brian,
>
> I get some of the picture, but not all is clear. Could you simply show
> the commands that are failing, because the prose version has me confused :)
>
> Meanwhile
>
> a. did you notice that zpool.cache has moved to /var/lib/zfs/ ?
>
> b .here is a pointer on changed default behaviour in 0.6.9,http://zfs-fuse.net/documentation/upgrading/pool-discovery

sgheeren

unread,
Jun 29, 2010, 3:06:34 PM6/29/10
to zfs-...@googlegroups.com
On 06/29/2010 08:10 PM, Bryan wrote:
> root@storage:/var/lib/zfs# zpool detach canaanpool /dev/sdd1
>
I think that needs to be

root@storage:/var/lib/zfs# zpool detach canaanpool mirror-0 /dev/sdd1

I'll be testing this but from the top of my head this your problem

sgheeren

unread,
Jun 29, 2010, 3:31:46 PM6/29/10
to zfs-...@googlegroups.com
Disregard my previous post: it should work the way you tried, not mine [1]


On 06/29/2010 08:10 PM, Bryan wrote:
I had read that partitions are recommended mentioned on some forum
post from a couple of years ago - couldn't tell if that was still the
case.  From my zpool status, you can tell that the original pool was
using partitions.  All these drives use GUID Partition Tables created
under either OpenSolaris or Nexenta - is that relevant?  When I tried
to examine the partition structure with gparted, it wanted to repair
the partitions, perhaps because of endianness?  Could this be part of
the trouble with the device names?
  
Yes this could be problematic. See http://zfs-fuse.net/issues/50. I'll link to that one from the pool import/discovery FAQ item
I had /some/ success with kpartx, but I'd be sure to have backups. Personally, I'd first replicate the pool onto disks with regular IBM/DOS FAT tables.


d.  After I had upgraded to 0.6.9, pulled the failing disk, and
rebooted, another unused disk automatically replaced the failed disk
(as if I had physically replaced the disk).  I'm sure this occurred b/
c the old disk had been sdc and the new disk was given the sdc label
(even though I had already switched to using /dev/disk/by-id).  
This should _not_ happen. ZFS will note the absense of the _correct_ disklabel and ID and report the old device as 'UNAVAIL'. Only by actually dd-ing the raw data from the first to the second drive you should be able to 'fool' ZFS into thinking it was the correct device. Anyways, any such confusion should be avoidable by zapping the zpool.cache.
I
thought this was weird behavior, but maybe it was b/c I had switched
to /dev/disk/by-id before upgrading version, and the old version was
still using "sdc" internally?
  
Yeah sorts of. It shouldn't really matter because, obviously, ZFS looks at the vdevs themselves to see whether it is really the expected disk, accessible, readable _and_ not corrupted...


/var/lib/zfs?  Didn't notice.  I was going to rm it, but it
disappeared when I exported all pools.  Re-importing in any manner
hasn't helped.
  
zpool.cache only exists when pools are imported (and they aren't imported with a temporary altroot or an explicit cachefile; man zpool import -R -c)


8372648596947368235 - in other words, just a string of
digits, I believe 19 digits long.  Using that, I was able to detach
the new drive from the mirror, but not able to reattach it.  I've
never seen such a disk identifier except in some forum post about
zfs.  The post was old, on Solaris, and off-point (somebody was trying
to detach a drive from a raidz).
  
The only way in which I can see this happen is when two devices have duplicate names. Now under POSIX and /dev fs this can't normally happen. I'd be happy if you had the link to that forum post you mentioned



[1]
  pool: canaanpool
 state: ONLINE
 scrub: none requested
config:

    NAME                         STATE     READ WRITE CKSUM
    canaanpool                   ONLINE       0     0     0
      mirror-0                   ONLINE       0     0     0
        /tmp/canaanpool_blk/za1  ONLINE       0     0     0
        /tmp/canaanpool_blk/za2  ONLINE       0     0     0
        /tmp/canaanpool_blk/za3  ONLINE       0     0     0
        /tmp/canaanpool_blk/za4  ONLINE       0     0     0
        /tmp/canaanpool_blk/za5  ONLINE       0     0     0
        /tmp/canaanpool_blk/za6  ONLINE       0     0     0
      mirror-1                   ONLINE       0     0     0
        /tmp/canaanpool_blk/zb1  ONLINE       0     0     0
        /tmp/canaanpool_blk/zb2  ONLINE       0     0     0
        /tmp/canaanpool_blk/zb3  ONLINE       0     0     0
        /tmp/canaanpool_blk/zb4  ONLINE       0     0     0
        /tmp/canaanpool_blk/zb5  ONLINE       0     0     0
        /tmp/canaanpool_blk/zb6  ONLINE       0     0     0
      mirror-2                   ONLINE       0     0     0
        /tmp/canaanpool_blk/zc1  ONLINE       0     0     0
        /tmp/canaanpool_blk/zc2  ONLINE       0     0     0
        /tmp/canaanpool_blk/zc3  ONLINE       0     0     0
        /tmp/canaanpool_blk/zc4  ONLINE       0     0     0
        /tmp/canaanpool_blk/zc5  ONLINE       0     0     0
        /tmp/canaanpool_blk/zc6  ONLINE       0     0     0


errors: No known data errors
NAME         USED  AVAIL  REFER  MOUNTPOINT
canaanpool  88.5K   188G    21K  /tmp/canaanpool

root@lucid:~# zpool detach canaanpool /tmp/canaanpool_blk/za5
root@lucid:~# zpool detach canaanpool mirror-0 /tmp/canaanpool_blk/za3
cannot detach mirror-0: only applicable to mirror and replacing vdevs

Bryan

unread,
Jun 29, 2010, 6:45:38 PM6/29/10
to zfs-fuse
root@storage:~# zpool detach canaanpool mirror-0 sdd1
cannot detach mirror-0: only applicable to mirror and replacing vdevs
root@storage:~# zpool detach canaanpool mirror-0 sdd
cannot detach mirror-0: only applicable to mirror and replacing vdevs

I had previously tried this - zfs documentation never mentions
specifying which mirror to detach from, and attach takes the existing
device name as the first argument to create a mirror (as it should be,
since the devices in any single pool will necessarily have unique
names, you should never need to specify a mirror set by name).

I believe the partition tables may be the real problem, and I will be
experimenting with changing that aspect next.

Bryan

unread,
Jun 29, 2010, 7:45:13 PM6/29/10
to zfs-fuse

> Yes this could be problematic. See http://zfs-fuse.net/issues/50. I'll
> link to that one from the pool import/discovery FAQ item
> I had /some/ success with kpartx, but I'd be sure to have backups.
> Personally, I'd first replicate the pool onto disks with regular IBM/DOS
> FAT tables.

I've looked at that issue briefly before. However, I have never had
difficulty importing a pool created in Nexenta - I suppose I haven't
tried with one created in OpenSolaris. I used the pool and scrubbed
it regularly for a few months before the one disk started failing. In
fact, I might have destroyed and recreated the pool, copying the data
back to it, when I switched to Ubuntu - many months ago. The drives
definitely retain their partitions from Nexenta or possibly even
OpenSolaris since I used it briefly.

One thing I plan to try is allowing gparted to fix the partitions,
after I get another drive detached and create another mirror.


> > d.  After I had upgraded to 0.6.9, pulled the failing disk, and
> > rebooted, another unused disk automatically replaced the failed disk
> > (as if I had physically replaced the disk).  I'm sure this occurred b/
> > c the old disk had been sdc and the new disk was given the sdc label
> > (even though I had already switched to using /dev/disk/by-id).  
>
> This should _not_ happen. ZFS will note the absense of the _correct_
> disklabel and ID and report the old device as 'UNAVAIL'. Only by
> actually dd-ing the raw data from the first to the second drive you
> should be able to 'fool' ZFS into thinking it was the correct device.
> Anyways, any such confusion should be avoidable by zapping the zpool.cache.

This definitely happened, no dd-ing involved. The zpool.cache was not
zapped, but the new drive was reassigned to the sdc disk label
(corrupted sdc pulled). And this is when the numeric identifier was
given to the disk.

> > 8372648596947368235 - in other words, just a string of
> > digits, I believe 19 digits long.  Using that, I was able to detach
> > the new drive from the mirror, but not able to reattach it.  I've
> > never seen such a disk identifier except in some forum post about
> > zfs.  The post was old, on Solaris, and off-point (somebody was trying
> > to detach a drive from a raidz).
>
> The only way in which I can see this happen is when two devices have
> duplicate names. Now under POSIX and /dev fs this can't normally happen.
> I'd be happy if you had the link to that forum post you mentioned

This is exactly what happened - the new drive was given the name sdc
upon reboot after pulling the old drive - thus, duplicate names and
the numeric identifier. At least the numeric identifier worked to
detach it from the pool (I didn't give it a chance to resilver).

Aha, I found it:

root@storage:/dev# zpool status
pool: canaanpool
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas
exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: none requested
config:

NAME
STATE READ WRITE CKSUM
canaanpool
DEGRADED 0 0 0
mirror-0
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0K4W8B-part1
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0PDPDB-part1
ONLINE 0 0 0
mirror-1
DEGRADED 0 0 0
17792451104132812401
UNAVAIL 0 0 0 was /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0R858B-part1
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0N6U3B-part1
ONLINE 0 0 0

errors: No known data errors
root@storage:/dev# zpool detach canaanpool 17792451104132812401
root@storage:/dev# zpool status
pool: canaanpool
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool
can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done,
the
pool will no longer be accessible on older software versions.
scrub: none requested
config:

NAME
STATE READ WRITE CKSUM
canaanpool
ONLINE 0 0 0
mirror-0
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0K4W8B-part1
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0PDPDB-part1
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0N6U3B-part1
Yeah, any non-raidz pool is essentially a raid0 of vdevs (resulting in
a raid10 if mirroring is involved), mirror-0 would be one of the vdevs
in the raid10, and detaching or removing it would result in complete
data loss. But I'm not sure why you brought that up.

I believe I'm going to export the pool, re-partition sdd, create a new
pool from it, replicate to it, and then try letting gparted repair the
partition table on the original pool. Maybe then I can attach
something new to it to see if the problem is resolved. ZFS is
supposed to be endian-agnostic, but if the kernel is having trouble
with the partition table, it could adversely affect ZFS ability to
identify partitions correctly. This is my current guess, but it'll
take a little time.

I'll try to document all this when I'm done.

sgheeren

unread,
Jun 29, 2010, 7:51:13 PM6/29/10
to zfs-...@googlegroups.com
On 06/30/2010 01:45 AM, Bryan wrote:
> This definitely happened, no dd-ing involved. The zpool.cache was not
> zapped, but the new drive was reassigned to the sdc disk label
> (corrupted sdc pulled). And this is when the numeric identifier was
> given to the disk.
>
Come to think of it this makes sense: if zpool 'knows' that a device
'should be there' (formerly named sdc, but not available right now
because you pulled it :)) then it makes sense that zpool status cannot
display a normal device node name. Instead it will have to show the
internal ID. So that actually goes to show that the other disk did _not_
magically replace the failing device, because that would not have been a
reason to show the internal ID ...

Does that make sense?

Now.... if only we could understand why the b***rd won't detach the
other vdev ...

sgheeren

unread,
Jun 29, 2010, 7:57:54 PM6/29/10
to zfs-...@googlegroups.com
On 06/30/2010 01:45 AM, Bryan wrote:
> mirror-1
> DEGRADED 0 0 0
> 17792451104132812401
> UNAVAIL 0 0 0 was /dev/disk/by-id/ata-
> Hitachi_HDT721010SLA360_STF604MH0R858B-part1
>
I should have kept reading: this illustrates my latest point. It clearly
sais "there OUGHT to be a disk with internal ID 17792451104132812401
which I _formerly_ knew as was
/dev/disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0R858B-part1

This would actually give me hope that you should be able to similarly
detach the other vdev by physically removing it first...?
You might, of course, zap the zpool.cache, create a /dev/disk/subset
folder containing just symlinks to the devices you want zpool to notice
and then

zpool import -d /dev/disk/subset poolname

That should have the same effect as yanking the disk but without the
physical exercise and risk of damaging your harddrive in the process :)

> But I'm not sure why you brought that up.

Well, like I promised I was going to _test_ my first idea on how to
change the syntax on detach. I was simply showing [1] that I was wrong :)

[1] as a footnote, as well


sgheeren

unread,
Jun 29, 2010, 8:04:05 PM6/29/10
to zfs-...@googlegroups.com
On 06/30/2010 01:45 AM, Bryan wrote:
>> The only way in which I can see this happen is when two devices have
>> > duplicate names. Now under POSIX and /dev fs this can't normally happen.
>> > I'd be happy if you had the link to that forum post you mentioned
>>
> This is exactly what happened - the new drive was given the name sdc
> upon reboot after pulling the old drive - thus, duplicate names and
> the numeric identifier. At least the numeric identifier worked to
> detach it from the pool (I didn't give it a chance to resilver).
>
Erm... I'm tired: I'm forgetting things

I meant duplicate device names _simultaneously visible_. This is not
technically feasible and therefore not what happened. Besides, we have
found a perfectly reasonable explanation for the long numeric IDs
appearing in my other response, based on the actual output of zpool that
you had saved (bravo!).

I was fantasizing along the lines of having two identically-named pools
visible at one given time. Never mind

Bryan

unread,
Jun 30, 2010, 10:36:01 AM6/30/10
to zfs-fuse
After gnu parted failed to fix partitions it was having trouble with,
I used gparted to re-partition one of the mirrored drives in the pool
(sdb or STF604MH0K4W8B-part1), using GPT and making a single
unformatted partition. This resulted in:

root@storage:/dev/disk/by-id# zpool status
pool: canaanpool
state: ONLINE
status: One or more devices could not be used because the label is
missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: none requested
config:

NAME
STATE READ WRITE CKSUM
canaanpool
ONLINE 0 0 0
mirror-0
ONLINE 0 0 0
18047331280846060305
UNAVAIL 0 0 0 was /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0K4W8B-part1
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0PDPDB-part1
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0N6U3B-part1
ONLINE 0 0 0

errors: No known data errors

root@storage:/dev/disk/by-id# zpool replace canaanpool
18047331280846060305 /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0K4W8B-part1
root@storage:/dev/disk/by-id# zpool status
pool: canaanpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool
will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h0m, 0.24% done, 1h35m to go
config:

NAME
STATE READ WRITE CKSUM
canaanpool
DEGRADED 0 0 0
mirror-0
DEGRADED 0 0 0
replacing-0
DEGRADED 0 0 0
18047331280846060305
UNAVAIL 0 0 0 was /dev/disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0K4W8B-part1/old
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0K4W8B-part1
ONLINE 0 0 0 707M resilvered
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0PDPDB-part1
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0N6U3B-part1
ONLINE 0 0 0

errors: No known data errors

root@storage:~# zpool status
pool: canaanpool
state: ONLINE
scrub: resilver completed after 1h19m with 0 errors on Tue Jun 29
23:39:52 2010
config:

NAME
STATE READ WRITE CKSUM
canaanpool
ONLINE 0 0 0
mirror-0
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0K4W8B-part1
ONLINE 0 0 0 284G resilvered
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0PDPDB-part1
ONLINE 0 0 0
disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0N6U3B-part1
ONLINE 0 0 0

errors: No known data errors

That was a very fast resilver - approximately 60MBps. zpool iostat
reported write speed to be very very low. I believe the resilver
retained the data on the disk and basically verified checksums.
Interesting.

Unfortunately:

root@storage:~# zpool detach canaanpool disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDB-part1
cannot detach disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0PDPDB-
part1: no such device in pool
root@storage:~# zpool offline canaanpool disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0PDPDB-part1
cannot offline disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0PDPDB-
part1: no such device in pool
root@storage:~# zpool offline canaanpool disk/by-id/ata-
Hitachi_HDT721010SLA360_STF604MH0K4W8B-part1
cannot offline disk/by-id/ata-Hitachi_HDT721010SLA360_STF604MH0K4W8B-
part1: no such device in pool

The problem is not fixed at all. But gnu parted no longer chokes on
the partition table. I guess I should try just using msdos partition
table. I would think gpt would pose no problem, support is compiled
into the kernel.

Any other ideas?

sgheeren

unread,
Jun 30, 2010, 11:37:38 AM6/30/10
to zfs-...@googlegroups.com
On 06/30/2010 04:36 PM, Bryan wrote:
That was a very fast resilver - approximately 60MBps.  zpool iostat
reported write speed to be very very low.  I believe the resilver
retained the data on the disk and basically verified checksums.
Interesting.
  
most interesting


The problem is not fixed at all.  But gnu parted no longer chokes on
the partition table. 
By now I'd file a bug. File it at http://zfs-fuse.net/issues

Perhaps you could build a debug=2 build and run 'zfs-fuse -n |& tee zfs-fuse.log' in the foreground. There might be an issue with the long device names.
Did you retry with zapped zpool.cache and imported from -d /dev ? (so the long names are gone?) Perhaps an extra export/import is in order to make sure all the disklabels got rewritten.[1]

[1] this is paranoia, because I know the code is quite vigilant to always synch layout changes to all labels...


The problem is not fixed at all.  But gnu parted no longer chokes on
the partition table.  I guess I should try just using msdos partition
table.  I would think gpt would pose no problem, support is compiled
into the kernel.
  
While that may be true, there are different flavours/interpretations/usage patterns to it that have caused trouble in the past. I'd need to google them up, and I don't have the time now.

Reply all
Reply to author
Forward
0 new messages