checksum errors not reported or fixed until scrub is run

wcoenen

unread,

Jan 25, 2010, 8:55:05 PM1/25/10

to zfs-fuse

Hi,

I just did an experiment to see what happens if a corruption happens
on one disk of a mirror vdev. I have noticed that while ZFS will pick
the healthy disk when reading data, it will do so without repairing or
even reporting errors. The only way to make ZFS do either is to start
a scrub. Is this normal? I would expect such errors to be at least
shown by zpool status after reading the file.

Note that the file is too big to fit in this system's memory. I also
tried to reboot after the corruption to make 100% sure that nothing
was cached.

Here are the details of my experiment:

# create 1GB test pool with mirror vdev
zpool create testpool mirror /dev/sdc1 /dev/sdc2
zfs create testpool/testfs

# generate test file, print sha1sum fingerprint
cd /testpool/testfs
dd if=/dev/urandom of=testfile bs=1M count=920
sha1sum testfile

# generate corruption on /dev/sdc1
/home/wim/corruption.py

# reread file, checksum is OK
sha1sum testfile

# shows NO errors detected or repaired, except after a scrub
zpool status testpool

This is the (dangerous!) corruption.py script I used to generate a
corruption every 10MB:

#!/usr/bin/python2.5
openedDevice = open('/dev/sdc1', 'w+b')
interval = 10000000
while (True):
openedDevice.seek(interval,1)
print str(openedDevice.tell())
openedDevice.write('corrupt')

Julius Roberts

unread,

Jan 26, 2010, 4:26:38 AM1/26/10

to zfs-...@googlegroups.com

Hi,

yes this is normal and this is how zfs also behaves under solaris; i'm
not sure if it's considered a bug however, it may well behave like
this intentionally.

i recently performed a live demo (on open solaris) which consisted in
part of demonstrating just this point, that hardware failure or other
corruption on a device isn't reported until *after* a scrub has been
performed:

### Demo repair zpool ###
mkfile 65m ~/zfsdemo/file1 ~/zfsdemo/file2 ~/zfsdemo/file3
~/zfsdemo/file4 ~/zfsdemo/file5 ~/zfsdemo/file6
sudo zpool create testpool raidz1 ~/zfsdemo/file1 ~/zfsdemo/file2
~/zfsdemo/file3 ~/zfsdemo/file4 ~/zfsdemo/file5
sudo zpool status testpool
dd if=/dev/random of=~/zfsdemo/file3 bs=1024 count=100
sudo zpool status testpool <- no corruption shown here.
sudo zpool scrub testpool
sudo zpool status testpool <- corruption is shown correctly here.
sudo zpool replace testpool ~/zfsdemo/file3 ~/zfsdemo/file6
sudo zpool status testpool

2010/1/26 wcoenen <wco...@gmail.com>:

> Hi,
>
> I just did an experiment to see what happens if a corruption happens
> on one disk of a mirror vdev. I have noticed that while ZFS will pick
> the healthy disk when reading data, it will do so without repairing or
> even reporting errors. The only way to make ZFS do either is to start
> a scrub. Is this normal? I would expect such errors to be at least
> shown by zpool status after reading the file.

--
Kind regards, Jules

Zen left me, then I remembered, nothing to forget.

sgheeren

unread,

Jan 26, 2010, 4:42:45 AM1/26/10

to zfs-...@googlegroups.com

Julius Roberts wrote:
> Hi,
>
> yes this is normal and this is how zfs also behaves under solaris; i'm
> not sure if it's considered a bug however, it may well behave like
> this intentionally.
>
> i recently performed a live demo (on open solaris) which consisted in
> part of demonstrating just this point, that hardware failure or other
> corruption on a device isn't reported until *after* a scrub has been
> performed:
>

Note the disparity with this well-known ZFS self-heal demo on youtube:

http://www.youtube.com/watch?v=VlFGTtU65Xo [fiddle for highquality and
fullscreen, so you can read]

I know, this is the BSD port, but it shouldn't be different?

[note that after a scrub, a lot _more_ errors had been reported. could
it be that the errors reported _before_ scrub are limited to metadata
corruption as opposed to object data corruption? I was generally
assuming that ZFS really did not treat it's metadata any different from
the contents itself?]

Emmanuel Anne

unread,

Jan 26, 2010, 5:19:16 AM1/26/10

to zfs-...@googlegroups.com

In the video they also export / import the pool after the corruption, it might help to update its state...

2010/1/26 sgheeren <sghe...@hotmail.com>

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

--
zfs-fuse git repository : http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

sgheeren

unread,

Jan 26, 2010, 5:21:17 AM1/26/10

to zfs-...@googlegroups.com

Emmanuel Anne wrote:

In the video they also export / import the pool after the corruption, it might help to update its state...

I don't follow. Are you suggesting, that ZFS detects the errors early, but stores them "somewhere else" only to update the counts at certain times? This is definitely not my expectation

Emmanuel Anne

unread,

Jan 26, 2010, 5:26:06 AM1/26/10

to zfs-...@googlegroups.com

Not sure, but exporting/importing the pool definetely updates the counters.

2010/1/26 sgheeren <sghe...@hotmail.com>

Emmanuel Anne wrote:
In the video they also export / import the pool after the corruption, it might help to update its state...

I don't follow. Are you suggesting, that ZFS detects the errors early, but stores them "somewhere else" only to update the counts at certain times? This is definitely not my expectation

--

To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Reply all

Reply to author

Forward