repeated kernel panics on lion, 7.41 & also 7.42

91 views
Skip to first unread message

Dave Cottlehuber

unread,
Sep 9, 2012, 7:46:34 PM9/9/12
to zfs-...@googlegroups.com
Hi,

I'm getting frequent (every 2-3 minutes sometimes) KPs --
debilitating, only started last week after trying to install XCode
4.4. I had to disable ZFS to do this, & now turned ZFs back on again.
May or may not have anything to do with 74.2 upgrade, I have moved
back to 74.1 and still have same issues. OSX Lion, SSD only (all my
other bits disconnected atm).
pool scrub comes clean under maczfs, I'll do a smartos or omnios tomorrow.

grep of all KPs: https://friendpaste.com/5gFQw9wPBFOgNjue6495YW
panic-decode didn't, nor did gdb reveal anything more interesting than
"add %al,(%eax)" to me.

## kit

OSX: Software Mac OS X Lion 10.7.4 (11E53),
HW is MBP Feb 2011 15" screen.
uname: Darwin akai.local 11.4.0 Darwin Kernel Version 11.4.0: Mon Apr
9 19:32:15 PDT 2012; root:xnu-1699.26.8~1/RELEASE_X86_64 x86_64

## zpool list
pool: tub
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
tub ONLINE 0 0 0
disk0s4 ONLINE 0 0 0

errors: No known data errors

## zoink

ZFS footprint: 53M used, 53M peak, 123M goal 17 threads
ARC footprint: 28M used, 28M peak, 682M goal

obj slab active total peak total
kmem_cache name size size objs objs objs mem
-----------------------------------------------------------------------------
kmem_magazine_1 16 4096 137 252 252 3K
kmem_magazine_3 32 4096 563 630 630 19K
kmem_magazine_7 64 4096 3 126 126 7K
kmem_magazine_15 128 4096 2 31 31 3K
kmem_slab_cache 56 4096 2041 2088 2088 114K
kmem_bufctl_cache 24 4096 6927 7056 7056 165K
taskq_ent_cache 96 4096 342 378 378 35K
taskq_cache 296 4096 13 13 13 3K
zfs_znode_cache 336 32768 1430 1552 1552 509K
zio_bufs 664 664 4096 2178 2768 2768 342K
zio_bufs 512 512 32768 2226 2304 2304 1120K
zio_bufs 1024 1024 16384 103 528 528 512K
zio_bufs 1536 1536 12288 38 48 48 36K
zio_bufs 2048 2048 4096 30 38 38 28K
zio_bufs 2560 2560 20480 17 24 24 40K
zio_bufs 3072 3072 12288 19 28 28 48K
zio_bufs 3584 3584 28672 12 48 48 140K
zio_bufs 4096 4096 4096 24 34 34 92K
zio_bufs 5120 5120 20480 13 20 20 40K
zio_bufs 6144 6144 12288 20 22 22 48K
zio_bufs 7168 7168 28672 11 16 16 28K
zio_bufs 8192 8192 8192 8 10 10 24K
zio_bufs 10240 10240 20480 21 24 24 40K
zio_bufs 12288 12288 12288 13 15 15 36K
zio_bufs 14336 14336 28672 13 30 30 252K
zio_bufs 16384 16384 16384 944 946 946 15024K
zio_bufs 20480 20480 20480 21 22 22 100K
zio_bufs 24576 24576 24576 8 9 9 48K
zio_bufs 28672 28672 28672 9 10 10 28K
zio_bufs 32768 32768 32768 7 9 9 64K
zio_bufs 36864 36864 36864 4 5 5 36K
zio_bufs 40960 40960 40960 3 4 4 40K
zio_bufs 45056 45056 45056 2 3 3 44K
zio_bufs 49152 49152 49152 5 6 6 96K
zio_bufs 53248 53248 53248 2 3 3 52K
zio_bufs 57344 57344 57344 2 3 3 56K
zio_bufs 65536 65536 65536 162 163 163 10368K
zio_bufs 69632 69632 69632 2 3 3 68K
zio_bufs 73728 73728 73728 1 2 2 72K
zio_bufs 114688 114688 114688 0 2 2 224K
zio_bufs 131072 131072 131072 52 52 52 384K
dmu_buf_impl_t 320 32768 3371 3468 3468 1083K
dnode_t 960 4096 1961 1964 1964 1841K
arc_buf_hdr_t 272 8192 2111 2117 2117 562K
arc_buf_t 40 4096 1907 1919 1919 74K
-----------------------------------------------------------------------------
kmem_cache total: 41M 43M



NAME USED AVAIL REFER MOUNTPOINT
tub 61.2G 21.4G 294K /zfs
tub/builds 17.1M 21.4G 22K /zfs/builds
tub/builds/couchdb 19K 21.4G 19K
/zfs/builds/couchdb
tub/builds/spidermonkey 17.0M 21.4G 17.0M
/zfs/builds/spidermonkey
tub/dropbox 11.7G 21.4G 11.5G /zfs/dropbox
tub/dropbox@20120806 2.05M - 11.1G -
tub/dropbox@20120818-pre-sync 0 - 11.1G -
tub/dropbox@20120818 0 - 11.1G -
tub/dropbox@20120905 163M - 11.6G -
tub/fusion 19.1G 21.4G 26K /zfs/fusion
tub/fusion@20120905 23K - 26K -
tub/fusion/debian 385M 21.4G 383M /zfs/fusion/debian
tub/fusion/debian@blank 224K - 230K -
tub/fusion/debian@installed-backports 1.81M - 383M -
tub/fusion/debian@20120905 0 - 383M -
tub/fusion/freebsd 898M 21.4G 896M
/zfs/fusion/freebsd
tub/fusion/freebsd@blank 28K - 896M -
tub/fusion/freebsd@installed 0 - 896M -
tub/fusion/freebsd@20120905 0 - 896M -
tub/fusion/openbsd 328M 21.4G 327M
/zfs/fusion/openbsd
tub/fusion/openbsd@installed 26K - 327M -
tub/fusion/openbsd@20120905 24K - 327M -
tub/fusion/win7 5.71G 21.4G 21K /zfs/fusion/win7
tub/fusion/win7/base 5.71G 21.4G 5.15G
/zfs/fusion/win7/base
tub/fusion/win7/base@installed-no-apps 570M - 4.70G -
tub/fusion/win7/base@20120905 0 - 5.15G -
tub/fusion/win8 11.8G 21.4G 8.39G /zfs/fusion/win8
tub/fusion/win8@installed 3.45G - 3.47G -
tub/fusion/wi...@installed-sdk7.1-vs2012 1.50M - 8.39G -
tub/fusion/win8@20120905 0 - 8.39G -
tub/home 4.85G 21.4G 23K /zfs/home
tub/home@20120905 19K - 23K -
tub/home/dch 4.35G 21.4G 20K /zfs/home/dch
tub/home/dch@20120818 18K - 22K -
tub/home/dch@20120821 18K - 22K -
tub/home/dch@20120905 18K - 21K -
tub/home/dch/repos 4.35G 21.4G 3.81G
/zfs/home/dch/repos
tub/home/dch/repos@20120818 12.0M - 3.24G -
tub/home/dch/repos@20120821 4.35M - 3.24G -
tub/home/dch/repos@20120822 14.7M - 3.24G -
tub/home/dch/repos@20120905 21.4M - 2.84G -
tub/home/dch/repos@20120907 1.41M - 3.81G -
tub/home/veronika 508M 21.4G 508M /zfs/home/veronika
tub/home/veronika@20120806 0 - 508M -
tub/home/veronika@20120818 0 - 508M -
tub/home/veronika@20120905 0 - 508M -
tub/homebrew 502M 21.4G 312M /usr/local
tub/homebrew@20120818 188M - 335M -
tub/homebrew@20120905 1.15M - 311M -
tub/shared 25.0G 21.4G 26K /zfs/shared
tub/shared@20120818 21K - 25K -
tub/shared@20120905 19K - 25K -
tub/shared/games 984M 21.4G 984M /zfs/shared/games
tub/shared/itunes 554M 21.4G 345M /zfs/shared/itunes
tub/shared/itunes@20120818 1.04M - 308M -
tub/shared/itunes@20120905 166M - 551M -
tub/shared/movies 19K 21.4G 19K /zfs/shared/movies
tub/shared/movies@20120818 0 - 19K -
tub/shared/movies@20120905 0 - 19K -
tub/shared/music 23.5G 21.4G 23.5G /zfs/shared/music
tub/shared/music@merged 0 - 23.4G -
tub/shared/music@20120821 0 - 23.4G -
tub/shared/music@20120905 0 - 23.5G -

I've 164K of kernel dumps if these are of use, let me know to whom &
how to send them.

Any tips on getting back to usable again?

A+
Dave

Alex Blewitt

unread,
Sep 9, 2012, 7:57:30 PM9/9/12
to zfs-...@googlegroups.com, zfs-...@googlegroups.com
On 10 Sep 2012, at 00:46, Dave Cottlehuber <d...@jsonified.com> wrote:

> NAME STATE READ WRITE CKSUM
> tub ONLINE 0 0 0
> disk0s4 ONLINE 0 0 0

You have a single disk for your ZFS filesystem. If that disk experiences problems this will trigger a KP. It may be on the way out.

If it's an external disk you might want to check the power supply and connection as these can fluctuate causing momentary disconnects, which zfs doesn't like (especially if they are USB disks).

Alex

Jason

unread,
Sep 9, 2012, 7:59:47 PM9/9/12
to zfs-...@googlegroups.com
What Alex said in spades.

Jason
Sent from my iPad
> --
>
>
>

Björn Kahl

unread,
Sep 9, 2012, 8:07:18 PM9/9/12
to zfs-...@googlegroups.com

Hi Dave,

sorry for the bad experience. Usually maczfs is stable and don't
panic without reason.

See more comments inline.

Am 10.09.12 01:46, schrieb Dave Cottlehuber:
> Hi,
>
> I'm getting frequent (every 2-3 minutes sometimes) KPs --
> debilitating, only started last week after trying to install XCode
> 4.4. I had to disable ZFS to do this, & now turned ZFs back on again.
> May or may not have anything to do with 74.2 upgrade, I have moved
> back to 74.1 and still have same issues. OSX Lion, SSD only (all my
> other bits disconnected atm).

Not clear to me what you did.

Are you using the installer packages from www.maczfs.org (or the
code.google.com equivalent), or are you using a self-compiled version?

Do you had working MacZFS installation before, or is this your first
try? If you had a working one, which installer packet was it, or was
it compiled from source?

Have you installed other software (beside the mentioned XCode 4.4)
before the kernel panics started?


> pool scrub comes clean under maczfs, I'll do a smartos or omnios tomorrow.

How can you scrub, if you get a panic every 2-3 minutes as you wrote
above?


> grep of all KPs: https://friendpaste.com/5gFQw9wPBFOgNjue6495YW
> panic-decode didn't, nor did gdb reveal anything more interesting than
> "add %al,(%eax)" to me.
>
> ## kit
>
> OSX: Software Mac OS X Lion 10.7.4 (11E53),
> HW is MBP Feb 2011 15" screen.
> uname: Darwin akai.local 11.4.0 Darwin Kernel Version 11.4.0: Mon Apr
> 9 19:32:15 PDT 2012; root:xnu-1699.26.8~1/RELEASE_X86_64 x86_64
>
> ## zpool list
> pool: tub
> state: ONLINE
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> tub ONLINE 0 0 0
> disk0s4 ONLINE 0 0 0
>
> errors: No known data errors

I think you mixed "zpool status" and "zpool list" above. Can you also
provide "zpool list" output, for completeness?
Please first check your drives data and power connectivity. Instable
connections, or a dying drive can cause kernel panics.

If you are sure your drive is fine, then please open an issue in our
issue tracker and attach two or three representative dumps. Make sure
the attached dumps are those that have the MacZFS kext in their
backtrace.

Alternative, you can send me two or three such dumps by mail.

> Any tips on getting back to usable again?

As said, one possible cause is a drive which produces IO errors.
MacZFS will panic in case of drive instabilities. Make sure your
drive is securely connected and powered. Besides that guess,
unfortunately not yet.


Best regards

Björn

--
| Bjoern Kahl +++ Siegburg +++ Germany |
| "googlelogin@-my-domain-" +++ www.bjoern-kahl.de |
| Languages: German, English, Ancient Latin (a bit :-)) |

signature.asc

Dave Cottlehuber

unread,
Sep 10, 2012, 3:11:58 AM9/10/12
to zfs-...@googlegroups.com
On 10 September 2012 02:07, Björn Kahl <googl...@bjoern-kahl.de> wrote:
>
> Hi Dave,
>
> sorry for the bad experience. Usually maczfs is stable and don't
> panic without reason.

Yes, it's been great last 2 months, very suddenly started this last week.

Many thanks to all for their advice so far.

> See more comments inline.
>
> Am 10.09.12 01:46, schrieb Dave Cottlehuber:
>> Hi,
>>
>> I'm getting frequent (every 2-3 minutes sometimes) KPs --
>> debilitating, only started last week after trying to install XCode
>> 4.4. I had to disable ZFS to do this, & now turned ZFs back on again.
>> May or may not have anything to do with 74.2 upgrade, I have moved
>> back to 74.1 and still have same issues. OSX Lion, SSD only (all my
>> other bits disconnected atm).
>
> Not clear to me what you did.

I'm not compiling myself, gime me some time though!

- Install ZFS 74.1 some time ago, all good. Heavy use, running
concurrent VMs and lots of git/compilation. Regular scrubbing like a
pirate swabbing the decks!

- Install XCode from app store. After download & during install it
KPs. Repeat 10x, notice pattern

- Deinstall MacZFS (mv /System/Library{/Filesystem,Extensions}/zfs.*
/tmp) & reboot

- Run disk utility and check system volume.

- Install XCode, no crashes

- Reboot, install MacZFS 74.2 & reboot again

- zpool scrub comes up clean

- run things like find / , or locate.updatedb & get instant panic

So if I don't access the ZFS partitions, life's good. But anything
that touches the filesystem & we are down for the count.


> Are you using the installer packages from www.maczfs.org (or the
> code.google.com equivalent), or are you using a self-compiled version?
>
> Do you had working MacZFS installation before, or is this your first
> try? If you had a working one, which installer packet was it, or was
> it compiled from source?
>
> Have you installed other software (beside the mentioned XCode 4.4)
> before the kernel panics started?
>
>
>> pool scrub comes clean under maczfs, I'll do a smartos or omnios tomorrow.
>
> How can you scrub, if you get a panic every 2-3 minutes as you wrote
> above?

By using an admin-only account with all my usual zsh and git goodies disabled.

>> grep of all KPs: https://friendpaste.com/5gFQw9wPBFOgNjue6495YW
>> panic-decode didn't, nor did gdb reveal anything more interesting than
>> "add %al,(%eax)" to me.
>>
>> ## kit
>>
>> OSX: Software Mac OS X Lion 10.7.4 (11E53),
>> HW is MBP Feb 2011 15" screen.
>> uname: Darwin akai.local 11.4.0 Darwin Kernel Version 11.4.0: Mon Apr
>> 9 19:32:15 PDT 2012; root:xnu-1699.26.8~1/RELEASE_X86_64 x86_64
>>
>> ## zpool list
>> pool: tub
>> state: ONLINE
>> scrub: none requested
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> tub ONLINE 0 0 0
>> disk0s4 ONLINE 0 0 0
>>
>> errors: No known data errors
>
> I think you mixed "zpool status" and "zpool list" above. Can you also
> provide "zpool list" output, for completeness?

# zpool list

NAME SIZE USED AVAIL CAP HEALTH ALTROOT
tub 84G 61.2G 22.8G 72% ONLINE -
Will do after scrub from opensolaris based OS.

> Alternative, you can send me two or three such dumps by mail.
>
>> Any tips on getting back to usable again?
>
> As said, one possible cause is a drive which produces IO errors.
> MacZFS will panic in case of drive instabilities. Make sure your
> drive is securely connected and powered. Besides that guess,
> unfortunately not yet.
>

If there are IO errors, how can I check for them, or how do they
manifest themselves?

If a specific file or block *is* the issue, how could I track it down?
I've got backups galore off system so I'm not too concerned atm, it's
just not being able to use/access files.

A+
Dave

Jason

unread,
Sep 10, 2012, 7:20:42 AM9/10/12
to zfs-...@googlegroups.com
That really sounds more a drive issue. As I've done all that recently on a new SSD without issue. I'd get another SSD and move to it. Then make sure the firmware on the old is uptodate and nuke it, then try everything again.

Jason
Sent from my iPad

> --
>
>
>

Alex Bowden

unread,
Sep 10, 2012, 7:28:51 AM9/10/12
to zfs-...@googlegroups.com

So what if its a drive issue?

If he's not booted of it, and not paging to it, then failure of a single disk zfs filesystem may clearly cause loss of data (though hopefully not if its a power glitch), but it should not cause a kernel panic. That is an absurd bug to pass off as if it were some sort of obviously absurd user expectation or error.

And for what it's worth that does not cause ZFS on Solaris to panic.

Alex
> --
>
>
>

Alex Blewitt

unread,
Sep 10, 2012, 4:06:16 PM9/10/12
to zfs-...@googlegroups.com
On 10 Sep 2012, at 12:28, Alex Bowden wrote:

> So what if its a drive issue?
>
> If he's not booted of it, and not paging to it, then failure of a single disk zfs filesystem may clearly cause loss of data (though hopefully not if its a power glitch), but it should not cause a kernel panic. That is an absurd bug to pass off as if it were some sort of obviously absurd user expectation or error.
>
> And for what it's worth that does not cause ZFS on Solaris to panic.

The original Solaris implementation used to panic as well. It wasn't until build onnv_120 that the 'failmode' property was introduced, which included the option to not panic upon the sign of failure (and instead, say, switch the ZFS pool to read-only).

I look forward to your patch which takes us from onnv_74 to onnv_120. I'm sure it will be greatly received by everyone!

Thanks,

Alex

Alex Bowden

unread,
Sep 10, 2012, 5:15:08 PM9/10/12
to zfs-...@googlegroups.com

You don't need onnv_120, you just need CR #6322646 from about October 2007, about 2 years before onnv_120

Personally I'm perfectly happy running an up to date ZFS under Solaris under VMware Fusion under MacOS 10.8 where I import the file system back.

I just look in every couple of years to see if zfs-macos shows any sign of ever making progress.

Perhaps I should leave it longer this time?

Alex
> --
>
>
>

Alex Bowden

unread,
Sep 10, 2012, 5:24:47 PM9/10/12
to zfs-...@googlegroups.com

Alex Blewitt

unread,
Sep 10, 2012, 5:32:12 PM9/10/12
to zfs-...@googlegroups.com, zfs-...@googlegroups.com
I'm sorry, I was unable to apply your patch as it was formatted in "random linked e-mail blog post" format. Please feel free to submit a tested pull request against the GitHub repository since you clearly know what the fix should be.

Alex

Sent from my iPhone 4S
> --
>
>
>

Richard Elling

unread,
Sep 10, 2012, 6:59:10 PM9/10/12
to zfs-...@googlegroups.com
On Sep 10, 2012, at 1:06 PM, Alex Blewitt <alex.b...@gmail.com> wrote:
The original Solaris implementation used to panic as well. It wasn't until build onnv_120 that the 'failmode' property was introduced, which included the option to not panic upon the sign of failure (and instead, say, switch the ZFS pool to read-only).

I can categorically say that switching a file system to read-only is an extraordinarily bad
thing to do, and ZFS does not do that. The failmode property exists to allow you to make
a reasonably decent choice and there are good case studies for each option.
 -- richard

--
illumos Day & ZFS Day, Oct 1-2, 2012 San Fransisco 








Alex Bowden

unread,
Sep 10, 2012, 7:12:18 PM9/10/12
to zfs-...@googlegroups.com
True,  more realistically it allows a continue option that returns an IO error on non system critical IO rather than panicking on system critical IO.

I think Alex just got the wrong end of the concept. 

Alex

--
 
 
 

Dave Cottlehuber

unread,
Sep 11, 2012, 2:27:02 AM9/11/12
to zfs-...@googlegroups.com
On 10 September 2012 13:20, Jason <jason...@belecmartin.com> wrote:
> That really sounds more a drive issue. As I've done all that recently on a new SSD without issue. I'd get another SSD and move to it. Then make sure the firmware on the old is uptodate and nuke it, then try everything again.
>
> Jason

Scrub in freebsd was fine, I couldn't get omniOS or SmartOS to boot on
my mac tho. XCode installs as expected just fine on my other mac with
tin disk.

Other than the "it's probably not ZFS but the drive", is there any
actual way to identify what's failing? In particular I'm interested
to learn why a zpool scrub should succeed but not other operations.
Does the whole disk not get read during scrub?

Given I've got a full set of snaps on a different host, would it make
sense to zap this partition & then try to restore it?

I'm a little reluctant to shell out 300+ euros for another SSD without
something a little more concrete to go on :-)

A+
Dave

Jason

unread,
Sep 11, 2012, 6:51:00 AM9/11/12
to zfs-...@googlegroups.com
If you have a backup, yes zap and restore to see if things are rosy. Once zapped, check for firmware updates for the SSD and apply, reformat, restore and see if your good. Some SSDs have issues, but almost all can and have been fixed with firmware, it just depends on what really might be wrong and the SSD may actually need to be RMA'd.

Jason
Sent from my iPad

> --
>
>
>

Graham Perrin

unread,
Nov 19, 2012, 10:57:25 PM11/19/12
to zfs-...@googlegroups.com
On Tuesday, 11 September 2012 07:27:03 UTC+1, dch wrote:

… Does the whole disk not get read during scrub? … 

A complete scrub examines all data that can be checksummed, excluding parts of the disk where no such data is stored. 
Reply all
Reply to author
Forward
0 new messages