Debug kernel panics

43 views
Skip to first unread message

Alex Wasserman

unread,
Jul 27, 2012, 11:19:51 PM7/27/12
to zfs-...@googlegroups.com
So, since building out a new zpool I'm getting frequent kernel panics.

I've had a single 3Tb drive acting as a single ZFS pool for a while - never had a KP.

Now, I have that drive and another 3Tb drive, and another pair of 2Tb drives, both pairs mirrored for a total of 5Tb (3/3 + 2/2).

I created the new pool with the ashift=12 as all the drives are 4K drives.

I see in Disk Utility that the 3Tb drives have a positive Seek Error Rate. The 2Tb drives are fine. I've never had a problem with the existing 3Tb drive before, it's been running for a couple of months.

Do I need to RMA both drives?

A scrub shows no errors.


So, how can I find out what's causing the kernel panics?

Once after a restart I logged into the console (>console) at the login prompt. That shows a ZFS error 16 above the usual textual login prompt. What's that?


Jason

unread,
Jul 28, 2012, 7:48:15 AM7/28/12
to zfs-...@googlegroups.com
Check your power and connections, the usual culprit, then check again.

Jason
Sent from my iPad

GREGG WONDERLY

unread,
Jul 28, 2012, 10:18:29 AM7/28/12
to zfs-...@googlegroups.com
I've been having more and more problems with new drives on my Solaris ZFS installations. I've started using my external USB Sata dock, to hook the devices up to a windows computer with the SeaTools for seagate drives to check them. I've also taken to just running a full format on the drives which are acting up to see if I can correct some sector errors by getting them mapped out. I've had mixed results, and have RMA'd about 5 different drives of different ages (some older for sure) in the past week. I've RMA'd one new Seagate 2TB drive, and I received a replacement 1.5TB WD drive which appears to not be working.

Gregg Wonderly

Jason

unread,
Jul 28, 2012, 11:56:19 AM7/28/12
to zfs-...@googlegroups.com
Did you zero the drives before use? I've had issues in the past, but almost all drives work, just not out of the box. Almost all issues are power related with one's setup, not the drive. I'm not defending drive manufacturer's, just passing on experience. I've had drives not even show up in the sata connectors, but appear in another case connected by Firewire800 or USB which tells me where to start. I zero these ones, twice, reformat, and try again as originally anticipated. Usually works assuming your not having power issues. Hopefully this helps some people. If you are sure of the power and the drive is a dud, by all means send it back. The replacement may have other issues though, as most manufacturer's are going to send you a already reconditioned drive. ;)

Jason
Sent from my iPad

Alex Wasserman

unread,
Jul 28, 2012, 12:44:43 PM7/28/12
to zfs-...@googlegroups.com
I did double check the power and sata connections. Pulled them all out and re-plugged them in again.

SeaTools says the disks are fine, although, I'm still seeing a high error rate.

Just not sure what is actually causing KPs, if I could find that out, it would help a lot.

As for installation, I just ran through the getting started guide - formatted and added to the pool. They're mirrored, so I can always pull a drive out, format it and zero it, then re-add it. Just takes a while as the pool has about 2Tb of data on it, so any scrub, resilver, migration, etc takes hours.

Alex Wasserman

unread,
Jul 28, 2012, 12:47:30 PM7/28/12
to zfs-...@googlegroups.com
Also, what's the best format/zero procedure? Just use Disk Utility to format with security set to a 3-pass zero?

GREGG WONDERLY

unread,
Jul 28, 2012, 1:28:40 PM7/28/12
to zfs-...@googlegroups.com
The early versions of the ZFS error handling would panic anytime that corrupted data was detected, instead of returning data with errors, in certain situations.  If you can just unplug one drive at a time, and not make changes to the file system, when you plug it back in, ZFS should resilver just changed data.  That might let you see, drive by drive, which one might actually be creating the panic.

The high error rate might be the indicator you need to look at.

Gregg

Jason

unread,
Jul 28, 2012, 1:35:55 PM7/28/12
to zfs-...@googlegroups.com
Gregg is correct, this is your next step. 

Jason
Sent from my iPhone

Jason

unread,
Jul 28, 2012, 1:37:44 PM7/28/12
to zfs-...@googlegroups.com
Yup that should do it. The drive should swap its back up bits for any bad ones encountered. 


Jason
Sent from my iPhone

Alex Wasserman

unread,
Jul 28, 2012, 10:57:09 PM7/28/12
to zfs-...@googlegroups.com
The drive should swap its back up bits for any bad ones encountered. 

I ran the full SeaTools repair test today, which is supposed to test block by block and reallocate as necessary. Both drives passed, so I assume I'm good here.

Jason

unread,
Jul 29, 2012, 9:07:54 AM7/29/12
to zfs-...@googlegroups.com
Well in absolutely every single case I have had thus far, about 10, it has been due to power issues. Either bad cable, interference to the cables (sata are not shielded and should not touch), bad power supply, bad power coming into supply, surges, browns, etc, etc.. I've been lucky in narrowing it down in my mad science lab and with clients.

Have you tried each drive in a different enclosure? 

If your still working on the same pool the error may be introduced and you will have to revert to a earlier snapshot, not sure how far back. You can try copying latest stuff written to the drive off to another, sooner or later it will fail, this will tell you where things most likely went wrong.

If your just starting, destroy the pool, make again and test.


Jason
Sent from my iPad

Björn Kahl

unread,
Jul 30, 2012, 12:58:49 PM7/30/12
to zfs-...@googlegroups.com
Am 28.07.12 19:28, schrieb GREGG WONDERLY:
> The early versions of the ZFS error handling would panic anytime that corrupted data was detected, instead of returning data with errors, in certain situations.


While I am quite sure in this case Gregg is right, I'd like to add in
some general advice on debugging and reporting Kernel Panics. Sorry
for kind of hijacking the thread, but the title is just to good.

Every problem report can help to improve mac-ZFS.

However, just saying "my box paniced while...." is of limited used. To
find out what went wrong, we need more information. A Kernel Panic
report should contain at least:

- the type of Mac you are using

- the MacOSX you are using. Not just the code name like "Lion", but
the release, like "Lion 10.7.4"

- the architecture: PPC, i386 or x86_64 (Check Activity Monitor,
process list, then locate "kernel_task", it should say something
like "Intel", "Intel (64-Bit)" or "PPC")

- amount of memory in the box

- the version of mac-ZFS installed

- output of zpool status -v, if you can obtain this without new panic

- output of zfs list, if you can obtain this without new panic

- what you tried to do while the panic happened


Various MacOSX versions produce different kind of panic reports. Run
"Console.app" and look for things like "CrashReport" "DiagnosticReport"
or similar. If you find a file matching the time of your Kernel
Panic, attach it. (You may want to check and clean the content first
using your favorite text editor.) These crash reports have all the
information needed, except the output of zpool and zfs and what you
where doing from above.


These crash reports look like this:
(This one is from tenscomplements ZFS implementation (yes, that one
panics also and actually more often then mac-ZFS).)

-----------------
Interval Since Last Panic Report: 10794501 sec
Panics Since Last Report: 12
Anonymous UUID: 036BFAD2-6CAF-415E-AD5B-6481662E4F9D

Tue Feb 21 01:07:27 2012
panic(cpu 1 caller 0x1ad754a):
"/Volumes/depot/repo/z410/src/uts/common/fs/zfs/arc.c:3270 Z-410
assertion failed:
BP_GET_DEDUP(zio->io_bp)"@/Volumes/depot/repo/z410/src/uts/darwin/os/printf.c:42
Backtrace (CPU 1), Frame : Return Address (4 potential args on stack)
0x83ec3b88 : 0x21b837 (0x5dd7fc 0x83ec3bbc 0x223ce1 0x0)
0x83ec3bd8 : 0x1ad754a (0x1baffb8 0x1bb1158 0xcc6 0x1bb1d68)
0x83ec3bf8 : 0x1ae4fca (0x1bb1d68 0x1bb1158 0xcc6 0x1ad3b91)
0x83ec3c68 : 0x1b98faa (0xb8cec7e8 0xc00 0x0 0x21)
...
-----------------


Best

Björn
--
| Bjoern Kahl +++ Siegburg +++ Germany |
| "googlelogin@-my-domain-" +++ www.bjoern-kahl.de |
| Languages: German, English, Ancient Latin (a bit :-)) |

signature.asc

pub

unread,
Jul 30, 2012, 2:42:38 PM7/30/12
to zfs-...@googlegroups.com
When my machine comes back up after a panic, it offers to "report the crash to apple," with the crash report you mentioned below. I never signed up for a developer account, so I have no idea what Apple does with the reports for third-party software. Does anyone developing MacZFS get those reports? If not, would you like us to email or post them somewhere for later debugging? It might be a good idea to segment them into a separate channel, rather than have them all go to this list.

Björn Kahl

unread,
Jul 30, 2012, 5:42:18 PM7/30/12
to zfs-...@googlegroups.com
Am 30.07.12 20:42, schrieb pub:
> When my machine comes back up after a panic, it offers to "report the crash to apple," with the crash report you mentioned below. I never signed up for a developer account, so I have no idea what Apple does with the reports for third-party software. Does anyone developing MacZFS get those reports? If not, would you like us to email or post them somewhere for later debugging? It might be a good idea to segment them into a separate channel, rather than have them all go to this list.


I'd suspect Apple simply throws these reports away.

So my usual procedure is to "Select All" & "Copy" and then save to a
file for later reference. (Note, in my case, the menu items for
"select all" and "copy" are always grayed out, but luckily the
shortcuts Cmd-A and Cmd-C work nevertheless.)

Regarding where to send these reports, I would think the best idea is
to simply attach the crash report to the mail to this list, if you are
going to report the crash by mail anyway.

But maybe that is something we should run a poll for, to find out what
the list's members preference is.

Of course, if you are going to open an issue report in our bug tracker
at http://code.google.com/p/maczfs/issues/list , then I suggest the
crash report belongs there, and not on the list.


Björn
signature.asc

Alex Wasserman

unread,
Jul 30, 2012, 10:58:56 PM7/30/12
to zfs-...@googlegroups.com
So, the correct thing here is to document the procedure the devs feel would most help with reporting issues.

At the moment there is no template, or other format of information which should be supplied. Provide such a thing, and people can file accurate and helpful bug reports.

- the type of Mac you are using 

Hackintosh Mac Pro (Core i7)


 - the MacOSX you are using.  Not just the code name like "Lion", but 
   the release, like "Lion 10.7.4" 

Lion 10.7.4

 - the architecture: PPC, i386 or x86_64  (Check Activity Monitor, 
   process list, then locate "kernel_task", it should say something 
   like "Intel", "Intel (64-Bit)" or "PPC") 

Intel 64-bit


 - amount of memory in the box 

12Gb Ram


 - the version of mac-ZFS installed 

Current production. 74.


 - output of zpool status -v, if you can obtain this without new panic 

I can't.


 - output of zfs list, if you can obtain this without new panic 

Last login: Mon Jul 30 02:23:45 from crack.home
alex@smiley ~ $ zfs list                                      [ruby-1.9.3-p194]
NAME         USED  AVAIL  REFER  MOUNTPOINT
Data        1.95T  2.53T  1.93T  /Volumes/Data
Data/Users  19.0G  2.53T  19.0G  /Volumes/Data/Users

 - what you tried to do while the panic happened 

Copying files, or running zpool status.

Alex Wasserman

unread,
Jul 30, 2012, 10:59:29 PM7/30/12
to zfs-...@googlegroups.com
Should also note that I checked console and see no crash dumps or kernel panic logs.

Daniel Bethe

unread,
Jul 31, 2012, 1:13:58 AM7/31/12
to zfs-...@googlegroups.com
Whoa, how'd this happen?  I just noticed a new document which just magically sprang up after Bjoern wrote that detailed article to the mailing list:


It's a URL that you can paste to people to help them formulate a problem report for the mailing list.  Hopefully the process of gathering the information will be helpful too.

Alex Wasserman

unread,
Jul 31, 2012, 10:41:19 AM7/31/12
to zfs-...@googlegroups.com, Daniel Bethe
can you magically create other documents by writing up detailed articles?

Cheques? Lottery tickets? Cash?
Reply all
Reply to author
Forward
0 new messages