A box of mine running RELENG_7_0 and ZFS over a couple of disks (6
disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T:
load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]
0.02u 0.04s 0% 3404k
load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]
0.02u 0.04s 0% 3404k
load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]
0.02u 0.04s 0% 3404k
load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]
0.02u 0.04s 0% 3404k
load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]
0.02u 0.04s 0% 3404k
Worked for a while then that stopped working too (was over ssh). When
trying a local login i only got
load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k
I found one post like this earlier (by Xin LI), but nobody seemed to
have replied...
in my current conf, I think my kmem/kmem_max is at 512Mb (not sure
though, since I've edited my file yesterday for next reboot), with 2G
of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M.
currently it is at default), but since I just got back to 2G total mem
after some hardware problems I've been runnig at those lows (1G total
is kindof tight with zfs..)
Well, just wanted to report... The box is not totally dead yet, ie I
can still do Ctrl-T on console, but thats it.. I don't really know
what more I can do so.. I don't have KDB/DDB.
I'll wait another hour or so before I hard reboot it, unless it
"unlocks" or if anyone have any suggestions.
Thanks
--
Johan Ström
Stromnet
jo...@stromnet.se
http://www.stromnet.se/
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
I don't think there are any suggestions left to give. Many people,
including myself, have experienced this kind of problem. It's well-
documented both on my Common Issues page, and the official FreeBSD ZFS
Wiki.
ZFS is still considered highly experimental, so if your data is at all
important to you, perform backups or switch to another filesystem
provider.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
The key is to increase your kmem and prevent it from being exhausted. I
think more recent OpenSolaris's ZFS code has some improvements but I do
not have spare devices at hand to test and debug :(
Maybe pjd@ would get a new import at some point? I have cc'ed him.
Cheers,
--
Xin LI <del...@delphij.net> http://www.delphij.net/
FreeBSD - The Power to Serve!
Ah.. I guess I was just to restrictive with the googling on
"zfs:&buf_hash_table.ht_locks[i].ht_lock".
>
>
> ZFS is still considered highly experimental, so if your data is at all
> important to you, perform backups or switch to another filesystem
> provider.
That I am aware of.
Thanks._______________________________________________
This situation is not recoverable and you can trust ZFS that you will
not lose data if they are already sync'ed.
Yep, never had the problem when I was running with 2G total mem, but
then one stick (damn consumer crap) failed and I was left with 1G, and
I started to have random problems. Going to tune kmem back up now when
I got more mem again, thinking about putting in 4G too..
>
>
> Maybe pjd@ would get a new import at some point? I have cc'ed him.
>
> Cheers,
> --
> Xin LI <del...@delphij.net> http://www.delphij.net/
> FreeBSD - The Power to Serve!
>
_______________________________________________
> For your question: just reboot would be fine, you may want to tune
> your arc size (to be smaller) and kmem space (to be larger), which
> would reduce the chance that this would happen, or eliminate it,
> depending on your workload.
Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are
those reasonable on a 2G machine? I think I've read that from
somewhere, but cannot find that (arc at least) in the TuningGuide now.
>
> This situation is not recoverable and you can trust ZFS that you
> will not lose data if they are already sync'ed.
>
Actually, I've had a lot of hard crashes lately on this machine (bad
hw) but not a single time I have lost data (to my knowledge at
least...). In that regard, comparing to UFS, ZFS is waaay better! :)
> --
> Xin LI <del...@delphij.net> http://www.delphij.net/
> FreeBSD - The Power to Serve!
>
_______________________________________________
Depending on your work load you are just buying more time, so
"reasonable" is a matter of perspective. :( I didn't see if you said
you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64
regardless of how much memory you have. If 512M arcsize crashes too soon
for your tastes you can always lower it down to 256M, or 128M, etc.
I tried for several weeks to get ZFS stable on a 64bit system with a
1.5G kernel. The best uptime I ever got was 72 hours, the worst was 2,
the average about 24. Interestingly, most of the hangs were at off
hours, when the system was lightly loaded, had lots of free memory, etc.
That suggests to me a slow leak of some sort.
Anyway, ZFS is not ready for production. Some people may get lucky, but
you can't count on it.
Spike
----- Mensaje original ----
De: Spike Ilacqua <sp...@indra.com>
Para: Ender <en...@enderzone.com>
CC: freeb...@freebsd.org; freebsd...@freebsd.org; Johan Ström <jo...@headweb.com>
Enviado: martes, 8 de abril, 2008 18:13:32
Asunto: Re: ZFS deadlock
______________________________________________
¿Con Mascota por primera vez? Sé un mejor Amigo. Entra en Yahoo! Respuestas http://es.answers.yahoo.com/info/welcome
info on my system below if anyones interested.
Vince
(20:12:28 </usr/home/jhary>) 0 $ more /boot/loader.conf
geom_mirror_load=YES
vm.kmem_size="768M"
vm.kmem_size_max="768M"
snd_emu10k1_load=YES
jhary@crab
(20:12:39 </usr/home/jhary>) 0 $ uptime
8:12PM up 13 days, 19:16, 5 users, load averages: 1.21, 0.86, 0.44
jhary@crab
(20:12:50 </usr/home/jhary>) 0 $ zfs list
NAME USED AVAIL REFER MOUNTPOINT
data 164G 64.8G 18K /data
data/usr 163G 64.8G 163G /usr
data/var 306M 64.8G 306M /var
jhary@crab
(20:13:00 </usr/home/jhary>) 0 $ zpool status
pool: data
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror ONLINE 0 0 0
ad6s2 ONLINE 0 0 0
ad4s2 ONLINE 0 0 0
errors: No known data errors
relevent bits from dmesg:
CPU: AMD Opteron(tm) Processor 242 (1594.18-MHz K8-class CPU)
Origin = "AuthenticAMD" Id = 0xf5a Stepping = 10
Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow!+,3DNow!>
usable memory = 3210489856 (3061 MB)
avail memory = 3103461376 (2959 MB)
ACPI APIC Table: <A M I OEMAPIC >
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
cpu0 (BSP): APIC ID: 0
cpu1 (AP): APIC ID: 1