A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 =20
disks, 3 mirrors) seems to have gotten stuck. =46rom Ctrl-T:
load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20
0.02u 0.04s 0% 3404k
load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20
0.02u 0.04s 0% 3404k
load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20
0.02u 0.04s 0% 3404k
load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20
0.02u 0.04s 0% 3404k
load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20
0.02u 0.04s 0% 3404k
Worked for a while then that stopped working too (was over ssh). When =20=
trying a local login i only got
load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k
I found one post like this earlier (by Xin LI), but nobody seemed to =20
have replied...
in my current conf, I think my kmem/kmem_max is at 512Mb (not sure =20
though, since I've edited my file yesterday for next reboot), with 2G =20=
of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. =20
currently it is at default), but since I just got back to 2G total mem =20=
after some hardware problems I've been runnig at those lows (1G total =20=
is kindof tight with zfs..)
Well, just wanted to report... The box is not totally dead yet, ie I =20
can still do Ctrl-T on console, but thats it.. I don't really know =20
what more I can do so.. I don't have KDB/DDB.
I'll wait another hour or so before I hard reboot it, unless it =20
"unlocks" or if anyone have any suggestions.
Thanks
--
Johan Str=F6m
Stromnet
jo...@stromnet.se
http://www.stromnet.se/
_______________________________________________
freeb...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"
I don't think there are any suggestions left to give. Many people,
including myself, have experienced this kind of problem. It's well-
documented both on my Common Issues page, and the official FreeBSD ZFS
Wiki.
ZFS is still considered highly experimental, so if your data is at all
important to you, perform backups or switch to another filesystem
provider.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
Johan Str=F6m wrote:
> Hello
>=20
> A box of mine running RELENG_7_0 and ZFS over a couple of disks (6=20
> disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T:
>=20
> load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20
> 0.02u 0.04s 0% 3404k
> load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20
> 0.02u 0.04s 0% 3404k
> load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20
> 0.02u 0.04s 0% 3404k
> load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20
> 0.02u 0.04s 0% 3404k
> load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20
> 0.02u 0.04s 0% 3404k
>=20
> Worked for a while then that stopped working too (was over ssh). When=20
> trying a local login i only got
>=20
> load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k
>=20
> I found one post like this earlier (by Xin LI), but nobody seemed to=20
> have replied...
> in my current conf, I think my kmem/kmem_max is at 512Mb (not sure=20
> though, since I've edited my file yesterday for next reboot), with 2G o=
f=20
> system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M.=20
> currently it is at default), but since I just got back to 2G total mem =
> after some hardware problems I've been runnig at those lows (1G total i=
s=20
> kindof tight with zfs..)
>=20
> Well, just wanted to report... The box is not totally dead yet, ie I ca=
n=20
> still do Ctrl-T on console, but thats it.. I don't really know what mor=
e=20
> I can do so.. I don't have KDB/DDB.
> I'll wait another hour or so before I hard reboot it, unless it=20
> "unlocks" or if anyone have any suggestions.
The key is to increase your kmem and prevent it from being exhausted. I =
think more recent OpenSolaris's ZFS code has some improvements but I do=20
not have spare devices at hand to test and debug :(
Maybe pjd@ would get a new import at some point? I have cc'ed him.
Cheers,
--=20
Xin LI <del...@delphij.net> http://www.delphij.net/
FreeBSD - The Power to Serve!
--------------enigE795D5CFBD7AB26F932D8DB3
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFH+yC1OfuToMruuMARCqN0AKCIKKc84mc47mc70QEHXgI3cbIzlACfclIE
OCVHk4KNeYm7i6JdbM+7dkI=
=yO3F
-----END PGP SIGNATURE-----
--------------enigE795D5CFBD7AB26F932D8DB3--
> On Tue, Apr 08, 2008 at 08:17:38AM +0200, Johan Str=F6m wrote:
>> Hello
>>
>> A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 =20=
>> disks, 3
>> mirrors) seems to have gotten stuck. =46rom Ctrl-T:
>>
>> load: 0.50 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
>> 0.04s 0% 3404k
>> load: 0.43 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
>> 0.04s 0% 3404k
>> load: 0.10 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
>> 0.04s 0% 3404k
>> load: 0.10 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
>> 0.04s 0% 3404k
>> load: 0.11 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
>> 0.04s 0% 3404k
>>
>> Worked for a while then that stopped working too (was over ssh). When
>> trying a local login i only got
>>
>> load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k
>>
>> I found one post like this earlier (by Xin LI), but nobody seemed =20
>> to have
>> replied...
>> in my current conf, I think my kmem/kmem_max is at 512Mb (not sure =20=
>> though,
>> since I've edited my file yesterday for next reboot), with 2G of =20
>> system
>> RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. =20
>> currently it is
>> at default), but since I just got back to 2G total mem after some =20
>> hardware
>> problems I've been runnig at those lows (1G total is kindof tight =20
>> with
>> zfs..)
>>
>> Well, just wanted to report... The box is not totally dead yet, ie =20=
>> I can
>> still do Ctrl-T on console, but thats it.. I don't really know what =20=
>> more I
>> can do so.. I don't have KDB/DDB.
>> I'll wait another hour or so before I hard reboot it, unless it =20
>> "unlocks"
>> or if anyone have any suggestions.
>
> I don't think there are any suggestions left to give. Many people,
> including myself, have experienced this kind of problem. It's well-
> documented both on my Common Issues page, and the official FreeBSD ZFS
> Wiki.
Ah.. I guess I was just to restrictive with the googling on =20
"zfs:&buf_hash_table.ht_locks[i].ht_lock".
>
>
> ZFS is still considered highly experimental, so if your data is at all
> important to you, perform backups or switch to another filesystem
> provider.
That I am aware of.
Thanks.=
For your question: just reboot would be fine, you may want to tune your=20
arc size (to be smaller) and kmem space (to be larger), which would=20
reduce the chance that this would happen, or eliminate it, depending on=20
your workload.
This situation is not recoverable and you can trust ZFS that you will=20
not lose data if they are already sync'ed.
--=20
Xin LI <del...@delphij.net> http://www.delphij.net/
FreeBSD - The Power to Serve!
--------------enigC768FF74A1AA2FF26112D23A
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFH+yFVOfuToMruuMARCsn6AJ9+gLwO6qE1EMh88KrHzoTPUqfLWwCeP7cJ
AGlkPJ5DNkNw172KJ/bapKs=
=uROd
-----END PGP SIGNATURE-----
--------------enigC768FF74A1AA2FF26112D23A--
> Johan Str=F6m wrote:
>> Hello
>> A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 =20=
>> disks, 3 mirrors) seems to have gotten stuck. =46rom Ctrl-T:
>> load: 0.50 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
>> load: 0.43 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
>> load: 0.10 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
>> load: 0.10 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
>> load: 0.11 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
>> Worked for a while then that stopped working too (was over ssh). =20
>> When trying a local login i only got
>> load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k
>> I found one post like this earlier (by Xin LI), but nobody seemed =20
>> to have replied...
>> in my current conf, I think my kmem/kmem_max is at 512Mb (not sure =20=
>> though, since I've edited my file yesterday for next reboot), with =20=
>> 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of =20
>> 512M. currently it is at default), but since I just got back to 2G =20=
>> total mem after some hardware problems I've been runnig at those =20
>> lows (1G total is kindof tight with zfs..)
>> Well, just wanted to report... The box is not totally dead yet, ie =20=
>> I can still do Ctrl-T on console, but thats it.. I don't really =20
>> know what more I can do so.. I don't have KDB/DDB.
>> I'll wait another hour or so before I hard reboot it, unless it =20
>> "unlocks" or if anyone have any suggestions.
>
> The key is to increase your kmem and prevent it from being =20
> exhausted. I think more recent OpenSolaris's ZFS code has some =20
> improvements but I do not have spare devices at hand to test and =20
> debug :(
Yep, never had the problem when I was running with 2G total mem, but =20
then one stick (damn consumer crap) failed and I was left with 1G, and =20=
I started to have random problems. Going to tune kmem back up now when =20=
I got more mem again, thinking about putting in 4G too..
>
>
> Maybe pjd@ would get a new import at some point? I have cc'ed him.
>
> Cheers,
> --=20
> Xin LI <del...@delphij.net> http://www.delphij.net/
> FreeBSD - The Power to Serve!
>
_______________________________________________
> For your question: just reboot would be fine, you may want to tune
> your arc size (to be smaller) and kmem space (to be larger), which
> would reduce the chance that this would happen, or eliminate it,
> depending on your workload.
Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are
those reasonable on a 2G machine? I think I've read that from
somewhere, but cannot find that (arc at least) in the TuningGuide now.
>
> This situation is not recoverable and you can trust ZFS that you
> will not lose data if they are already sync'ed.
>
Actually, I've had a lot of hard crashes lately on this machine (bad
hw) but not a single time I have lost data (to my knowledge at
least...). In that regard, comparing to UFS, ZFS is waaay better! :)
> --
Depending on your work load you are just buying more time, so
"reasonable" is a matter of perspective. :( I didn't see if you said
you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64
regardless of how much memory you have. If 512M arcsize crashes too soon
for your tastes you can always lower it down to 256M, or 128M, etc.
I tried for several weeks to get ZFS stable on a 64bit system with a
1.5G kernel. The best uptime I ever got was 72 hours, the worst was 2,
the average about 24. Interestingly, most of the hangs were at off
hours, when the system was lightly loaded, had lots of free memory, etc.
That suggests to me a slow leak of some sort.
Anyway, ZFS is not ready for production. Some people may get lucky, but
you can't count on it.
Spike