Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ZFS deadlock

3 views
Skip to first unread message

Johan Ström

unread,
Apr 8, 2008, 2:35:48 AM4/8/08
to
Hello

A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 =20
disks, 3 mirrors) seems to have gotten stuck. =46rom Ctrl-T:

load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20
0.02u 0.04s 0% 3404k
load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20
0.02u 0.04s 0% 3404k
load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20
0.02u 0.04s 0% 3404k
load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20
0.02u 0.04s 0% 3404k
load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20
0.02u 0.04s 0% 3404k

Worked for a while then that stopped working too (was over ssh). When =20=

trying a local login i only got

load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k

I found one post like this earlier (by Xin LI), but nobody seemed to =20
have replied...
in my current conf, I think my kmem/kmem_max is at 512Mb (not sure =20
though, since I've edited my file yesterday for next reboot), with 2G =20=

of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. =20
currently it is at default), but since I just got back to 2G total mem =20=

after some hardware problems I've been runnig at those lows (1G total =20=

is kindof tight with zfs..)

Well, just wanted to report... The box is not totally dead yet, ie I =20
can still do Ctrl-T on console, but thats it.. I don't really know =20
what more I can do so.. I don't have KDB/DDB.
I'll wait another hour or so before I hard reboot it, unless it =20
"unlocks" or if anyone have any suggestions.

Thanks

--
Johan Str=F6m
Stromnet
jo...@stromnet.se
http://www.stromnet.se/


_______________________________________________
freeb...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"

Jeremy Chadwick

unread,
Apr 8, 2008, 3:34:56 AM4/8/08
to
On Tue, Apr 08, 2008 at 08:17:38AM +0200, Johan Ström wrote:
> Hello
>
> A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3
> mirrors) seems to have gotten stuck. From Ctrl-T:
>
> load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
> 0.04s 0% 3404k
> load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
> 0.04s 0% 3404k
> load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
> 0.04s 0% 3404k
> load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
> 0.04s 0% 3404k
> load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
> 0.04s 0% 3404k
>
> Worked for a while then that stopped working too (was over ssh). When
> trying a local login i only got
>
> load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k
>
> I found one post like this earlier (by Xin LI), but nobody seemed to have
> replied...
> in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though,
> since I've edited my file yesterday for next reboot), with 2G of system
> RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is
> at default), but since I just got back to 2G total mem after some hardware
> problems I've been runnig at those lows (1G total is kindof tight with
> zfs..)
>
> Well, just wanted to report... The box is not totally dead yet, ie I can
> still do Ctrl-T on console, but thats it.. I don't really know what more I
> can do so.. I don't have KDB/DDB.
> I'll wait another hour or so before I hard reboot it, unless it "unlocks"
> or if anyone have any suggestions.

I don't think there are any suggestions left to give. Many people,
including myself, have experienced this kind of problem. It's well-
documented both on my Common Issues page, and the official FreeBSD ZFS
Wiki.

ZFS is still considered highly experimental, so if your data is at all
important to you, perform backups or switch to another filesystem
provider.

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

LI Xin

unread,
Apr 8, 2008, 3:38:48 AM4/8/08
to
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigE795D5CFBD7AB26F932D8DB3
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable

Johan Str=F6m wrote:
> Hello
>=20
> A box of mine running RELENG_7_0 and ZFS over a couple of disks (6=20


> disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T:

>=20
> load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20
> 0.02u 0.04s 0% 3404k
> load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20
> 0.02u 0.04s 0% 3404k
> load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20
> 0.02u 0.04s 0% 3404k
> load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20
> 0.02u 0.04s 0% 3404k
> load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20
> 0.02u 0.04s 0% 3404k
>=20
> Worked for a while then that stopped working too (was over ssh). When=20


> trying a local login i only got

>=20


> load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k

>=20
> I found one post like this earlier (by Xin LI), but nobody seemed to=20
> have replied...
> in my current conf, I think my kmem/kmem_max is at 512Mb (not sure=20
> though, since I've edited my file yesterday for next reboot), with 2G o=
f=20
> system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M.=20
> currently it is at default), but since I just got back to 2G total mem =

> after some hardware problems I've been runnig at those lows (1G total i=
s=20
> kindof tight with zfs..)
>=20
> Well, just wanted to report... The box is not totally dead yet, ie I ca=
n=20
> still do Ctrl-T on console, but thats it.. I don't really know what mor=
e=20


> I can do so.. I don't have KDB/DDB.

> I'll wait another hour or so before I hard reboot it, unless it=20


> "unlocks" or if anyone have any suggestions.

The key is to increase your kmem and prevent it from being exhausted. I =

think more recent OpenSolaris's ZFS code has some improvements but I do=20
not have spare devices at hand to test and debug :(

Maybe pjd@ would get a new import at some point? I have cc'ed him.

Cheers,
--=20
Xin LI <del...@delphij.net> http://www.delphij.net/
FreeBSD - The Power to Serve!


--------------enigE795D5CFBD7AB26F932D8DB3
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH+yC1OfuToMruuMARCqN0AKCIKKc84mc47mc70QEHXgI3cbIzlACfclIE
OCVHk4KNeYm7i6JdbM+7dkI=
=yO3F
-----END PGP SIGNATURE-----

--------------enigE795D5CFBD7AB26F932D8DB3--

Johan Ström

unread,
Apr 8, 2008, 3:41:16 AM4/8/08
to
On Apr 8, 2008, at 9:32 AM, Jeremy Chadwick wrote:

> On Tue, Apr 08, 2008 at 08:17:38AM +0200, Johan Str=F6m wrote:
>> Hello
>>
>> A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 =20=

>> disks, 3
>> mirrors) seems to have gotten stuck. =46rom Ctrl-T:
>>
>> load: 0.50 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
>> 0.04s 0% 3404k
>> load: 0.43 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
>> 0.04s 0% 3404k
>> load: 0.10 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
>> 0.04s 0% 3404k
>> load: 0.10 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
>> 0.04s 0% 3404k
>> load: 0.11 cmd: zsh 40188 =20


>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u
>> 0.04s 0% 3404k
>>

>> Worked for a while then that stopped working too (was over ssh). When

>> trying a local login i only got
>>

>> load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k
>>

>> I found one post like this earlier (by Xin LI), but nobody seemed =20
>> to have
>> replied...
>> in my current conf, I think my kmem/kmem_max is at 512Mb (not sure =20=

>> though,
>> since I've edited my file yesterday for next reboot), with 2G of =20


>> system
>> RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. =20
>> currently it is

>> at default), but since I just got back to 2G total mem after some =20
>> hardware
>> problems I've been runnig at those lows (1G total is kindof tight =20
>> with
>> zfs..)
>>
>> Well, just wanted to report... The box is not totally dead yet, ie =20=

>> I can
>> still do Ctrl-T on console, but thats it.. I don't really know what =20=

>> more I


>> can do so.. I don't have KDB/DDB.
>> I'll wait another hour or so before I hard reboot it, unless it =20
>> "unlocks"
>> or if anyone have any suggestions.
>

> I don't think there are any suggestions left to give. Many people,
> including myself, have experienced this kind of problem. It's well-
> documented both on my Common Issues page, and the official FreeBSD ZFS
> Wiki.

Ah.. I guess I was just to restrictive with the googling on =20
"zfs:&buf_hash_table.ht_locks[i].ht_lock".


>
>
> ZFS is still considered highly experimental, so if your data is at all
> important to you, perform backups or switch to another filesystem
> provider.

That I am aware of.

Thanks.=

LI Xin

unread,
Apr 8, 2008, 3:43:11 AM4/8/08
to
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigC768FF74A1AA2FF26112D23A

Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable

For your question: just reboot would be fine, you may want to tune your=20
arc size (to be smaller) and kmem space (to be larger), which would=20
reduce the chance that this would happen, or eliminate it, depending on=20
your workload.

This situation is not recoverable and you can trust ZFS that you will=20
not lose data if they are already sync'ed.

--=20
Xin LI <del...@delphij.net> http://www.delphij.net/
FreeBSD - The Power to Serve!


--------------enigC768FF74A1AA2FF26112D23A


Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH+yFVOfuToMruuMARCsn6AJ9+gLwO6qE1EMh88KrHzoTPUqfLWwCeP7cJ
AGlkPJ5DNkNw172KJ/bapKs=
=uROd
-----END PGP SIGNATURE-----

--------------enigC768FF74A1AA2FF26112D23A--

Johan Ström

unread,
Apr 8, 2008, 3:44:50 AM4/8/08
to
On Apr 8, 2008, at 9:37 AM, LI Xin wrote:

> Johan Str=F6m wrote:
>> Hello
>> A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 =20=

>> disks, 3 mirrors) seems to have gotten stuck. =46rom Ctrl-T:
>> load: 0.50 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
>> load: 0.43 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
>> load: 0.10 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
>> load: 0.10 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
>> load: 0.11 cmd: zsh 40188 =20
>> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k

>> Worked for a while then that stopped working too (was over ssh). =20


>> When trying a local login i only got
>> load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k
>> I found one post like this earlier (by Xin LI), but nobody seemed =20
>> to have replied...
>> in my current conf, I think my kmem/kmem_max is at 512Mb (not sure =20=

>> though, since I've edited my file yesterday for next reboot), with =20=

>> 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of =20
>> 512M. currently it is at default), but since I just got back to 2G =20=

>> total mem after some hardware problems I've been runnig at those =20
>> lows (1G total is kindof tight with zfs..)


>> Well, just wanted to report... The box is not totally dead yet, ie =20=

>> I can still do Ctrl-T on console, but thats it.. I don't really =20
>> know what more I can do so.. I don't have KDB/DDB.


>> I'll wait another hour or so before I hard reboot it, unless it =20
>> "unlocks" or if anyone have any suggestions.
>

> The key is to increase your kmem and prevent it from being =20
> exhausted. I think more recent OpenSolaris's ZFS code has some =20
> improvements but I do not have spare devices at hand to test and =20
> debug :(

Yep, never had the problem when I was running with 2G total mem, but =20
then one stick (damn consumer crap) failed and I was left with 1G, and =20=

I started to have random problems. Going to tune kmem back up now when =20=

I got more mem again, thinking about putting in 4G too..

>
>
> Maybe pjd@ would get a new import at some point? I have cc'ed him.
>
> Cheers,

> --=20
> Xin LI <del...@delphij.net> http://www.delphij.net/
> FreeBSD - The Power to Serve!
>

_______________________________________________

Johan Ström

unread,
Apr 8, 2008, 3:57:38 AM4/8/08
to
On Apr 8, 2008, at 9:40 AM, LI Xin wrote:

> For your question: just reboot would be fine, you may want to tune

> your arc size (to be smaller) and kmem space (to be larger), which
> would reduce the chance that this would happen, or eliminate it,
> depending on your workload.

Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are
those reasonable on a 2G machine? I think I've read that from
somewhere, but cannot find that (arc at least) in the TuningGuide now.

>
> This situation is not recoverable and you can trust ZFS that you

> will not lose data if they are already sync'ed.
>

Actually, I've had a lot of hard crashes lately on this machine (bad
hw) but not a single time I have lost data (to my knowledge at
least...). In that regard, comparing to UFS, ZFS is waaay better! :)

> --

Ender

unread,
Apr 8, 2008, 10:33:11 AM4/8/08
to
Johan Ström wrote:
> On Apr 8, 2008, at 9:40 AM, LI Xin wrote:
>
>> For your question: just reboot would be fine, you may want to tune
>> your arc size (to be smaller) and kmem space (to be larger), which
>> would reduce the chance that this would happen, or eliminate it,
>> depending on your workload.
>
> Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are
> those reasonable on a 2G machine? I think I've read that from
> somewhere, but cannot find that (arc at least) in the TuningGuide now.
>

Depending on your work load you are just buying more time, so
"reasonable" is a matter of perspective. :( I didn't see if you said
you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64
regardless of how much memory you have. If 512M arcsize crashes too soon
for your tastes you can always lower it down to 256M, or 128M, etc.

Ender

unread,
Apr 8, 2008, 12:28:07 PM4/8/08
to
Spike Ilacqua wrote:
>> Depending on your work load you are just buying more time, so
>> "reasonable" is a matter of perspective. :( I didn't see if you said
>> you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on
>> amd64 regardless of how much memory you have. If 512M arcsize crashes
>> too soon for your tastes you can always lower it down to 256M, or
>> 128M, etc.
>
> I tried for several weeks to get ZFS stable on a 64bit system with a
> 1.5G kernel. The best uptime I ever got was 72 hours, the worst was
> 2, the average about 24. Interestingly, most of the hangs were at off
> hours, when the system was lightly loaded, had lots of free memory,
> etc. That suggests to me a slow leak of some sort.
>
> Anyway, ZFS is not ready for production. Some people may get lucky,
> but you can't count on it.
>
> Spike
Very intresting. With 1.5G of kmem and a 64M arc_max the best uptime I
had was 5 days, worst 1 day. Also most of my crashes are off hours as
well. Another tidbit of information running things out of /tank instead
of /tank/foo/bar/foo seems to lead to longer uptime, you might want to
try that as well.

Spike Ilacqua

unread,
Apr 8, 2008, 12:42:50 PM4/8/08
to
> Depending on your work load you are just buying more time, so
> "reasonable" is a matter of perspective. :( I didn't see if you said
> you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64
> regardless of how much memory you have. If 512M arcsize crashes too soon
> for your tastes you can always lower it down to 256M, or 128M, etc.

I tried for several weeks to get ZFS stable on a 64bit system with a
1.5G kernel. The best uptime I ever got was 72 hours, the worst was 2,
the average about 24. Interestingly, most of the hangs were at off
hours, when the system was lightly loaded, had lots of free memory, etc.
That suggests to me a slow leak of some sort.

Anyway, ZFS is not ready for production. Some people may get lucky, but
you can't count on it.

Spike

0 new messages