Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Next Steps to Debug ZFS Hang?

159 views
Skip to first unread message

Nick Sivo

unread,
Oct 7, 2014, 9:48:51 PM10/7/14
to
Hello,


I've been having trouble with ZFS on my server. For the most part it works splendidly, but occasionally I'll experience permanent hangs.


For example, right now on one of my ZFS filesystems (the others are fine), I can read, write, and stat files, but if I run ls in any directory, ls and the terminal will hang. CTRL-C, and kill -9 can't kill it:


In top:
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND

 5868 nsivo         1  20    0 14456K  1016K zfs     0   0:00  0.00% ls


In ps:
USER      PID  %CPU %MEM     VSZ     RSS TT  STAT STARTED        TIME COMMAND

nsivo    5868   0.0  0.0   14456    1016  2- D+    2:35PM     0:00.00 ls


Eventually the entire system hangs, and can't be shutdown cleanly.


What are the next steps to debug this? I'm a software developer, but am not familiar with kernel debugging. Is there a way to discover in which syscall ls is stuck? Ideally without requiring a crash dump?


Thanks for reading,
Nick

-Nick
_______________________________________________
freebsd-...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questi...@freebsd.org"

Nick Sivo

unread,
Oct 8, 2014, 1:08:49 AM10/8/14
to
Thanks Chad. I looked through 20 pages of google search results restricted to the archives of this list, and didn't see anything I could tell was related to the problem I'm having. It's not a complete IO stall, or even a pool-wide stall. It's only for listing directory contents of a single filesystem, which is why I'm baffled.


The pool is less that 40% utilized, ARC isn't under memory pressure, and as far as I can tell everything should be fine.


I did find: https://wiki.freebsd.org/AvgZfsDeadlockDebug which confirms I want kernel stack traces. I was hoping to get some guidance on that, especially on a remote system with only SSH access. Not sure I can just enter DDB over SSH. Maybe tricks with dtrace and stack()?

[nsivo@hn3 sysutils]$ zpool get all ssd | grep -v default

NAME  PROPERTY                                      VALUE                                         SOURCE

ssd   size                                          182G                                          -

ssd   capacity                                      38%                                           -

ssd   health                                        ONLINE                                        -

ssd   failmode                                      panic                                         local

ssd   dedupratio                                    1.00x                                         -

ssd   free                                          111G                                          -

ssd   allocated                                     70.7G                                         -

ssd   readonly                                      off                                           -

ssd   expandsize                                    0                                             -

ssd   feature@async_destroy                         enabled                                       local

ssd   feature@empty_bpobj                           active                                        local

ssd   feature@lz4_compress                          enabled                                       local

ssd   unsup...@com.joyent:multi_vdev_crash_dump  inactive                                      local

[nsivo@hn3 sysutils]$ zfs get all | grep ssd | grep -v default | grep -v 'arc@2'

ssd                                         type                  filesystem               -

ssd                                         creation              Thu Aug 28 16:33 2014    -

ssd                                         used                  70.7G                    -

ssd                                         available             108G                     -

ssd                                         referenced            144K                     -

ssd                                         compressratio         1.00x                    -

ssd                                         mounted               no                       -

ssd                                         mountpoint            none                     local

ssd                                         checksum              sha256                   local

ssd                                         atime                 off                      local

ssd                                         canmount              off                      local

ssd                                         version               5                        -

ssd                                         utf8only              on                       -

ssd                                         normalization         formKC                   -

ssd                                         casesensitivity       sensitive                -

ssd                                         usedbysnapshots       0                        -

ssd                                         usedbydataset         144K                     -

ssd                                         usedbychildren        70.7G                    -

ssd                                         usedbyrefreservation  0                        -

ssd                                         mlslabel                                       -

ssd                                         refcompressratio      1.00x                    -

ssd                                         written               144K                     -

ssd                                         logicalused           31.1G                    -

ssd                                         logicalreferenced     43.5K                    -

ssd/arc                                     type                  filesystem               -

ssd/arc                                     creation              Wed Sep 17 17:07 2014    -

ssd/arc                                     used                  70.5G                    -

ssd/arc                                     available             108G                     -

ssd/arc                                     referenced            47.8G                    -

ssd/arc                                     compressratio         1.00x                    -

ssd/arc                                     mounted               yes                      -

ssd/arc                                     mountpoint            /usr/arc                 received

ssd/arc                                     checksum              sha256                   inherited from ssd

ssd/arc                                     atime                 off                      inherited from ssd

ssd/arc                                     setuid                off                      received

ssd/arc                                     snapdir               visible                  received

ssd/arc                                     xattr                 off                      temporary

ssd/arc                                     version               5                        -

ssd/arc                                     utf8only              on                       -

ssd/arc                                     normalization         formKC                   -

ssd/arc                                     casesensitivity       sensitive                -

ssd/arc                                     usedbysnapshots       22.7G                    -

ssd/arc                                     usedbydataset         47.8G                    -

ssd/arc                                     usedbychildren        0                        -

ssd/arc                                     usedbyrefreservation  0                        -

ssd/arc                                     mlslabel                                       -

ssd/arc                                     sync                  always                   local

ssd/arc                                     refcompressratio      1.00x                    -

ssd/arc                                     written               262M                     -

ssd/arc                                     logicalused           31.0G                    -

ssd/arc                                     logicalreferenced     15.3G                    -

[nsivo@hn3 ~]$ zfs-stats -a


------------------------------------------------------------------------

ZFS Subsystem Report Tue Oct  7 21:58:37 2014

------------------------------------------------------------------------


System Information:


Kernel Version: 902001 (osreldate)

Hardware Platform: amd64

Processor Architecture: amd64


ZFS Storage pool Version: 5000

ZFS Filesystem Version: 5


FreeBSD 9.2-RELEASE-p12 #0: Mon Sep 15 18:46:46 UTC 2014 root

 9:58PM  up 16 days,  3:39, 2 users, load averages: 0.24, 0.33, 0.35


------------------------------------------------------------------------


System Memory:


16.80% 10.42 GiB Active, 0.15% 94.19 MiB Inact

72.82% 45.15 GiB Wired, 0.12% 74.04 MiB Cache

10.11% 6.27 GiB Free, 0.00% 2.46 MiB Gap


Real Installed: 64.00 GiB

Real Available: 99.91% 63.94 GiB

Real Managed: 96.97% 62.00 GiB


Logical Total: 64.00 GiB

Logical Used: 89.95% 57.56 GiB

Logical Free: 10.05% 6.44 GiB


Kernel Memory: 15.70 GiB

Data: 99.83% 15.67 GiB

Text: 0.17% 27.37 MiB


Kernel Memory Map: 52.92 GiB

Size: 21.21% 11.22 GiB

Free: 78.79% 41.69 GiB


------------------------------------------------------------------------


ARC Summary: (HEALTHY)

Memory Throttle Count: 0


ARC Misc:

Deleted: 500.41m

Recycle Misses: 270.90m

Mutex Misses: 27.63m

Evict Skips: 5.12b


ARC Size: 25.95% 15.83 GiB

Target Size: (Adaptive) 45.64% 27.84 GiB

Min Size (Hard Limit): 12.50% 7.63 GiB

Max Size (High Water): 8:1 61.00 GiB


ARC Size Breakdown:

Recently Used Cache Size: 10.56% 2.94 GiB

Frequently Used Cache Size: 89.44% 24.91 GiB


ARC Hash Breakdown:

Elements Max: 16.33m

Elements Current: 20.84% 3.40m

Collisions: 571.54m

Chain Max: 41

Chains: 750.10k


------------------------------------------------------------------------


ARC Efficiency: 4.77b

Cache Hit Ratio: 81.44% 3.88b

Cache Miss Ratio: 18.56% 884.45m

Actual Hit Ratio: 80.36% 3.83b


Data Demand Efficiency: 29.50% 966.53m

Data Prefetch Efficiency: 29.08% 21.22m


CACHE HITS BY CACHE LIST:

  Most Recently Used: 3.47% 134.64m

  Most Frequently Used: 95.20% 3.69b

  Most Recently Used Ghost: 5.18% 200.94m

  Most Frequently Used Ghost: 5.58% 216.52m


CACHE HITS BY DATA TYPE:

  Demand Data: 7.35% 285.16m

  Prefetch Data: 0.16% 6.17m

  Demand Metadata: 90.91% 3.53b

  Prefetch Metadata: 1.58% 61.40m


CACHE MISSES BY DATA TYPE:

  Demand Data: 77.04% 681.37m

  Prefetch Data: 1.70% 15.05m

  Demand Metadata: 15.86% 140.24m

  Prefetch Metadata: 5.40% 47.79m


------------------------------------------------------------------------


L2ARC is disabled


------------------------------------------------------------------------


File-Level Prefetch: (HEALTHY)


DMU Efficiency: 11.05b

Hit Ratio: 59.82% 6.61b

Miss Ratio: 40.18% 4.44b


Colinear: 4.44b

  Hit Ratio: 0.01% 317.87k

  Miss Ratio: 99.99% 4.44b


Stride: 6.62b

  Hit Ratio: 99.57% 6.59b

  Miss Ratio: 0.43% 28.41m


DMU Misc:

Reclaim: 4.44b

  Successes: 0.81% 35.91m

  Failures: 99.19% 4.40b


Streams: 16.54m

  +Resets: 0.25% 41.20k

  -Resets: 99.75% 16.50m

  Bogus: 0


------------------------------------------------------------------------


VDEV cache is disabled


------------------------------------------------------------------------


ZFS Tunables (sysctl):

kern.maxusers                           384

vm.kmem_size                            66575511552

vm.kmem_size_scale                      1

vm.kmem_size_min                        0

vm.kmem_size_max                        329853485875

vfs.zfs.l2c_only_size                   0

vfs.zfs.mfu_ghost_data_lsize            483016704

vfs.zfs.mfu_ghost_metadata_lsize        1192360960

vfs.zfs.mfu_ghost_size                  1675377664

vfs.zfs.mfu_data_lsize                  3821800448

vfs.zfs.mfu_metadata_lsize              1144714240

vfs.zfs.mfu_size                        9304926208

vfs.zfs.mru_ghost_data_lsize            7731420672

vfs.zfs.mru_ghost_metadata_lsize        19021883392

vfs.zfs.mru_ghost_size                  26753304064

vfs.zfs.mru_data_lsize                  1530433536

vfs.zfs.mru_metadata_lsize              390488064

vfs.zfs.mru_size                        3144110080

vfs.zfs.anon_data_lsize                 0

vfs.zfs.anon_metadata_lsize             0

vfs.zfs.anon_size                       12001792

vfs.zfs.l2arc_norw                      1

vfs.zfs.l2arc_feed_again                1

vfs.zfs.l2arc_noprefetch                1

vfs.zfs.l2arc_feed_min_ms               200

vfs.zfs.l2arc_feed_secs                 1

vfs.zfs.l2arc_headroom                  2

vfs.zfs.l2arc_write_boost               8388608

vfs.zfs.l2arc_write_max                 8388608

vfs.zfs.arc_meta_limit                  16375442432

vfs.zfs.arc_meta_used                   11641238168

vfs.zfs.arc_min                         8187721216

vfs.zfs.arc_max                         65501769728

vfs.zfs.dedup.prefetch                  1

vfs.zfs.mdcomp_disable                  0

vfs.zfs.nopwrite_enabled                1

vfs.zfs.write_limit_override            0

vfs.zfs.write_limit_inflated            205969035264

vfs.zfs.write_limit_max                 8582043136

vfs.zfs.write_limit_min                 33554432

vfs.zfs.write_limit_shift               3

vfs.zfs.no_write_throttle               0

vfs.zfs.zfetch.array_rd_sz              1048576

vfs.zfs.zfetch.block_cap                256

vfs.zfs.zfetch.min_sec_reap             2

vfs.zfs.zfetch.max_streams              8

vfs.zfs.prefetch_disable                0

vfs.zfs.no_scrub_prefetch               0

vfs.zfs.no_scrub_io                     0

vfs.zfs.resilver_min_time_ms            3000

vfs.zfs.free_min_time_ms                1000

vfs.zfs.scan_min_time_ms                1000

vfs.zfs.scan_idle                       50

vfs.zfs.scrub_delay                     4

vfs.zfs.resilver_delay                  2

vfs.zfs.top_maxinflight                 32

vfs.zfs.write_to_degraded               0

vfs.zfs.mg_alloc_failures               8

vfs.zfs.check_hostid                    1

vfs.zfs.deadman_enabled                 1

vfs.zfs.deadman_synctime                1000

vfs.zfs.recover                         0

vfs.zfs.txg.synctime_ms                 1000

vfs.zfs.txg.timeout                     5

vfs.zfs.vdev.cache.bshift               16

vfs.zfs.vdev.cache.size                 0

vfs.zfs.vdev.cache.max                  16384

vfs.zfs.vdev.trim_on_init               1

vfs.zfs.vdev.write_gap_limit            4096

vfs.zfs.vdev.read_gap_limit             32768

vfs.zfs.vdev.aggregation_limit          131072

vfs.zfs.vdev.ramp_rate                  2

vfs.zfs.vdev.time_shift                 29

vfs.zfs.vdev.min_pending                4

vfs.zfs.vdev.max_pending                10

vfs.zfs.vdev.bio_delete_disable         0

vfs.zfs.vdev.bio_flush_disable          0

vfs.zfs.vdev.trim_max_pending           64

vfs.zfs.vdev.trim_max_bytes             2147483648

vfs.zfs.cache_flush_disable             0

vfs.zfs.zil_replay_disable              0

vfs.zfs.sync_pass_rewrite               2

vfs.zfs.sync_pass_dont_compress         5

vfs.zfs.sync_pass_deferred_free         2

vfs.zfs.zio.use_uma                     0

vfs.zfs.snapshot_list_prefetch          0

vfs.zfs.version.ioctl                   3

vfs.zfs.version.zpl                     5

vfs.zfs.version.spa                     5000

vfs.zfs.version.acl                     1

vfs.zfs.debug                           0

vfs.zfs.super_owner                     0

vfs.zfs.trim.enabled                    1

vfs.zfs.trim.max_interval               1

vfs.zfs.trim.timeout                    30

vfs.zfs.trim.txg_delay                  32


------------------------------------------------------------------------


-Nick

On Tue, Oct 7, 2014 at 9:15 PM, Chad Leigh Shire.Net LLC <ch...@shire.net>
wrote:

> On Oct 7, 2014, at 7:48 PM, Nick Sivo <ni...@ycombinator.com> wrote:
>> Hello,
>>
>>
>> I've been having trouble with ZFS on my server. For the most part it works splendidly, but occasionally I'll experience permanent hangs.
>>
>>
>> For example, right now on one of my ZFS filesystems (the others are fine), I can read, write, and stat files, but if I run ls in any directory, ls and the terminal will hang. CTRL-C, and kill -9 can't kill it:
>>

> How much free space do you have? (percentage wise). I found out (and have had others confirm) that when you get below a certain amount of free space you can get these symptoms (which percentage may vary per system and how the zfs config is set up [kernel parameters]).
> Also, depending on what you are doing the various parameters may need to be tweaked. Look in the archives for similar posts (including from me).

Chad Leigh Shire.Net LLC

unread,
Oct 8, 2014, 12:15:50 AM10/8/14
to

Daniel Staal

unread,
Oct 8, 2014, 11:49:09 PM10/8/14
to
--As of October 7, 2014 6:48:51 PM -0700, Nick Sivo is alleged to have said:

> I've been having trouble with ZFS on my server. For the most part it
> works splendidly, but occasionally I'll experience permanent hangs.
>
>
> For example, right now on one of my ZFS filesystems (the others are
> fine), I can read, write, and stat files, but if I run ls in any
> directory, ls and the terminal will hang. CTRL-C, and kill -9 can't kill
> it:

--As for the rest, it is mine.

Not sure if this will be helpful, but at least it shouldn't hurt: When was
the last time you ran a scrub? Also, how much RAM do you have and where is
your swap? (The only times I've had permanent hangs from ZFS was when I
ran out of RAM and was trying to swap to ZFS...)

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author. Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes. This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

Nick Sivo

unread,
Oct 9, 2014, 12:28:05 AM10/9/14
to
Hi Daniel,

> Not sure if this will be helpful, but at least it shouldn't hurt: When was
> the last time you ran a scrub? Also, how much RAM do you have and where is
> your swap? (The only times I've had permanent hangs from ZFS was when I ran
> out of RAM and was trying to swap to ZFS...)

The server has 64GB ECC RAM, and no swap at the moment. I've since
rebooted the box, but a scrub today revealed no errors, and there was
nothing in the console or any log files about disk or controller
errors or timeouts.

I was able to get kernel call stacks with procstat:
https://gist.github.com/kogir/49ff76f95b0b3be3e80e

Thanks,
Nick

Daniel Staal

unread,
Oct 9, 2014, 6:53:52 PM10/9/14
to
--As of October 8, 2014 9:28:05 PM -0700, Nick Sivo is alleged to have said:

>> Not sure if this will be helpful, but at least it shouldn't hurt: When
>> was the last time you ran a scrub? Also, how much RAM do you have and
>> where is your swap? (The only times I've had permanent hangs from ZFS
>> was when I ran out of RAM and was trying to swap to ZFS...)
>
> The server has 64GB ECC RAM, and no swap at the moment. I've since
> rebooted the box, but a scrub today revealed no errors, and there was
> nothing in the console or any log files about disk or controller
> errors or timeouts.

--As for the rest, it is mine.

Note that I'm still just going by 'standard checks', but why don't you have
any swap? I know you probably wouldn't use it with that much RAM around,
but FreeBSD still performs better with it - and it wouldn't surprise me
completely if that was causing your issue. (It shouldn't, in an ideal
world - but it's an oddity on your system that might be causing issues, if
there's an uncovered corner case.)

Setting up a small ramdisk for swap - or even putting some small
swap-on-zfs - might be worth checking to see if it seems to prevent the
issue. (Note that swap-on-zfs has a 'worst case' scenario that crashes the
box if you run out of RAM. It's happened to me, and I was able to recover,
but it took a while.)

Anyway, I'm mostly trying to keep your question alive in hopes that someone
who's more knowledgeable can answer it. ;)

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author. Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes. This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

Valeri Galtsev

unread,
Oct 9, 2014, 7:17:02 PM10/9/14
to

On Thu, October 9, 2014 5:53 pm, Daniel Staal wrote:
> --As of October 8, 2014 9:28:05 PM -0700, Nick Sivo is alleged to have
> said:
>
>>> Not sure if this will be helpful, but at least it shouldn't hurt: When
>>> was the last time you ran a scrub? Also, how much RAM do you have and
>>> where is your swap? (The only times I've had permanent hangs from ZFS
>>> was when I ran out of RAM and was trying to swap to ZFS...)
>>
>> The server has 64GB ECC RAM, and no swap at the moment. I've since
>> rebooted the box, but a scrub today revealed no errors, and there was
>> nothing in the console or any log files about disk or controller
>> errors or timeouts.
>
> --As for the rest, it is mine.
>
> Note that I'm still just going by 'standard checks', but why don't you
> have
> any swap? I know you probably wouldn't use it with that much RAM around,
> but FreeBSD still performs better with it - and it wouldn't surprise me
> completely if that was causing your issue.

I'm petrified. Is that so? I mean, as I understood you, 64 GB RAM machine
running under FreeBSD (say, 9.3) still needs some amount of SWAP for
better performance, right?

Valeri
++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++

Daniel Staal

unread,
Oct 9, 2014, 8:50:33 PM10/9/14
to
--As of October 9, 2014 6:17:02 PM -0500, Valeri Galtsev is alleged to have
said:

>> Note that I'm still just going by 'standard checks', but why don't you
>> have
>> any swap? I know you probably wouldn't use it with that much RAM around,
>> but FreeBSD still performs better with it - and it wouldn't surprise me
>> completely if that was causing your issue.
>
> I'm petrified. Is that so? I mean, as I understood you, 64 GB RAM machine
> running under FreeBSD (say, 9.3) still needs some amount of SWAP for
> better performance, right?

--As for the rest, it is mine.

Well, I suppose a better way to say it would be 'expects to have for normal
operations'. Basically any modern OS expects to have some swap - even with
abundant RAM. In theory you should be able to run without it, if you have
the RAM, but it will be used (in small amounts) if it's available by normal
housekeeping and operations. The tuning(7) man page under 9.3 recommends
at least 256M of swap in all cases. (Though I note that recommendation is
no longer in the man page under FreeBSD 10.)

For a relatively new feature (like ZFS) that intensely and aggressively
uses RAM (like ZFS), it would not surprise me if not having swap would
uncover bugs in the implementation.

Adrian Chadd

unread,
Oct 12, 2014, 3:41:23 AM10/12/14
to
Hi!

A bunch of ZFS hangs were found / fixed in FreeBSD-HEAD and I -think-
backported to FreeBSD-10.

I don't know if they've been backported to -9. Certainly not to 9.2; I
think I found / reported some after 9.2 was released.

In the kernel debugger (ddb), you can try "show allproc" to get a list of procs.

https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-online-ddb.html
for more information.

Why can't you get crashdumps?

are you able to:

* update to freebsd-9-stable?
* .. if you can, update to freebsd-10-stable? 10.1 is about to be released soon.

You could try "procstat -ta" to see the threads running. Do it as root
to see all the threads. TDNAME is the thread name; WCHAN is what's
important to figure out why it's sleeping.

I hope this helps!



-a


On 7 October 2014 18:48, Nick Sivo <ni...@ycombinator.com> wrote:
> Hello,
>
>
> I've been having trouble with ZFS on my server. For the most part it works splendidly, but occasionally I'll experience permanent hangs.
>
>
> For example, right now on one of my ZFS filesystems (the others are fine), I can read, write, and stat files, but if I run ls in any directory, ls and the terminal will hang. CTRL-C, and kill -9 can't kill it:
>
>
> In top:
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
>
> 5868 nsivo 1 20 0 14456K 1016K zfs 0 0:00 0.00% ls
>
>
> In ps:
> USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
>
> nsivo 5868 0.0 0.0 14456 1016 2- D+ 2:35PM 0:00.00 ls
>
>
> Eventually the entire system hangs, and can't be shutdown cleanly.
>
>
> What are the next steps to debug this? I'm a software developer, but am not familiar with kernel debugging. Is there a way to discover in which syscall ls is stuck? Ideally without requiring a crash dump?
>
>
> Thanks for reading,
> Nick
>
>
>
> -Nick
0 new messages