ZFS debugging with zdb yields "signal 6 (no core dump - bad address)"

13 views
Skip to first unread message

mic...@cannel.la

unread,
Mar 8, 2026, 11:07:49 AM (7 days ago) Mar 8
to us...@hardenedbsd.org

HBSD: 14.3 stable

Context: I'm poking at some odd-seeming zfs behaviour with bclone and want debug info on a zpool and dataset, specifically the count of cloned blocks (-T)

issue: Anytime i execute zdb with arguments (thorough but non-exhaustive testing), I am the lucky winner of the following:

root@deathtongue:/var/log # zdb -s unclemeat/drives
ASSERT at /usr/src/sys/contrib/openzfs/module/zcommon/zfs_fletcher.c:429:fletcher_4_impl_get()
fletcher_4_initialized
  PID: 2325      COMM: zdb
  TID: 145535    NAME:
Abort trap

and:

[252438] [HBSD INTERNAL] zdb (jid 0, uid 0) exited on signal 6 (no core dump - bad address)
[252438]  -> pid: 2325 ppid: 94045 p_pax: 0x58655<PAGEEXEC,MPROTECT,SEGVGUARD,ASLR,NOSHLIBRANDOM,DISALLOWMAP32BIT,<f15>,<f16>,<f18>>

and the same result with mitigations disabled:

[253355] [HBSD INTERNAL] zdb (jid 0, uid 0) exited on signal 6 (no core dump - bad address)
[253355]  -> pid: 81585 ppid: 94045 p_pax: 0x58aaa<NOPAGEEXEC,NOMPROTECT,NOSEGVGUARD,NOASLR,NOSHLIBRANDOM,NODISALLOWMAP32BIT,<f15>,<f16>,<f18>>

Any suggestions on:

  • What's Happening?
  • How to reliably run zdb?

Kindest regards,

Mike

Joe

unread,
Mar 9, 2026, 6:25:38 PM (6 days ago) Mar 9
to us...@hardenedbsd.org
On 3/8/26 16:07, mic...@cannel.la wrote:
> root@deathtongue:/var/log # zdb -s unclemeat/drives
> ASSERT at /usr/src/sys/contrib/openzfs/module/zcommon/zfs_fletcher.c:429:fletcher_4_impl_get()
> fletcher_4_initialized
>   PID: 2325      COMM: zdb
>   TID: 145535    NAME:
> Abort trap
> Any suggestions on:
>
> * What's Happening?
> * How to reliably run zdb?
>

I'm not sure what makes HBSD different, but the default fletcher4
checksum implementation expects 32-bit alignment, which is generally
also the case in openzfs, but there are some exceptions.

The resume token code for example expects to be able to use it on
variable-length data, which means the checksum won't cover the last 0-3
bytes (0 when the data is aligned to a 4 byte boundary). I do not
remember if it does aligned access, but that could(?) be related to what
you're seeing. I'm not aware of us having anything in place that would
trap there, though, but maybe someone else on the list knows more.

1) Are you using resume tokens?

2) Are you able to reproduce on a zpool you can share with us?

3) Can you get a core dump that you would feel comfortable sharing? If
not the whole thing, perhaps just a stack trace?

Joe

Reply all
Reply to author
Forward
0 new messages