SIGBUS when paging in new extent on ext4-DAX

24 views

Skip to first unread message

George Hodgkins

unread,

Mar 9, 2022, 2:46:23 PM3/9/22

to pmem

Hi all,

I've been running some benchmarks using PMEM-backed ext4-DAX files as heap memory, and I've encountered a non-deterministic error that I am unsure how to fix.

The benchmarks use a multi-threaded append-only PMEM allocator that works as follows: at thread creation, a PMEM file (the "heap file") is created on an ext4-DAX FS, given a large fixed size (~ 500 MB) with fallocate, and then the entire file is mapped shared read/write with MAP_SYNC. Each thread's allocations just increment a pointer into that thread's heap file mapping and return the previous value; memory is not released after allocation until the application exits. The benchmark itself runs various YCSB workloads on a concurrent hashmap; the error was observed for a 4-thread workload. The system is x86-64 (Cascade Lake), running Ubuntu 18.04/kernel 4.15. The PMEM is first-gen Optane.

The error is as follows: on about 20% of runs, SIGBUS will be delivered when a thread pages in the page corresponding to the beginning of the second extent in its heap file (checked using debugfs), which is at that time uninitialized. I have verified that the faulting address is within a valid mapping according to /proc/smaps, and that the access is aligned. There is enough space on the device for the files (they are usually the only files on the ~ 700 GB device when the benchmarks are running). Using FTrace, I was able to determine that these faults reach dax_iomap_pte_fault, and return out of there with (MAJOR | NEEDDSYNC) set for write faults and (NOPAGE) for read faults, but I was not able to isolate the cause beyond that.

I'd appreciate pointers to explanations/potential solutions (or kernel patches if this is an issue that's been fixed since 4.15), or any ideas for how to further isolate the cause. I plan to update this thread by the end of the week, once I test whether the error reproduces with different workload config (single thread, smaller working set size, etc).

Thanks,

George

ppbb...@gmail.com

unread,

Mar 18, 2022, 6:46:35 AM3/18/22

to pmem

Hi George,

This sounds like a kernel bug, and given that you are using an Ubuntu LTS version I recommend filing an issue on their bug tracker (https://bugs.launchpad.net/ubuntu). You might also consider asking on #pmem on OFTC if someone there seen a similar issue.

As a temporary workaround, you can also force page allocation at startup (https://github.com/pmem/pmdk/blob/master/src/common/set.c#L349).