My testing has turned up another case where we can end up doing both page
cache I/O and DAX I/O to the same raw block device. Here is the stack trace
of the error, passed through kasan_symbolize.py:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 598 at mm/filemap.c:590 __add_to_page_cache_locked+0x30d/0x3c0()
Modules linked in: nd_pmem nd_btt nd_e820 libnvdimm
CPU: 1 PID: 598 Comm: systemd-udevd Tainted: G W 4.5.0-rc1+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
0000000000000000 0000000001e02977 ffff88040ff9fa88 ffffffff81576db2
0000000000000000 ffff88040ff9fac0 ffffffff810a65a2 ffff8804103d9020
0000000000000000 ffff8804103d9008 00000000ffffffea ffffea0002ebf080
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff81576db2>] dump_stack+0x44/0x62 lib/dump_stack.c:50
[<ffffffff810a65a2>] warn_slowpath_common+0x82/0xc0 kernel/panic.c:482
[<ffffffff810a66ea>] warn_slowpath_null+0x1a/0x20 kernel/panic.c:515
[< inline >] page_cache_tree_insert mm/filemap.c:590
[<ffffffff811ca32d>] __add_to_page_cache_locked+0x30d/0x3c0 mm/filemap.c:649
[<ffffffff811ca449>] add_to_page_cache_lru+0x49/0xd0 mm/filemap.c:697
[< inline >] __read_cache_page mm/filemap.c:2300
[<ffffffff811ca87b>] do_read_cache_page+0x6b/0x300 mm/filemap.c:2330
[<ffffffff811cab2c>] read_cache_page+0x1c/0x20 mm/filemap.c:2377
[< inline >] read_mapping_page include/linux/pagemap.h:391
[<ffffffff81558d74>] read_dev_sector+0x34/0xf0 block/partition-generic.c:558
[< inline >] read_part_sector block/partitions/check.h:37
[<ffffffff81560c4e>] read_lba+0x18e/0x290 block/partitions/efi.c:264
[< inline >] find_valid_gpt block/partitions/efi.c:610
[<ffffffff81561542>] efi_partition+0xf2/0x7d0 block/partitions/efi.c:692
[<ffffffff8155b47e>] check_partition+0x13e/0x220 block/partitions/check.c:166
[<ffffffff81559670>] rescan_partitions+0xc0/0x2b0 block/partition-generic.c:434
[<ffffffff81553aeb>] __blkdev_reread_part+0x6b/0xa0 block/ioctl.c:171
[<ffffffff81553b45>] blkdev_reread_part+0x25/0x40 block/ioctl.c:191
[<ffffffff815547b9>] blkdev_ioctl+0x5a9/0xab0 block/ioctl.c:624
[<ffffffff812a2303>] block_ioctl+0x43/0x50 fs/block_dev.c:1624
[< inline >] vfs_ioctl fs/ioctl.c:43
[<ffffffff81274452>] do_vfs_ioctl+0xa2/0x6a0 fs/ioctl.c:674
[< inline >] SYSC_ioctl fs/ioctl.c:689
[<ffffffff81274ac9>] SyS_ioctl+0x79/0x90 fs/ioctl.c:680
[<ffffffff81a6a732>] entry_SYSCALL_64_fastpath+0x12/0x76 arch/x86/entry/entry_64.S:185
---[ end trace 96171b39eee8b31b ]---
Here is my minimal reproducer:
#define _GNU_SOURCE
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <linux/falloc.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#define PAGE(a) ((a)*0x1000)
int main(int argc, char *argv[])
{
int i, fd;
char *data_array = (char*) 0x10200000; /* request a 2MiB aligned address with mmap() */
int a;
fd = open("/dev/pmem0", O_RDWR);
if (fd < 0) {
perror("fd");
return 1;
}
data_array = mmap(data_array, PAGE(0x300), PROT_READ|PROT_WRITE,
MAP_SHARED, fd, PAGE(0));
data_array[PAGE(0x0)] = 1;
close(fd);
return 0;
}
I believe what is happening is that I am doing a DAX PMD fault, and that fault
is happening at the same time that an IOCTL is doing a scan for partition
tables. The above stack trace is for a scan of an efi_partition(), but I get a
similar stack trace going through sgi_partition(), ldm_partition(),
msdos_partition(), etc. This looping is happening in check_partition(), as it
loops through the various partition types.
I think the issue essentially is that the partition scanning code doesn't have
a direct I/O code path, at least from what I have found. This means that it's
not calling into DAX, and instead is just using the page cache.