Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH v5 0/5] MAP_DIRECT and block-map-atomic files

60 views
Skip to first unread message

Dan Williams

unread,
Aug 16, 2017, 4:00:06 AM8/16/17
to
Changes since v4 [1]:
* Drop the new vma ->fs_flags field, it can be replaced by just checking
->vm_ops locally in the filesystem. This approach also allows
non-MAP_DIRECT vmas to be vma_merge() capable since vmas with
vm_ops->close() disable vma merging. (Jan)

* Drop the new ->fmmap() operation, instead convert all ->mmap()
implementations tree-wide to take an extra 'map_flags' parameter.
(Jan)

* Drop the cute (MAP_SHARED|MAP_PRIVATE) hack/mechanism to add new
validated flags mmap(2) and instead just define a new mmap syscall
variant (sys_mmap_pgoff_strict). (Andy)

* Fix the fact that MAP_PRIVATE|MAP_DIRECT would silently fallback to
MAP_SHARED (addressed by the new syscall). (Kirill)

* Require CAP_LINUX_IMMUTABLE for MAP_DIRECT to close any unforeseen
denial of service for unmanaged + unprivileged MAP_DIRECT usage.
(Kirill)

* Switch MAP_DIRECT fault failures to SIGBUS (Kirill)

* Add an fcntl mechanism to allow an unprivileged process to use
MAP_DIRECT on an fd setup by a privileged process.

* Rework the MAP_DIRECT description to allow for future hardware where
it may not be required to software-pin the file offset to physical
address relationship.

Given the tree-wide touches in this revision the patchset is starting to
feel more like -mm material than strictly xfs.

[1]: https://lkml.org/lkml/2017/8/15/39

---

This is the next revision of a patch series that aims to enable
applications that otherwise need to resort to DAX mapping a raw device
file to instead move to a filesystem.

In the course of reviewing a previous posting, Christoph said:

That being said I think we absolutely should support RDMA memory
registrations for DAX mappings. I'm just not sure how S_IOMAP_IMMUTABLE
helps with that. We'll want a MAP_SYNC | MAP_POPULATE to make sure all
the blocks are populated and all ptes are set up. Second we need to
make sure get_user_page works, which for now means we'll need a struct
page mapping for the region (which will be really annoying for PCIe
mappings, like the upcoming NVMe persistent memory region), and we need
to guarantee that the extent mapping won't change while the
get_user_pages holds the pages inside it. I think that is true due to
side effects even with the current DAX code, but we'll need to make it
explicit. And maybe that's where we need to converge - "sealing" the
extent map makes sense as such a temporary measure that is not persisted
on disk, which automatically gets released when the holding process
exits, because we sort of already do this implicitly. It might also
make sense to have explicitly breakable seals similar to what I do for
the pNFS blocks kernel server, as any userspace RDMA file server would
also need those semantics.

So, this is an attempt to converge on the idea that we need an explicit
and process-lifetime-temporary mechanism for a process to be able to
make assumptions about the mapping to physical page to dax-file-offset
relationship. The "explicitly breakable seals" aspect is not addressed
in these patches, but I wonder if it might be a voluntary mechanism that
can implemented via userfaultfd.

---

Dan Williams (5):
vfs: add flags parameter to ->mmap() in 'struct file_operations'
fs, xfs: introduce S_IOMAP_SEALED
mm: introduce mmap3 for safely defining new mmap flags
fs, xfs: introduce MAP_DIRECT for creating block-map-atomic file ranges
fs, fcntl: add F_MAP_DIRECT


Diffstat without patch1:

arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/attr.c | 10 +++
fs/fcntl.c | 15 +++++
fs/open.c | 6 ++
fs/read_write.c | 3 +
fs/xfs/libxfs/xfs_bmap.c | 5 ++
fs/xfs/xfs_bmap_util.c | 3 +
fs/xfs/xfs_file.c | 115 +++++++++++++++++++++++++++++++--
fs/xfs/xfs_inode.h | 1 +
fs/xfs/xfs_ioctl.c | 6 ++
fs/xfs/xfs_super.c | 1 +
include/linux/fs.h | 10 ++-
include/linux/mm.h | 2 +-
include/linux/mman.h | 25 +++++++
include/linux/syscalls.h | 3 +
include/uapi/asm-generic/mman.h | 1 +
include/uapi/linux/fcntl.h | 5 ++
mm/filemap.c | 5 ++
mm/mmap.c | 56 +++++++++++++++-
20 files changed, 263 insertions(+), 11 deletions(-)

Diffstat with patch1:

arch/arc/kernel/arc_hostlink.c | 3 -
arch/powerpc/kernel/proc_powerpc.c | 3 -
arch/powerpc/kvm/book3s_64_vio.c | 3 -
arch/powerpc/platforms/cell/spufs/file.c | 21 +++-
arch/powerpc/platforms/powernv/opal-prd.c | 3 -
arch/um/drivers/mmapper_kern.c | 3 -
arch/x86/entry/syscalls/syscall_32.tbl | 1
arch/x86/entry/syscalls/syscall_64.tbl | 1
drivers/android/binder.c | 3 -
drivers/char/agp/frontend.c | 3 -
drivers/char/bsr.c | 3 -
drivers/char/hpet.c | 6 +
drivers/char/mbcs.c | 3 -
drivers/char/mem.c | 11 +-
drivers/char/mspec.c | 9 +-
drivers/char/uv_mmtimer.c | 6 +
drivers/dax/device.c | 3 -
drivers/dma-buf/dma-buf.c | 4 +
drivers/firewire/core-cdev.c | 3 -
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +
drivers/gpu/drm/arc/arcpgu_drv.c | 5 +
drivers/gpu/drm/ast/ast_drv.h | 3 -
drivers/gpu/drm/ast/ast_ttm.c | 3 -
drivers/gpu/drm/drm_gem.c | 3 -
drivers/gpu/drm/drm_gem_cma_helper.c | 2
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 2
drivers/gpu/drm/exynos/exynos_drm_gem.c | 2
drivers/gpu/drm/i810/i810_dma.c | 3 -
drivers/gpu/drm/i915/i915_gem_dmabuf.c | 2
drivers/gpu/drm/mediatek/mtk_drm_gem.c | 2
drivers/gpu/drm/mgag200/mgag200_drv.h | 3 -
drivers/gpu/drm/mgag200/mgag200_ttm.c | 3 -
drivers/gpu/drm/msm/msm_gem.c | 2
drivers/gpu/drm/omapdrm/omap_gem.c | 2
drivers/gpu/drm/radeon/radeon_drv.c | 3 -
drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 2
drivers/gpu/drm/tegra/gem.c | 2
drivers/gpu/drm/udl/udl_gem.c | 2
drivers/gpu/drm/vc4/vc4_bo.c | 2
drivers/gpu/drm/vgem/vgem_drv.c | 7 +
drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 3 -
drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 3 -
drivers/hsi/clients/cmt_speech.c | 3 -
drivers/hwtracing/intel_th/msu.c | 3 -
drivers/hwtracing/stm/core.c | 3 -
drivers/infiniband/core/uverbs_main.c | 3 -
drivers/infiniband/hw/hfi1/file_ops.c | 6 +
drivers/infiniband/hw/qib/qib_file_ops.c | 5 +
drivers/media/v4l2-core/v4l2-dev.c | 3 -
drivers/misc/aspeed-lpc-ctrl.c | 3 -
drivers/misc/cxl/file.c | 3 -
drivers/misc/genwqe/card_dev.c | 3 -
drivers/misc/mic/scif/scif_fd.c | 3 -
drivers/misc/mic/vop/vop_vringh.c | 3 -
drivers/misc/sgi-gru/grufile.c | 3 -
drivers/mtd/mtdchar.c | 3 -
drivers/pci/proc.c | 3 -
drivers/rapidio/devices/rio_mport_cdev.c | 3 -
drivers/sbus/char/flash.c | 3 -
drivers/sbus/char/jsflash.c | 3 -
drivers/scsi/cxlflash/superpipe.c | 3 -
drivers/scsi/sg.c | 3 -
drivers/staging/android/ashmem.c | 3 -
drivers/staging/comedi/comedi_fops.c | 3 -
drivers/staging/lustre/lustre/llite/llite_mmap.c | 2
drivers/staging/vme/devices/vme_user.c | 3 -
drivers/uio/uio.c | 3 -
drivers/usb/core/devio.c | 3 -
drivers/usb/mon/mon_bin.c | 3 -
drivers/vfio/vfio.c | 7 +
drivers/video/fbdev/core/fbmem.c | 3 -
drivers/video/fbdev/pxa3xx-gcu.c | 3 -
drivers/xen/gntalloc.c | 3 -
drivers/xen/gntdev.c | 3 -
drivers/xen/privcmd.c | 3 -
drivers/xen/xenbus/xenbus_dev_backend.c | 3 -
drivers/xen/xenfs/xenstored.c | 3 -
fs/9p/vfs_file.c | 10 +-
fs/aio.c | 3 -
fs/attr.c | 10 ++
fs/btrfs/file.c | 3 -
fs/cifs/file.c | 4 -
fs/coda/file.c | 5 +
fs/ecryptfs/file.c | 5 +
fs/ext2/file.c | 5 +
fs/ext4/file.c | 3 -
fs/f2fs/file.c | 3 -
fs/fcntl.c | 15 +++
fs/fuse/file.c | 8 +-
fs/gfs2/file.c | 3 -
fs/hugetlbfs/inode.c | 3 -
fs/kernfs/file.c | 3 -
fs/nfs/file.c | 5 +
fs/nfs/internal.h | 2
fs/nilfs2/file.c | 3 -
fs/open.c | 6 +
fs/orangefs/file.c | 5 +
fs/proc/inode.c | 7 +
fs/proc/vmcore.c | 6 +
fs/ramfs/file-nommu.c | 6 +
fs/read_write.c | 3 +
fs/romfs/mmap-nommu.c | 3 -
fs/ubifs/file.c | 5 +
fs/xfs/libxfs/xfs_bmap.c | 5 +
fs/xfs/xfs_bmap_util.c | 3 +
fs/xfs/xfs_file.c | 114 +++++++++++++++++++++-
fs/xfs/xfs_inode.h | 1
fs/xfs/xfs_ioctl.c | 6 +
fs/xfs/xfs_super.c | 1
include/drm/drm_gem.h | 3 -
include/linux/fs.h | 21 +++-
include/linux/mm.h | 2
include/linux/mman.h | 25 +++++
include/linux/syscalls.h | 3 +
include/uapi/asm-generic/mman.h | 1
include/uapi/linux/fcntl.h | 5 +
ipc/shm.c | 5 +
kernel/events/core.c | 3 -
kernel/kcov.c | 3 -
kernel/relay.c | 3 -
mm/filemap.c | 19 +++-
mm/mmap.c | 56 ++++++++++-
mm/nommu.c | 4 -
mm/shmem.c | 3 -
net/socket.c | 6 +
security/selinux/selinuxfs.c | 6 +
sound/core/compress_offload.c | 3 -
sound/core/hwdep.c | 3 -
sound/core/info.c | 3 -
sound/core/init.c | 3 -
sound/core/oss/pcm_oss.c | 3 -
sound/oss/soundcard.c | 3 -
sound/oss/swarm_cs4297a.c | 3 -
virt/kvm/kvm_main.c | 3 -
134 files changed, 553 insertions(+), 174 deletions(-)

Dan Williams

unread,
Aug 16, 2017, 4:00:08 AM8/16/17
to
MAP_DIRECT is an mmap(2) flag with the following semantics:

MAP_DIRECT
When specified with MAP_SHARED a successful fault in this range
indicates that the kernel is maintaining the block map (user linear
address to file offset to physical address relationship) in a manner
that no external agent can observe any inconsistent changes. In other
words, the block map of the mapping is effectively pinned, or the kernel
is otherwise able to exchange a new physical extent atomically with
respect to any hardware / software agent. As implied by this definition
a successful fault in a MAP_DIRECT range bypasses kernel indirections
like the page-cache, and all updates are carried directly through to the
underlying file physical blocks (modulo cpu cache effects).

ETXTBSY may be returned to any third party operation on the file that
attempts to update the block map (allocate blocks / convert unwritten
extents / break shared extents). However, whether a filesystem returns
EXTBSY for a certain state of the block relative to a MAP_DIRECT mapping
is filesystem and kernel version dependent.

Some filesystems may extend these operation restrictions outside the
mapped range and return ETXTBSY to any file operations that might mutate
the block map. MAP_DIRECT faults may fail with a SIGBUS if the
filesystem needs to write the block map to satisfy the fault. For
example, if the mapping was established over a hole in a sparse file.

ERRORS
EACCES A MAP_DIRECT mapping was requested and PROT_WRITE was not set,
or the requesting process is missing CAP_LINUX_IMMUTABLE.

EINVAL MAP_ANONYMOUS or MAP_PRIVATE was specified with MAP_DIRECT.

EOPNOTSUPP The filesystem explicitly does not support the flag

SIGBUS Attempted to write a MAP_DIRECT mapping at a file offset that
might require block-map updates.

Cc: Jan Kara <ja...@suse.cz>
Cc: Jeff Moyer <jmo...@redhat.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Dave Chinner <da...@fromorbit.com>
Cc: Alexander Viro <vi...@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darric...@oracle.com>
Cc: Ross Zwisler <ross.z...@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.w...@intel.com>
---
fs/xfs/xfs_file.c | 115 ++++++++++++++++++++++++++++++++++++++-
fs/xfs/xfs_inode.h | 1
fs/xfs/xfs_super.c | 1
include/linux/mman.h | 13 +---
include/uapi/asm-generic/mman.h | 1
mm/mmap.c | 23 ++++++++
6 files changed, 139 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index cacc0162a41a..9e21ae3777dd 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -40,6 +40,7 @@
#include "xfs_iomap.h"
#include "xfs_reflink.h"

+#include <linux/mman.h>
#include <linux/dcache.h>
#include <linux/falloc.h>
#include <linux/pagevec.h>
@@ -1001,6 +1002,25 @@ xfs_file_llseek(
return vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
}

+static const struct vm_operations_struct xfs_file_vm_direct_ops;
+
+STATIC int
+xfs_vma_checks(
+ struct vm_area_struct *vma,
+ struct inode *inode)
+{
+ if (vma->vm_ops != &xfs_file_vm_direct_ops)
+ return 0;
+
+ if (xfs_is_reflink_inode(XFS_I(inode)))
+ return VM_FAULT_SIGBUS;
+
+ if (!IS_DAX(inode))
+ return VM_FAULT_SIGBUS;
+
+ return 0;
+}
+
/*
* Locking for serialisation of IO during page faults. This results in a lock
* ordering of:
@@ -1031,6 +1051,10 @@ xfs_filemap_page_mkwrite(
file_update_time(vmf->vma->vm_file);
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);

+ ret = xfs_vma_checks(vmf->vma, inode);
+ if (ret)
+ goto out_unlock;
+
if (IS_DAX(inode)) {
ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
} else {
@@ -1038,6 +1062,7 @@ xfs_filemap_page_mkwrite(
ret = block_page_mkwrite_return(ret);
}

+out_unlock:
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
sb_end_pagefault(inode->i_sb);

@@ -1058,10 +1083,15 @@ xfs_filemap_fault(
return xfs_filemap_page_mkwrite(vmf);

xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+ ret = xfs_vma_checks(vmf->vma, inode);
+ if (ret)
+ goto out_unlock;
+
if (IS_DAX(inode))
ret = dax_iomap_fault(vmf, PE_SIZE_PTE, &xfs_iomap_ops);
else
ret = filemap_fault(vmf);
+out_unlock:
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);

return ret;
@@ -1094,7 +1124,9 @@ xfs_filemap_huge_fault(
}

xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
- ret = dax_iomap_fault(vmf, pe_size, &xfs_iomap_ops);
+ ret = xfs_vma_checks(vmf->vma, inode);
+ if (ret == 0)
+ ret = dax_iomap_fault(vmf, pe_size, &xfs_iomap_ops);
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);

if (vmf->flags & FAULT_FLAG_WRITE)
@@ -1137,6 +1169,61 @@ xfs_filemap_pfn_mkwrite(

}

+STATIC void
+xfs_filemap_direct_open(
+ struct vm_area_struct *vma)
+{
+ struct file *filp = vma->vm_file;
+ struct inode *inode = file_inode(filp);
+ struct xfs_inode *ip = XFS_I(inode);
+
+ atomic_inc(&ip->i_mapdcount);
+}
+
+STATIC int
+atomic_dec_and_xfs_ilock(
+ atomic_t *atomic,
+ struct xfs_inode *ip,
+ uint lock_flags)
+{
+ /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
+ if (atomic_add_unless(atomic, -1, 1))
+ return 0;
+
+ /* Otherwise do it the slow way */
+ xfs_ilock(ip, lock_flags);
+ if (atomic_dec_and_test(atomic))
+ return 1;
+ xfs_iunlock(ip, lock_flags);
+ return 0;
+}
+
+STATIC void
+xfs_filemap_direct_close(
+ struct vm_area_struct *vma)
+{
+ struct file *filp = vma->vm_file;
+ struct inode *inode = file_inode(filp);
+ struct xfs_inode *ip = XFS_I(inode);
+
+ if (!atomic_dec_and_xfs_ilock(&ip->i_mapdcount, ip,
+ XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL))
+ return;
+ inode->i_flags &= ~S_IOMAP_SEALED;
+ xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
+}
+
+static const struct vm_operations_struct xfs_file_vm_direct_ops = {
+ .fault = xfs_filemap_fault,
+ .huge_fault = xfs_filemap_huge_fault,
+ .map_pages = filemap_map_pages,
+ .page_mkwrite = xfs_filemap_page_mkwrite,
+ .pfn_mkwrite = xfs_filemap_pfn_mkwrite,
+
+ .open = xfs_filemap_direct_open,
+ .close = xfs_filemap_direct_close,
+};
+
static const struct vm_operations_struct xfs_file_vm_ops = {
.fault = xfs_filemap_fault,
.huge_fault = xfs_filemap_huge_fault,
@@ -1145,14 +1232,33 @@ static const struct vm_operations_struct xfs_file_vm_ops = {
.pfn_mkwrite = xfs_filemap_pfn_mkwrite,
};

+#define XFS_MAP_SUPPORTED (LEGACY_MAP_SUPPORTED_MASK | MAP_DIRECT)
+
STATIC int
-xfs_file_mmap(struct file *filp, struct vm_area_struct *vma,
- unsigned long map_flags)
+xfs_file_mmap(
+ struct file *filp,
+ struct vm_area_struct *vma,
+ unsigned long map_flags)
{
+ struct inode *inode = file_inode(filp);
+ struct xfs_inode *ip = XFS_I(inode);
+
+ if (map_flags & ~(XFS_MAP_SUPPORTED))
+ return -EOPNOTSUPP;
+
file_accessed(filp);
- vma->vm_ops = &xfs_file_vm_ops;
if (IS_DAX(file_inode(filp)))
vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
+
+ xfs_ilock(ip, XFS_MMAPLOCK_EXCL|XFS_IOLOCK_EXCL);
+ if (map_flags & MAP_DIRECT) {
+ vma->vm_ops = &xfs_file_vm_direct_ops;
+ inode->i_flags |= S_IOMAP_SEALED;
+ atomic_inc(&ip->i_mapdcount);
+ } else
+ vma->vm_ops = &xfs_file_vm_ops;
+ xfs_iunlock(ip, XFS_MMAPLOCK_EXCL|XFS_IOLOCK_EXCL);
+
return 0;
}

@@ -1174,6 +1280,7 @@ const struct file_operations xfs_file_operations = {
.fallocate = xfs_file_fallocate,
.clone_file_range = xfs_file_clone_range,
.dedupe_file_range = xfs_file_dedupe_range,
+ .mmap_supported_mask = XFS_MAP_SUPPORTED,
};

const struct file_operations xfs_dir_file_operations = {
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 0ee453de239a..50d3e1bca1a9 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -58,6 +58,7 @@ typedef struct xfs_inode {
mrlock_t i_lock; /* inode lock */
mrlock_t i_mmaplock; /* inode mmap IO lock */
atomic_t i_pincount; /* inode pin count */
+ atomic_t i_mapdcount; /* inode MAP_DIRECT count */
spinlock_t i_flags_lock; /* inode i_flags lock */
/* Miscellaneous state. */
unsigned long i_flags; /* see defined flags below */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 664db709cd1a..2604568354db 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1011,6 +1011,7 @@ xfs_fs_inode_init_once(

/* xfs inode */
atomic_set(&ip->i_pincount, 0);
+ atomic_set(&ip->i_mapdcount, 0);
spin_lock_init(&ip->i_flags_lock);

mrlock_init(&ip->i_mmaplock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER,
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 0e1de42c836f..7c9e3d11027f 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -7,16 +7,6 @@
#include <linux/atomic.h>
#include <uapi/linux/mman.h>

-#ifndef MAP_32BIT
-#define MAP_32BIT 0
-#endif
-#ifndef MAP_HUGE_2MB
-#define MAP_HUGE_2MB 0
-#endif
-#ifndef MAP_HUGE_1GB
-#define MAP_HUGE_1GB 0
-#endif
-
/*
* The historical set of flags that all mmap implementations implicitly
* support when file_operations.mmap_supported_mask is zero.
@@ -39,7 +29,8 @@
| MAP_HUGE_2MB \
| MAP_HUGE_1GB)

-#define MAP_SUPPORTED_MASK (LEGACY_MAP_SUPPORTED_MASK)
+#define MAP_SUPPORTED_MASK (LEGACY_MAP_SUPPORTED_MASK \
+ | MAP_DIRECT)

extern int sysctl_overcommit_memory;
extern int sysctl_overcommit_ratio;
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index 7162cd4cca73..1e7dda3bc56a 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -12,6 +12,7 @@
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
+#define MAP_DIRECT 0x80000 /* shared, sealed, and no page cache */

/* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */

diff --git a/mm/mmap.c b/mm/mmap.c
index 386706831d67..32417b2a668c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1393,6 +1393,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
return -EACCES;

/*
+ * Require write access and the immutable
+ * capability for MAP_DIRECT mappings
+ */
+ if (flags & MAP_DIRECT) {
+ if (!(prot & PROT_WRITE))
+ return -EACCES;
+ if (!capable(CAP_LINUX_IMMUTABLE))
+ return -EACCES;
+ }
+
+ /*
* Make sure we don't allow writing to an append-only
* file..
*/
@@ -1411,6 +1422,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,

/* fall through */
case MAP_PRIVATE:
+ if ((flags & (MAP_PRIVATE|MAP_DIRECT))
+ == (MAP_PRIVATE|MAP_DIRECT))
+ return -EINVAL;
if (!(file->f_mode & FMODE_READ))
return -EACCES;
if (path_noexec(&file->f_path)) {
@@ -1448,6 +1462,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
default:
return -EINVAL;
}
+
+ if (flags & MAP_DIRECT)
+ return -EINVAL;
}

/*
@@ -1525,6 +1542,12 @@ SYSCALL_DEFINE6(mmap_pgoff_strict, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, pgoff)
{
+ /*
+ * since mmap flag definitions are spread over several files,
+ * sanity check new definitions here.
+ */
+ BUILD_BUG_ON((MAP_DIRECT & ~LEGACY_MAP_SUPPORTED_MASK) != MAP_DIRECT);
+
if (flags & ~(MAP_SUPPORTED_MASK))
return -EOPNOTSUPP;

Dan Williams

unread,
Aug 16, 2017, 4:00:08 AM8/16/17
to
When a filesystem sees this flag set it will not allow changes to the
file-offset to physical-block-offset relationship of any extent in the
file. The extent of the extents covered by the global S_IOMAP_SEALED is
filesystem specific. In other words it is similar to the inode-wide
XFS_DIFLAG2_REFLINK flag where we make the distinction apply globally to
the inode even though we could theoretically limit that effect to a
sub-range of the file.

The interface that sets this flag (mmap(..., MAP_DIRECT, ...)) will be
careful to document that it is implementation specific whether the
'sealed' restrictions apply to a sub-range or the whole file.
Applications should be prepared for unrelated ranges in the file to be
effected.

The term 'sealed' is used instead of 'immutable' to better indicate that
this is a file property that is temporary and can be undone.

Cc: Jan Kara <ja...@suse.cz>
Cc: Jeff Moyer <jmo...@redhat.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Dave Chinner <da...@fromorbit.com>
Cc: Alexander Viro <vi...@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darric...@oracle.com>
Cc: Ross Zwisler <ross.z...@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.w...@intel.com>
---
fs/attr.c | 10 ++++++++++
fs/open.c | 6 ++++++
fs/read_write.c | 3 +++
fs/xfs/libxfs/xfs_bmap.c | 5 +++++
fs/xfs/xfs_bmap_util.c | 3 +++
fs/xfs/xfs_ioctl.c | 6 ++++++
include/linux/fs.h | 2 ++
mm/filemap.c | 5 +++++
8 files changed, 40 insertions(+)

diff --git a/fs/attr.c b/fs/attr.c
index 135304146120..d940386e0ca9 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -112,6 +112,16 @@ EXPORT_SYMBOL(setattr_prepare);
*/
int inode_newsize_ok(const struct inode *inode, loff_t offset)
{
+ if (IS_IOMAP_SEALED(inode)) {
+ /*
+ * Any size change is disallowed. Size increases may
+ * dirty metadata that an application is not prepared to
+ * sync, and a size decrease may expose free blocks to
+ * in-flight DMA.
+ */
+ return -ETXTBSY;
+ }
+
if (inode->i_size < offset) {
unsigned long limit;

diff --git a/fs/open.c b/fs/open.c
index 35bb784763a4..92d89ec2d6b3 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -292,6 +292,12 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
return -ETXTBSY;

/*
+ * We cannot allow any allocation changes on an iomap sealed file
+ */
+ if (IS_IOMAP_SEALED(inode))
+ return -ETXTBSY;
+
+ /*
* Revalidate the write permissions, in case security policy has
* changed since the files were opened.
*/
diff --git a/fs/read_write.c b/fs/read_write.c
index 0cc7033aa413..55700ca85f7e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1706,6 +1706,9 @@ int vfs_clone_file_prep_inodes(struct inode *inode_in, loff_t pos_in,
if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out))
return -ETXTBSY;

+ if (IS_IOMAP_SEALED(inode_in) || IS_IOMAP_SEALED(inode_out))
+ return -ETXTBSY;
+
/* Don't reflink dirs, pipes, sockets... */
if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
return -EISDIR;
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index a2d64666cdd4..84d8ee9f414c 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4481,6 +4481,11 @@ xfs_bmapi_write(
if (XFS_FORCED_SHUTDOWN(mp))
return -EIO;

+ /* fail any attempts to mutate data extents */
+ if (IS_IOMAP_SEALED(VFS_I(ip))
+ && !(flags & (XFS_BMAPI_METADATA | XFS_BMAPI_ATTRFORK)))
+ return -ETXTBSY;
+
ifp = XFS_IFORK_PTR(ip, whichfork);

XFS_STATS_INC(mp, xs_blk_mapw);
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 93e955262d07..ef4c4e8b0f58 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1294,6 +1294,9 @@ xfs_free_file_space(

trace_xfs_free_file_space(ip);

+ if (IS_IOMAP_SEALED(VFS_I(ip)))
+ return -ETXTBSY;
+
error = xfs_qm_dqattach(ip, 0);
if (error)
return error;
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index e75c40a47b7d..b716d184ae9a 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1755,6 +1755,12 @@ xfs_ioc_swapext(
goto out_put_tmp_file;
}

+ if (IS_IOMAP_SEALED(file_inode(f.file)) ||
+ IS_IOMAP_SEALED(file_inode(tmp.file))) {
+ error = -EINVAL;
+ goto out_put_tmp_file;
+ }
+
/*
* We need to ensure that the fds passed in point to XFS inodes
* before we cast and access them as XFS structures as we have no
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4c6d0d9db8e3..405976022752 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1830,6 +1830,7 @@ struct super_operations {
#else
#define S_DAX 0 /* Make all the DAX code disappear */
#endif
+#define S_IOMAP_SEALED 16384 /* logical-to-physical extent map is fixed */

/*
* Note that nosuid etc flags are inode-specific: setting some file-system
@@ -1868,6 +1869,7 @@ struct super_operations {
#define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
#define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)
#define IS_DAX(inode) ((inode)->i_flags & S_DAX)
+#define IS_IOMAP_SEALED(inode) ((inode)->i_flags & S_IOMAP_SEALED)

#define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \
(inode)->i_rdev == WHITEOUT_DEV)
diff --git a/mm/filemap.c b/mm/filemap.c
index 2457e34d10e0..4cbcf9d589fa 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2810,6 +2810,11 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
if (unlikely(pos >= inode->i_sb->s_maxbytes))
return -EFBIG;

+ /* Are we about to mutate the block map on a sealed file? */
+ if (IS_IOMAP_SEALED(inode)
+ && (pos + iov_iter_count(from) > i_size_read(inode)))
+ return -ETXTBSY;
+
iov_iter_truncate(from, inode->i_sb->s_maxbytes - pos);
return iov_iter_count(from);
}

Dan Williams

unread,
Aug 16, 2017, 4:00:08 AM8/16/17
to
The mmap(2) syscall suffers from the ABI anti-pattern of not validating
unknown flags. However, proposals like MAP_SYNC and MAP_DIRECT need a
mechanism to define new behavior that is known to fail on older kernels
without the support. Define a new mmap3 syscall that checks for
unsupported flags at syscall entry and add a 'mmap_supported_mask' to
'struct file_operations' so generic code can validate the ->mmap()
handler knows about the specified flags. This also arranges for the
flags to be passed to the handler so it can do further local validation
if the requested behavior can be fulfilled.

Cc: Jan Kara <ja...@suse.cz>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Andrew Morton <ak...@linux-foundation.org>
Suggested-by: Andy Lutomirski <lu...@kernel.org>
Signed-off-by: Dan Williams <dan.j.w...@intel.com>
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/fs.h | 3 ++-
include/linux/mm.h | 2 +-
include/linux/mman.h | 34 ++++++++++++++++++++++++++++++++
include/linux/syscalls.h | 3 +++
mm/mmap.c | 32 +++++++++++++++++++++++++++---
7 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..0618b5b38b45 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
382 i386 pkey_free sys_pkey_free
383 i386 statx sys_statx
384 i386 arch_prctl sys_arch_prctl compat_sys_arch_prctl
+385 i386 mmap3 sys_mmap_pgoff_strict
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..e204c736d7e9 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
330 common pkey_alloc sys_pkey_alloc
331 common pkey_free sys_pkey_free
332 common statx sys_statx
+333 common mmap3 sys_mmap_pgoff_strict

#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 405976022752..db42da9f98c4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1674,6 +1674,7 @@ struct file_operations {
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *, unsigned long);
+ unsigned long mmap_supported_mask;
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *, fl_owner_t id);
int (*release) (struct inode *, struct file *);
@@ -1746,7 +1747,7 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
static inline int call_mmap(struct file *file, struct vm_area_struct *vma,
unsigned long flags)
{
- return file->f_op->mmap(file, vma, 0);
+ return file->f_op->mmap(file, vma, flags);
}

ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..49eef48da4b7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2090,7 +2090,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo

extern unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
- struct list_head *uf);
+ struct list_head *uf, unsigned long flags);
extern unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate,
diff --git a/include/linux/mman.h b/include/linux/mman.h
index c8367041fafd..0e1de42c836f 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -7,6 +7,40 @@
#include <linux/atomic.h>
#include <uapi/linux/mman.h>

+#ifndef MAP_32BIT
+#define MAP_32BIT 0
+#endif
+#ifndef MAP_HUGE_2MB
+#define MAP_HUGE_2MB 0
+#endif
+#ifndef MAP_HUGE_1GB
+#define MAP_HUGE_1GB 0
+#endif
+
+/*
+ * The historical set of flags that all mmap implementations implicitly
+ * support when file_operations.mmap_supported_mask is zero.
+ */
+#define LEGACY_MAP_SUPPORTED_MASK (MAP_SHARED \
+ | MAP_PRIVATE \
+ | MAP_FIXED \
+ | MAP_ANONYMOUS \
+ | MAP_UNINITIALIZED \
+ | MAP_GROWSDOWN \
+ | MAP_DENYWRITE \
+ | MAP_EXECUTABLE \
+ | MAP_LOCKED \
+ | MAP_NORESERVE \
+ | MAP_POPULATE \
+ | MAP_NONBLOCK \
+ | MAP_STACK \
+ | MAP_HUGETLB \
+ | MAP_32BIT \
+ | MAP_HUGE_2MB \
+ | MAP_HUGE_1GB)
+
+#define MAP_SUPPORTED_MASK (LEGACY_MAP_SUPPORTED_MASK)
+
extern int sysctl_overcommit_memory;
extern int sysctl_overcommit_ratio;
extern unsigned long sysctl_overcommit_kbytes;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 3cb15ea48aee..c0e0c99cf4ad 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -858,6 +858,9 @@ asmlinkage long sys_perf_event_open(
asmlinkage long sys_mmap_pgoff(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
+asmlinkage long sys_mmap_pgoff_strict(unsigned long addr, unsigned long len,
+ unsigned long prot, unsigned long flags,
+ unsigned long fd, unsigned long pgoff);
asmlinkage long sys_old_mmap(struct mmap_arg_struct __user *arg);
asmlinkage long sys_name_to_handle_at(int dfd, const char __user *name,
struct file_handle __user *handle,
diff --git a/mm/mmap.c b/mm/mmap.c
index 744faae86781..386706831d67 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1464,7 +1464,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
vm_flags |= VM_NORESERVE;
}

- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
+ addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, flags);
if (!IS_ERR_VALUE(addr) &&
((vm_flags & VM_LOCKED) ||
(flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
@@ -1521,6 +1521,32 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
return retval;
}

+SYSCALL_DEFINE6(mmap_pgoff_strict, unsigned long, addr, unsigned long, len,
+ unsigned long, prot, unsigned long, flags,
+ unsigned long, fd, unsigned long, pgoff)
+{
+ if (flags & ~(MAP_SUPPORTED_MASK))
+ return -EOPNOTSUPP;
+
+ if (!(flags & MAP_ANONYMOUS)) {
+ unsigned long f_supported;
+ struct file *file;
+
+ audit_mmap_fd(fd, flags);
+ file = fget(fd);
+ if (!file)
+ return -EBADF;
+ f_supported = file->f_op->mmap_supported_mask;
+ fput(file);
+ if (!f_supported)
+ f_supported = LEGACY_MAP_SUPPORTED_MASK;
+ if (flags & ~f_supported)
+ return -EOPNOTSUPP;
+ }
+
+ return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
+}
+
#ifdef __ARCH_WANT_SYS_OLD_MMAP
struct mmap_arg_struct {
unsigned long addr;
@@ -1601,7 +1627,7 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)

unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
- struct list_head *uf)
+ struct list_head *uf, unsigned long flags)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
@@ -1686,7 +1712,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* new file must not have been exposed to user-space, yet.
*/
vma->vm_file = get_file(file);
- error = call_mmap(file, vma, 0);
+ error = call_mmap(file, vma, flags);
if (error)
goto unmap_and_free_vma;

Dan Williams

unread,
Aug 16, 2017, 4:00:09 AM8/16/17
to
There is no room for new vm_flags without requiring a 64-bit host
architecture. Also, the flags that are being proposed as additions for
new mmap behavior, like MAP_SYNC and MAP_DIRECT, only have relevance to
a filesystem, the core mm can mostly ignore them. So, we arrange for the
mmap(2) flags to be passed all the way through to ->mmap() so the leaf
implementation can handle them.

This conversion was performed by the following semantic patch, as well
as manual edits to handle the non-local / shared mmap handlers
(drm_gem_mmap, radeon_mmap, vmw_mmap, nfs_file_mmap, generic_file_mmap,
generic_file_readonly_mmap, ast_mmap, mgag200_mmap), and the oddity that
is proc_reg_mmap.

// mmap_flags.cocci: add an 'flags' argument to 'mmap' in 'struct file_operations'
// usage: make coccicheck COCCI=mmap_flags.cocci MODE=patch

@ a @
identifier fn;
identifier ops;
@@

struct file_operations ops = { ..., .mmap = fn, ...};

@@
identifier a.fn;
identifier x, y;
@@

int
- fn(struct file *x, struct vm_area_struct *y)
+ fn(struct file *x, struct vm_area_struct *y, unsigned long map_flags)
{
...
}

@@
expression E1, E2;
@@

- generic_file_mmap(E1, E2)
+ generic_file_mmap(E1, E2, 0)

@@
expression E1, E2;
@@

- generic_file_readonly_mmap(E1, E2)
+ generic_file_readonly_mmap(E1, E2, 0)

@@
expression E1, E2;
@@

- call_mmap(E1, E2)
+ call_mmap(E1, E2, 0)

@@
expression E1, E2;
@@

- drm_gem_mmap(E1, E2)
+ drm_gem_mmap(E1, E2, 0)

@@
expression E1, E2;
@@

- radeon_mmap(E1, E2)
+ radeon_mmap(E1, E2, 0)

@@
identifier a.fn;
expression E1, E2;
@@

- fn(E1, E2)
+ fn(E1, E2, 0)

Suggested-by: Jan Kara <ja...@suse.cz>
Cc: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.w...@intel.com>
---
arch/arc/kernel/arc_hostlink.c | 3 ++-
arch/powerpc/kernel/proc_powerpc.c | 3 ++-
arch/powerpc/kvm/book3s_64_vio.c | 3 ++-
arch/powerpc/platforms/cell/spufs/file.c | 21 ++++++++++++++-------
arch/powerpc/platforms/powernv/opal-prd.c | 3 ++-
arch/um/drivers/mmapper_kern.c | 3 ++-
drivers/android/binder.c | 3 ++-
drivers/char/agp/frontend.c | 3 ++-
drivers/char/bsr.c | 3 ++-
drivers/char/hpet.c | 6 ++++--
drivers/char/mbcs.c | 3 ++-
drivers/char/mem.c | 11 +++++++----
drivers/char/mspec.c | 9 ++++++---
drivers/char/uv_mmtimer.c | 6 ++++--
drivers/dax/device.c | 3 ++-
drivers/dma-buf/dma-buf.c | 4 +++-
drivers/firewire/core-cdev.c | 3 ++-
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +++--
drivers/gpu/drm/arc/arcpgu_drv.c | 5 +++--
drivers/gpu/drm/ast/ast_drv.h | 3 ++-
drivers/gpu/drm/ast/ast_ttm.c | 3 ++-
drivers/gpu/drm/drm_gem.c | 3 ++-
drivers/gpu/drm/drm_gem_cma_helper.c | 2 +-
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 2 +-
drivers/gpu/drm/exynos/exynos_drm_gem.c | 2 +-
drivers/gpu/drm/i810/i810_dma.c | 3 ++-
drivers/gpu/drm/i915/i915_gem_dmabuf.c | 2 +-
drivers/gpu/drm/mediatek/mtk_drm_gem.c | 2 +-
drivers/gpu/drm/mgag200/mgag200_drv.h | 3 ++-
drivers/gpu/drm/mgag200/mgag200_ttm.c | 3 ++-
drivers/gpu/drm/msm/msm_gem.c | 2 +-
drivers/gpu/drm/omapdrm/omap_gem.c | 2 +-
drivers/gpu/drm/radeon/radeon_drv.c | 3 ++-
drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 2 +-
drivers/gpu/drm/tegra/gem.c | 2 +-
drivers/gpu/drm/udl/udl_gem.c | 2 +-
drivers/gpu/drm/vc4/vc4_bo.c | 2 +-
drivers/gpu/drm/vgem/vgem_drv.c | 7 ++++---
drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 3 ++-
drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 3 ++-
drivers/hsi/clients/cmt_speech.c | 3 ++-
drivers/hwtracing/intel_th/msu.c | 3 ++-
drivers/hwtracing/stm/core.c | 3 ++-
drivers/infiniband/core/uverbs_main.c | 3 ++-
drivers/infiniband/hw/hfi1/file_ops.c | 6 ++++--
drivers/infiniband/hw/qib/qib_file_ops.c | 5 +++--
drivers/media/v4l2-core/v4l2-dev.c | 3 ++-
drivers/misc/aspeed-lpc-ctrl.c | 3 ++-
drivers/misc/cxl/file.c | 3 ++-
drivers/misc/genwqe/card_dev.c | 3 ++-
drivers/misc/mic/scif/scif_fd.c | 3 ++-
drivers/misc/mic/vop/vop_vringh.c | 3 ++-
drivers/misc/sgi-gru/grufile.c | 3 ++-
drivers/mtd/mtdchar.c | 3 ++-
drivers/pci/proc.c | 3 ++-
drivers/rapidio/devices/rio_mport_cdev.c | 3 ++-
drivers/sbus/char/flash.c | 3 ++-
drivers/sbus/char/jsflash.c | 3 ++-
drivers/scsi/cxlflash/superpipe.c | 3 ++-
drivers/scsi/sg.c | 3 ++-
drivers/staging/android/ashmem.c | 3 ++-
drivers/staging/comedi/comedi_fops.c | 3 ++-
drivers/staging/lustre/lustre/llite/llite_mmap.c | 2 +-
drivers/staging/vme/devices/vme_user.c | 3 ++-
drivers/uio/uio.c | 3 ++-
drivers/usb/core/devio.c | 3 ++-
drivers/usb/mon/mon_bin.c | 3 ++-
drivers/vfio/vfio.c | 7 +++++--
drivers/video/fbdev/core/fbmem.c | 3 ++-
drivers/video/fbdev/pxa3xx-gcu.c | 3 ++-
drivers/xen/gntalloc.c | 3 ++-
drivers/xen/gntdev.c | 3 ++-
drivers/xen/privcmd.c | 3 ++-
drivers/xen/xenbus/xenbus_dev_backend.c | 3 ++-
drivers/xen/xenfs/xenstored.c | 3 ++-
fs/9p/vfs_file.c | 10 ++++++----
fs/aio.c | 3 ++-
fs/btrfs/file.c | 3 ++-
fs/cifs/file.c | 4 ++--
fs/coda/file.c | 5 +++--
fs/ecryptfs/file.c | 5 +++--
fs/ext2/file.c | 5 +++--
fs/ext4/file.c | 3 ++-
fs/f2fs/file.c | 3 ++-
fs/fuse/file.c | 8 +++++---
fs/gfs2/file.c | 3 ++-
fs/hugetlbfs/inode.c | 3 ++-
fs/kernfs/file.c | 3 ++-
fs/nfs/file.c | 5 +++--
fs/nfs/internal.h | 2 +-
fs/nilfs2/file.c | 3 ++-
fs/orangefs/file.c | 5 +++--
fs/proc/inode.c | 7 ++++---
fs/proc/vmcore.c | 6 ++++--
fs/ramfs/file-nommu.c | 6 ++++--
fs/romfs/mmap-nommu.c | 3 ++-
fs/ubifs/file.c | 5 +++--
fs/xfs/xfs_file.c | 5 ++---
include/drm/drm_gem.h | 3 ++-
include/linux/fs.h | 13 ++++++++-----
ipc/shm.c | 5 +++--
kernel/events/core.c | 3 ++-
kernel/kcov.c | 3 ++-
kernel/relay.c | 3 ++-
mm/filemap.c | 14 +++++++++-----
mm/mmap.c | 2 +-
mm/nommu.c | 4 ++--
mm/shmem.c | 3 ++-
net/socket.c | 6 ++++--
security/selinux/selinuxfs.c | 6 ++++--
sound/core/compress_offload.c | 3 ++-
sound/core/hwdep.c | 3 ++-
sound/core/info.c | 3 ++-
sound/core/init.c | 3 ++-
sound/core/oss/pcm_oss.c | 3 ++-
sound/oss/soundcard.c | 3 ++-
sound/oss/swarm_cs4297a.c | 3 ++-
virt/kvm/kvm_main.c | 3 ++-
118 files changed, 295 insertions(+), 168 deletions(-)

diff --git a/arch/arc/kernel/arc_hostlink.c b/arch/arc/kernel/arc_hostlink.c
index 47b2a17cc52a..09398a953cca 100644
--- a/arch/arc/kernel/arc_hostlink.c
+++ b/arch/arc/kernel/arc_hostlink.c
@@ -18,7 +18,8 @@

static unsigned char __HOSTLINK__[4 * PAGE_SIZE] __aligned(PAGE_SIZE);

-static int arc_hl_mmap(struct file *fp, struct vm_area_struct *vma)
+static int arc_hl_mmap(struct file *fp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

diff --git a/arch/powerpc/kernel/proc_powerpc.c b/arch/powerpc/kernel/proc_powerpc.c
index 56548bf6231f..77ba2cc4be66 100644
--- a/arch/powerpc/kernel/proc_powerpc.c
+++ b/arch/powerpc/kernel/proc_powerpc.c
@@ -41,7 +41,8 @@ static ssize_t page_map_read( struct file *file, char __user *buf, size_t nbytes
PDE_DATA(file_inode(file)), PAGE_SIZE);
}

-static int page_map_mmap( struct file *file, struct vm_area_struct *vma )
+static int page_map_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if ((vma->vm_end - vma->vm_start) > PAGE_SIZE)
return -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index a160c14304eb..79147b5b014c 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -255,7 +255,8 @@ static const struct vm_operations_struct kvm_spapr_tce_vm_ops = {
.fault = kvm_spapr_tce_fault,
};

-static int kvm_spapr_tce_mmap(struct file *file, struct vm_area_struct *vma)
+static int kvm_spapr_tce_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
vma->vm_ops = &kvm_spapr_tce_vm_ops;
return 0;
diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c
index ae2f740a82f1..e785e96707cf 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -291,7 +291,8 @@ static const struct vm_operations_struct spufs_mem_mmap_vmops = {
.access = spufs_mem_mmap_access,
};

-static int spufs_mem_mmap(struct file *file, struct vm_area_struct *vma)
+static int spufs_mem_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;
@@ -379,7 +380,8 @@ static const struct vm_operations_struct spufs_cntl_mmap_vmops = {
/*
* mmap support for problem state control area [0x4000 - 0x4fff].
*/
-static int spufs_cntl_mmap(struct file *file, struct vm_area_struct *vma)
+static int spufs_cntl_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;
@@ -1059,7 +1061,8 @@ static const struct vm_operations_struct spufs_signal1_mmap_vmops = {
.fault = spufs_signal1_mmap_fault,
};

-static int spufs_signal1_mmap(struct file *file, struct vm_area_struct *vma)
+static int spufs_signal1_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;
@@ -1197,7 +1200,8 @@ static const struct vm_operations_struct spufs_signal2_mmap_vmops = {
.fault = spufs_signal2_mmap_fault,
};

-static int spufs_signal2_mmap(struct file *file, struct vm_area_struct *vma)
+static int spufs_signal2_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;
@@ -1320,7 +1324,8 @@ static const struct vm_operations_struct spufs_mss_mmap_vmops = {
/*
* mmap support for problem state MFC DMA area [0x0000 - 0x0fff].
*/
-static int spufs_mss_mmap(struct file *file, struct vm_area_struct *vma)
+static int spufs_mss_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;
@@ -1382,7 +1387,8 @@ static const struct vm_operations_struct spufs_psmap_mmap_vmops = {
/*
* mmap support for full problem state area [0x00000 - 0x1ffff].
*/
-static int spufs_psmap_mmap(struct file *file, struct vm_area_struct *vma)
+static int spufs_psmap_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;
@@ -1442,7 +1448,8 @@ static const struct vm_operations_struct spufs_mfc_mmap_vmops = {
/*
* mmap support for problem state MFC DMA area [0x0000 - 0x0fff].
*/
-static int spufs_mfc_mmap(struct file *file, struct vm_area_struct *vma)
+static int spufs_mfc_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;
diff --git a/arch/powerpc/platforms/powernv/opal-prd.c b/arch/powerpc/platforms/powernv/opal-prd.c
index 2d6ee1c5ad85..5a4ee5d6f223 100644
--- a/arch/powerpc/platforms/powernv/opal-prd.c
+++ b/arch/powerpc/platforms/powernv/opal-prd.c
@@ -109,7 +109,8 @@ static int opal_prd_open(struct inode *inode, struct file *file)
* @vma: VMA to map the registers into
*/

-static int opal_prd_mmap(struct file *file, struct vm_area_struct *vma)
+static int opal_prd_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
size_t addr, size;
pgprot_t page_prot;
diff --git a/arch/um/drivers/mmapper_kern.c b/arch/um/drivers/mmapper_kern.c
index 3645fcb2a787..046eb23602a2 100644
--- a/arch/um/drivers/mmapper_kern.c
+++ b/arch/um/drivers/mmapper_kern.c
@@ -45,7 +45,8 @@ static long mmapper_ioctl(struct file *file, unsigned int cmd, unsigned long arg
return -ENOIOCTLCMD;
}

-static int mmapper_mmap(struct file *file, struct vm_area_struct *vma)
+static int mmapper_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
int ret = -EINVAL;
int size;
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index f7665c31feca..f105e2a9d39b 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -3354,7 +3354,8 @@ static const struct vm_operations_struct binder_vm_ops = {
.fault = binder_vm_fault,
};

-static int binder_mmap(struct file *filp, struct vm_area_struct *vma)
+static int binder_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
int ret;
struct vm_struct *area;
diff --git a/drivers/char/agp/frontend.c b/drivers/char/agp/frontend.c
index f6955888e676..c39b90e26c76 100644
--- a/drivers/char/agp/frontend.c
+++ b/drivers/char/agp/frontend.c
@@ -562,7 +562,8 @@ int agp_remove_client(pid_t id)

/* File Operations */

-static int agp_mmap(struct file *file, struct vm_area_struct *vma)
+static int agp_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
unsigned int size, current_size;
unsigned long offset;
diff --git a/drivers/char/bsr.c b/drivers/char/bsr.c
index a6cef548e01e..93ec4c6f029e 100644
--- a/drivers/char/bsr.c
+++ b/drivers/char/bsr.c
@@ -122,7 +122,8 @@ static struct attribute *bsr_dev_attrs[] = {
};
ATTRIBUTE_GROUPS(bsr_dev);

-static int bsr_mmap(struct file *filp, struct vm_area_struct *vma)
+static int bsr_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
unsigned long size = vma->vm_end - vma->vm_start;
struct bsr_dev *dev = filp->private_data;
diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index b941e6d59fd6..e817c1b6c52d 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -379,7 +379,8 @@ static __init int hpet_mmap_enable(char *str)
}
__setup("hpet_mmap", hpet_mmap_enable);

-static int hpet_mmap(struct file *file, struct vm_area_struct *vma)
+static int hpet_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct hpet_dev *devp;
unsigned long addr;
@@ -397,7 +398,8 @@ static int hpet_mmap(struct file *file, struct vm_area_struct *vma)
return vm_iomap_memory(vma, addr, PAGE_SIZE);
}
#else
-static int hpet_mmap(struct file *file, struct vm_area_struct *vma)
+static int hpet_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
return -ENOSYS;
}
diff --git a/drivers/char/mbcs.c b/drivers/char/mbcs.c
index 8c9216a0f62e..2cd165571039 100644
--- a/drivers/char/mbcs.c
+++ b/drivers/char/mbcs.c
@@ -475,7 +475,8 @@ static void mbcs_gscr_pioaddr_set(struct mbcs_soft *soft)
soft->gscr_addr = mbcs_pioaddr(soft, MBCS_GSCR_START);
}

-static int mbcs_gscr_mmap(struct file *fp, struct vm_area_struct *vma)
+static int mbcs_gscr_mmap(struct file *fp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct cx_dev *cx_dev = fp->private_data;
struct mbcs_soft *soft = cx_dev->soft;
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 593a8818aca9..e786e1920f3a 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -337,7 +337,8 @@ static const struct vm_operations_struct mmap_mem_ops = {
#endif
};

-static int mmap_mem(struct file *file, struct vm_area_struct *vma)
+static int mmap_mem(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
size_t size = vma->vm_end - vma->vm_start;
phys_addr_t offset = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
@@ -376,7 +377,8 @@ static int mmap_mem(struct file *file, struct vm_area_struct *vma)
return 0;
}

-static int mmap_kmem(struct file *file, struct vm_area_struct *vma)
+static int mmap_kmem(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
unsigned long pfn;

@@ -394,7 +396,7 @@ static int mmap_kmem(struct file *file, struct vm_area_struct *vma)
return -EIO;

vma->vm_pgoff = pfn;
- return mmap_mem(file, vma);
+ return mmap_mem(file, vma, 0);
}

/*
@@ -679,7 +681,8 @@ static ssize_t read_iter_zero(struct kiocb *iocb, struct iov_iter *iter)
return written;
}

-static int mmap_zero(struct file *file, struct vm_area_struct *vma)
+static int mmap_zero(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
#ifndef CONFIG_MMU
return -ENOSYS;
diff --git a/drivers/char/mspec.c b/drivers/char/mspec.c
index 7b75669d3670..a3496304c4ef 100644
--- a/drivers/char/mspec.c
+++ b/drivers/char/mspec.c
@@ -287,19 +287,22 @@ mspec_mmap(struct file *file, struct vm_area_struct *vma,
}

static int
-fetchop_mmap(struct file *file, struct vm_area_struct *vma)
+fetchop_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
return mspec_mmap(file, vma, MSPEC_FETCHOP);
}

static int
-cached_mmap(struct file *file, struct vm_area_struct *vma)
+cached_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
return mspec_mmap(file, vma, MSPEC_CACHED);
}

static int
-uncached_mmap(struct file *file, struct vm_area_struct *vma)
+uncached_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
return mspec_mmap(file, vma, MSPEC_UNCACHED);
}
diff --git a/drivers/char/uv_mmtimer.c b/drivers/char/uv_mmtimer.c
index 956ebe2080a5..c95e68ec2ca2 100644
--- a/drivers/char/uv_mmtimer.c
+++ b/drivers/char/uv_mmtimer.c
@@ -40,7 +40,8 @@ MODULE_LICENSE("GPL");

static long uv_mmtimer_ioctl(struct file *file, unsigned int cmd,
unsigned long arg);
-static int uv_mmtimer_mmap(struct file *file, struct vm_area_struct *vma);
+static int uv_mmtimer_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags);

/*
* Period in femtoseconds (10^-15 s)
@@ -144,7 +145,8 @@ static long uv_mmtimer_ioctl(struct file *file, unsigned int cmd,
* Calls remap_pfn_range() to map the clock's registers into
* the calling process' address space.
*/
-static int uv_mmtimer_mmap(struct file *file, struct vm_area_struct *vma)
+static int uv_mmtimer_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
unsigned long uv_mmtimer_addr;

diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index e9f3b3e4bbf4..52aa8c80f786 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -432,7 +432,8 @@ static const struct vm_operations_struct dax_vm_ops = {
.huge_fault = dev_dax_huge_fault,
};

-static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
+static int dax_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct dev_dax *dev_dax = filp->private_data;
int rc, id;
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 4a038dcf5361..41aab156fc18 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -81,7 +81,9 @@ static int dma_buf_release(struct inode *inode, struct file *file)
return 0;
}

-static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
+static int dma_buf_mmap_internal(struct file *file,
+ struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct dma_buf *dmabuf;

diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c
index a301fcf46e88..07b8983d31ff 100644
--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -1667,7 +1667,8 @@ static long fw_device_op_compat_ioctl(struct file *file,
}
#endif

-static int fw_device_op_mmap(struct file *file, struct vm_area_struct *vma)
+static int fw_device_op_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct client *client = file->private_data;
unsigned long size;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6316aad43a73..483a11e530f9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -39,7 +39,7 @@

static long kfd_ioctl(struct file *, unsigned int, unsigned long);
static int kfd_open(struct inode *, struct file *);
-static int kfd_mmap(struct file *, struct vm_area_struct *);
+static int kfd_mmap(struct file *, struct vm_area_struct *, unsigned long);

static const char kfd_dev_name[] = "kfd";

@@ -991,7 +991,8 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
return retcode;
}

-static int kfd_mmap(struct file *filp, struct vm_area_struct *vma)
+static int kfd_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct kfd_process *process;

diff --git a/drivers/gpu/drm/arc/arcpgu_drv.c b/drivers/gpu/drm/arc/arcpgu_drv.c
index 3e43a5d4fb09..e816e53c95ec 100644
--- a/drivers/gpu/drm/arc/arcpgu_drv.c
+++ b/drivers/gpu/drm/arc/arcpgu_drv.c
@@ -48,11 +48,12 @@ static void arcpgu_setup_mode_config(struct drm_device *drm)
drm->mode_config.funcs = &arcpgu_drm_modecfg_funcs;
}

-static int arcpgu_gem_mmap(struct file *filp, struct vm_area_struct *vma)
+static int arcpgu_gem_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
int ret;

- ret = drm_gem_mmap(filp, vma);
+ ret = drm_gem_mmap(filp, vma, 0);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/ast/ast_drv.h b/drivers/gpu/drm/ast/ast_drv.h
index 8880f0b62e9c..b9b4e16a196b 100644
--- a/drivers/gpu/drm/ast/ast_drv.h
+++ b/drivers/gpu/drm/ast/ast_drv.h
@@ -391,7 +391,8 @@ static inline void ast_bo_unreserve(struct ast_bo *bo)

void ast_ttm_placement(struct ast_bo *bo, int domain);
int ast_bo_push_sysram(struct ast_bo *bo);
-int ast_mmap(struct file *filp, struct vm_area_struct *vma);
+int ast_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags);

/* ast post */
void ast_enable_vga(struct drm_device *dev);
diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c
index 58084985e6cf..11d8be1af8e3 100644
--- a/drivers/gpu/drm/ast/ast_ttm.c
+++ b/drivers/gpu/drm/ast/ast_ttm.c
@@ -420,7 +420,8 @@ int ast_bo_push_sysram(struct ast_bo *bo)
return 0;
}

-int ast_mmap(struct file *filp, struct vm_area_struct *vma)
+int ast_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct drm_file *file_priv;
struct ast_private *ast;
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 8dc11064253d..18cf11227af7 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -956,7 +956,8 @@ EXPORT_SYMBOL(drm_gem_mmap_obj);
* If the caller is not granted access to the buffer object, the mmap will fail
* with EACCES. Please see the vma manager for more information.
*/
-int drm_gem_mmap(struct file *filp, struct vm_area_struct *vma)
+int drm_gem_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct drm_file *priv = filp->private_data;
struct drm_device *dev = priv->minor->dev;
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index bc28e7575254..f432d76b6cf8 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -350,7 +350,7 @@ int drm_gem_cma_mmap(struct file *filp, struct vm_area_struct *vma)
struct drm_gem_object *gem_obj;
int ret;

- ret = drm_gem_mmap(filp, vma);
+ ret = drm_gem_mmap(filp, vma, 0);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 9a3bea738330..e612ff948cde 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -167,7 +167,7 @@ int etnaviv_gem_mmap(struct file *filp, struct vm_area_struct *vma)
struct etnaviv_gem_object *obj;
int ret;

- ret = drm_gem_mmap(filp, vma);
+ ret = drm_gem_mmap(filp, vma, 0);
if (ret) {
DBG("mmap failed: %d", ret);
return ret;
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index c23479be4850..73359382f218 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -517,7 +517,7 @@ int exynos_drm_gem_mmap(struct file *filp, struct vm_area_struct *vma)
int ret;

/* set vm_area_struct. */
- ret = drm_gem_mmap(filp, vma);
+ ret = drm_gem_mmap(filp, vma, 0);
if (ret < 0) {
DRM_ERROR("failed to mmap.\n");
return ret;
diff --git a/drivers/gpu/drm/i810/i810_dma.c b/drivers/gpu/drm/i810/i810_dma.c
index 576a417690d4..c7ff2e7072ca 100644
--- a/drivers/gpu/drm/i810/i810_dma.c
+++ b/drivers/gpu/drm/i810/i810_dma.c
@@ -84,7 +84,8 @@ static int i810_freelist_put(struct drm_device *dev, struct drm_buf *buf)
return 0;
}

-static int i810_mmap_buffers(struct file *filp, struct vm_area_struct *vma)
+static int i810_mmap_buffers(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct drm_file *priv = filp->private_data;
struct drm_device *dev;
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 6176e589cf09..296cd09dd3aa 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -165,7 +165,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
if (!obj->base.filp)
return -ENODEV;

- ret = call_mmap(obj->base.filp, vma);
+ ret = call_mmap(obj->base.filp, vma, 0);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
index 7abc550ebc00..77667f2ef4e3 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
@@ -195,7 +195,7 @@ int mtk_drm_gem_mmap(struct file *filp, struct vm_area_struct *vma)
struct drm_gem_object *obj;
int ret;

- ret = drm_gem_mmap(filp, vma);
+ ret = drm_gem_mmap(filp, vma, 0);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/mgag200/mgag200_drv.h b/drivers/gpu/drm/mgag200/mgag200_drv.h
index c88b6ec88dd2..42317765c982 100644
--- a/drivers/gpu/drm/mgag200/mgag200_drv.h
+++ b/drivers/gpu/drm/mgag200/mgag200_drv.h
@@ -301,7 +301,8 @@ int mgag200_bo_create(struct drm_device *dev, int size, int align,
uint32_t flags, struct mgag200_bo **pastbo);
int mgag200_mm_init(struct mga_device *mdev);
void mgag200_mm_fini(struct mga_device *mdev);
-int mgag200_mmap(struct file *filp, struct vm_area_struct *vma);
+int mgag200_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags);
int mgag200_bo_pin(struct mgag200_bo *bo, u32 pl_flag, u64 *gpu_addr);
int mgag200_bo_unpin(struct mgag200_bo *bo);
int mgag200_bo_push_sysram(struct mgag200_bo *bo);
diff --git a/drivers/gpu/drm/mgag200/mgag200_ttm.c b/drivers/gpu/drm/mgag200/mgag200_ttm.c
index 3e7e1cd31395..8d850fb0bbe3 100644
--- a/drivers/gpu/drm/mgag200/mgag200_ttm.c
+++ b/drivers/gpu/drm/mgag200/mgag200_ttm.c
@@ -418,7 +418,8 @@ int mgag200_bo_push_sysram(struct mgag200_bo *bo)
return 0;
}

-int mgag200_mmap(struct file *filp, struct vm_area_struct *vma)
+int mgag200_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct drm_file *file_priv;
struct mga_device *mdev;
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 65f35544c1ec..5ff750e28a95 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -202,7 +202,7 @@ int msm_gem_mmap(struct file *filp, struct vm_area_struct *vma)
{
int ret;

- ret = drm_gem_mmap(filp, vma);
+ ret = drm_gem_mmap(filp, vma, 0);
if (ret) {
DBG("mmap failed: %d", ret);
return ret;
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
index 5c5c86ddd6f4..a975b186668f 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -565,7 +565,7 @@ int omap_gem_mmap(struct file *filp, struct vm_area_struct *vma)
{
int ret;

- ret = drm_gem_mmap(filp, vma);
+ ret = drm_gem_mmap(filp, vma, 0);
if (ret) {
DBG("mmap failed: %d", ret);
return ret;
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index 74abd161237b..4e1f63f36bff 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -135,7 +135,8 @@ extern int radeon_get_crtc_scanoutpos(struct drm_device *dev, unsigned int crtc,
extern bool radeon_is_px(struct drm_device *dev);
extern const struct drm_ioctl_desc radeon_ioctls_kms[];
extern int radeon_max_kms_ioctl;
-int radeon_mmap(struct file *filp, struct vm_area_struct *vma);
+int radeon_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags);
int radeon_mode_dumb_mmap(struct drm_file *filp,
struct drm_device *dev,
uint32_t handle, uint64_t *offset_p);
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index b74ac717e56a..aae52d73ebee 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -293,7 +293,7 @@ int rockchip_gem_mmap(struct file *filp, struct vm_area_struct *vma)
struct drm_gem_object *obj;
int ret;

- ret = drm_gem_mmap(filp, vma);
+ ret = drm_gem_mmap(filp, vma, 0);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 7a39a355678a..143880e16273 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -487,7 +487,7 @@ int tegra_drm_mmap(struct file *file, struct vm_area_struct *vma)
struct tegra_bo *bo;
int ret;

- ret = drm_gem_mmap(file, vma);
+ ret = drm_gem_mmap(file, vma, 0);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/udl/udl_gem.c b/drivers/gpu/drm/udl/udl_gem.c
index db9ceceba30e..07a3511ae3ef 100644
--- a/drivers/gpu/drm/udl/udl_gem.c
+++ b/drivers/gpu/drm/udl/udl_gem.c
@@ -88,7 +88,7 @@ int udl_drm_gem_mmap(struct file *filp, struct vm_area_struct *vma)
{
int ret;

- ret = drm_gem_mmap(filp, vma);
+ ret = drm_gem_mmap(filp, vma, 0);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
index 487f96412d35..38b365db5f89 100644
--- a/drivers/gpu/drm/vc4/vc4_bo.c
+++ b/drivers/gpu/drm/vc4/vc4_bo.c
@@ -402,7 +402,7 @@ int vc4_mmap(struct file *filp, struct vm_area_struct *vma)
struct vc4_bo *bo;
int ret;

- ret = drm_gem_mmap(filp, vma);
+ ret = drm_gem_mmap(filp, vma, 0);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
index 18f401b442c2..b0221f2a2ada 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.c
+++ b/drivers/gpu/drm/vgem/vgem_drv.c
@@ -248,12 +248,13 @@ static struct drm_ioctl_desc vgem_ioctls[] = {
DRM_IOCTL_DEF_DRV(VGEM_FENCE_SIGNAL, vgem_fence_signal_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
};

-static int vgem_mmap(struct file *filp, struct vm_area_struct *vma)
+static int vgem_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
unsigned long flags = vma->vm_flags;
int ret;

- ret = drm_gem_mmap(filp, vma);
+ ret = drm_gem_mmap(filp, vma, 0);
if (ret)
return ret;

@@ -370,7 +371,7 @@ static int vgem_prime_mmap(struct drm_gem_object *obj,
if (!obj->filp)
return -ENODEV;

- ret = call_mmap(obj->filp, vma);
+ ret = call_mmap(obj->filp, vma, 0);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 4b948fba9eec..5e0216ac6a5a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -742,7 +742,8 @@ extern int vmw_fifo_flush(struct vmw_private *dev_priv,

extern int vmw_ttm_global_init(struct vmw_private *dev_priv);
extern void vmw_ttm_global_release(struct vmw_private *dev_priv);
-extern int vmw_mmap(struct file *filp, struct vm_area_struct *vma);
+extern int vmw_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags);

/**
* TTM buffer object driver - vmwgfx_buffer.c
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
index e771091d2cd3..0bb831de1f33 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
@@ -28,7 +28,8 @@
#include <drm/drmP.h>
#include "vmwgfx_drv.h"

-int vmw_mmap(struct file *filp, struct vm_area_struct *vma)
+int vmw_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct drm_file *file_priv;
struct vmw_private *dev_priv;
diff --git a/drivers/hsi/clients/cmt_speech.c b/drivers/hsi/clients/cmt_speech.c
index 727f968ac1cb..507499044727 100644
--- a/drivers/hsi/clients/cmt_speech.c
+++ b/drivers/hsi/clients/cmt_speech.c
@@ -1270,7 +1270,8 @@ static long cs_char_ioctl(struct file *file, unsigned int cmd,
return r;
}

-static int cs_char_mmap(struct file *file, struct vm_area_struct *vma)
+static int cs_char_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if (vma->vm_end < vma->vm_start)
return -EINVAL;
diff --git a/drivers/hwtracing/intel_th/msu.c b/drivers/hwtracing/intel_th/msu.c
index dbbe31df74df..cb943291eb8c 100644
--- a/drivers/hwtracing/intel_th/msu.c
+++ b/drivers/hwtracing/intel_th/msu.c
@@ -1212,7 +1212,8 @@ static const struct vm_operations_struct msc_mmap_ops = {
.fault = msc_mmap_fault,
};

-static int intel_th_msc_mmap(struct file *file, struct vm_area_struct *vma)
+static int intel_th_msc_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
unsigned long size = vma->vm_end - vma->vm_start;
struct msc_iter *iter = vma->vm_file->private_data;
diff --git a/drivers/hwtracing/stm/core.c b/drivers/hwtracing/stm/core.c
index 0e731143f6a4..4b35b3dfc82e 100644
--- a/drivers/hwtracing/stm/core.c
+++ b/drivers/hwtracing/stm/core.c
@@ -519,7 +519,8 @@ static const struct vm_operations_struct stm_mmap_vmops = {
.close = stm_mmap_close,
};

-static int stm_char_mmap(struct file *file, struct vm_area_struct *vma)
+static int stm_char_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct stm_file *stmf = file->private_data;
struct stm_device *stm = stmf->stm;
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 3d2609608f58..49899bc31b39 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -808,7 +808,8 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
return ret;
}

-static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
+static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct ib_uverbs_file *file = filp->private_data;
struct ib_device *ib_dev;
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 3158128d57e8..1868b3558d51 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -75,7 +75,8 @@ static int hfi1_file_open(struct inode *inode, struct file *fp);
static int hfi1_file_close(struct inode *inode, struct file *fp);
static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from);
static unsigned int hfi1_poll(struct file *fp, struct poll_table_struct *pt);
-static int hfi1_file_mmap(struct file *fp, struct vm_area_struct *vma);
+static int hfi1_file_mmap(struct file *fp, struct vm_area_struct *vma,
+ unsigned long map_flags);

static u64 kvirt_to_phys(void *addr);
static int assign_ctxt(struct hfi1_filedata *fd, struct hfi1_user_info *uinfo);
@@ -450,7 +451,8 @@ static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from)
return reqs;
}

-static int hfi1_file_mmap(struct file *fp, struct vm_area_struct *vma)
+static int hfi1_file_mmap(struct file *fp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct hfi1_filedata *fd = fp->private_data;
struct hfi1_ctxtdata *uctxt = fd->uctxt;
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index 9396c1807cc3..2482d0fc2a77 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -59,7 +59,7 @@ static int qib_close(struct inode *, struct file *);
static ssize_t qib_write(struct file *, const char __user *, size_t, loff_t *);
static ssize_t qib_write_iter(struct kiocb *, struct iov_iter *);
static unsigned int qib_poll(struct file *, struct poll_table_struct *);
-static int qib_mmapf(struct file *, struct vm_area_struct *);
+static int qib_mmapf(struct file *, struct vm_area_struct *, unsigned long);

/*
* This is really, really weird shit - write() and writev() here
@@ -993,7 +993,8 @@ static int mmap_kvaddr(struct vm_area_struct *vma, u64 pgaddr,
* buffers in the chip. We have the open and close entries so we can bump
* the ref count and keep the driver from being unloaded while still mapped.
*/
-static int qib_mmapf(struct file *fp, struct vm_area_struct *vma)
+static int qib_mmapf(struct file *fp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct qib_ctxtdata *rcd;
struct qib_devdata *dd;
diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
index c647ba648805..1c2980e51708 100644
--- a/drivers/media/v4l2-core/v4l2-dev.c
+++ b/drivers/media/v4l2-core/v4l2-dev.c
@@ -388,7 +388,8 @@ static unsigned long v4l2_get_unmapped_area(struct file *filp,
}
#endif

-static int v4l2_mmap(struct file *filp, struct vm_area_struct *vm)
+static int v4l2_mmap(struct file *filp, struct vm_area_struct *vm,
+ unsigned long map_flags)
{
struct video_device *vdev = video_devdata(filp);
int ret = -ENODEV;
diff --git a/drivers/misc/aspeed-lpc-ctrl.c b/drivers/misc/aspeed-lpc-ctrl.c
index b5439643f54b..c79564d544c3 100644
--- a/drivers/misc/aspeed-lpc-ctrl.c
+++ b/drivers/misc/aspeed-lpc-ctrl.c
@@ -38,7 +38,8 @@ static struct aspeed_lpc_ctrl *file_aspeed_lpc_ctrl(struct file *file)
miscdev);
}

-static int aspeed_lpc_ctrl_mmap(struct file *file, struct vm_area_struct *vma)
+static int aspeed_lpc_ctrl_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct aspeed_lpc_ctrl *lpc_ctrl = file_aspeed_lpc_ctrl(file);
unsigned long vsize = vma->vm_end - vma->vm_start;
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index 0761271d68c5..47059fb264c9 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -303,7 +303,8 @@ static long afu_compat_ioctl(struct file *file, unsigned int cmd,
return afu_ioctl(file, cmd, arg);
}

-int afu_mmap(struct file *file, struct vm_area_struct *vm)
+int afu_mmap(struct file *file, struct vm_area_struct *vm,
+ unsigned long map_flags)
{
struct cxl_context *ctx = file->private_data;

diff --git a/drivers/misc/genwqe/card_dev.c b/drivers/misc/genwqe/card_dev.c
index dd4617764f14..82a58da65756 100644
--- a/drivers/misc/genwqe/card_dev.c
+++ b/drivers/misc/genwqe/card_dev.c
@@ -435,7 +435,8 @@ static const struct vm_operations_struct genwqe_vma_ops = {
* plain buffer, we lookup our dma_mapping list to find the
* corresponding DMA address for the associated user-space address.
*/
-static int genwqe_mmap(struct file *filp, struct vm_area_struct *vma)
+static int genwqe_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
int rc;
unsigned long pfn, vsize = vma->vm_end - vma->vm_start;
diff --git a/drivers/misc/mic/scif/scif_fd.c b/drivers/misc/mic/scif/scif_fd.c
index f7e826142a72..5dfbaa681d2d 100644
--- a/drivers/misc/mic/scif/scif_fd.c
+++ b/drivers/misc/mic/scif/scif_fd.c
@@ -34,7 +34,8 @@ static int scif_fdclose(struct inode *inode, struct file *f)
return scif_close(priv);
}

-static int scif_fdmmap(struct file *f, struct vm_area_struct *vma)
+static int scif_fdmmap(struct file *f, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct scif_endpt *priv = f->private_data;

diff --git a/drivers/misc/mic/vop/vop_vringh.c b/drivers/misc/mic/vop/vop_vringh.c
index fed992e2c258..d80418f503b3 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -1083,7 +1083,8 @@ vop_query_offset(struct vop_vdev *vdev, unsigned long offset,
/*
* Maps the device page and virtio rings to user space for readonly access.
*/
-static int vop_mmap(struct file *f, struct vm_area_struct *vma)
+static int vop_mmap(struct file *f, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct vop_vdev *vdev = f->private_data;
unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
diff --git a/drivers/misc/sgi-gru/grufile.c b/drivers/misc/sgi-gru/grufile.c
index 104a05f6b738..2751d82a259f 100644
--- a/drivers/misc/sgi-gru/grufile.c
+++ b/drivers/misc/sgi-gru/grufile.c
@@ -104,7 +104,8 @@ static void gru_vma_close(struct vm_area_struct *vma)
* and private data structure necessary to allocate, track, and free the
* underlying pages.
*/
-static int gru_file_mmap(struct file *file, struct vm_area_struct *vma)
+static int gru_file_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if ((vma->vm_flags & (VM_SHARED | VM_WRITE)) != (VM_SHARED | VM_WRITE))
return -EPERM;
diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c
index 3568294d4854..7aa296edd4ff 100644
--- a/drivers/mtd/mtdchar.c
+++ b/drivers/mtd/mtdchar.c
@@ -1192,7 +1192,8 @@ static unsigned mtdchar_mmap_capabilities(struct file *file)
/*
* set up a mapping for shared memory segments
*/
-static int mtdchar_mmap(struct file *file, struct vm_area_struct *vma)
+static int mtdchar_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
#ifdef CONFIG_MMU
struct mtd_file_info *mfi = file->private_data;
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index 098360d7ff81..4e77aad084d1 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -230,7 +230,8 @@ static long proc_bus_pci_ioctl(struct file *file, unsigned int cmd,
}

#ifdef HAVE_PCI_MMAP
-static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
+static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct pci_dev *dev = PDE_DATA(file_inode(file));
struct pci_filp_private *fpriv = file->private_data;
diff --git a/drivers/rapidio/devices/rio_mport_cdev.c b/drivers/rapidio/devices/rio_mport_cdev.c
index 5beb0c361076..a3dfd8ea6580 100644
--- a/drivers/rapidio/devices/rio_mport_cdev.c
+++ b/drivers/rapidio/devices/rio_mport_cdev.c
@@ -2261,7 +2261,8 @@ static const struct vm_operations_struct vm_ops = {
.close = mport_mm_close,
};

-static int mport_cdev_mmap(struct file *filp, struct vm_area_struct *vma)
+static int mport_cdev_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct mport_cdev_priv *priv = filp->private_data;
struct mport_dev *md;
diff --git a/drivers/sbus/char/flash.c b/drivers/sbus/char/flash.c
index 216f923161d1..b3dbbc195753 100644
--- a/drivers/sbus/char/flash.c
+++ b/drivers/sbus/char/flash.c
@@ -33,7 +33,8 @@ static struct {
#define FLASH_MINOR 152

static int
-flash_mmap(struct file *file, struct vm_area_struct *vma)
+flash_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
unsigned long addr;
unsigned long size;
diff --git a/drivers/sbus/char/jsflash.c b/drivers/sbus/char/jsflash.c
index 14f377ac1280..e497152ce317 100644
--- a/drivers/sbus/char/jsflash.c
+++ b/drivers/sbus/char/jsflash.c
@@ -440,7 +440,8 @@ static long jsf_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
return error;
}

-static int jsf_mmap(struct file * file, struct vm_area_struct * vma)
+static int jsf_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
return -ENXIO;
}
diff --git a/drivers/scsi/cxlflash/superpipe.c b/drivers/scsi/cxlflash/superpipe.c
index ad0f9968ccfb..90886c3e9424 100644
--- a/drivers/scsi/cxlflash/superpipe.c
+++ b/drivers/scsi/cxlflash/superpipe.c
@@ -1160,7 +1160,8 @@ static const struct vm_operations_struct cxlflash_mmap_vmops = {
*
* Return: 0 on success, -errno on failure
*/
-static int cxlflash_cxl_mmap(struct file *file, struct vm_area_struct *vma)
+static int cxlflash_cxl_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct cxl_context *ctx = cxl_fops_get_context(file);
struct cxlflash_cfg *cfg = container_of(file->f_op, struct cxlflash_cfg,
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 1e82d4128a84..5006c7010e19 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1253,7 +1253,8 @@ static const struct vm_operations_struct sg_mmap_vm_ops = {
};

static int
-sg_mmap(struct file *filp, struct vm_area_struct *vma)
+sg_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
Sg_fd *sfp;
unsigned long req_sz, len, sa;
diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index 6ba270e0494d..ad4f863cdb8e 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -375,7 +375,8 @@ static inline vm_flags_t calc_vm_may_flags(unsigned long prot)
_calc_vm_trans(prot, PROT_EXEC, VM_MAYEXEC);
}

-static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
+static int ashmem_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct ashmem_area *asma = file->private_data;
int ret = 0;
diff --git a/drivers/staging/comedi/comedi_fops.c b/drivers/staging/comedi/comedi_fops.c
index ca11be21f64b..2fbd860f74e4 100644
--- a/drivers/staging/comedi/comedi_fops.c
+++ b/drivers/staging/comedi/comedi_fops.c
@@ -2185,7 +2185,8 @@ static const struct vm_operations_struct comedi_vm_ops = {
.access = comedi_vm_access,
};

-static int comedi_mmap(struct file *file, struct vm_area_struct *vma)
+static int comedi_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct comedi_file *cfp = file->private_data;
struct comedi_device *dev = cfp->dev;
diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index ccc7ae15a943..51a804d3e6fb 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -464,7 +464,7 @@ int ll_file_mmap(struct file *file, struct vm_area_struct *vma)
return -EOPNOTSUPP;

ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_MAP, 1);
- rc = generic_file_mmap(file, vma);
+ rc = generic_file_mmap(file, vma, 0);
if (rc == 0) {
vma->vm_ops = &ll_file_vm_ops;
vma->vm_ops->open(vma);
diff --git a/drivers/staging/vme/devices/vme_user.c b/drivers/staging/vme/devices/vme_user.c
index a3d4610fbdbe..4edf846529d7 100644
--- a/drivers/staging/vme/devices/vme_user.c
+++ b/drivers/staging/vme/devices/vme_user.c
@@ -484,7 +484,8 @@ static int vme_user_master_mmap(unsigned int minor, struct vm_area_struct *vma)
return 0;
}

-static int vme_user_mmap(struct file *file, struct vm_area_struct *vma)
+static int vme_user_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
unsigned int minor = MINOR(file_inode(file)->i_rdev);

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index ff04b7f8549f..1ddd3f901127 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -674,7 +674,8 @@ static int uio_mmap_physical(struct vm_area_struct *vma)
vma->vm_page_prot);
}

-static int uio_mmap(struct file *filep, struct vm_area_struct *vma)
+static int uio_mmap(struct file *filep, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct uio_listener *listener = filep->private_data;
struct uio_device *idev = listener->dev;
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index ebe27595c4af..36b0ff19531a 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -215,7 +215,8 @@ static struct vm_operations_struct usbdev_vm_ops = {
.close = usbdev_vm_close
};

-static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
+static int usbdev_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct usb_memory *usbm = NULL;
struct usb_dev_state *ps = file->private_data;
diff --git a/drivers/usb/mon/mon_bin.c b/drivers/usb/mon/mon_bin.c
index b6d8bf475c92..69aec6194772 100644
--- a/drivers/usb/mon/mon_bin.c
+++ b/drivers/usb/mon/mon_bin.c
@@ -1246,7 +1246,8 @@ static const struct vm_operations_struct mon_bin_vm_ops = {
.fault = mon_bin_vma_fault,
};

-static int mon_bin_mmap(struct file *filp, struct vm_area_struct *vma)
+static int mon_bin_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
/* don't do anything here: "fault" will set up page table entries */
vma->vm_ops = &mon_bin_vm_ops;
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 330d50582f40..e972a2de79f6 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1256,7 +1256,8 @@ static ssize_t vfio_fops_write(struct file *filep, const char __user *buf,
return ret;
}

-static int vfio_fops_mmap(struct file *filep, struct vm_area_struct *vma)
+static int vfio_fops_mmap(struct file *filep, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct vfio_container *container = filep->private_data;
struct vfio_iommu_driver *driver;
@@ -1677,7 +1678,9 @@ static ssize_t vfio_device_fops_write(struct file *filep,
return device->ops->write(device->device_data, buf, count, ppos);
}

-static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
+static int vfio_device_fops_mmap(struct file *filep,
+ struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct vfio_device *device = filep->private_data;

diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 7a42238db446..ba675464cc27 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -1380,7 +1380,8 @@ static long fb_compat_ioctl(struct file *file, unsigned int cmd,
#endif

static int
-fb_mmap(struct file *file, struct vm_area_struct * vma)
+fb_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct fb_info *info = file_fb_info(file);
struct fb_ops *fb;
diff --git a/drivers/video/fbdev/pxa3xx-gcu.c b/drivers/video/fbdev/pxa3xx-gcu.c
index 50bce45e7f3d..bed61712616e 100644
--- a/drivers/video/fbdev/pxa3xx-gcu.c
+++ b/drivers/video/fbdev/pxa3xx-gcu.c
@@ -479,7 +479,8 @@ pxa3xx_gcu_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
}

static int
-pxa3xx_gcu_mmap(struct file *file, struct vm_area_struct *vma)
+pxa3xx_gcu_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
unsigned int size = vma->vm_end - vma->vm_start;
struct pxa3xx_gcu_priv *priv = to_pxa3xx_gcu_priv(file);
diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
index 1bf55a32a4b3..35ded2a8bba6 100644
--- a/drivers/xen/gntalloc.c
+++ b/drivers/xen/gntalloc.c
@@ -502,7 +502,8 @@ static const struct vm_operations_struct gntalloc_vmops = {
.close = gntalloc_vma_close,
};

-static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
+static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct gntalloc_file_private_data *priv = filp->private_data;
struct gntalloc_vma_private_data *vm_priv;
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index f3bf8f4e2d6c..2b3971ce0062 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -980,7 +980,8 @@ static long gntdev_ioctl(struct file *flip,
return 0;
}

-static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
+static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct gntdev_priv *priv = flip->private_data;
int index = vma->vm_pgoff;
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index feca75b07fdd..3a8278d72375 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -818,7 +818,8 @@ static const struct vm_operations_struct privcmd_vm_ops = {
.fault = privcmd_fault
};

-static int privcmd_mmap(struct file *file, struct vm_area_struct *vma)
+static int privcmd_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
/* DONTCOPY is essential for Xen because copy_page_range doesn't know
* how to recreate these mappings */
diff --git a/drivers/xen/xenbus/xenbus_dev_backend.c b/drivers/xen/xenbus/xenbus_dev_backend.c
index 1126701e212e..ed7e81ae167a 100644
--- a/drivers/xen/xenbus/xenbus_dev_backend.c
+++ b/drivers/xen/xenbus/xenbus_dev_backend.c
@@ -88,7 +88,8 @@ static long xenbus_backend_ioctl(struct file *file, unsigned int cmd,
}
}

-static int xenbus_backend_mmap(struct file *file, struct vm_area_struct *vma)
+static int xenbus_backend_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
size_t size = vma->vm_end - vma->vm_start;

diff --git a/drivers/xen/xenfs/xenstored.c b/drivers/xen/xenfs/xenstored.c
index 82fd2a396d96..259ad78834a4 100644
--- a/drivers/xen/xenfs/xenstored.c
+++ b/drivers/xen/xenfs/xenstored.c
@@ -30,7 +30,8 @@ static int xsd_kva_open(struct inode *inode, struct file *file)
return 0;
}

-static int xsd_kva_mmap(struct file *file, struct vm_area_struct *vma)
+static int xsd_kva_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
size_t size = vma->vm_end - vma->vm_start;

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 3de3b4a89d89..c8b2fdd53411 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -484,12 +484,13 @@ int v9fs_file_fsync_dotl(struct file *filp, loff_t start, loff_t end,
}

static int
-v9fs_file_mmap(struct file *filp, struct vm_area_struct *vma)
+v9fs_file_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
int retval;


- retval = generic_file_mmap(filp, vma);
+ retval = generic_file_mmap(filp, vma, 0);
if (!retval)
vma->vm_ops = &v9fs_file_vm_ops;

@@ -497,7 +498,8 @@ v9fs_file_mmap(struct file *filp, struct vm_area_struct *vma)
}

static int
-v9fs_mmap_file_mmap(struct file *filp, struct vm_area_struct *vma)
+v9fs_mmap_file_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
int retval;
struct inode *inode;
@@ -526,7 +528,7 @@ v9fs_mmap_file_mmap(struct file *filp, struct vm_area_struct *vma)
}
mutex_unlock(&v9inode->v_mutex);

- retval = generic_file_mmap(filp, vma);
+ retval = generic_file_mmap(filp, vma, 0);
if (!retval)
vma->vm_ops = &v9fs_mmap_file_vm_ops;

diff --git a/fs/aio.c b/fs/aio.c
index dcad3a66748c..e07cabf73093 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -353,7 +353,8 @@ static const struct vm_operations_struct aio_ring_vm_ops = {
#endif
};

-static int aio_ring_mmap(struct file *file, struct vm_area_struct *vma)
+static int aio_ring_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
vma->vm_flags |= VM_DONTEXPAND;
vma->vm_ops = &aio_ring_vm_ops;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9e75d8a39aac..fee72875f075 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2262,7 +2262,8 @@ static const struct vm_operations_struct btrfs_file_vm_ops = {
.page_mkwrite = btrfs_page_mkwrite,
};

-static int btrfs_file_mmap(struct file *filp, struct vm_area_struct *vma)
+static int btrfs_file_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct address_space *mapping = filp->f_mapping;

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index bc09df6b473a..9e4738511d20 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3488,7 +3488,7 @@ int cifs_file_strict_mmap(struct file *file, struct vm_area_struct *vma)
return rc;
}

- rc = generic_file_mmap(file, vma);
+ rc = generic_file_mmap(file, vma, 0);
if (rc == 0)
vma->vm_ops = &cifs_file_vm_ops;
free_xid(xid);
@@ -3507,7 +3507,7 @@ int cifs_file_mmap(struct file *file, struct vm_area_struct *vma)
free_xid(xid);
return rc;
}
- rc = generic_file_mmap(file, vma);
+ rc = generic_file_mmap(file, vma, 0);
if (rc == 0)
vma->vm_ops = &cifs_file_vm_ops;
free_xid(xid);
diff --git a/fs/coda/file.c b/fs/coda/file.c
index 363402fcb3ed..80076b07c32a 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -61,7 +61,8 @@ coda_file_write_iter(struct kiocb *iocb, struct iov_iter *to)
}

static int
-coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
+coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct coda_file_info *cfi;
struct coda_inode_info *cii;
@@ -96,7 +97,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
cfi->cfi_mapcount++;
spin_unlock(&cii->c_lock);

- return call_mmap(host_file, vma);
+ return call_mmap(host_file, vma, 0);
}

int coda_open(struct inode *coda_inode, struct file *coda_file)
diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index ca4e83750214..6a2ae381f16a 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -169,7 +169,8 @@ static int read_or_initialize_metadata(struct dentry *dentry)
return rc;
}

-static int ecryptfs_mmap(struct file *file, struct vm_area_struct *vma)
+static int ecryptfs_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct file *lower_file = ecryptfs_file_to_lower(file);
/*
@@ -179,7 +180,7 @@ static int ecryptfs_mmap(struct file *file, struct vm_area_struct *vma)
*/
if (!lower_file->f_op->mmap)
return -ENODEV;
- return generic_file_mmap(file, vma);
+ return generic_file_mmap(file, vma, 0);
}

/**
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index d34d32bdc944..ffcec18bc332 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -141,10 +141,11 @@ static const struct vm_operations_struct ext2_dax_vm_ops = {
.pfn_mkwrite = ext2_dax_pfn_mkwrite,
};

-static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma)
+static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if (!IS_DAX(file_inode(file)))
- return generic_file_mmap(file, vma);
+ return generic_file_mmap(file, vma, 0);

file_accessed(file);
vma->vm_ops = &ext2_dax_vm_ops;
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 58294c9a7e1d..a78537c80ed0 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -357,7 +357,8 @@ static const struct vm_operations_struct ext4_file_vm_ops = {
.page_mkwrite = ext4_page_mkwrite,
};

-static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
+static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct inode *inode = file->f_mapping->host;

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 2706130c261b..47ba41af9b94 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -425,7 +425,8 @@ static loff_t f2fs_llseek(struct file *file, loff_t offset, int whence)
return -EINVAL;
}

-static int f2fs_file_mmap(struct file *file, struct vm_area_struct *vma)
+static int f2fs_file_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct inode *inode = file_inode(file);
int err;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 3ee4fdc3da9e..d095b4992293 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2062,7 +2062,8 @@ static const struct vm_operations_struct fuse_file_vm_ops = {
.page_mkwrite = fuse_page_mkwrite,
};

-static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
+static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
fuse_link_write_file(file);
@@ -2072,7 +2073,8 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
return 0;
}

-static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma)
+static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
/* Can't provide the coherency needed for MAP_SHARED */
if (vma->vm_flags & VM_MAYSHARE)
@@ -2080,7 +2082,7 @@ static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma)

invalidate_inode_pages2(file->f_mapping);

- return generic_file_mmap(file, vma);
+ return generic_file_mmap(file, vma, 0);
}

static int convert_fuse_file_lock(struct fuse_conn *fc,
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index c2062a108d19..50ea01d6c33e 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -506,7 +506,8 @@ static const struct vm_operations_struct gfs2_vm_ops = {
* Returns: 0
*/

-static int gfs2_mmap(struct file *file, struct vm_area_struct *vma)
+static int gfs2_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct gfs2_inode *ip = GFS2_I(file->f_mapping->host);

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 28d2753be094..5bf5a3ec4818 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -118,7 +118,8 @@ static void huge_pagevec_release(struct pagevec *pvec)
pagevec_reinit(pvec);
}

-static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
+static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct inode *inode = file_inode(file);
loff_t len, vma_len;
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index ac2dfe0c5a9c..58a85c61c657 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -467,7 +467,8 @@ static const struct vm_operations_struct kernfs_vm_ops = {
#endif
};

-static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
+static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct kernfs_open_file *of = kernfs_of(file);
const struct kernfs_ops *ops;
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 5713eb32a45e..afa6b89e864f 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -176,7 +176,8 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to)
EXPORT_SYMBOL_GPL(nfs_file_read);

int
-nfs_file_mmap(struct file * file, struct vm_area_struct * vma)
+nfs_file_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct inode *inode = file_inode(file);
int status;
@@ -186,7 +187,7 @@ nfs_file_mmap(struct file * file, struct vm_area_struct * vma)
/* Note: generic_file_mmap() returns ENOSYS on nommu systems
* so we call that before revalidating the mapping
*/
- status = generic_file_mmap(file, vma);
+ status = generic_file_mmap(file, vma, 0);
if (!status) {
vma->vm_ops = &nfs_file_vm_ops;
status = nfs_revalidate_mapping(inode, file->f_mapping);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index dc456416d2be..8b913079684d 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -370,7 +370,7 @@ int nfs_rename(struct inode *, struct dentry *,
int nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync);
loff_t nfs_file_llseek(struct file *, loff_t, int);
ssize_t nfs_file_read(struct kiocb *, struct iov_iter *);
-int nfs_file_mmap(struct file *, struct vm_area_struct *);
+int nfs_file_mmap(struct file *, struct vm_area_struct *, unsigned long);
ssize_t nfs_file_write(struct kiocb *, struct iov_iter *);
int nfs_file_release(struct inode *, struct file *);
int nfs_lock(struct file *, int, struct file_lock *);
diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
index c5fa3dee72fc..71c5a24d78ce 100644
--- a/fs/nilfs2/file.c
+++ b/fs/nilfs2/file.c
@@ -126,7 +126,8 @@ static const struct vm_operations_struct nilfs_file_vm_ops = {
.page_mkwrite = nilfs_page_mkwrite,
};

-static int nilfs_file_mmap(struct file *file, struct vm_area_struct *vma)
+static int nilfs_file_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
file_accessed(file);
vma->vm_ops = &nilfs_file_vm_ops;
diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 28f38d813ad2..9b8fda1279e9 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -584,7 +584,8 @@ static long orangefs_ioctl(struct file *file, unsigned int cmd, unsigned long ar
/*
* Memory map a region of a file.
*/
-static int orangefs_file_mmap(struct file *file, struct vm_area_struct *vma)
+static int orangefs_file_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
gossip_debug(GOSSIP_FILE_DEBUG,
"orangefs_file_mmap: called on %s\n",
@@ -597,7 +598,7 @@ static int orangefs_file_mmap(struct file *file, struct vm_area_struct *vma)
vma->vm_flags &= ~VM_RAND_READ;

/* Use readonly mmap since we cannot support writable maps. */
- return generic_file_readonly_mmap(file, vma);
+ return generic_file_readonly_mmap(file, vma, 0);
}

#define mapping_nrpages(idata) ((idata)->nrpages)
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index e250910cffc8..4b7d31616985 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -277,15 +277,16 @@ static long proc_reg_compat_ioctl(struct file *file, unsigned int cmd, unsigned
}
#endif

-static int proc_reg_mmap(struct file *file, struct vm_area_struct *vma)
+static int proc_reg_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct proc_dir_entry *pde = PDE(file_inode(file));
int rv = -EIO;
- int (*mmap)(struct file *, struct vm_area_struct *);
+ int (*mmap)(struct file *, struct vm_area_struct *, unsigned long);
if (use_pde(pde)) {
mmap = pde->proc_fops->mmap;
if (mmap)
- rv = mmap(file, vma);
+ rv = mmap(file, vma, map_flags);
unuse_pde(pde);
}
return rv;
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 885d445afa0d..36463814ffc1 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -406,7 +406,8 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma,
return remap_oldmem_pfn_range(vma, from, pfn, size, prot);
}

-static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+static int mmap_vmcore(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
size_t size = vma->vm_end - vma->vm_start;
u64 start, end, len, tsz;
@@ -485,7 +486,8 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
return -EAGAIN;
}
#else
-static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+static int mmap_vmcore(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
return -ENOSYS;
}
diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c
index 2ef7ce75c062..a41eba2c5ff9 100644
--- a/fs/ramfs/file-nommu.c
+++ b/fs/ramfs/file-nommu.c
@@ -32,7 +32,8 @@ static unsigned long ramfs_nommu_get_unmapped_area(struct file *file,
unsigned long len,
unsigned long pgoff,
unsigned long flags);
-static int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma);
+static int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags);

static unsigned ramfs_mmap_capabilities(struct file *file)
{
@@ -257,7 +258,8 @@ static unsigned long ramfs_nommu_get_unmapped_area(struct file *file,
/*
* set up a mapping for shared memory segments
*/
-static int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma)
+static int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if (!(vma->vm_flags & (VM_SHARED | VM_MAYSHARE)))
return -ENOSYS;
diff --git a/fs/romfs/mmap-nommu.c b/fs/romfs/mmap-nommu.c
index 1118a0dc6b45..60a893b5e864 100644
--- a/fs/romfs/mmap-nommu.c
+++ b/fs/romfs/mmap-nommu.c
@@ -65,7 +65,8 @@ static unsigned long romfs_get_unmapped_area(struct file *file,
* permit a R/O mapping to be made directly through onto an MTD device if
* possible
*/
-static int romfs_mmap(struct file *file, struct vm_area_struct *vma)
+static int romfs_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -ENOSYS;
}
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 8cad0b19b404..9edcd7f68c0b 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1612,11 +1612,12 @@ static const struct vm_operations_struct ubifs_file_vm_ops = {
.page_mkwrite = ubifs_vm_page_mkwrite,
};

-static int ubifs_file_mmap(struct file *file, struct vm_area_struct *vma)
+static int ubifs_file_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
int err;

- err = generic_file_mmap(file, vma);
+ err = generic_file_mmap(file, vma, 0);
if (err)
return err;
vma->vm_ops = &ubifs_file_vm_ops;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index c4893e226fd8..cacc0162a41a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1146,9 +1146,8 @@ static const struct vm_operations_struct xfs_file_vm_ops = {
};

STATIC int
-xfs_file_mmap(
- struct file *filp,
- struct vm_area_struct *vma)
+xfs_file_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
file_accessed(filp);
vma->vm_ops = &xfs_file_vm_ops;
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 663d80358057..3de33c5e374e 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -214,7 +214,8 @@ void drm_gem_vm_open(struct vm_area_struct *vma);
void drm_gem_vm_close(struct vm_area_struct *vma);
int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,
struct vm_area_struct *vma);
-int drm_gem_mmap(struct file *filp, struct vm_area_struct *vma);
+int drm_gem_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags);

/**
* drm_gem_object_get - acquire a GEM buffer object reference
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6e1fd5d21248..4c6d0d9db8e3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1673,7 +1673,7 @@ struct file_operations {
unsigned int (*poll) (struct file *, struct poll_table_struct *);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
- int (*mmap) (struct file *, struct vm_area_struct *);
+ int (*mmap) (struct file *, struct vm_area_struct *, unsigned long);
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *, fl_owner_t id);
int (*release) (struct inode *, struct file *);
@@ -1743,9 +1743,10 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
return file->f_op->write_iter(kio, iter);
}

-static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
+static inline int call_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long flags)
{
- return file->f_op->mmap(file, vma);
+ return file->f_op->mmap(file, vma, 0);
}

ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
@@ -2864,8 +2865,10 @@ extern int set_blocksize(struct block_device *, int);
extern int sb_set_blocksize(struct super_block *, int);
extern int sb_min_blocksize(struct super_block *, int);

-extern int generic_file_mmap(struct file *, struct vm_area_struct *);
-extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
+extern int generic_file_mmap(struct file *, struct vm_area_struct *,
+ unsigned long);
+extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *,
+ unsigned long);
extern ssize_t generic_write_checks(struct kiocb *, struct iov_iter *);
extern ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *);
extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *);
diff --git a/ipc/shm.c b/ipc/shm.c
index 28a444861a8f..89d2e92971f5 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -411,7 +411,8 @@ static struct mempolicy *shm_get_policy(struct vm_area_struct *vma,
}
#endif

-static int shm_mmap(struct file *file, struct vm_area_struct *vma)
+static int shm_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct shm_file_data *sfd = shm_file_data(file);
int ret;
@@ -424,7 +425,7 @@ static int shm_mmap(struct file *file, struct vm_area_struct *vma)
if (ret)
return ret;

- ret = call_mmap(sfd->file, vma);
+ ret = call_mmap(sfd->file, vma, 0);
if (ret) {
shm_close(vma);
return ret;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 426c2ffba16d..1a32d165db88 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5219,7 +5219,8 @@ static const struct vm_operations_struct perf_mmap_vmops = {
.page_mkwrite = perf_mmap_fault,
};

-static int perf_mmap(struct file *file, struct vm_area_struct *vma)
+static int perf_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct perf_event *event = file->private_data;
unsigned long user_locked, user_lock_limit;
diff --git a/kernel/kcov.c b/kernel/kcov.c
index cd771993f96f..453c484ac00a 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -132,7 +132,8 @@ void kcov_task_exit(struct task_struct *t)
kcov_put(kcov);
}

-static int kcov_mmap(struct file *filep, struct vm_area_struct *vma)
+static int kcov_mmap(struct file *filep, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
int res = 0;
void *area;
diff --git a/kernel/relay.c b/kernel/relay.c
index 39a9dfc69486..58dee7ee8dbb 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -906,7 +906,8 @@ static int relay_file_open(struct inode *inode, struct file *filp)
*
* Calls upon relay_mmap_buf() to map the file into user space.
*/
-static int relay_file_mmap(struct file *filp, struct vm_area_struct *vma)
+static int relay_file_mmap(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct rchan_buf *buf = filp->private_data;
return relay_mmap_buf(buf, vma);
diff --git a/mm/filemap.c b/mm/filemap.c
index a49702445ce0..2457e34d10e0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2569,7 +2569,8 @@ const struct vm_operations_struct generic_file_vm_ops = {

/* This is used for a general mmap of a disk file */

-int generic_file_mmap(struct file * file, struct vm_area_struct * vma)
+int generic_file_mmap(struct file * file, struct vm_area_struct * vma,
+ unsigned long map_flags)
{
struct address_space *mapping = file->f_mapping;

@@ -2583,18 +2584,21 @@ int generic_file_mmap(struct file * file, struct vm_area_struct * vma)
/*
* This is for filesystems which do not implement ->writepage.
*/
-int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma)
+int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
return -EINVAL;
- return generic_file_mmap(file, vma);
+ return generic_file_mmap(file, vma, 0);
}
#else
-int generic_file_mmap(struct file * file, struct vm_area_struct * vma)
+int generic_file_mmap(struct file * file, struct vm_area_struct * vma,
+ unsigned long map_flags)
{
return -ENOSYS;
}
-int generic_file_readonly_mmap(struct file * file, struct vm_area_struct * vma)
+int generic_file_readonly_mmap(struct file * file, struct vm_area_struct * vma,
+ unsigned long map_flags)
{
return -ENOSYS;
}
diff --git a/mm/mmap.c b/mm/mmap.c
index f19efcf75418..744faae86781 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1686,7 +1686,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* new file must not have been exposed to user-space, yet.
*/
vma->vm_file = get_file(file);
- error = call_mmap(file, vma);
+ error = call_mmap(file, vma, 0);
if (error)
goto unmap_and_free_vma;

diff --git a/mm/nommu.c b/mm/nommu.c
index fc184f597d59..3eb3bd76c405 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1089,7 +1089,7 @@ static int do_mmap_shared_file(struct vm_area_struct *vma)
{
int ret;

- ret = call_mmap(vma->vm_file, vma);
+ ret = call_mmap(vma->vm_file, vma, 0);
if (ret == 0) {
vma->vm_region->vm_top = vma->vm_region->vm_end;
return 0;
@@ -1120,7 +1120,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
* - VM_MAYSHARE will be set if it may attempt to share
*/
if (capabilities & NOMMU_MAP_DIRECT) {
- ret = call_mmap(vma->vm_file, vma);
+ ret = call_mmap(vma->vm_file, vma, 0);
if (ret == 0) {
/* shouldn't return success if we're not sharing */
BUG_ON(!(vma->vm_flags & VM_MAYSHARE));
diff --git a/mm/shmem.c b/mm/shmem.c
index b0aa6075d164..f3f9509c7486 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2122,7 +2122,8 @@ int shmem_lock(struct file *file, int lock, struct user_struct *user)
return retval;
}

-static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
+static int shmem_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
file_accessed(file);
vma->vm_ops = &shmem_vm_ops;
diff --git a/net/socket.c b/net/socket.c
index bf2122691fba..7708f87f1ab6 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -115,7 +115,8 @@ unsigned int sysctl_net_busy_poll __read_mostly;

static ssize_t sock_read_iter(struct kiocb *iocb, struct iov_iter *to);
static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from);
-static int sock_mmap(struct file *file, struct vm_area_struct *vma);
+static int sock_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags);

static int sock_close(struct inode *inode, struct file *file);
static unsigned int sock_poll(struct file *file,
@@ -1100,7 +1101,8 @@ static unsigned int sock_poll(struct file *file, poll_table *wait)
return busy_flag | sock->ops->poll(file, sock, wait);
}

-static int sock_mmap(struct file *file, struct vm_area_struct *vma)
+static int sock_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct socket *sock = file->private_data;

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 00eed842c491..802c801a38dd 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -215,7 +215,8 @@ static ssize_t sel_read_handle_status(struct file *filp, char __user *buf,
}

static int sel_mmap_handle_status(struct file *filp,
- struct vm_area_struct *vma)
+ struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct page *status = filp->private_data;
unsigned long size = vma->vm_end - vma->vm_start;
@@ -444,7 +445,8 @@ static const struct vm_operations_struct sel_mmap_policy_ops = {
.page_mkwrite = sel_mmap_policy_fault,
};

-static int sel_mmap_policy(struct file *filp, struct vm_area_struct *vma)
+static int sel_mmap_policy(struct file *filp, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
if (vma->vm_flags & VM_SHARED) {
/* do not allow mprotect to make mapping writable */
diff --git a/sound/core/compress_offload.c b/sound/core/compress_offload.c
index fec1dfdb14ad..884cefaf906e 100644
--- a/sound/core/compress_offload.c
+++ b/sound/core/compress_offload.c
@@ -391,7 +391,8 @@ static ssize_t snd_compr_read(struct file *f, char __user *buf,
return retval;
}

-static int snd_compr_mmap(struct file *f, struct vm_area_struct *vma)
+static int snd_compr_mmap(struct file *f, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
return -ENXIO;
}
diff --git a/sound/core/hwdep.c b/sound/core/hwdep.c
index a73baa1242be..070b83091c60 100644
--- a/sound/core/hwdep.c
+++ b/sound/core/hwdep.c
@@ -260,7 +260,8 @@ static long snd_hwdep_ioctl(struct file * file, unsigned int cmd,
return -ENOTTY;
}

-static int snd_hwdep_mmap(struct file * file, struct vm_area_struct * vma)
+static int snd_hwdep_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct snd_hwdep *hw = file->private_data;
if (hw->ops.mmap)
diff --git a/sound/core/info.c b/sound/core/info.c
index bcf6a48cc70d..6551d90aac2c 100644
--- a/sound/core/info.c
+++ b/sound/core/info.c
@@ -232,7 +232,8 @@ static long snd_info_entry_ioctl(struct file *file, unsigned int cmd,
file, cmd, arg);
}

-static int snd_info_entry_mmap(struct file *file, struct vm_area_struct *vma)
+static int snd_info_entry_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
struct inode *inode = file_inode(file);
struct snd_info_private_data *data;
diff --git a/sound/core/init.c b/sound/core/init.c
index b4365bcf28a7..b83ca6424fae 100644
--- a/sound/core/init.c
+++ b/sound/core/init.c
@@ -356,7 +356,8 @@ static long snd_disconnect_ioctl(struct file *file,
return -ENODEV;
}

-static int snd_disconnect_mmap(struct file *file, struct vm_area_struct *vma)
+static int snd_disconnect_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
return -ENODEV;
}
diff --git a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c
index e49f448ee04f..abcace0d7234 100644
--- a/sound/core/oss/pcm_oss.c
+++ b/sound/core/oss/pcm_oss.c
@@ -2716,7 +2716,8 @@ static unsigned int snd_pcm_oss_poll(struct file *file, poll_table * wait)
return mask;
}

-static int snd_pcm_oss_mmap(struct file *file, struct vm_area_struct *area)
+static int snd_pcm_oss_mmap(struct file *file, struct vm_area_struct *area,
+ unsigned long map_flags)
{
struct snd_pcm_oss_file *pcm_oss_file;
struct snd_pcm_substream *substream = NULL;
diff --git a/sound/oss/soundcard.c b/sound/oss/soundcard.c
index b70c7c8f9c5d..b6e8ba2ec452 100644
--- a/sound/oss/soundcard.c
+++ b/sound/oss/soundcard.c
@@ -420,7 +420,8 @@ static unsigned int sound_poll(struct file *file, poll_table * wait)
return 0;
}

-static int sound_mmap(struct file *file, struct vm_area_struct *vma)
+static int sound_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
int dev_class;
unsigned long size;
diff --git a/sound/oss/swarm_cs4297a.c b/sound/oss/swarm_cs4297a.c
index 97899352b15f..4a020d2a53ab 100644
--- a/sound/oss/swarm_cs4297a.c
+++ b/sound/oss/swarm_cs4297a.c
@@ -1962,7 +1962,8 @@ static unsigned int cs4297a_poll(struct file *file,
}


-static int cs4297a_mmap(struct file *file, struct vm_area_struct *vma)
+static int cs4297a_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
/* XXXKW currently no mmap support */
return -EINVAL;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 82987d457b8b..1b95e925e8e2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2398,7 +2398,8 @@ static const struct vm_operations_struct kvm_vcpu_vm_ops = {
.fault = kvm_vcpu_fault,
};

-static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma)
+static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma,
+ unsigned long map_flags)
{
vma->vm_ops = &kvm_vcpu_vm_ops;
return 0;

Kirill A. Shutemov

unread,
Aug 16, 2017, 7:20:07 AM8/16/17
to
Since we looking into mmap(2) ABI, maybe we should consider re-defining
MAP_DENYWRITE and MAP_EXECUTABLE as 0 in hope that we would be able to
re-use these bits in the future? These flags are ignored now anyway.

--
Kirill A. Shutemov

Kirill A. Shutemov

unread,
Aug 16, 2017, 7:20:09 AM8/16/17
to
On Wed, Aug 16, 2017 at 12:44:28AM -0700, Dan Williams wrote:
> @@ -1411,6 +1422,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>
> /* fall through */
> case MAP_PRIVATE:
> + if ((flags & (MAP_PRIVATE|MAP_DIRECT))
> + == (MAP_PRIVATE|MAP_DIRECT))
> + return -EINVAL;

We've already checked for MAP_PRIVATE in this codepath. Simple (flags &
MAP_DIRECT) would be enough.

--
Kirill A. Shutemov

Dan Williams

unread,
Aug 16, 2017, 12:30:09 PM8/16/17
to
True, willl fix.

Dan Williams

unread,
Aug 16, 2017, 12:40:08 PM8/16/17
to
Actually, no, because of the fallthrough we need to check MAP_SHARED
or MAP_PRIVATE along with MAP_DIRECT.

Dan Williams

unread,
Aug 16, 2017, 12:40:08 PM8/16/17
to
Yes, we can make these -EOPNOTSUPP in the new syscall.

Kirill A. Shutemov

unread,
Aug 16, 2017, 12:50:08 PM8/16/17
to
You cannot detect them, if we would redefine them as 0. :)

--
Kirill A. Shutemov

Dan Williams

unread,
Aug 16, 2017, 1:00:10 PM8/16/17
to
On Wed, Aug 16, 2017 at 9:47 AM, Kirill A. Shutemov
Yes, we can, there will now be missing bits in
LEGACY_MAP_SUPPORTED_MASK that will fail those bit values until we
re-define them. Everything else is a an exercise for libc about what
it wants to do when it sees those values.

Dan Williams

unread,
Aug 16, 2017, 1:30:15 PM8/16/17
to
[..]
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 0e1de42c836f..7c9e3d11027f 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -7,16 +7,6 @@
> #include <linux/atomic.h>
> #include <uapi/linux/mman.h>
>
> -#ifndef MAP_32BIT
> -#define MAP_32BIT 0
> -#endif
> -#ifndef MAP_HUGE_2MB
> -#define MAP_HUGE_2MB 0
> -#endif
> -#ifndef MAP_HUGE_1GB
> -#define MAP_HUGE_1GB 0
> -#endif

This was inadvertent, we need this to build on non-x86 archs, will fix.

Dan Williams

unread,
Aug 16, 2017, 7:50:10 PM8/16/17
to
On Wed, Aug 16, 2017 at 12:44 AM, Dan Williams <dan.j.w...@intel.com> wrote:
For easier testing / evaluation of these patches I went ahead and
rebased them to v4.13-rc5, fixed up 0-day reports from the ->mmap()
conversion, and published a for-4.14/map-direct branch here:

https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=for-4.14/map-direct

Dan Williams

unread,
Aug 23, 2017, 8:00:06 PM8/23/17
to
The mmap(2) syscall suffers from the ABI anti-pattern of not validating
unknown flags. However, proposals like MAP_SYNC and MAP_DIRECT need a
mechanism to define new behavior that is known to fail on older kernels
without the support. Define a new mmap3 syscall that checks for
unsupported flags at syscall entry and add a 'mmap_supported_mask' to
'struct file_operations' so generic code can validate the ->mmap()
handler knows about the specified flags. This also arranges for the
flags to be passed to the handler so it can do further local validation
if the requested behavior can be fulfilled.

Cc: Jan Kara <ja...@suse.cz>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Andrew Morton <ak...@linux-foundation.org>
Suggested-by: Andy Lutomirski <lu...@kernel.org>
Signed-off-by: Dan Williams <dan.j.w...@intel.com>
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/fs.h | 1 +
include/linux/mm.h | 2 +-
include/linux/mman.h | 42 ++++++++++++++++++++++++++++++++
include/linux/syscalls.h | 3 ++
mm/mmap.c | 32 ++++++++++++++++++++++--
7 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..0618b5b38b45 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
382 i386 pkey_free sys_pkey_free
383 i386 statx sys_statx
384 i386 arch_prctl sys_arch_prctl compat_sys_arch_prctl
+385 i386 mmap3 sys_mmap_pgoff_strict
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..e204c736d7e9 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
330 common pkey_alloc sys_pkey_alloc
331 common pkey_free sys_pkey_free
332 common statx sys_statx
+333 common mmap3 sys_mmap_pgoff_strict

#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 33d1ee8f51be..db42da9f98c4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1674,6 +1674,7 @@ struct file_operations {
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *, unsigned long);
+ unsigned long mmap_supported_mask;
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *, fl_owner_t id);
int (*release) (struct inode *, struct file *);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..49eef48da4b7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2090,7 +2090,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo

extern unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
- struct list_head *uf);
+ struct list_head *uf, unsigned long flags);
extern unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate,
diff --git a/include/linux/mman.h b/include/linux/mman.h
index c8367041fafd..64b6cb3dec70 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -7,6 +7,48 @@
#include <linux/atomic.h>
#include <uapi/linux/mman.h>

+/*
+ * Arrange for undefined architecture specific flags to be rejected by
+ * default.
+ */
+#ifndef MAP_32BIT
+#define MAP_32BIT 0
+#endif
+#ifndef MAP_HUGE_2MB
+#define MAP_HUGE_2MB 0
+#endif
+#ifndef MAP_HUGE_1GB
+#define MAP_HUGE_1GB 0
+#endif
+#ifndef MAP_UNINITIALIZED
+#define MAP_UNINITIALIZED 0
+#endif
+
+/*
+ * The historical set of flags that all mmap implementations implicitly
+ * support when file_operations.mmap_supported_mask is zero. With the
+ * mmap3 syscall the deprecated MAP_DENYWRITE and MAP_EXECUTABLE bit
+ * values are explicitly rejected with EOPNOTSUPP rather than being
+ * silently accepted.
+ */
+#define LEGACY_MAP_SUPPORTED_MASK (MAP_SHARED \
+ | MAP_PRIVATE \
+ | MAP_FIXED \
+ | MAP_ANONYMOUS \
+ | MAP_UNINITIALIZED \
+ | MAP_GROWSDOWN \
+ | MAP_LOCKED \
+ | MAP_NORESERVE \
+ | MAP_POPULATE \
+ | MAP_NONBLOCK \
+ | MAP_STACK \
+ | MAP_HUGETLB \
+ | MAP_32BIT \
+ | MAP_HUGE_2MB \
+ | MAP_HUGE_1GB)
+
+#define MAP_SUPPORTED_MASK (LEGACY_MAP_SUPPORTED_MASK)
+
extern int sysctl_overcommit_memory;
extern int sysctl_overcommit_ratio;
extern unsigned long sysctl_overcommit_kbytes;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 3cb15ea48aee..c0e0c99cf4ad 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -858,6 +858,9 @@ asmlinkage long sys_perf_event_open(
asmlinkage long sys_mmap_pgoff(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
+asmlinkage long sys_mmap_pgoff_strict(unsigned long addr, unsigned long len,
+ unsigned long prot, unsigned long flags,
+ unsigned long fd, unsigned long pgoff);
asmlinkage long sys_old_mmap(struct mmap_arg_struct __user *arg);
asmlinkage long sys_name_to_handle_at(int dfd, const char __user *name,
struct file_handle __user *handle,
diff --git a/mm/mmap.c b/mm/mmap.c
index 744faae86781..386706831d67 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1686,7 +1712,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* new file must not have been exposed to user-space, yet.
*/
vma->vm_file = get_file(file);

Dan Williams

unread,
Aug 23, 2017, 8:00:06 PM8/23/17
to
When a filesystem sees this flag set it will not allow changes to the
file-offset to physical-block-offset relationship of any extent in the
file. The extent of the extents covered by the global S_IOMAP_SEALED is
filesystem specific. In other words it is similar to the inode-wide
XFS_DIFLAG2_REFLINK flag where we make the distinction apply globally to
the inode even though we could theoretically limit that effect to a
sub-range of the file.

The interface that sets this flag (mmap(..., MAP_DIRECT, ...)) will be
careful to document that it is implementation specific whether the
'sealed' restrictions apply to a sub-range or the whole file.
Applications should be prepared for unrelated ranges in the file to be
effected.

The term 'sealed' is used instead of 'immutable' to better indicate that
this is a file property that is temporary and can be undone.

Cc: Jan Kara <ja...@suse.cz>
Cc: Jeff Moyer <jmo...@redhat.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Dave Chinner <da...@fromorbit.com>
Cc: Alexander Viro <vi...@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darric...@oracle.com>
Cc: Ross Zwisler <ross.z...@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.w...@intel.com>
---
index c09c16b1ad3b..241f3a272f49 100644
index 9c0c7a920304..845587e6928b 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1730,6 +1730,12 @@ xfs_ioc_swapext(
goto out_put_tmp_file;
}

+ if (IS_IOMAP_SEALED(file_inode(f.file)) ||
+ IS_IOMAP_SEALED(file_inode(tmp.file))) {
+ error = -EINVAL;
+ goto out_put_tmp_file;
+ }
+
/*
* We need to ensure that the fds passed in point to XFS inodes
* before we cast and access them as XFS structures as we have no
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 47249bbe973c..33d1ee8f51be 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1830,6 +1830,7 @@ struct super_operations {
#else
#define S_DAX 0 /* Make all the DAX code disappear */
#endif
+#define S_IOMAP_SEALED 16384 /* logical-to-physical extent map is fixed */

/*
* Note that nosuid etc flags are inode-specific: setting some file-system
@@ -1868,6 +1869,7 @@ struct super_operations {
#define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
#define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)
#define IS_DAX(inode) ((inode)->i_flags & S_DAX)
+#define IS_IOMAP_SEALED(inode) ((inode)->i_flags & S_IOMAP_SEALED)

#define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \
(inode)->i_rdev == WHITEOUT_DEV)
diff --git a/mm/filemap.c b/mm/filemap.c
index 2457e34d10e0..4cbcf9d589fa 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c

Dan Williams

unread,
Aug 23, 2017, 8:00:06 PM8/23/17
to
A common pattern for granting a privilege to an unprivileged process is
to pass it an established file descriptor over a unix domain socket.
This enables fine grained access to the MAP_DIRECT mechanism instead of
requiring the mapping process have CAP_LINUX_IMMUTABLE.

Cc: Jan Kara <ja...@suse.cz>
Cc: Jeff Moyer <jmo...@redhat.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Dave Chinner <da...@fromorbit.com>
Cc: Alexander Viro <vi...@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darric...@oracle.com>
Cc: Ross Zwisler <ross.z...@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.w...@intel.com>
---
fs/fcntl.c | 15 +++++++++++++++
include/linux/fs.h | 5 +++--
include/uapi/linux/fcntl.h | 5 +++++
mm/mmap.c | 3 ++-
4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 3b01b646e528..f2375c406e6f 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -318,6 +318,18 @@ static long fcntl_rw_hint(struct file *file, unsigned int cmd,
}
}

+static int fcntl_map_direct(struct file *filp)
+{
+ if (!capable(CAP_LINUX_IMMUTABLE))
+ return -EACCES;
+
+ spin_lock(&filp->f_lock);
+ filp->f_map_direct = 1;
+ spin_unlock(&filp->f_lock);
+
+ return 0;
+}
+
static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
struct file *filp)
{
@@ -425,6 +437,9 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
case F_SET_FILE_RW_HINT:
err = fcntl_rw_hint(filp, cmd, arg);
break;
+ case F_MAP_DIRECT:
+ err = fcntl_map_direct(filp);
+ break;
default:
break;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index db42da9f98c4..ec2e1d6bf22c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -855,11 +855,12 @@ struct file {
const struct file_operations *f_op;

/*
- * Protects f_ep_links, f_flags.
+ * Protects f_ep_links, f_flags, f_write_hint, and f_map_direct.
* Must not be taken from IRQ context.
*/
spinlock_t f_lock;
- enum rw_hint f_write_hint;
+ enum rw_hint f_write_hint:3;
+ unsigned int f_map_direct:1;
atomic_long_t f_count;
unsigned int f_flags;
fmode_t f_mode;
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index ec69d55bcec7..2a57a503174e 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -53,6 +53,11 @@
#define F_SET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 14)

/*
+ * Enable MAP_DIRECT on the file without CAP_LINUX_IMMUTABLE
+ */
+#define F_MAP_DIRECT (F_LINUX_SPECIFIC_BASE + 15)
+
+/*
* Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
* used to clear any hints previously set.
*/
diff --git a/mm/mmap.c b/mm/mmap.c
index 32417b2a668c..cf5e0cb7d0e3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1399,7 +1399,8 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
if (flags & MAP_DIRECT) {
if (!(prot & PROT_WRITE))
return -EACCES;
- if (!capable(CAP_LINUX_IMMUTABLE))
+ if (!file->f_map_direct
+ && !capable(CAP_LINUX_IMMUTABLE))
return -EACCES;
}

Dan Williams

unread,
Aug 23, 2017, 8:00:06 PM8/23/17
to
Changes since v5 [1]:
* Compile fixes from a much improved coccinelle semantic patch (thanks
Julia!) that adds a 'flags' argument to all the ->mmap()
implementations in the kernel. (0day-kbuild-robot)

* Make the deprecated MAP_DENYWRITE and MAP_EXECUTABLE flags return
EOPNOTSUPP with the new mmap3() syscall. (Kirill)

* Minor changelog updates.

* Updated cover letter with a clarified summary and checklist of
questions to answer before proceeding further.

---

MAP_DIRECT is a mechanism to ask the kernel to atomically manage the
file-offset to physical-address block relationship of a mapping relative
to any memory-mapped access. It is complimentary to the proposed
MAP_SYNC mechanism which makes the same guarantee relative to cpu
faults. MAP_DIRECT goes a step further and makes this guarantee for
agents that may not generate mmu faults, but at the cost of restricting
the kernel's ability to mutate the block-map at will.

MAP_SYNC is preferred for scenarios that want full filesystem feature
support while avoiding fsync/msync overhead, but also do not need to
contend with hypervisors or RDMA agents that do not give the kernel an
mmu fault. In other words, the need for MAP_DIRECT is driven by the
scarcity of SVM capable hardware (Shared Virtual Memory, where hardware
generates mmu faults), hypervisors like Xen that need to interrogate the
physical address layout of a file to maintain their own physical-address
mapping metadata outside the kernel, and peer-to-peer DMA use cases that
always bypass the mmu.

The MAP_DIRECT mechanism allows a filesystem to be used for capacity
provisioning and access control where these aforementioned applications
would otherwise be forced to roll a custom solution on top of a raw
device-file.

Questions:
1/ Is the definition of MAP_DIRECT constrained enough to allow us to
make the restrictions it imposes on the kernel finer grained over time?

2/ Do the XFS changes look sane? They attempt to avoid adding any
overhead to the non-MAP_DIRECT case at the expense of the new
i_mapdcount atomic counter in the XFS inode.

3/ While the generic MAP_DIRECT description warns that the block-map may
not be actually be immutable for the lifetime of the mapping it also
does not preclude a filesystem from making that guarantee. In fact,
Dave wants to be able to get a stable view of the physical mapping
[2], and Xen has a need to do the same [3]. Do we want userspace to
start making "XFS + MAP_DIRECT == Immutable" assumptions, or do we
need a separate mechanism for that guarantee?

[1]: https://lkml.org/lkml/2017/8/16/114
[2]: https://www.mail-archive.com/linux-...@vger.kernel.org/msg1467677.html
[3]: https://lists.xen.org/archives/html/xen-devel/2017-04/msg00419.html

---

Dan Williams (5):
vfs: add flags parameter to ->mmap() in 'struct file_operations'
fs, xfs: introduce S_IOMAP_SEALED
mm: introduce mmap3 for safely defining new mmap flags
fs, xfs: introduce MAP_DIRECT for creating block-map-atomic file ranges
fs, fcntl: add F_MAP_DIRECT


arch/arc/kernel/arc_hostlink.c | 3 -
arch/mips/kernel/vdso.c | 2
arch/powerpc/kernel/proc_powerpc.c | 3 -
arch/powerpc/kvm/book3s_64_vio.c | 3 -
arch/powerpc/platforms/cell/spufs/file.c | 21 ++--
arch/powerpc/platforms/powernv/opal-prd.c | 3 -
arch/um/drivers/mmapper_kern.c | 3 -
arch/x86/entry/syscalls/syscall_32.tbl | 1
arch/x86/entry/syscalls/syscall_64.tbl | 1
drivers/android/binder.c | 3 -
drivers/char/agp/frontend.c | 3 -
drivers/char/bsr.c | 3 -
drivers/char/hpet.c | 6 +
drivers/char/mbcs.c | 3 -
drivers/char/mbcs.h | 3 -
drivers/char/mem.c | 11 +-
drivers/char/mspec.c | 9 +-
drivers/char/uv_mmtimer.c | 6 +
drivers/dax/device.c | 3 -
drivers/dma-buf/dma-buf.c | 4 +
drivers/firewire/core-cdev.c | 3 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 -
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +
drivers/gpu/drm/arc/arcpgu_drv.c | 5 +
drivers/gpu/drm/ast/ast_drv.h | 3 -
drivers/gpu/drm/ast/ast_ttm.c | 3 -
drivers/gpu/drm/bochs/bochs.h | 3 -
drivers/gpu/drm/bochs/bochs_mm.c | 3 -
drivers/gpu/drm/cirrus/cirrus_drv.h | 3 -
drivers/gpu/drm/cirrus/cirrus_ttm.c | 3 -
drivers/gpu/drm/drm_gem.c | 3 -
drivers/gpu/drm/drm_gem_cma_helper.c | 6 +
drivers/gpu/drm/drm_vm.c | 3 -
drivers/gpu/drm/etnaviv/etnaviv_drv.h | 3 -
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 5 +
drivers/gpu/drm/exynos/exynos_drm_gem.c | 5 +
drivers/gpu/drm/exynos/exynos_drm_gem.h | 3 -
drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.h | 3 -
drivers/gpu/drm/hisilicon/hibmc/hibmc_ttm.c | 3 -
drivers/gpu/drm/i810/i810_dma.c | 3 -
drivers/gpu/drm/i915/i915_gem_dmabuf.c | 2
drivers/gpu/drm/mediatek/mtk_drm_gem.c | 5 +
drivers/gpu/drm/mediatek/mtk_drm_gem.h | 3 -
drivers/gpu/drm/mgag200/mgag200_drv.h | 3 -
drivers/gpu/drm/mgag200/mgag200_ttm.c | 3 -
drivers/gpu/drm/msm/msm_drv.h | 3 -
drivers/gpu/drm/msm/msm_gem.c | 5 +
drivers/gpu/drm/nouveau/nouveau_ttm.c | 5 +
drivers/gpu/drm/nouveau/nouveau_ttm.h | 2
drivers/gpu/drm/omapdrm/omap_drv.h | 3 -
drivers/gpu/drm/omapdrm/omap_gem.c | 5 +
drivers/gpu/drm/qxl/qxl_drv.h | 3 -
drivers/gpu/drm/qxl/qxl_ttm.c | 3 -
drivers/gpu/drm/radeon/radeon_drv.c | 3 -
drivers/gpu/drm/radeon/radeon_ttm.c | 3 -
drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 5 +
drivers/gpu/drm/rockchip/rockchip_drm_gem.h | 3 -
drivers/gpu/drm/tegra/gem.c | 5 +
drivers/gpu/drm/tegra/gem.h | 3 -
drivers/gpu/drm/udl/udl_drv.h | 3 -
drivers/gpu/drm/udl/udl_gem.c | 5 +
drivers/gpu/drm/vc4/vc4_bo.c | 5 +
drivers/gpu/drm/vc4/vc4_drv.h | 3 -
drivers/gpu/drm/vgem/vgem_drv.c | 7 +
drivers/gpu/drm/virtio/virtgpu_drv.h | 3 -
drivers/gpu/drm/virtio/virtgpu_ttm.c | 3 -
drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 3 -
drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 3 -
drivers/hsi/clients/cmt_speech.c | 3 -
drivers/hwtracing/intel_th/msu.c | 3 -
drivers/hwtracing/stm/core.c | 3 -
drivers/infiniband/core/uverbs_main.c | 3 -
drivers/infiniband/hw/hfi1/file_ops.c | 6 +
drivers/infiniband/hw/qib/qib_file_ops.c | 5 +
drivers/media/v4l2-core/v4l2-dev.c | 3 -
drivers/misc/aspeed-lpc-ctrl.c | 3 -
drivers/misc/cxl/api.c | 5 +
drivers/misc/cxl/cxl.h | 3 -
drivers/misc/cxl/file.c | 3 -
drivers/misc/genwqe/card_dev.c | 3 -
drivers/misc/mic/scif/scif_fd.c | 3 -
drivers/misc/mic/vop/vop_vringh.c | 3 -
drivers/misc/sgi-gru/grufile.c | 3 -
drivers/mtd/mtdchar.c | 3 -
drivers/pci/proc.c | 3 -
drivers/rapidio/devices/rio_mport_cdev.c | 3 -
drivers/sbus/char/flash.c | 3 -
drivers/sbus/char/jsflash.c | 3 -
drivers/scsi/cxlflash/superpipe.c | 5 +
drivers/scsi/sg.c | 3 -
drivers/staging/android/ashmem.c | 3 -
drivers/staging/comedi/comedi_fops.c | 3 -
.../staging/lustre/lustre/llite/llite_internal.h | 3 -
drivers/staging/lustre/lustre/llite/llite_mmap.c | 5 +
drivers/staging/vboxvideo/vbox_drv.h | 3 -
drivers/staging/vboxvideo/vbox_ttm.c | 3 -
drivers/staging/vme/devices/vme_user.c | 3 -
drivers/uio/uio.c | 3 -
drivers/usb/core/devio.c | 3 -
drivers/usb/mon/mon_bin.c | 3 -
drivers/vfio/vfio.c | 7 +
drivers/video/fbdev/core/fbmem.c | 3 -
drivers/video/fbdev/pxa3xx-gcu.c | 3 -
drivers/xen/gntalloc.c | 3 -
drivers/xen/gntdev.c | 3 -
drivers/xen/privcmd.c | 3 -
drivers/xen/xenbus/xenbus_dev_backend.c | 3 -
drivers/xen/xenfs/xenstored.c | 3 -
fs/9p/vfs_file.c | 10 +-
fs/aio.c | 3 -
fs/attr.c | 10 ++
fs/btrfs/file.c | 3 -
fs/ceph/addr.c | 3 -
fs/ceph/super.h | 3 -
fs/cifs/cifsfs.h | 6 +
fs/cifs/file.c | 10 +-
fs/coda/file.c | 5 +
fs/ecryptfs/file.c | 5 +
fs/ext2/file.c | 5 +
fs/ext4/file.c | 3 -
fs/f2fs/file.c | 3 -
fs/fcntl.c | 15 +++
fs/fuse/file.c | 8 +
fs/gfs2/file.c | 3 -
fs/hugetlbfs/inode.c | 3 -
fs/kernfs/file.c | 3 -
fs/ncpfs/mmap.c | 3 -
fs/ncpfs/ncp_fs.h | 2
fs/nfs/file.c | 5 +
fs/nfs/internal.h | 2
fs/nilfs2/file.c | 3 -
fs/ocfs2/mmap.c | 3 -
fs/ocfs2/mmap.h | 3 -
fs/open.c | 6 +
fs/orangefs/file.c | 5 +
fs/proc/inode.c | 7 +
fs/proc/vmcore.c | 6 +
fs/ramfs/file-nommu.c | 6 +
fs/read_write.c | 3 +
fs/romfs/mmap-nommu.c | 3 -
fs/ubifs/file.c | 5 +
fs/xfs/libxfs/xfs_bmap.c | 5 +
fs/xfs/xfs_bmap_util.c | 3 +
fs/xfs/xfs_file.c | 114 +++++++++++++++++++-
fs/xfs/xfs_inode.h | 1
fs/xfs/xfs_ioctl.c | 6 +
fs/xfs/xfs_super.c | 1
include/drm/drm_gem.h | 3 -
include/drm/drm_gem_cma_helper.h | 3 -
include/drm/drm_legacy.h | 3 -
include/linux/fs.h | 21 ++--
include/linux/mm.h | 2
include/linux/mman.h | 46 ++++++++
include/linux/syscalls.h | 3 +
include/misc/cxl.h | 3 -
include/uapi/asm-generic/mman.h | 1
include/uapi/linux/fcntl.h | 5 +
ipc/shm.c | 5 +
kernel/events/core.c | 3 -
kernel/kcov.c | 3 -
kernel/relay.c | 3 -
mm/filemap.c | 19 ++-
mm/mmap.c | 56 +++++++++-
mm/nommu.c | 4 -
mm/shmem.c | 3 -
net/socket.c | 6 +
security/selinux/selinuxfs.c | 6 +
sound/core/compress_offload.c | 3 -
sound/core/hwdep.c | 3 -
sound/core/info.c | 3 -
sound/core/init.c | 3 -
sound/core/oss/pcm_oss.c | 3 -
sound/core/pcm_native.c | 3 -
sound/oss/soundcard.c | 3 -
sound/oss/swarm_cs4297a.c | 3 -
virt/kvm/kvm_main.c | 3 -
177 files changed, 689 insertions(+), 234 deletions(-)

Dan Williams

unread,
Aug 23, 2017, 8:00:06 PM8/23/17
to
MAP_DIRECT is an mmap(2) flag with the following semantics:

MAP_DIRECT
When specified with MAP_SHARED a successful fault in this range
indicates that the kernel is maintaining the block map (user linear
address to file offset to physical address relationship) in a manner
that no external agent can observe any inconsistent changes. In other
words, the block map of the mapping is effectively pinned, or the kernel
is otherwise able to exchange a new physical extent atomically with
respect to any hardware / software agent. As implied by this definition
a successful fault in a MAP_DIRECT range bypasses kernel indirections
like the page-cache, and all updates are carried directly through to the
underlying file physical-address blocks (modulo cpu cache effects).

ETXTBSY may be returned to any third party operation on the file that
attempts to update the block map (allocate blocks / convert unwritten
extents / break shared extents). However, whether a filesystem returns
EXTBSY for a certain state of the block relative to a MAP_DIRECT mapping
is filesystem and kernel version dependent.

Some filesystems may extend these operation restrictions outside the
mapped range and return ETXTBSY to any file operations that might mutate
the block map. MAP_DIRECT faults may fail with a SIGBUS if the
filesystem needs to write the block map to satisfy the fault. For
example, if the mapping was established over a hole in a sparse file.

ERRORS
EACCES A MAP_DIRECT mapping was requested and PROT_WRITE was not set,
or the requesting process is missing CAP_LINUX_IMMUTABLE.

EINVAL MAP_ANONYMOUS or MAP_PRIVATE was specified with MAP_DIRECT.

EOPNOTSUPP The filesystem explicitly does not support the flag

SIGBUS Attempted to write a MAP_DIRECT mapping at a file offset that
might require block-map updates.

Cc: Jan Kara <ja...@suse.cz>
Cc: Jeff Moyer <jmo...@redhat.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Dave Chinner <da...@fromorbit.com>
Cc: Alexander Viro <vi...@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darric...@oracle.com>
Cc: Ross Zwisler <ross.z...@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.w...@intel.com>
---
fs/xfs/xfs_file.c | 115 ++++++++++++++++++++++++++++++++++++++-
fs/xfs/xfs_inode.h | 1
fs/xfs/xfs_super.c | 1
include/linux/mman.h | 6 ++
include/uapi/asm-generic/mman.h | 1
mm/mmap.c | 23 ++++++++
6 files changed, 142 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index cacc0162a41a..f82bf9416200 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -40,6 +40,7 @@
#include "xfs_iomap.h"
#include "xfs_reflink.h"

+#include <linux/mman.h>
#include <linux/dcache.h>
#include <linux/falloc.h>
#include <linux/pagevec.h>
@@ -1001,6 +1002,25 @@ xfs_file_llseek(
return vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
}

+static const struct vm_operations_struct xfs_file_vm_direct_ops;
+
+STATIC int
+xfs_vma_checks(
+ struct vm_area_struct *vma,
+ struct inode *inode)
+{
+ if (vma->vm_ops != &xfs_file_vm_direct_ops)
+ return 0;
+
+ if (xfs_is_reflink_inode(XFS_I(inode)))
+ return VM_FAULT_SIGBUS;
+
+ if (!IS_DAX(inode))
+ return VM_FAULT_SIGBUS;
+
+ return 0;
+}
+
+ return 0;
+
+ /* Otherwise do it the slow way */
+ xfs_ilock(ip, lock_flags);
+ if (atomic_dec_and_test(atomic))
+ return 1;
+ xfs_iunlock(ip, lock_flags);
+ return 0;
+}
+
static const struct vm_operations_struct xfs_file_vm_ops = {
.fault = xfs_filemap_fault,
.huge_fault = xfs_filemap_huge_fault,
@@ -1145,14 +1232,33 @@ static const struct vm_operations_struct xfs_file_vm_ops = {
.pfn_mkwrite = xfs_filemap_pfn_mkwrite,
};

+#define XFS_MAP_SUPPORTED (LEGACY_MAP_SUPPORTED_MASK | MAP_DIRECT)
+
STATIC int
-xfs_file_mmap(struct file *filp, struct vm_area_struct *vma,
- unsigned long map_flags)
+xfs_file_mmap(
+ struct file *filp,
+ struct vm_area_struct *vma,
+ unsigned long map_flags)
{
+ struct inode *inode = file_inode(filp);
+ struct xfs_inode *ip = XFS_I(inode);
+
+ if (map_flags & ~(XFS_MAP_SUPPORTED))
+ return -EOPNOTSUPP;
+
file_accessed(filp);
- vma->vm_ops = &xfs_file_vm_ops;
if (IS_DAX(file_inode(filp)))
vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
+
+ if (map_flags & MAP_DIRECT) {
+ xfs_ilock(ip, XFS_MMAPLOCK_EXCL|XFS_IOLOCK_EXCL);
+ vma->vm_ops = &xfs_file_vm_direct_ops;
+ inode->i_flags |= S_IOMAP_SEALED;
+ atomic_inc(&ip->i_mapdcount);
+ xfs_iunlock(ip, XFS_MMAPLOCK_EXCL|XFS_IOLOCK_EXCL);
+ } else
+ vma->vm_ops = &xfs_file_vm_ops;
+
return 0;
}

@@ -1174,6 +1280,7 @@ const struct file_operations xfs_file_operations = {
.fallocate = xfs_file_fallocate,
.clone_file_range = xfs_file_clone_range,
.dedupe_file_range = xfs_file_dedupe_range,
+ .mmap_supported_mask = XFS_MAP_SUPPORTED,
};

const struct file_operations xfs_dir_file_operations = {
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 0ee453de239a..50d3e1bca1a9 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -58,6 +58,7 @@ typedef struct xfs_inode {
mrlock_t i_lock; /* inode lock */
mrlock_t i_mmaplock; /* inode mmap IO lock */
atomic_t i_pincount; /* inode pin count */
+ atomic_t i_mapdcount; /* inode MAP_DIRECT count */
spinlock_t i_flags_lock; /* inode i_flags lock */
/* Miscellaneous state. */
unsigned long i_flags; /* see defined flags below */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 38aaacdbb8b3..88711e01e504 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1011,6 +1011,7 @@ xfs_fs_inode_init_once(

/* xfs inode */
atomic_set(&ip->i_pincount, 0);
+ atomic_set(&ip->i_mapdcount, 0);
spin_lock_init(&ip->i_flags_lock);

mrlock_init(&ip->i_mmaplock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER,
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 64b6cb3dec70..4bebb4ca0f7b 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -20,6 +20,9 @@
#ifndef MAP_HUGE_1GB
#define MAP_HUGE_1GB 0
#endif
+#ifndef MAP_DIRECT
+#define MAP_DIRECT 0
+#endif
#ifndef MAP_UNINITIALIZED
#define MAP_UNINITIALIZED 0
#endif
@@ -47,7 +50,8 @@
| MAP_HUGE_2MB \
| MAP_HUGE_1GB)

-#define MAP_SUPPORTED_MASK (LEGACY_MAP_SUPPORTED_MASK)
+#define MAP_SUPPORTED_MASK (LEGACY_MAP_SUPPORTED_MASK \
+ | MAP_DIRECT)

extern int sysctl_overcommit_memory;
extern int sysctl_overcommit_ratio;
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index 7162cd4cca73..1e7dda3bc56a 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -12,6 +12,7 @@
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
+#define MAP_DIRECT 0x80000 /* shared, sealed, and no page cache */

/* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */

diff --git a/mm/mmap.c b/mm/mmap.c
index 386706831d67..32417b2a668c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1393,6 +1393,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
return -EACCES;

/*
+ * Require write access and the immutable
+ * capability for MAP_DIRECT mappings
+ */
+ if (flags & MAP_DIRECT) {
+ if (!(prot & PROT_WRITE))
+ return -EACCES;
+ if (!capable(CAP_LINUX_IMMUTABLE))
+ return -EACCES;
+ }
+
+ /*
* Make sure we don't allow writing to an append-only
* file..
*/
@@ -1411,6 +1422,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,

/* fall through */
case MAP_PRIVATE:
+ if ((flags & (MAP_PRIVATE|MAP_DIRECT))
+ == (MAP_PRIVATE|MAP_DIRECT))
+ return -EINVAL;
if (!(file->f_mode & FMODE_READ))
return -EACCES;
if (path_noexec(&file->f_path)) {
@@ -1448,6 +1462,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
default:
return -EINVAL;
}
+
+ if (flags & MAP_DIRECT)
+ return -EINVAL;
}

/*
@@ -1525,6 +1542,12 @@ SYSCALL_DEFINE6(mmap_pgoff_strict, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, pgoff)
{
+ /*
+ * since mmap flag definitions are spread over several files,
+ * sanity check new definitions here.
+ */
+ BUILD_BUG_ON((MAP_DIRECT & ~LEGACY_MAP_SUPPORTED_MASK) != MAP_DIRECT);
+
if (flags & ~(MAP_SUPPORTED_MASK))
return -EOPNOTSUPP;

Jan Kara

unread,
Aug 24, 2017, 9:10:18 AM8/24/17
to
On Wed 23-08-17 16:48:51, Dan Williams wrote:
> The mmap(2) syscall suffers from the ABI anti-pattern of not validating
> unknown flags. However, proposals like MAP_SYNC and MAP_DIRECT need a
> mechanism to define new behavior that is known to fail on older kernels
> without the support. Define a new mmap3 syscall that checks for
> unsupported flags at syscall entry and add a 'mmap_supported_mask' to
> 'struct file_operations' so generic code can validate the ->mmap()
> handler knows about the specified flags. This also arranges for the
> flags to be passed to the handler so it can do further local validation
> if the requested behavior can be fulfilled.
>
> Cc: Jan Kara <ja...@suse.cz>
> Cc: Arnd Bergmann <ar...@arndb.de>
> Cc: Andrew Morton <ak...@linux-foundation.org>
> Suggested-by: Andy Lutomirski <lu...@kernel.org>
> Signed-off-by: Dan Williams <dan.j.w...@intel.com>

OK, are we sold on this approach to introduction of new mmap flags? I'm
asking because working API for mmap flag is basically the only thing that's
missing from my MAP_SYNC patches so I'd like to rebase my patches onto
something that is working...

Honza
--
Jan Kara <ja...@suse.com>
SUSE Labs, CR

Christoph Hellwig

unread,
Aug 24, 2017, 12:10:09 PM8/24/17
to
This seems to be missing patches 1 and 3.

Christoph Hellwig

unread,
Aug 24, 2017, 12:20:09 PM8/24/17
to
I still can't make any sense of this description. What is an external
agent? Userspace obviously can't ever see a change in the extent
map, so it can't be meant.

It would help a lot if you could come up with a concrete user for this,
including example code.

Christoph Hellwig

unread,
Aug 24, 2017, 12:20:09 PM8/24/17
to
I'm still very unhappy about the get/set flag state. What is the
reason you can't use/extend leases? (take a look at the fcntl
man page and look for Leases). A variant of the concept is what
the pNFS block server uses.

Dan Williams

unread,
Aug 24, 2017, 12:30:08 PM8/24/17
to
On Thu, Aug 24, 2017 at 9:08 AM, Christoph Hellwig <h...@lst.de> wrote:
> This seems to be missing patches 1 and 3.

Sorry, I didn't cc you directly on those. They're on the list:

https://patchwork.kernel.org/patch/9918657/
https://patchwork.kernel.org/patch/9918663/

Christoph Hellwig

unread,
Aug 24, 2017, 12:40:17 PM8/24/17
to
On Thu, Aug 24, 2017 at 09:31:17AM -0700, Dan Williams wrote:
> External agent is a DMA device, or a hypervisor like Xen. In the DMA
> case perhaps we can use the fcntl lease mechanism, I'll investigate.
> In the Xen case it actually would need to use fiemap() to discover the
> physical addresses that back the file to setup their M2P tables.
> Here's the discussion where we discovered that physical address
> dependency:
>
> https://lists.xen.org/archives/html/xen-devel/2017-04/msg00419.html

fiemap does not work to discover physical addresses. If they want
to do anything involving physical address they will need a kernel
driver.

Dan Williams

unread,
Aug 24, 2017, 12:40:17 PM8/24/17
to
[ adding Xen ]

On Thu, Aug 24, 2017 at 9:11 AM, Christoph Hellwig <h...@lst.de> wrote:
> I still can't make any sense of this description. What is an external
> agent? Userspace obviously can't ever see a change in the extent
> map, so it can't be meant.

External agent is a DMA device, or a hypervisor like Xen. In the DMA
case perhaps we can use the fcntl lease mechanism, I'll investigate.
In the Xen case it actually would need to use fiemap() to discover the
physical addresses that back the file to setup their M2P tables.
Here's the discussion where we discovered that physical address
dependency:

https://lists.xen.org/archives/html/xen-devel/2017-04/msg00419.html

> It would help a lot if you could come up with a concrete user for this,
> including example code.

Will do.

Christoph Hellwig

unread,
Aug 24, 2017, 1:00:09 PM8/24/17
to
On Wed, Aug 23, 2017 at 04:48:40PM -0700, Dan Williams wrote:
> We are running running short of vma->vm_flags. We can avoid needing a
> new VM_* flag in some cases if the original @flags submitted to mmap(2)
> is made available to the ->mmap() 'struct file_operations'
> implementation. For example, the proposed addition of MAP_DIRECT can be
> implemented without taking up a new vm_flags bit. Another motivation to
> avoid vm_flags is that they appear in /proc/$pid/smaps, and we have seen
> software that tries to dangerously (TOCTOU) read smaps to infer the
> behavior of a virtual address range.
>
> This conversion was performed by the following semantic patch. There
> were a few manual edits for oddities like proc_reg_mmap.
>
> Thanks to Julia for helping me with coccinelle iteration to cover cases
> where the mmap routine is defined in a separate file from the 'struct
> file_operations' instance that consumes it.

How are we going to check that an instance actually supports any
of those flags?

Christoph Hellwig

unread,
Aug 24, 2017, 1:00:16 PM8/24/17
to
On Wed, Aug 23, 2017 at 04:48:51PM -0700, Dan Williams wrote:
> The mmap(2) syscall suffers from the ABI anti-pattern of not validating
> unknown flags. However, proposals like MAP_SYNC and MAP_DIRECT need a
> mechanism to define new behavior that is known to fail on older kernels
> without the support. Define a new mmap3 syscall that checks for
> unsupported flags at syscall entry and add a 'mmap_supported_mask' to
> 'struct file_operations' so generic code can validate the ->mmap()
> handler knows about the specified flags. This also arranges for the
> flags to be passed to the handler so it can do further local validation
> if the requested behavior can be fulfilled.

What is the reason to not go with __MAP_VALID hack? Adding new
syscalls is extremely painful, it will take forever to trickle this
through all architectures (especially with the various 32-bit
architectures having all kinds of different granularities for the
offset) and then the various C libraries, never mind applications.

Dan Williams

unread,
Aug 24, 2017, 1:40:09 PM8/24/17
to
I'll let Andy and Kirill restate their concerns, but one of the
arguments that swayed me is that any new mmap flag with this hack must
be documented to only work with MAP_SHARED and that MAP_PRIVATE is
silently ignored. I agree with the mess and delays it causes for other
archs and libc, but at the same time this is for new applications and
libraries that know to look for the new flag, so they need to do the
extra work to check for the new syscall.

However, if the fcntl lease approach works for the DMA cases then we
only have the one mmap flag to add for now, so maybe the weird
MAP_{SHARED|PRIVATE} semantics are tolerable.

Dan Williams

unread,
Aug 24, 2017, 1:50:09 PM8/24/17
to
In patch 3 I validate the flags by introducing an
"mmap_supported_mask" field to 'struct file_operations'. It will be
zero by default for almost all implementations and zero means "support
the legacy mmap flags".

Dan Williams

unread,
Aug 24, 2017, 4:30:08 PM8/24/17
to
True, it's broken with respect to multi-device filesystems and these
patches do nothing to fix that problem. Ok, I'm fine to let that use
case depend on a kernel driver and just focus on fixing the DMA case.

Dan Williams

unread,
Aug 25, 2017, 2:10:05 AM8/25/17
to
So I think leases could potentially be extended to replace the inode
flag. A MAP_DIRECT operation would take out a lease that is broken by
break_layouts(). However, like the pNFS case the lease break would
need to held off while any DMA might be in-flight. We can use an
elevated page count as that indication as ZONE_DEVICE pages only ever
have an elevated page count in response to get_user_pages().

However, I think the only practical difference is turning an immediate
ETXTBSY response that S_IOMAP_SEALED provides into an indefinite
blocking wait for break_layouts() to complete. Can pNFS run
break_layouts() in bounded time?

As far I can see a lease and S_IOMAP_SEALED have the same DMA
cancelling problem, so a lease is not better in that regard. Absent an
overlaying protocol like pNFS, I think S_IOMAP_SEALED is cleaner
because it fails incompatible operations outright rather than stalls
them in break_layouts(). Were their other benefits to a lease over an
inode flag that you had in mind for this case where the protocol is
userspace defined? Maybe I'm thinking too small on the ways a lease
might be extended.

Christoph Hellwig

unread,
Aug 25, 2017, 9:10:09 AM8/25/17
to
On Thu, Aug 24, 2017 at 10:36:02AM -0700, Dan Williams wrote:
> I'll let Andy and Kirill restate their concerns, but one of the
> arguments that swayed me is that any new mmap flag with this hack must
> be documented to only work with MAP_SHARED and that MAP_PRIVATE is
> silently ignored. I agree with the mess and delays it causes for other
> archs and libc, but at the same time this is for new applications and
> libraries that know to look for the new flag, so they need to do the
> extra work to check for the new syscall.

True. That is for the original hack, but I spent some more time
looking at the mmap code, and there is one thing I noticed:

include/uapi/asm-generic/mman-common.h:

#define MAP_SHARED 0x01 /* Share changes */
#define MAP_PRIVATE 0x02 /* Changes are private */
#define MAP_TYPE 0x0f /* Mask for type of mapping */

mm/mmap.c:

if (file) {
struct inode *inode = file_inode(file);

switch (flags & MAP_TYPE) {
case MAP_SHARED:
...
case MAP_PRIVATE:
...
default:
return -EINVAL;
}

and very similar for the anonymous and nommu cases.

So if we pick e.g. 0x4 as the valid bit we don't even need to overload
the MAP_SHARED and MAP_PRIVATE meaning.

>
> However, if the fcntl lease approach works for the DMA cases then we
> only have the one mmap flag to add for now, so maybe the weird
> MAP_{SHARED|PRIVATE} semantics are tolerable.
---end quoted text---

Kirill A. Shutemov

unread,
Aug 25, 2017, 12:00:08 PM8/25/17
to
Not all archs are ready for this:

arch/parisc/include/uapi/asm/mman.h:#define MAP_TYPE 0x03 /* Mask for type of mapping */
arch/parisc/include/uapi/asm/mman.h:#define MAP_FIXED 0x04 /* Interpret addr exactly */

--
Kirill A. Shutemov

Christoph Hellwig

unread,
Aug 25, 2017, 12:10:08 PM8/25/17
to
On Fri, Aug 25, 2017 at 06:58:03PM +0300, Kirill A. Shutemov wrote:
> Not all archs are ready for this:
>
> arch/parisc/include/uapi/asm/mman.h:#define MAP_TYPE 0x03 /* Mask for type of mapping */
> arch/parisc/include/uapi/asm/mman.h:#define MAP_FIXED 0x04 /* Interpret addr exactly */

I'd be happy to say that we should not care about parisc for
persistent memory. We'll just have to find a way to exclude
parisc without making life too ugly.

Kirill A. Shutemov

unread,
Aug 25, 2017, 12:20:11 PM8/25/17
to
I don't think creapling mmap() interface for one arch is the right way to
go. I think the interface should be universal.

I may imagine MAP_DIRECT can be useful not only for persistent memory.
For tmpfs instead of mlock()?

--
Kirill A. Shutemov

Helge Deller

unread,
Aug 25, 2017, 12:30:09 PM8/25/17
to
On parisc we have
#define MAP_SHARED 0x01 /* Share changes */
#define MAP_PRIVATE 0x02 /* Changes are private */
#define MAP_TYPE 0x03 /* Mask for type of mapping */
#define MAP_FIXED 0x04 /* Interpret addr exactly */
#define MAP_ANONYMOUS 0x10 /* don't use a file */

So, if you need a MAP_DIRECT, wouldn't e.g.
#define MAP_DIRECT 0x08
be possible (for parisc, and others 0x04).
And if MAP_TYPE needs to include this flag on parisc:
#define MAP_TYPE (0x03 | 0x08) /* Mask for type of mapping */

Helge

Kirill A. Shutemov

unread,
Aug 25, 2017, 1:00:09 PM8/25/17
to
I guess it's better to re-define MAP_TYPE as 0x3 everywhere and make
MAP_DIRECT a normal flag. It's not new type of mapping anyway.

--
Kirill A. Shutemov

Dan Williams

unread,
Aug 25, 2017, 3:50:11 PM8/25/17
to
At a minimum I can at least use a new lease type as an indication of
when to bail out an block-map operation with ETXTBSY, and reuse the
lease security model. That way we at least start to converge the
in-kernel lease machinery for pinning blocks with this userspace
mechanism.

Dan Williams

unread,
Aug 25, 2017, 4:30:09 PM8/25/17
to
The problem here is that to support new the mmap flags the arch needs
to find a flag that is guaranteed to fail on older kernels. Defining
MAP_DIRECT to 0x8 on parisc doesn't work because it will simply be
ignored on older parisc kernels.

However, it's already the case that several archs have their own
sys_mmap entry points. Those archs that can't follow the common scheme
(only parsic it seems) will need to add a new mmap syscall. I think
that's a reasonable tradeoff to allow every other architecture to add
this support with their existing mmap syscall paths.

That means MAP_DIRECT should be defined to MAP_TYPE on parisc until it
later defines an opt-in mechanism to a new syscall that honors
MAP_DIRECT as a valid flag.

Helge Deller

unread,
Aug 26, 2017, 3:50:09 AM8/26/17
to
* Dan Williams <dan.j.w...@intel.com>:
I don't want other architectures to suffer just because of parisc.
But adding a new syscall just for usage on parisc won't work either,
because nobody will add code to call it then.

> That means MAP_DIRECT should be defined to MAP_TYPE on parisc until it
> later defines an opt-in mechanism to a new syscall that honors
> MAP_DIRECT as a valid flag.

I'd instead propose to to introduce an ABI breakage for parisc users
(which aren't many). Most parisc users update their kernel regularily
anyway, because we fixed so many bugs in the latest kernel.

With the following patch pushed down to the stable kernel series,
MAP_DIRECT will fail as expected on those kernels, while we can
keep parisc up with current developments regarding MAP_DIRECT.

diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 9a9c2fe..43b9a1e 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -13,6 +13,7 @@
#define MAP_PRIVATE 0x02 /* Changes are private */
#define MAP_TYPE 0x03 /* Mask for type of mapping */
#define MAP_FIXED 0x04 /* Interpret addr exactly */
+#define MAP_DIRECT 0x08 /* Interpret addr exactly */
#define MAP_ANONYMOUS 0x10 /* don't use a file */

#define MAP_DENYWRITE 0x0800 /* ETXTBSY */
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index 378a754..0499f87 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -270,6 +270,10 @@ asmlinkage unsigned long sys_mmap2(unsigned long addr, unsigned long len,
{
/* Make sure the shift for mmap2 is constant (12), no matter what PAGE_SIZE
we have. */
+#if !defined(CONFIG_HAVE_MAP_DIRECT_SUPPORT)
+ if (flags & MAP_DIRECT)
+ return -EINVAL;
+#endif
return sys_mmap_pgoff(addr, len, prot, flags, fd,
pgoff >> (PAGE_SHIFT - 12));
}
@@ -278,6 +282,10 @@ asmlinkage unsigned long sys_mmap(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags, unsigned long fd,
unsigned long offset)
{
+#if !defined(CONFIG_HAVE_MAP_DIRECT_SUPPORT)
+ if (flags & MAP_DIRECT)
+ return -EINVAL;
+#endif
if (!(offset & ~PAGE_MASK)) {
return sys_mmap_pgoff(addr, len, prot, flags, fd,
offset >> PAGE_SHIFT);


Helge

Dan Williams

unread,
Aug 26, 2017, 11:20:09 AM8/26/17
to
I don't understand this comment, if / when parisc gets around to
adding pmem and dax support why wouldn't libc grow support for the new
parisc mmap variant? Also, it's not just MAP_DIRECT you would also
need space for a MAP_SYNC flag.

>> That means MAP_DIRECT should be defined to MAP_TYPE on parisc until it
>> later defines an opt-in mechanism to a new syscall that honors
>> MAP_DIRECT as a valid flag.
>
> I'd instead propose to to introduce an ABI breakage for parisc users
> (which aren't many). Most parisc users update their kernel regularily
> anyway, because we fixed so many bugs in the latest kernel.
>
> With the following patch pushed down to the stable kernel series,
> MAP_DIRECT will fail as expected on those kernels, while we can
> keep parisc up with current developments regarding MAP_DIRECT.

The whole point is to avoid an ABI regression and the chance for false
positive results. We're immediately stuck if some application was
expecting 0x8 to be ignored, or conversely an application that
absolutely needs to rely on MAP_SYNC/MAP_DIRECT semantics assumes the
wrong result on a parisc kernel where they are ignored.

I have not seen any patches for parisc pmem+dax enabling so it seems
too early to worry about these "last mile" enabling features of
MAP_DIRECT and MAP_SYNC. In particular parisc doesn't appear to have
ARCH_ENABLE_MEMORY_HOTPLUG, so as far as I can see it can't yet
support the ZONE_DEVICE scheme that is a pre-requisite for MAP_DIRECT.

Helge Deller

unread,
Aug 26, 2017, 4:00:10 PM8/26/17
to
I see, but then it's probably best to not to define any MAP_DIRECT or
MAP_SYNC at all in the headers of those arches which don't support
pmem+dax (parisc, m68k, alpha, and probably quite some others).
That way applications can detect at configure time if the platform
supports that, and can leave out the functionality completely.

Helge

Dan Williams

unread,
Aug 26, 2017, 6:50:09 PM8/26/17
to
On Sat, Aug 26, 2017 at 12:50 PM, Helge Deller <del...@gmx.de> wrote:
> On 26.08.2017 17:15, Dan Williams wrote:
[..]
>> I have not seen any patches for parisc pmem+dax enabling so it seems
>> too early to worry about these "last mile" enabling features of
>> MAP_DIRECT and MAP_SYNC. In particular parisc doesn't appear to have
>> ARCH_ENABLE_MEMORY_HOTPLUG, so as far as I can see it can't yet
>> support the ZONE_DEVICE scheme that is a pre-requisite for MAP_DIRECT.
>
> I see, but then it's probably best to not to define any MAP_DIRECT or
> MAP_SYNC at all in the headers of those arches which don't support
> pmem+dax (parisc, m68k, alpha, and probably quite some others).
> That way applications can detect at configure time if the platform
> supports that, and can leave out the functionality completely.

Yes, that's a good idea we can handle this similar to
CONFIG_MMAP_ALLOW_UNINITIALIZED. These patches will also modify
'struct file_operations' so that do_mmap() can validate whether a flag
is supported on per architecture basis. Also the plan is to plumb the
flags passed to the syscall all the way down to the individual mmap
implementations. The ext4 and xfs ->mmap() operations will be able to
return -EOPNOTSUP based on runtime variables.

Kirill A. Shutemov

unread,
Aug 26, 2017, 8:00:09 PM8/26/17
to
BTW, we may be able to reuse the bit used for MAP_UNINITIALIZED -- it's
only used on !MMU machines.

--
Kirill A. Shutemov
0 new messages