Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] mm: mmap system call does not return EOVERFLOW

73 views
Skip to first unread message

Naotaka Hamaguchi

unread,
Dec 22, 2011, 4:40:01 AM12/22/11
to
In the system call mmap(), if the value of "offset" plus "length"
exceeds the offset maximum of "off_t", the error EOVERFLOW should be
returned.

------------------------------------------------------------------------
void *mmap(void *addr, size_t length, int prot, int flags,
int fd, off_t offset)
------------------------------------------------------------------------

Here is the detail how EOVERFLOW is returned:

The argument "offset" is shifted right by PAGE_SHIFT bits
in sys_mmap(mmap systemcall).

------------------------------------------------------------------------
sys_mmap(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long off)
{
error = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
}
------------------------------------------------------------------------

In sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff), do_mmap_pgoff()
is called as follows:

------------------------------------------------------------------------
sys_mmap_pgoff(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff)
{
retval = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
}
------------------------------------------------------------------------

In do_mmap_pgoff(file, addr, len, prot, flags, pgoff),
the code path which returns with the error EOVERFLOW exists already.

------------------------------------------------------------------------
do_mmap_pgoff(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot,
unsigned long flags, unsigned long pgoff)
{
if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
return -EOVERFLOW;
}
------------------------------------------------------------------------

However, in this case, giving off=0xfffffffffffff000 and
len=0xfffffffffffff000 on x86_64 arch, EOVERFLOW is not
returned. It is because the argument, "off" and "len" are shifted right
by PAGE_SHIFT bits and thus the condition "(pgoff + (len >> PAGE_SHIFT)) < pgoff"
never becomes true.

To fix this bug, it is necessary to compare "off" plus "len"
with "off" by units of "off_t". The patch is here:

Signed-off-by: Naotaka Hamaguchi <n.ham...@jp.fujitsu.com>
---
mm/mmap.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index eae90af..e74e736 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -948,6 +948,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
vm_flags_t vm_flags;
int error;
unsigned long reqprot = prot;
+ off_t off = pgoff << PAGE_SHIFT;

/*
* Does the application expect PROT_READ to imply PROT_EXEC?
@@ -971,7 +972,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
return -ENOMEM;

/* offset overflow? */
- if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
+ if ((off + len) < off)
return -EOVERFLOW;

/* Too many mappings? */
--
1.7.7.4
---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

KOSAKI Motohiro

unread,
Dec 22, 2011, 12:00:03 PM12/22/11
to
> To fix this bug, it is necessary to compare "off" plus "len"
> with "off" by units of "off_t". The patch is here:
>
> Signed-off-by: Naotaka Hamaguchi <n.ham...@jp.fujitsu.com>
> ---
>  mm/mmap.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index eae90af..e74e736 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -948,6 +948,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
>        vm_flags_t vm_flags;
>        int error;
>        unsigned long reqprot = prot;
> +       off_t off = pgoff << PAGE_SHIFT;
>
>        /*
>         * Does the application expect PROT_READ to imply PROT_EXEC?
> @@ -971,7 +972,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
>                return -ENOMEM;
>
>        /* offset overflow? */
> -       if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
> +       if ((off + len) < off)
>                return -EOVERFLOW;

Hmm...
pgoff doesn't make actual overflow. do_mmap_pgoff() can calculate big
value. We have
no reason to make artificial limit. Why don't you meke a overflow
check in sys_mmap()?

KOSAKI Motohiro

unread,
Dec 22, 2011, 12:50:01 PM12/22/11
to
> The argument "offset" is shifted right by PAGE_SHIFT bits
> in sys_mmap(mmap systemcall).
>
> ------------------------------------------------------------------------
> sys_mmap(unsigned long addr, unsigned long len,
> unsigned long prot, unsigned long flags,
> unsigned long fd, unsigned long off)
> {
> error = sys_mmap_pgoff(addr, len, prot, flags, fd, off>> PAGE_SHIFT);
> }
> ------------------------------------------------------------------------

Hm.
Which version are you looking at? Current code seems to don't have
sys_mmap().

Naotaka Hamaguchi

unread,
Dec 27, 2011, 1:30:01 AM12/27/11
to
Hi, Kosaki-san

> Which version are you looking at? Current code seems to don't have
> sys_mmap().

This sys_mmap() means the entrance of mmap system call for x86_64.

----------------------------------------------------------------------
arch/x86/kernel/sys_x86_64.c:
84 SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
85 unsigned long, prot, unsigned long, flags,
86 unsigned long, fd, unsigned long, off)
87 {
88 long error;
89 error = -EINVAL;
90 if (off & ~PAGE_MASK)
91 goto out;
92
93 error = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
94 out:
95 return error;
96 }
----------------------------------------------------------------------

This function calls sys_mmap_pgoff, which has the argument
"off >> PAGE_SHIFT". It means that sys_mmap_pgoff does not use off,
which is the argument of sys_mmap, with no change, but uses the value
obtained after off is shifted right by PAGE_SHIFT bits.

In mmap system call for x86, the following sys_mmap_pgoff is the
entrance in kernel.

----------------------------------------------------------------------
arch/x86/kernel/syscall_table_32.S:
...
194 .long sys_mmap_pgoff
...

mm/mmap.c:
1080 SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
1081 unsigned long, prot, unsigned long, flags,
1082 unsigned long, fd, unsigned long, pgoff)
...
1111 down_write(&current->mm->mmap_sem);
1112 retval = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
1113 up_write(&current->mm->mmap_sem);
----------------------------------------------------------------------

> value. We have
> no reason to make artificial limit. Why don't you meke a overflow
> check in sys_mmap()?

I consider it is better to make an overflow check in do_mmap_pgoff.
There are two reasons:

1. If we make an overflow check in the entrance of system call, we
have to check in sys_mmap for x86_64 and in sys_mmap_pgoff for
x86. It means that we have to check for each architecture
individually. Therefore, it is more effective to make an
overflow check in do_mmap_pgoff because both sys_mmap and
sys_mmap_pgoff call do_mmap_pgoff.

2. Because the argument "offset" of sys_mmap is a multiple
of the page size(otherwise, EINVAL is returned.), no information
is lost after shifting right by PAGE_SHIFT bits. Therefore
to make an overflow check in do_mmap_pgoff is equivalent
to check in sys_mmap.

Best Regards,
Naotaka Hamaguchi

KOSAKI Motohiro

unread,
Dec 27, 2011, 9:10:02 PM12/27/11
to
arch/x86/include/asm/posix_types_32.h
---------------------------------------------
typedef long __kernel_off_t;


So, your patch introduce 2GB limitation to 32bit arch. It makes no sense.



> 2. Because the argument "offset" of sys_mmap is a multiple
>   of the page size(otherwise, EINVAL is returned.), no information
>   is lost after shifting right by PAGE_SHIFT bits. Therefore
>   to make an overflow check in do_mmap_pgoff is equivalent
>   to check in sys_mmap.
0 new messages