[PATCH] vfs: fix IMA lockdep circular locking dependency

Mimi Zohar

unread,

May 13, 2012, 10:48:02 PM5/13/12

to linux-secu...@vger.kernel.org, Mimi Zohar, linux-...@vger.kernel.org, Al Viro, Linus Torvalds, Mimi Zohar

From: Mimi Zohar <zo...@linux.vnet.ibm.com>

This patch has been updated to move the ima_file_mmap() call from
security_file_mmap() to the new vm_mmap() function.

---

The circular lockdep is caused by allocating the 'iint' for mmapped
files. Originally when an 'iint' was allocated for every inode
in inode_alloc_security(), before the inode was accessible, no
locking was necessary. Commits bc7d2a3e and 196f518 changed this
behavior and allocated the 'iint' on a per need basis, resulting in
the mmap_sem being taken before the i_mutex for mmapped files.

Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(&sb->s_type->i_mutex_key);
lock(&mm->mmap_sem);
lock(&sb->s_type->i_mutex_key);

The existing security_file_mmap() hook is after the mmap_sem is taken.
This patch moves the ima_file_mmap() call from security_file_mmap() to
prior to the mmap_sem being taken.

Changelog v2:
- With commit "6be5ceb VM: add "vm_mmap()" helper function", moving the
ima_file_mmap() call is simplified. This patch moves the ima_file_mmap()
call to vm_mmap(), and to binfmt_elf.c: elf_map().

Changelog v1:
- Instead of just pre-allocating the iint in a new hook, do ALL of the
work in the new/moved ima_file_mmap() hook. (Based on Eric Paris' suggestion.)
- Removed do_mmap_with_sem() helper function.
- export ima_file_mmap()

Signed-off-by: Mimi Zohar <zo...@us.ibm.com>
Acked-by: Eric Paris <epa...@redhat.com>
---
fs/binfmt_elf.c | 6 ++++++
mm/mmap.c | 11 +++++++++++
mm/nommu.c | 11 +++++++++++
security/integrity/ima/ima_main.c | 1 +
security/security.c | 7 +------
5 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 16f7354..0d0514f 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -28,6 +28,7 @@
#include <linux/highmem.h>
#include <linux/pagemap.h>
#include <linux/security.h>
+#include <linux/ima.h>
#include <linux/random.h>
#include <linux/elf.h>
#include <linux/utsname.h>
@@ -321,6 +322,7 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
unsigned long map_addr;
unsigned long size = eppnt->p_filesz + ELF_PAGEOFFSET(eppnt->p_vaddr);
unsigned long off = eppnt->p_offset - ELF_PAGEOFFSET(eppnt->p_vaddr);
+ unsigned long ret;
addr = ELF_PAGESTART(addr);
size = ELF_PAGEALIGN(size);

@@ -329,6 +331,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
if (!size)
return addr;

+ ret = ima_file_mmap(filep, prot);
+ if (ret < 0)
+ return ret;
+
down_write(&current->mm->mmap_sem);
/*
* total_size is the size of the ELF (interpreter) image.
diff --git a/mm/mmap.c b/mm/mmap.c
index 848ef52..87e6948 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -20,6 +20,7 @@
#include <linux/fs.h>
#include <linux/personality.h>
#include <linux/security.h>
+#include <linux/ima.h>
#include <linux/hugetlb.h>
#include <linux/profile.h>
#include <linux/export.h>
@@ -1109,9 +1110,14 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
unsigned long ret;
struct mm_struct *mm = current->mm;

+ ret = ima_file_mmap(file, prot);
+ if (ret < 0)
+ goto err;
+
down_write(&mm->mmap_sem);
ret = do_mmap(file, addr, len, prot, flag, offset);
up_write(&mm->mmap_sem);
+err:
return ret;
}
EXPORT_SYMBOL(vm_mmap);
@@ -1147,10 +1153,15 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,

flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);

+ retval = ima_file_mmap(file, prot);
+ if (retval < 0)
+ goto err_out;
+
down_write(&current->mm->mmap_sem);
retval = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
up_write(&current->mm->mmap_sem);

+err_out:
if (file)
fput(file);
out:
diff --git a/mm/nommu.c b/mm/nommu.c
index bb8f4f0..2b13bd3 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -27,6 +27,7 @@
#include <linux/mount.h>
#include <linux/personality.h>
#include <linux/security.h>
+#include <linux/ima.h>
#include <linux/syscalls.h>
#include <linux/audit.h>

@@ -1490,9 +1491,14 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
unsigned long ret;
struct mm_struct *mm = current->mm;

+ ret = ima_file_mmap(file, prot);
+ if (ret < 0)
+ goto err;
+
down_write(&mm->mmap_sem);
ret = do_mmap(file, addr, len, prot, flag, offset);
up_write(&mm->mmap_sem);
+err:
return ret;
}
EXPORT_SYMBOL(vm_mmap);
@@ -1513,10 +1519,15 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,

flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);

+ retval = ima_file_mmap(file, prot);
+ if (retval < 0)
+ goto err_out;
+
down_write(&current->mm->mmap_sem);
retval = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
up_write(&current->mm->mmap_sem);

+err_out:
if (file)
fput(file);
out:
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index b17be79..91eda7f 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -176,6 +176,7 @@ int ima_file_mmap(struct file *file, unsigned long prot)
MAY_EXEC, FILE_MMAP);
return 0;
}
+EXPORT_SYMBOL_GPL(ima_file_mmap);

/**
* ima_bprm_check - based on policy, collect/store measurement.
diff --git a/security/security.c b/security/security.c
index bf619ff..e50bbf4 100644
--- a/security/security.c
+++ b/security/security.c
@@ -661,12 +661,7 @@ int security_file_mmap(struct file *file, unsigned long reqprot,
unsigned long prot, unsigned long flags,
unsigned long addr, unsigned long addr_only)
{
- int ret;
-
- ret = security_ops->file_mmap(file, reqprot, prot, flags, addr, addr_only);
- if (ret)
- return ret;
- return ima_file_mmap(file, prot);
+ return security_ops->file_mmap(file, reqprot, prot, flags, addr, addr_only);
}

int security_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot,
--
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Mimi Zohar

unread,

May 14, 2012, 8:52:33 PM5/14/12

to James Morris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org, Al Viro, Linus Torvalds

On Tue, 2012-05-15 at 10:29 +1000, James Morris wrote:

> On Sun, 13 May 2012, Mimi Zohar wrote:
>
> > From: Mimi Zohar <zo...@linux.vnet.ibm.com>
> >
> > This patch has been updated to move the ima_file_mmap() call from
> > security_file_mmap() to the new vm_mmap() function.
>

> Which kernel is this for?

Commit #6be5ceb(VM: add "vm_mmap()" helper function) is in Linus' master
tree, but the posted patch is against linux-security/next.

Mimi

James Morris

unread,

May 15, 2012, 11:23:20 AM5/15/12

to Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org, Al Viro, Linus Torvalds

On Mon, 14 May 2012, Mimi Zohar wrote:

> On Tue, 2012-05-15 at 10:29 +1000, James Morris wrote:
> > On Sun, 13 May 2012, Mimi Zohar wrote:
> >
> > > From: Mimi Zohar <zo...@linux.vnet.ibm.com>
> > >
> > > This patch has been updated to move the ima_file_mmap() call from
> > > security_file_mmap() to the new vm_mmap() function.
> >
> > Which kernel is this for?
>
> Commit #6be5ceb(VM: add "vm_mmap()" helper function) is in Linus' master
> tree, but the posted patch is against linux-security/next.

I'm still not sure what the answer is :-)

Can this wait until 3.5-rc1?

--
James Morris
<jmo...@namei.org>

Mimi Zohar

unread,

May 15, 2012, 12:08:56 PM5/15/12

to James Morris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org, Al Viro, Linus Torvalds

On Wed, 2012-05-16 at 01:14 +1000, James Morris wrote:
> On Mon, 14 May 2012, Mimi Zohar wrote:
>
> > On Tue, 2012-05-15 at 10:29 +1000, James Morris wrote:
> > > On Sun, 13 May 2012, Mimi Zohar wrote:
> > >
> > > > From: Mimi Zohar <zo...@linux.vnet.ibm.com>
> > > >
> > > > This patch has been updated to move the ima_file_mmap() call from
> > > > security_file_mmap() to the new vm_mmap() function.
> > >
> > > Which kernel is this for?
> >
> > Commit #6be5ceb(VM: add "vm_mmap()" helper function) is in Linus' master
> > tree, but the posted patch is against linux-security/next.
>
> I'm still not sure what the answer is :-)
>
> Can this wait until 3.5-rc1?

The patch is a bug fix for the mmap_sem/i_mutex locking problem that has
existed for a quite a while. (The previous version of the patch has
been waiting since January to be Acked.) The current version of the
patch applies cleanly to both your #master and #next trees. As it
hasn't been in linux-next, I'll leave it to your discretion how/when to
upstream it.

thanks,

Mimi

Linus Torvalds

unread,

May 15, 2012, 1:20:16 PM5/15/12

to Mimi Zohar, linux-secu...@vger.kernel.org, Mimi Zohar, linux-...@vger.kernel.org, Al Viro

On Sun, May 13, 2012 at 7:47 PM, Mimi Zohar <zo...@us.ibm.com> wrote:
> From: Mimi Zohar <zo...@linux.vnet.ibm.com>
>
> This patch has been updated to move the ima_file_mmap() call from
> security_file_mmap() to the new vm_mmap() function.

Ugh. I really have to admit to hating this.

Quite frankly, I'd much rather apply a patch that moved the call to
security_file_mmap() entirely outside the mmap_sem instead.

You did basically that, but you only moved the ima_file_mmap()
portion. Why not move it *all*?

Sure, that changes semantics. But does the "security_ops->file_mmap()"
function really need mmap_sem protection?

As far as I can tell, the *only* thing that the security layer tends
to care about is that address (ie the whole "dac_mmap_min_addr"
thing), or the file.

And the *file* doesn't need or want any mmap_sem protection.

The actual final address does "need" the mmap_sem, but in fact none of
the security models really care, except for the obvious NULL mapping
thing. And that can only happen with MAP_FIXED *or* when a person
gives an explicit suggested address.

So I would suggest:

- never test the default mmap address case (ie the case of a NULL
'addr' without PROT_FIXED set).

- move the whole call to security_file_mmap() to outside the
mmap_sem, and test the *suggested* address (which is not the same as
the final address)

Yes, this makes the assumption that arch_get_unmapped_area() will not
return a bad address. We'd have to think about that. I already found
one possible worry, where the default arch_get_unmapped_area() does
this:

if (addr) {
addr = PAGE_ALIGN(addr);

and maybe we need to make sure that that PAGE_ALIGN() does not
overflow into 0. Things like that (probably right thing to do: make
sure that 'addr' is already aligned when checking security), but the
point being that it looks like a *really* bad idea to require us
holding the mmap_sem() for this silly security check, when we could
just do it up-front because nobody really cares.

Ok?

I would also actually suggest that we move the "cap_file_mmap()" call
from the security model ->file_mmap() function into
"security_file_mmap()", so that all the security models don't even
have to remember to do that. Because a security model that *doesn't*
do the dac_mmap_min_addr comparison really is broken (we used to have
that bug in SELinux).

What do you think?

Linus

Mimi Zohar

unread,

May 15, 2012, 2:37:28 PM5/15/12

to Linus Torvalds, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org, Al Viro, Stephen Smalley, Eric Paris, Serge Hallyn, John Johansen, Casey Schaufler

On Tue, 2012-05-15 at 10:19 -0700, Linus Torvalds wrote:
> On Sun, May 13, 2012 at 7:47 PM, Mimi Zohar <zo...@us.ibm.com> wrote:
> > From: Mimi Zohar <zo...@linux.vnet.ibm.com>
> >
> > This patch has been updated to move the ima_file_mmap() call from
> > security_file_mmap() to the new vm_mmap() function.
>
> Ugh. I really have to admit to hating this.
>
> Quite frankly, I'd much rather apply a patch that moved the call to
> security_file_mmap() entirely outside the mmap_sem instead.
>
> You did basically that, but you only moved the ima_file_mmap()
> portion. Why not move it *all*?

As moving the ima_file_mmap() before taking the mmap_sem is not optimal,
I never even considered moving the security_file_mmap() hook itself, nor
was this suggestion ever made during the initial RFC discussions last
fall.

Unlike execve, which locks the file before calling the security hook,
here the file locking occurs afterwards. So, instead of moving the
ima_file_mmap() hook closer to where the file is locked, this patch
moves it even farther away.

> Sure, that changes semantics. But does the "security_ops->file_mmap()"
> function really need mmap_sem protection?

Having this sort of discussion should probably include the LSM hook
users. With the Subject line above, not sure how many of them are
reading this. Am cc'ing others...

Sorry, need to review usage, before commenting ...

Mimi

Linus Torvalds

unread,

May 15, 2012, 2:41:37 PM5/15/12

to Mimi Zohar, linux-secu...@vger.kernel.org, Mimi Zohar, linux-...@vger.kernel.org, Al Viro

On Tue, May 15, 2012 at 10:19 AM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
>
> - move the whole call to security_file_mmap() to outside the
> mmap_sem, and test the *suggested* address (which is not the same as
> the final address)

Actually, I think I have a simpler approach.

We already actually have two *different* security_file_mmap() calls:
it's just that currently the difference is shown by the last argument
to the function ("addr_only").

Which is totally static by call-site.

And the ones that have addr_only set are all inside the mmap_sem region.

So here's my new suggestion, which avoids the semantic changes: let's
just split that "security_file_mmap()" call into two. One that checks
the file, and one that checks the address.

And we'll continue to check the address within the mmap_sem region,
but nobody will care, because that one doesn't need any other locking.

So we'll just split the file check - along with the ima_file_mmap() -
into the "security_mmap_file()" function (no "addr_only" argument),
and move that outside the lock.

And then the address-only check (let's call it "security_mmap_addr()")
we leave at the place we *currently* do our security_file_mmap()
check, along with (obviously) the calls that currently have
"addr_only=1" and only give the address anyway).

That cleans up the whole nasty piece of crap calling convention too,
getting rid of idiotic unreadable code like

ret = security_file_mmap(NULL, 0, 0, 0, new_addr, 1);

and replacing it with

ret = security_mmap_addr(new_addr);

which actually shows what it is doing.

Hmm? That looks like a fairly mechanical conversion, and we could
start with Mimi's patch that already splits it up that way (except it
calls "security_mmap_file()" ima_file_mmap() and doesn't call the LSM
function pointer there)

Eric Paris

unread,

May 15, 2012, 3:43:10 PM5/15/12

to Linus Torvalds, Mimi Zohar, linux-secu...@vger.kernel.org, Mimi Zohar, linux-...@vger.kernel.org, Al Viro

On Tue, May 15, 2012 at 2:41 PM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
> On Tue, May 15, 2012 at 10:19 AM, Linus Torvalds
> <torv...@linux-foundation.org> wrote:
>>
>> - move the whole call to security_file_mmap() to outside the
>> mmap_sem, and test the *suggested* address (which is not the same as
>> the final address)
>
> Actually, I think I have a simpler approach.
>
> We already actually have two *different* security_file_mmap() calls:
> it's just that currently the difference is shown by the last argument
> to the function ("addr_only").

I'm the one who introduced that bit of horrific. I originally did it
the way you describe and someone (it was a long time ago, and I think
it was Ted Tso, but I am probably very very wrong on that) ask me to
tack it on the end like this. I'd be very happy with the split you
describe.

I'd rather not, however, move the address call site like you described
above, as I don't want to allow NULL + ~MAP_FIXED to be tested until
it has been resolved to a real address. I don't want someone to find
a way to get the kernel to choose 4096 and avoid the check....

Mimi, would you like to do this (slightly) larger change? Should I?

Mimi Zohar

unread,

May 15, 2012, 4:12:50 PM5/15/12

to Eric Paris, Linus Torvalds, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org, Al Viro

On Tue, 2012-05-15 at 15:42 -0400, Eric Paris wrote:
> On Tue, May 15, 2012 at 2:41 PM, Linus Torvalds
> <torv...@linux-foundation.org> wrote:
> > On Tue, May 15, 2012 at 10:19 AM, Linus Torvalds
> > <torv...@linux-foundation.org> wrote:
> >>
> >> - move the whole call to security_file_mmap() to outside the
> >> mmap_sem, and test the *suggested* address (which is not the same as
> >> the final address)
> >
> > Actually, I think I have a simpler approach.
> >
> > We already actually have two *different* security_file_mmap() calls:
> > it's just that currently the difference is shown by the last argument
> > to the function ("addr_only").
>
> I'm the one who introduced that bit of horrific. I originally did it
> the way you describe and someone (it was a long time ago, and I think
> it was Ted Tso, but I am probably very very wrong on that) ask me to
> tack it on the end like this. I'd be very happy with the split you
> describe.
>
> I'd rather not, however, move the address call site like you described
> above, as I don't want to allow NULL + ~MAP_FIXED to be tested until
> it has been resolved to a real address. I don't want someone to find
> a way to get the kernel to choose 4096 and avoid the check....

If the security_file_mmap() moves to before taking the mmap_sem and the
new security_mmap_addr() hook would be at the current
security_file_mmap() location, I don't see the problem. The addr test
would remain in the same place.

> Mimi, would you like to do this (slightly) larger change? Should I?

np, I'll make the changes.

Mimi

Linus Torvalds

unread,

May 15, 2012, 8:38:15 PM5/15/12

to Mimi Zohar, Eric Paris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org, Al Viro

So here's a COMPLETELY UNTESTED patch that does what I think might be
the right thing.

To simplify the calling convention, I'm just making the security layer
compute the whole "requested protections" vs "actual protections". It
might be a good idea to have a helper function to do that, of course,
but that sounds like an independent cleanup (the mprotect() code also
has this same logic, and does it incompletely).

It does change some things, like say that "->mmap_file()" is only ever
called for actual files, not for anonymous mappings. It doesn't seem
to be sensible to have a security model for anonymous mappings -
there's nothing there to really target. Whatever.

It also makes security_mmap_addr() always call the standard capability
check (*after* having called the security model version), so the
security model no longer needs to care. All of them did seem to do so.

And again: this is totally untested. I'm not committing this, it's
more of a "hey, I tried it out, might as well send it out for
comments" thing.

Linus

Linus Torvalds

unread,

May 15, 2012, 9:54:02 PM5/15/12

to Al Viro, Mimi Zohar, Eric Paris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Tue, May 15, 2012 at 5:45 PM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
>
> Oops. Err.. I meant to do that. Right. To check that you're awake.
>
> Here.

Btw, a few notes:

- this adds the new security_mmap_file() calls to the same places
that Mimi added the ima_file_mmap() call.

HOWEVER. There are other callers that call do_mmap[_pgoff].
Notably, do_shmat(). IMA probably doesn't care, but other security
models might - even if the file in question there will be one just
allocated by the shm layer.

So it does change semantics a bit, in that not only does the
->mmap_file() function no longer get called when 'file' is NULL, it
also doesn't get called with a couple of special files (the other one
I noticed was the i810 driver doing it's do_mmap() dance)

- the fact that we don't call the security function for a NULL-file
mmap() is actually consistent with what do_brk() has always done (it
used to do "addr_only=1", now it obviously does just
security_mmap_addr() instead). So that part looks sane, and no
security model can possibly care.

- I booted the patch. Not that I actually *tested* it any real way,
but at least it boots and doesn't seem to break anything obvious.

Hmm?

Al Viro

unread,

May 15, 2012, 10:18:49 PM5/15/12

to Linus Torvalds, Mimi Zohar, Eric Paris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Tue, May 15, 2012 at 05:45:44PM -0700, Linus Torvalds wrote:
> On Tue, May 15, 2012 at 5:42 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
> >
> > Er... ?Looks like you forgot to actually put the patch in there ;-)

>
> Oops. Err.. I meant to do that. Right. To check that you're awake.

Frankly, I would split it in two - one introducing security_mmap_addr()
and converting the callers, and another doing the rest of it.

Said that, I'm not sure I like the resulting picture.

1) caller in __bprm_mm_init() is simply ridiculous - note that
arguments are bleeding *constants*, so it might very well have
been a BUG_ON(). If it fails, you'll have every execve() fail.

2) get_unmapped_area() probably ought to grow such a caller and
I really suspect that it would've killed quite a few of them.

3) expand_downwards() seems to be missing the basic sanity checks on the
validity of VMA range (arch_mmap_check(), that is). itanic opencodes
the equivalent before calling expand_stack(); arm and mn10300 do not
bother, which might or might not be legitimate - depends on whether
one can get a fault in the first page *and* reach the check_stack:
in e.g. arm __do_page_fault(). Which just might be possible, if attacker
maps something just above said first page with MAP_GROWSDOWN and
tries to write at very small address - IIRC, the first page on arm
contains the stuff that shouldn't be world-writable... s390 doesn't
care and I'm not sure about sparc32/sparc64 - it looks like that shouldn't
be possible to hit, but...

4) i810_dma.c ought to be switched to vm_mmap() - as discussed in that
thread back then, magical mystery wank with ->f_op reassignments does
not rely on ->mmap_sem for protection and thus can be taken out of
under ->mmap_sem.

James Morris

unread,

May 16, 2012, 7:39:00 AM5/16/12

to Linus Torvalds, Al Viro, Mimi Zohar, Eric Paris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Tue, 15 May 2012, Linus Torvalds wrote:

> Hmm?

diff --git a/security/capability.c b/security/capability.c
index 5bb21b1c448c..9a19c6a54e12 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -949,7 +949,6 @@ void __init security_fixup_ops(struct
security_operations *ops)
set_to_cap_if_null(ops, file_alloc_security);
set_to_cap_if_null(ops, file_free_security);
set_to_cap_if_null(ops, file_ioctl);
- set_to_cap_if_null(ops, file_mmap);
set_to_cap_if_null(ops, file_mprotect);
set_to_cap_if_null(ops, file_lock);
set_to_cap_if_null(ops, file_fcntl);

Do we need to add addr_map to the fixup ops?

--
James Morris
<jmo...@namei.org>

James Morris

unread,

May 16, 2012, 7:39:23 AM5/16/12

to Linus Torvalds, Al Viro, Mimi Zohar, Eric Paris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, 16 May 2012, James Morris wrote:

> On Tue, 15 May 2012, Linus Torvalds wrote:
>
> > Hmm?
>
> diff --git a/security/capability.c b/security/capability.c
> index 5bb21b1c448c..9a19c6a54e12 100644
> --- a/security/capability.c
> +++ b/security/capability.c
> @@ -949,7 +949,6 @@ void __init security_fixup_ops(struct
> security_operations *ops)
> set_to_cap_if_null(ops, file_alloc_security);
> set_to_cap_if_null(ops, file_free_security);
> set_to_cap_if_null(ops, file_ioctl);
> - set_to_cap_if_null(ops, file_mmap);
> set_to_cap_if_null(ops, file_mprotect);
> set_to_cap_if_null(ops, file_lock);
> set_to_cap_if_null(ops, file_fcntl);
>
>
> Do we need to add addr_map to the fixup ops?

mmap_addr I mean.

Mimi Zohar

unread,

May 16, 2012, 9:28:13 AM5/16/12

to James Morris, Linus Torvalds, Al Viro, Eric Paris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, 2012-05-16 at 21:38 +1000, James Morris wrote:
> On Wed, 16 May 2012, James Morris wrote:
>
> > On Tue, 15 May 2012, Linus Torvalds wrote:
> >
> > > Hmm?
> >
> > diff --git a/security/capability.c b/security/capability.c
> > index 5bb21b1c448c..9a19c6a54e12 100644
> > --- a/security/capability.c
> > +++ b/security/capability.c
> > @@ -949,7 +949,6 @@ void __init security_fixup_ops(struct
> > security_operations *ops)
> > set_to_cap_if_null(ops, file_alloc_security);
> > set_to_cap_if_null(ops, file_free_security);
> > set_to_cap_if_null(ops, file_ioctl);
> > - set_to_cap_if_null(ops, file_mmap);
> > set_to_cap_if_null(ops, file_mprotect);
> > set_to_cap_if_null(ops, file_lock);
> > set_to_cap_if_null(ops, file_fcntl);
> >
> >
> > Do we need to add addr_map to the fixup ops?
>
> mmap_addr I mean.

I would think so.

Eric Paris

unread,

May 16, 2012, 9:42:24 AM5/16/12

to James Morris, Linus Torvalds, Al Viro, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, 2012-05-16 at 21:37 +1000, James Morris wrote:
> On Tue, 15 May 2012, Linus Torvalds wrote:
>
> > Hmm?
>
> diff --git a/security/capability.c b/security/capability.c
> index 5bb21b1c448c..9a19c6a54e12 100644
> --- a/security/capability.c
> +++ b/security/capability.c
> @@ -949,7 +949,6 @@ void __init security_fixup_ops(struct
> security_operations *ops)
> set_to_cap_if_null(ops, file_alloc_security);
> set_to_cap_if_null(ops, file_free_security);
> set_to_cap_if_null(ops, file_ioctl);
> - set_to_cap_if_null(ops, file_mmap);
> set_to_cap_if_null(ops, file_mprotect);
> set_to_cap_if_null(ops, file_lock);
> set_to_cap_if_null(ops, file_fcntl);
>
>
> Do we need to add addr_map to the fixup ops?

No. His patch works just fine without it. If you look he uses:

+ if (security_ops->mmap_file) {

Which means since we didn't set an explicit .mmap_file, even with no
other LSM loaded we would be fine.

At the moment I'd rather stick with our usual notation of forcing
capabilities to define every option even if all it does it return 0. If
Linus thinks it's a good idea to do
if (security_ops->function)
security_ops->funtion(args);
In the security server we should do that cleanup separately...

-Eric

Mimi Zohar

unread,

May 16, 2012, 9:53:58 AM5/16/12

to Eric Paris, James Morris, Linus Torvalds, Al Viro, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

James was pointing out that security_fixup_ops was not set for
mmap_addr, not mmap_file(), which should be initialized by
security_fixup_ops().

Mimi

Eric Paris

unread,

May 16, 2012, 10:06:54 AM5/16/12

to Mimi Zohar, James Morris, Linus Torvalds, Al Viro, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

But it would be flat wrong to put it there. True historically we did
things differently, but this patch is right given Linus is doing it a
different way.

Had Linus done what you two are suggesting:
+int security_mmap_addr(unsigned long addr)
+{
+ if (security_ops->mmap_addr) { <--- This would call cap_mmap_addr
+ int ret = security_ops->mmap_addr(addr);
+ if (ret)
+ return ret;
+ }
+ return cap_mmap_addr(addr); <--- This would also call
cap_mmap_addr
+}

See the problem?

Like I said, we can do a full cleanup removing all of the useless
capabilities functions and turning them into an inline branch instead
of unconditional jump and return 0. We could possibly even move all
of the cap stacking into this layer instead of pushing it into the LSM
layer. But you shouldn't really do #2 without #1. (Notice here Linus
is doing both of those things, but only with this one hook)

-Eric

Eric Paris

unread,

May 16, 2012, 10:13:45 AM5/16/12

to Linus Torvalds, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org, Al Viro, Stephen Smalley

On Tue, May 15, 2012 at 8:37 PM, Linus Torvalds
<torv...@linux-foundation.org> wrote:

> It does change some things, like say that "->mmap_file()" is only ever
> called for actual files, not for anonymous mappings. It doesn't seem
> to be sensible to have a security model for anonymous mappings -
> there's nothing there to really target. Whatever.

So we would have no checks for anonymous mappings? We actually do
have some controls around them today

http://www.akkadia.org/drepper/selinux-mem.html

It's mostly around W+X memory. (or was W now X memory)

Admittedly with the growing prevalence of JiT stuff we are using those
protections less and less and less....

Not certain how happy some will be to see them completely disappear....

Linus Torvalds

unread,

May 16, 2012, 11:14:31 AM5/16/12

to Eric Paris, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org, Al Viro, Stephen Smalley

On Wed, May 16, 2012 at 7:13 AM, Eric Paris <epa...@parisplace.org> wrote:
>
> So we would have no checks for anonymous mappings? We actually do
> have some controls around them today
>
> http://www.akkadia.org/drepper/selinux-mem.html
>
> It's mostly around W+X memory. (or was W now X memory)

Ahh, ok. So I guess that won't work.

That said, I think do_brk() can already today be used to avoid those
checks, since it does a mmap with VM_DATA_DEFAULT_FLAGS, which
includes exec if the current personality includes READ_IMPLIES_EXEC -
which is trivial yo do.

I wonder if the rwx checks could be split up too - the access
protection from the *file* is really a separate issue from the access
protection of the *mapping*, if you see what I mean.. Then we could do
it at do_brk() time too.

Linus

Linus Torvalds

unread,

May 16, 2012, 11:23:48 AM5/16/12

to Eric Paris, Mimi Zohar, James Morris, Al Viro, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 16, 2012 at 7:06 AM, Eric Paris <epa...@parisplace.org> wrote:
>
> But it would be flat wrong to put it there. True historically we did
> things differently, but this patch is right given Linus is doing it a
> different way.

Indeed. As Eric points out, I'm forcing the address capability checks
on everybody, so defaulting to them is actively the wrong thing to do.

I think the NULL pointer games are from back when we inlined the
security hooks into the code directly, rather than have out-of-line
helper functions. At that point, the code expansion overhead of
loading the pointer, checking it for NULL, and then calling it was
quite noticeable.

But we uninlined them in 2007 (commit 20510f2f4e2d: "security: Convert
LSM into a static interface"), so I think the "set to default so that
you can do an unconditional jump" is not all that useful any more.

Of course, if all security models do export the fn, not having the
conditional is still useful. But defaulting to the capability one is
not in this case, since we will call the capability version later
anyway. So it should be defaulted to something that just returns 0.

Linus

Mimi Zohar

unread,

May 16, 2012, 11:48:46 AM5/16/12

to Eric Paris, James Morris, Linus Torvalds, Al Viro, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

Sigh, I was actually thinking back to when LSM was not enabled, and the
default to enable caps was via the security_fixup_ops(). The default
today, without LSM enabled, is to call the cap functions directly from
the security stub functions, like how Linus included the call to
cap_mmap_addr():

static inline int security_mmap_addr(unsigned long addr)
{
return cap_mmap_addr(addr);
}

Mimi

Linus Torvalds

unread,

May 16, 2012, 12:10:27 PM5/16/12

to Mimi Zohar, Eric Paris, James Morris, Al Viro, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 16, 2012 at 8:47 AM, Mimi Zohar <zo...@linux.vnet.ibm.com> wrote:
>
> The default
> today, without LSM enabled, is to call the cap functions directly from
> the security stub functions,

. I wouldn't call that the default actually: the default today is
pretty much whatever "makes sense".

A *lot* of them just do "return 0" (or, for the ones without a return
value, nothing at all).

In fact, I think it's most of them by far.

Some of them return errors (-EOPNOTSUPP).

So really only for a small subset of the simple cases do the security
wrapper inlines call the cap functions directly. Those simple cases
tend to happen to be at the top of the list, though, so it may seem
more common at first glance than it actually is.

Linus

Mimi Zohar

unread,

May 23, 2012, 5:20:08 PM5/23/12

to Al Viro, Linus Torvalds, Eric Paris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, 2012-05-16 at 03:18 +0100, Al Viro wrote:
> On Tue, May 15, 2012 at 05:45:44PM -0700, Linus Torvalds wrote:
> > On Tue, May 15, 2012 at 5:42 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
> > >

> Frankly, I would split it in two - one introducing security_mmap_addr()
> and converting the callers, and another doing the rest of it.

Ok, I split the patch. Hopefully it is bisect safe. The results of which
are available from
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
#next-vfs-changes. But before posting them, I'd like to understand what
should be done regarding the issues you raised.

> Said that, I'm not sure I like the resulting picture.
>
> 1) caller in __bprm_mm_init() is simply ridiculous - note that
> arguments are bleeding *constants*, so it might very well have
> been a BUG_ON(). If it fails, you'll have every execve() fail.

ok, checking the addr based on the same constants doesn't make sense.
Replace it with a BUG_ON() as you suggested?

> 2) get_unmapped_area() probably ought to grow such a caller and
> I really suspect that it would've killed quite a few of them.

?

> 3) expand_downwards() seems to be missing the basic sanity checks on the
> validity of VMA range (arch_mmap_check(), that is). itanic opencodes
> the equivalent before calling expand_stack(); arm and mn10300 do not
> bother, which might or might not be legitimate - depends on whether
> one can get a fault in the first page *and* reach the check_stack:
> in e.g. arm __do_page_fault(). Which just might be possible, if attacker
> maps something just above said first page with MAP_GROWSDOWN and
> tries to write at very small address - IIRC, the first page on arm
> contains the stuff that shouldn't be world-writable... s390 doesn't
> care and I'm not sure about sparc32/sparc64 - it looks like that shouldn't
> be possible to hit, but...

?

> 4) i810_dma.c ought to be switched to vm_mmap() - as discussed in that
> thread back then, magical mystery wank with ->f_op reassignments does
> not rely on ->mmap_sem for protection and thus can be taken out of
> under ->mmap_sem.

Ok, replacing the do_mmap() with vm_mmap() would be a separate patch,
but it still leaves the existing f_op reassignment with locking issues.

thanks,

Mimi

Al Viro

unread,

May 30, 2012, 12:35:55 AM5/30/12

to Mimi Zohar, Linus Torvalds, Eric Paris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 23, 2012 at 05:18:19PM -0400, Mimi Zohar wrote:

> > 2) get_unmapped_area() probably ought to grow such a caller and
> > I really suspect that it would've killed quite a few of them.
>
> ?

This belong at the same level as arch_mmap_check(). We insist on not
having VMAs with address range that has certain properties. E.g. extends
beyond the maximal user process address (TASK_SIZE, all architectures),
overlaps a range forbidden by MMU (like on sparc boxen) *or* page at
fixed address that must not be unmapped (e.g. page 0 at old arm systems,
with their "IRQ vectors live at fixed small virtual address" lossage).
Or hugepage VMA in range where huge pages are not allowed. Or page 0
on systems where it's forbidden by security policy. Same kind of
restriction, as far as the rest of the system is concerned.

The obvious place for enforcing such restrictions is get_unmapped_area().
That's what produces such VMA address ranges and validates ones that
are explicitly given by userland (when called with MAP_FIXED).

> > 3) expand_downwards() seems to be missing the basic sanity checks on the
> > validity of VMA range (arch_mmap_check(), that is). itanic opencodes
> > the equivalent before calling expand_stack(); arm and mn10300 do not
> > bother, which might or might not be legitimate - depends on whether
> > one can get a fault in the first page *and* reach the check_stack:
> > in e.g. arm __do_page_fault(). Which just might be possible, if attacker
> > maps something just above said first page with MAP_GROWSDOWN and
> > tries to write at very small address - IIRC, the first page on arm
> > contains the stuff that shouldn't be world-writable... s390 doesn't
> > care and I'm not sure about sparc32/sparc64 - it looks like that shouldn't
> > be possible to hit, but...
>
> ?

Any place that does this check without the rest of restriction enforcement
is very suspicious. expand_downwards() is one such place. I'm not saying
that we need a full-blown get_unmapped_area() in there, but we *do* want
arch_mmap_check(). Itanic does it in caller of expand_stack() (which is
where expand_downwards() come from). So does arm, now that rmk has fixed
the bug there. The rest either has arch_mmap_check() returning "no special
restrictions for this architecture" or seems to be guaranteed that if an
area used to satisfy those restrictions then extending its lower end towards
0 can't violate them. Probably. Frankly, I would rather just do
arch_mmap_check() in expand_downwards() (and remove it from callers in
arch/{arm,ia64}). Dumber is better in this case...

Another place is __bprm_mm_init() and there the check doesn't make any sense.

Yet another is install_special_mapping(). And I do not believe that the
check is needed there either. Address comes either from get_unmapped_area()
(and that's where the checks should be done) or it's cast in stone (e.g.
unicore32 with special vma for its IRQ vectors, fixed at virtual address
0xffff000 and if somebody doesn't like a page being there (r/o, etc.), they
can take that with hardware designers; the damn thing will be there, or the
box won't work).

And the rest of security_file_mmap() callers has the address come through
get_unmapped_area(), AFAICS. It would bloody better, because otherwise
we are very likely to have a serious hole on a bunch of architectures...

Al Viro

unread,

May 30, 2012, 12:36:20 PM5/30/12

to Mimi Zohar, Linus Torvalds, Eric Paris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 30, 2012 at 05:34:43AM +0100, Al Viro wrote:
> On Wed, May 23, 2012 at 05:18:19PM -0400, Mimi Zohar wrote:
>
> > > 2) get_unmapped_area() probably ought to grow such a caller and
> > > I really suspect that it would've killed quite a few of them.
> >
> > ?
>
> This belong at the same level as arch_mmap_check(). We insist on not
> having VMAs with address range that has certain properties. E.g. extends
> beyond the maximal user process address (TASK_SIZE, all architectures),
> overlaps a range forbidden by MMU (like on sparc boxen) *or* page at
> fixed address that must not be unmapped (e.g. page 0 at old arm systems,
> with their "IRQ vectors live at fixed small virtual address" lossage).
> Or hugepage VMA in range where huge pages are not allowed. Or page 0
> on systems where it's forbidden by security policy. Same kind of
> restriction, as far as the rest of the system is concerned.
>
> The obvious place for enforcing such restrictions is get_unmapped_area().
> That's what produces such VMA address ranges and validates ones that
> are explicitly given by userland (when called with MAP_FIXED).

BTW, here's what we have for callers of get_unmapped_area():
[hexagon,mips,powerpc,s390,sh,x86] arch_setup_additional_pages() - passes to
install_special_mapping(), where we'll feed it to that check.
[x86] setup_additional_pages() - ditto.
[uprobes] xol_add_vma() - ditto.
sys_mremap() - we explicitly call security_file_mmap() just downstream from
that call.
mremap_to() - we have called security_file_mmap() a bit earlier; might as
well fold it into get_unmapped_area(); note that we are calling it with
MAP_FIXED here, so get_unmapped_area() can't change the address, just accept
or reject it.
do_brk() - ditto.
do_mmap_pgoff() - we have security_file_mmap() (with non-NULL file, this
time) done shortly after get_unmapped_area()
vma_expandable() - on this codepath we don't do check, since starting
address can't change here. Might as well have done it, though.
[ia64 perfmon] a pointless wrapper around get_unmapped_area(); incidentally,
it gets calling conventions wrong - get_unmapped_area() return -E... on
failure, not 0. No check made, and AFAICS it's a bug. I've fixed the
calling conventions there (see vfs.git#for-next tip).

So we have one caller of get_unmapped_area() that legitimately doesn't have
the return value fed through security_file_mmap(). And that's expanding
mremap() trying to decide if it's allowed to grow an old vma in place.
Not the hottest path in the kernel, that...

As far as I'm concerned, the sane plan here would be
* split the damn hook into file-related and address-related parts.
* take the former outside - up from do_mmap_pgoff(), through
do_mmap() into vm_mmap()/do_shmat() and into sys_mmap_pgoff(). Caller of
do_mmap() in fs/aio.c doesn't need that - we have file == NULL there.
That's 5 callers, 3 on MMU side of things and 2 on !MMU.
* take the latter into get_unmapped_area() (on success exits).
As for the existing callers:
one in __bprm_mm_init() can go
what's left of the callers in do_mmap_pgoff() and !MMU
validate_mmap_request() can go (we'd taken all file-related checks upstream
and we'd done get_unmapped_area() already)
one in expand_downwards() should stay
ones in mremap_to(), sys_mremap() and do_brk() can go -
get_unmapped_area() nearby takes care of that
one in install_special_mapping() can go; most of the
callers of that one have address coming out of get_unmapped_area() and
the rest (tile and uml/x86 arch_setup_additional_pages(), uml_setup_stubs(),
unicore32 vectors_user_mapping(), compat && !compat_uses_vdso case in
x86 arch_setup_additional_pages()) are passing a fixed address in there and
any security policy wishing to deny that can just as well refuse to boot -
same effect. Incidentally, unicore32 probably should do an analog of
commit f9d4861 (ARM: 7294/1: vectors: use gate_vma for vectors user mapping)
and stop playing with install_special_mapping() completely.
* probably take arch_mmap_check() into expand_downwards(); we don't
want a full-blow get_unmapped_area() there (it's a fairly hot path and most
of get_unmapped_area() would be redundant here, even with MAP_FIXED), but
arch_mmap_check() would probably be more in place here than in the callers
of expand_stack().

The only question is what do we want passed to resulting two hooks. LSM
folks?

Eric Paris

unread,

May 30, 2012, 3:43:08 PM5/30/12

to Al Viro, Mimi Zohar, Linus Torvalds, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, 2012-05-30 at 17:36 +0100, Al Viro wrote:

> The only question is what do we want passed to resulting two hooks. LSM
> folks?

Current hook:
int security_file_mmap(struct file *file, unsigned long reqprot,
unsigned long prot, unsigned long flags,
unsigned long addr, unsigned long addr_only)

Obvious easy split:
int security_file_mmap(struct file *file, unsigned long reqprot,
unsigned long prot, unsigned long flags)

int security_addr_mmap(unsigned long addr)

security_addr_mmap would be used as you described. Which means
security_file_mmap() would pretty much only be used in do_mmap_pgoff()
(or validate_mmap_request)

file:
capabilities: does not use
apparmor/smack/selinux: used to get security blobs from
file/dentry/inode

reqprot: the PROT_* requested by userspace.
prot: the actual PROT_* which will be applied (read-implies-exec is the
difference)

capabilities: does not use *prot
SMACK: does not use *prot
apparmor: only uses prot (not reqprot)
SELinux: uses prot or reqprot based on a kernel build/selinuxfs/cmdline
value. Fedora/RHEL uses reqprot, not prot. This seems dumb, but it's
what we are doing.

These are basically used to check permission to read/write/execute the
file based on PROT_READ/PROT_WRITE/PROT_EXECUTE etc. If you move this
up we won't have reqprot and prot, we'll only have reqprot. So we would
need a helper in the mm code which allow us to easily calculate the
read-implies-exec behavior. for apparmor (and less common selinux)

***flags
capabilities: does not use
SMACK: does not use
apparmor: if (!(flags & MAP_PRIVATE))
SELinux: if ((flags & MAP_TYPE) == MAP_SHARED)

So both apparmor and SELinux only use flags to know if PROT_WRITE will
actually change the backing file. PROT_WRITE is ignored if MAP_PRIVATE.
So this could be a bool called "shared" or the LSMs can just parse the
flags. Doesn't matter to me.

Al Viro

unread,

May 30, 2012, 4:24:49 PM5/30/12

to Eric Paris, Mimi Zohar, Linus Torvalds, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 30, 2012 at 03:42:47PM -0400, Eric Paris wrote:
> security_file_mmap() would pretty much only be used in do_mmap_pgoff()
> (or validate_mmap_request)

Callers, actually - the whole point is to lift it out of under ->mmap_sem.
The tricky part is reqprot vs. prot mess.

In all cases, we need current to be READ_IMPLIES_EXEC and prot to contain
PROT_READ for any changes to happen. Assuming that both conditions are
met, we have the following:
file == NULL:
add PROT_EXEC in !MMU case, don't do that for MMU.
file != NULL, lives on fs mounted noexec: don't modify prot
file != NULL, lives on fs mounted without noexec:
MMU - add PROT_EXEC; !MMU - add it only if we have BDI_CAP_EXEC_MAP on
that file.

AFAICS, the least painful way of dealing with that would be to have
something like
mangle_prot(struct file *file, unsigned long *prot, unsigned long flags)

that would handle that logics and call security_file_mmap(). With separate
instances in MMU and !MMU cases ;-/ However, there's an extra fun in there -
consider fs/aio.c caller of do_mmap(). For MMU case we don't really care.
But what about !MMU? Should that guy get PROT_EXEC slapped on it?

Linus Torvalds

unread,

May 30, 2012, 4:28:37 PM5/30/12

to Al Viro, Eric Paris, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 30, 2012 at 1:24 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
> On Wed, May 30, 2012 at 03:42:47PM -0400, Eric Paris wrote:
>> security_file_mmap() would pretty much only be used in do_mmap_pgoff()
>> (or validate_mmap_request)
>
> Callers, actually - the whole point is to lift it out of under ->mmap_sem.
> The tricky part is reqprot vs. prot mess.

See how I solved reqprot vs prot in my suggested original patch. You
have it somewhere in your inbox, I know, because you called me out on
the fact that my original email forgot to attach it ;)

It actually cleaned things up, and made the calling conventions
simpler. Just always pass in "reqprot", and have the security layer do
the trivial "calculate final prot".

Anyway, my original patch worked, but it (incorrectly) optimized away
the call to the security layer if file was NULL. But that's just a
matter of removing that check.

Linus

Mimi Zohar

unread,

May 30, 2012, 4:35:35 PM5/30/12

to Al Viro, Eric Paris, Linus Torvalds, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, 2012-05-30 at 21:24 +0100, Al Viro wrote:
> On Wed, May 30, 2012 at 03:42:47PM -0400, Eric Paris wrote:
> > security_file_mmap() would pretty much only be used in do_mmap_pgoff()
> > (or validate_mmap_request)
>
> Callers, actually - the whole point is to lift it out of under ->mmap_sem.
> The tricky part is reqprot vs. prot mess.

Linus already addressed it in the original patch, which I split into two
as you suggested:
733559f - "security: move security_file_mmap() and rename to security_mmap_file()"
b3649e9 security: define and use the new security_mmap_addr() hook

git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.

Mimi

Al Viro

unread,

May 30, 2012, 4:53:30 PM5/30/12

to Mimi Zohar, Eric Paris, Linus Torvalds, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 30, 2012 at 04:33:36PM -0400, Mimi Zohar wrote:
> On Wed, 2012-05-30 at 21:24 +0100, Al Viro wrote:
> > On Wed, May 30, 2012 at 03:42:47PM -0400, Eric Paris wrote:
> > > security_file_mmap() would pretty much only be used in do_mmap_pgoff()
> > > (or validate_mmap_request)
> >
> > Callers, actually - the whole point is to lift it out of under ->mmap_sem.
> > The tricky part is reqprot vs. prot mess.
>
> Linus already addressed it in the original patch, which I split into two
> as you suggested:
> 733559f - "security: move security_file_mmap() and rename to security_mmap_file()"
> b3649e9 security: define and use the new security_mmap_addr() hook

Ehh...

* elf_map() should just be using vm_map/vm_unmap (see vfs.git#for-next)
* that goto out; in sys_mmap_pgooff() is an instant leak - you miss
fput() that way
* again, prot handling for !MMU case differs from MMU one. Sure, we
can duplicate both, but it's getting seriously ugly ;-/
* while we are at it, checking address in validate_mmap_request() is
really odd, seeing that we have the only call to validate_mmap_request()
followed by
/* we ignore the address hint */
addr = 0;
len = PAGE_ALIGN(len);

Al Viro

unread,

May 30, 2012, 4:56:24 PM5/30/12

to Linus Torvalds, Eric Paris, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 30, 2012 at 01:28:09PM -0700, Linus Torvalds wrote:
> On Wed, May 30, 2012 at 1:24 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
> > On Wed, May 30, 2012 at 03:42:47PM -0400, Eric Paris wrote:
> >> security_file_mmap() would pretty much only be used in do_mmap_pgoff()
> >> (or validate_mmap_request)
> >
> > Callers, actually - the whole point is to lift it out of under ->mmap_sem.
> > The tricky part is reqprot vs. prot mess.
>
> See how I solved reqprot vs prot in my suggested original patch. You
> have it somewhere in your inbox, I know, because you called me out on
> the fact that my original email forgot to attach it ;)
>
> It actually cleaned things up, and made the calling conventions
> simpler. Just always pass in "reqprot", and have the security layer do
> the trivial "calculate final prot".

If only it would be trivial ;-/ Take a look at !MMU case (or at the
description in the posting upthread if you want to avoid seeing your
breakfast one more time - the code in validate_mmap_request() is
really ugly).

Linus Torvalds

unread,

May 30, 2012, 5:04:55 PM5/30/12

to Al Viro, Eric Paris, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 30, 2012 at 1:56 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
>>
>> It actually cleaned things up, and made the calling conventions
>> simpler. Just always pass in "reqprot", and have the security layer do
>> the trivial "calculate final prot".
>
> If only it would be trivial ;-/ Take a look at !MMU case (or at the
> description in the posting upthread if you want to avoid seeing your
> breakfast one more time - the code in validate_mmap_request() is
> really ugly).

Don't bother with validate_mmap_request() for nommu. It's ugly, but it
does the same thing, and if it does something else, it's buggy anyway.

Generating 'prot' from 'reqprot' really *should* be as simple as what
I did in my patch. The fact that some places f*ck it up is their
problem - see for example mprotect (I think) that didn't take
MNT_NOEXEC into account.

Don't try to emulate those broken semantics. Just fix them.

Linus

Al Viro

unread,

May 30, 2012, 5:36:54 PM5/30/12

to Linus Torvalds, Eric Paris, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 30, 2012 at 02:04:23PM -0700, Linus Torvalds wrote:
> Generating 'prot' from 'reqprot' really *should* be as simple as what
> I did in my patch. The fact that some places f*ck it up is their
> problem - see for example mprotect (I think) that didn't take
> MNT_NOEXEC into account.
>
> Don't try to emulate those broken semantics. Just fix them.

Actually, it's better than I thought, but not as simple as you say.
I've misread what's going on in !file case; mea culpa, they are
actually acting the same way there.

The only difference is that for file-backed ones !MMU wants
VM_MAYEXEC in that file's bdi flags (BDI_CAP_EXEC_MAP). And
that actually sounds reasonable in !MMU case.

Anyway, I've dumped the variant I've got into vfs.git@security_file_mmap;
it should be at commit f12a0fd062b1d259a0b6bc6442019e6d4c45e9f5.

Comments?

Linus Torvalds

unread,

May 30, 2012, 6:52:48 PM5/30/12

to Al Viro, Eric Paris, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 30, 2012 at 2:36 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
>
> The only difference is that for file-backed ones !MMU wants
> VM_MAYEXEC in that file's bdi flags (BDI_CAP_EXEC_MAP). And
> that actually sounds reasonable in !MMU case.

Ok, I don't think it should be strictly necessary, but I guess I don't
mind either.

> Anyway, I've dumped the variant I've got into vfs.git@security_file_mmap;
> it should be at commit f12a0fd062b1d259a0b6bc6442019e6d4c45e9f5.
>
> Comments?

Two small ones:

- I really don't think you should use "goto out" in
security_mmap_file(). That implies that you're exiting the function,
but in fact you're jumping to the very *meat* of the function.

So I think you should rename "out" as "no_added_exec" or something.

And a small question: This code:

+ ret = security_mmap_file(file, prot, flags);
+ if (!ret) {
+ down_write(&current->mm->mmap_sem);
+ retval = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+ up_write(&current->mm->mmap_sem);
+ }

now seems to exist in four places. And in fact, that pretty much seems
to *be* what vm_mmap() is, at this point. Why isn't there just one
single vm_mmap() implementation, and then the callers of that?

Linus

Al Viro

unread,

May 30, 2012, 8:28:33 PM5/30/12

to Linus Torvalds, Eric Paris, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 30, 2012 at 03:51:27PM -0700, Linus Torvalds wrote:
> On Wed, May 30, 2012 at 2:36 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
> >
> > The only difference is that for file-backed ones !MMU wants

> > VM_MAYEXEC in that file's bdi flags (BDI_CAP_EXEC_MAP). ?And

> > that actually sounds reasonable in !MMU case.
>
> Ok, I don't think it should be strictly necessary, but I guess I don't
> mind either.
>
> > Anyway, I've dumped the variant I've got into vfs.git@security_file_mmap;
> > it should be at commit f12a0fd062b1d259a0b6bc6442019e6d4c45e9f5.
> >
> > Comments?
>
> Two small ones:
>
> - I really don't think you should use "goto out" in
> security_mmap_file(). That implies that you're exiting the function,
> but in fact you're jumping to the very *meat* of the function.
>
> So I think you should rename "out" as "no_added_exec" or something.

FWIW, I think it's cleaner to take the whole thing into an inlined helper.

> And a small question: This code:
>
> + ret = security_mmap_file(file, prot, flags);
> + if (!ret) {
> + down_write(&current->mm->mmap_sem);
> + retval = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
> + up_write(&current->mm->mmap_sem);
> + }
>
> now seems to exist in four places. And in fact, that pretty much seems
> to *be* what vm_mmap() is, at this point. Why isn't there just one
> single vm_mmap() implementation, and then the callers of that?

Umm... Not quite. The difference is that vm_mmap() takes its argument as
offset in bytes, while sys_mmap_pgoff() - in pages.

It can be reorganized a bit, though. vm_mmap() aside, there are only two
callers of do_mmap(), both passing it 0 as the last argument. So let's
lift these checks on offset into vm_mmap() and kill do_mmap() completely -
all that remains of it would be a call of do_mmap_pgoff(). And there's no
reason to put those sanity checks (now in vm_mmap()) under ->mmap_sem,
of course. At that point we *do* get 4 identical pieces of code. Let's
call that vm_mmap_pgoff() and put it (and vm_mmap()) to mm/util.c. Voila...

I've pushed that to the same place (vfs.git#security_file_mmap). Should
propagate to git.kernel.org in a few... Guys, does anybody have objections
about the way it looks?

Linus Torvalds

unread,

May 30, 2012, 8:41:22 PM5/30/12

to Al Viro, Eric Paris, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 30, 2012 at 5:28 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
>
> FWIW, I think it's cleaner to take the whole thing into an inlined helper.

Even better.

I notice that your inlined helper doesn't do what I did: if PROT_EXEC
is already set, stop all the stupid games. IOW, the first test in that
function could as well be

if (prot & (PROT_READ | PROT_EXEC) != PROT_READ)
return prot;

because if PROT_EXEC is already set, or if PROT_READ wasn't set, none
of the rest of the checks make any sense at all.

But that's just me being anal. It doesn't really *matter* if we end up
setting PROT_EXEC again.

> It can be reorganized a bit, though. vm_mmap() aside, there are only two
> callers of do_mmap(), both passing it 0 as the last argument. So let's
> lift these checks on offset into vm_mmap() and kill do_mmap() completely -
> all that remains of it would be a call of do_mmap_pgoff(). And there's no
> reason to put those sanity checks (now in vm_mmap()) under ->mmap_sem,
> of course. At that point we *do* get 4 identical pieces of code. Let's
> call that vm_mmap_pgoff() and put it (and vm_mmap()) to mm/util.c. Voila...

Good.

End result looks fine to me.

Linus

Al Viro

unread,

May 30, 2012, 8:56:21 PM5/30/12

to Linus Torvalds, Eric Paris, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, May 30, 2012 at 05:40:54PM -0700, Linus Torvalds wrote:
> On Wed, May 30, 2012 at 5:28 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
> >
> > FWIW, I think it's cleaner to take the whole thing into an inlined helper.
>
> Even better.
>
> I notice that your inlined helper doesn't do what I did: if PROT_EXEC
> is already set, stop all the stupid games. IOW, the first test in that
> function could as well be
>
> if (prot & (PROT_READ | PROT_EXEC) != PROT_READ)
> return prot;
>
> because if PROT_EXEC is already set, or if PROT_READ wasn't set, none
> of the rest of the checks make any sense at all.

Point... OK, done, pushed and the whole thing thrown into #for-next.
Probably too late for today's linux-next, but tomorrow one should pick
that. BTW, tomorrow there'll be a signal.git pull request as well,
with task_work_add() series in it.

Now for the Miklos' stuff...

Mimi Zohar

unread,

May 30, 2012, 11:56:37 PM5/30/12

to Al Viro, Linus Torvalds, Eric Paris, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Thu, 2012-05-31 at 01:56 +0100, Al Viro wrote:
> On Wed, May 30, 2012 at 05:40:54PM -0700, Linus Torvalds wrote:
> > On Wed, May 30, 2012 at 5:28 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
> > >
> > > FWIW, I think it's cleaner to take the whole thing into an inlined helper.
> >
> > Even better.
> >
> > I notice that your inlined helper doesn't do what I did: if PROT_EXEC
> > is already set, stop all the stupid games. IOW, the first test in that
> > function could as well be
> >
> > if (prot & (PROT_READ | PROT_EXEC) != PROT_READ)
> > return prot;
> >
> > because if PROT_EXEC is already set, or if PROT_READ wasn't set, none
> > of the rest of the checks make any sense at all.
>
> Point... OK, done, pushed and the whole thing thrown into #for-next.

Patch "split cap_mmap_addr() out of cap_file_mmap()" contains the typo
'cat_mmap_addr', which is then removed in
"split ->file_mmap() into ->mmap_addr()/->mmap_file()".

James Morris

unread,

May 31, 2012, 12:21:15 AM5/31/12

to Al Viro, Linus Torvalds, Eric Paris, Mimi Zohar, Mimi Zohar, linux-secu...@vger.kernel.org, linux-...@vger.kernel.org

On Thu, 31 May 2012, Al Viro wrote:

> I've pushed that to the same place (vfs.git#security_file_mmap). Should
> propagate to git.kernel.org in a few... Guys, does anybody have objections
> about the way it looks?

Nope.

Acked-by: James Morris <james.l...@oracle.com>