noexec workarounds (Was: Yet abother Linux for Android app)

187 views
Skip to first unread message

Cédric VINCENT

unread,
Apr 23, 2014, 3:55:32 PM4/23/14
to proo...@googlegroups.com, Corbin Champion
> Thanks for the feedback.  I plan on spending time soon looking into
> using the sdcard for storage of most of a rootfs.  It remains a
> strongly desired feature.  Now that so many other things are
> working, the next sprint of time I can dedicate toward writing
> extensions for proot will be focused on this (I will also spend time
> testing the example you are creating that shows how to run a native
> android binary without proot attaching to it).

Good news!


> Can you outline what the future work is that you think would be
> necessary to make this more robust and properly test it? Just
> whatever you already have in mind, I can start from there.

As of now the mmap-noexec extension is stateless, that is, it doesn't
keep records about process' memory map changes.  As a consequence,
when the ELF interpreter changes *part* of a mmaped segment (using
mmap, munmap, or mprotect), some modifications the mmap-noexec
extension have made are not valid anymore.

For instance, the ELF interpreter typically creates a single mmaped
segment when loading an executable:

    +----------+
    |          |
    |          |
    |    RWX   |
    |          |
    |          |
    +----------+

Then, it uses mmap, munmap, or mprotect on *part* of this single
segment to create sub-segments with different attributes:

    +----------+
    |    RWX   |
    +----------+
    |    RW    |
    +----------+
    | unmmaped |
    +----------+

On ARM, the call to "mprotect(sub-segment, RW)" fails because the
single segment creation was modified by mmap-noexec (mmap anonymous +
RW permissions + copy).  It is not clear to me why it fails, however I
think a stateful mmap-noexec could make this issue disappear and I'm
sure it will be way faster (no need to copy the whole single segment,
only the RWX sub-segment needs to be copied).


> Also, I had an alternate idea that should work, but would also have
> a performance penalty.  What if when an executable was going to be
> exec'd or mapped, it was copied off of the noexec partition first.
> Then it could either A) be deleted, or B) be deleted only when more
> space is needed.  B) requires checking whether the file has changed
> since last being touched but B) has less of a performance penalty as
> a user only uses a small fraction of the executables at a time and
> probably only uses a small portion of a linux distro over and over
> in their normal use. This would only be desirable if the noexec
> extension cannot be made to work or if this had a noticeably smaller
> performance penalty.

I really like this idea.  I think it will be easier to implement and
it could be faster than mmap-noexec.


> Another alternative would be to intercept a lot of system calls and
> make sure any file that might need to run is stored on the internal
> storage and anything else is stored on the sdcard.  That would get
> more than half of a rootfs on the sdcard and would have a very small
> performance penalty.  The negative here is this would be a more
> complicated extension with more test cases to try.

This one doesn't sound as appealing to me ;)


Regards,
Cédric.

Corbin Champion

unread,
Apr 27, 2014, 2:26:20 AM4/27/14
to proo...@googlegroups.com, Corbin Champion
Alright, starting looking at this tonight.  Here is my first question...

When handling PR_execve, I modify where it is pointing by doing something like this:

status = write_data(tracee, peek_reg(tracee, CURRENT, SYSARG_1), modified, sizeof(modified));
which seems to work fine.

When handling PR_mmap or PR_mmap2, I modify where they are pointing by doing something like this:

status = write_data(tracee, peek_reg(tracee, CURRENT, SYSARG_4), &fp, sizeof(fp));

But, this results in:

proot warning: ptrace(POKEDATA): Input/output error.

So, what is the correct where to replace argument 4, which is the file pointer?

Thanks,
Corbin

Corbin Champion

unread,
Apr 27, 2014, 2:44:36 AM4/27/14
to proo...@googlegroups.com, Corbin Champion
Not sure why, but the fopen that is assigning fp is returning NULL.  Need to figure that out and see if it solves this.

Corbin Champion

unread,
Apr 27, 2014, 2:59:47 AM4/27/14
to proo...@googlegroups.com, Corbin Champion
Well, correcting my path such that fopen returned something useful doesn't fix the other error.  I must be replacing the file pointer used by mmap2 incorrectly.  Done for tonight.

Corbin

Cédric VINCENT

unread,
Apr 27, 2014, 5:19:27 AM4/27/14
to proo...@googlegroups.com, Corbin Champion
Hello Corbin,


> When handling PR_execve, I modify where it is pointing by doing
> something like this:
>
> status = write_data(tracee, peek_reg(tracee, CURRENT, SYSARG_1), modified, sizeof(modified));
>
> which seems to work fine.

Even if it works fine, be careful with "sizeof(modified)", because it
is valid only if "modified" is a litteral string, for instance:

    #define modified "/tmp/foo"

or if it is a fixed length buffer, for instance:

    char modified[PATH_MAX];

otherwise "sizeof(modified)" is likely equivalent to the size of a
pointer (4 or 8 bytes).  I tell you this advice only because it is a
known dangerous code pattern (I use too, I must confess).



> When handling PR_mmap or PR_mmap2, I modify where they are pointing
> by doing something like this:
>
> status = write_data(tracee, peek_reg(tracee, CURRENT, SYSARG_4), &fp, sizeof(fp));
>
> [...]

>
> Well, correcting my path such that fopen returned something useful
> doesn't fix the other error.  I must be replacing the file pointer
> used by mmap2 incorrectly.  Done for tonight.

I'm not sure to understand what you want to achieve.  I assume you
want to change the "fd" argument of mmap.  If this assumption is
right, four things come to my mind:

1. this is the 5th argument, not the 4th.

2. this argument is passed directly by register -- not by memory -- so
   you can replace it this way (code from "src/syscal/heap.c"):

       poke_reg(tracee, SYSARG_5 /* fd */, -1);

3. the type of this argument is "int" not "FILE *", as a consequence
   you have to convert it with fileno(3), or use open(2) instead.

4. file descriptors are not shared between the tracer and the tracee,
   thus you have to make the tracee create this file descriptor on its
   side.  I can explain you how to achieve this, but first I'd like to
   be sure my assumption is right.

Cédric.

Corbin Champion

unread,
Apr 27, 2014, 12:14:02 PM4/27/14
to proo...@googlegroups.com, Corbin Champion

Cedric,

For the first topic, I am using:

char modified[PATH_MAX];

As you have pointed out that danger before.

For the second issue,

>1. this is the 5th argument, not the 4th.
>

Ouch.  I looked at this man page: http://linux.die.net/man/2/mmap2 .  For some reason they have one argument seperate from the others the way the page renders, so my eyes must have skipped it (or insert funny joke about US schools, or insert funny joke about logic designers always counting base 0).

>2. this argument is passed directly by register -- not by memory -- so
>   you can replace it this way (code from "src/syscal/heap.c"):
>
>       poke_reg(tracee, SYSARG_5 /* fd */, -1);

I tried poke_reg, but targeting the wrong argument with it, is probably what caused a different error.


>3. the type of this argument is "int" not "FILE *", as a consequence
>   you have to convert it with fileno(3), or use open(2) instead.

Ah, yes.  Should have seen that one too.  Obviously struggling last night.


>4. file descriptors are not shared between the tracer and the tracee,
>   thus you have to make the tracee create this file descriptor on its
>   side.  I can explain you how to achieve this, but first I'd like to
>   be sure my assumption is right.

Yes, that is what I want to do.  In both cases, I want to copy the file to the exec partition and then point at it, instead of the one on the noexec partition.  Just trying something quick to get an idea of performance.

Thank you as always!
Corbin

Cédric VINCENT

unread,
Apr 28, 2014, 7:48:33 AM4/28/14
to proo...@googlegroups.com, Corbin Champion
> Ouch.  I looked at this man page: http://linux.die.net/man/2/mmap2 .
> For some reason they have one argument seperate from the others the
> way the page renders, so my eyes must have skipped it (or insert
> funny joke about US schools, or insert funny joke about logic
> designers always counting base 0).

I got the joke about logic designers, but didn't get the one about US
schools.  I'm curious about it now, could you help me ;)



> >4. file descriptors are not shared between the tracer and the tracee,
> >   thus you have to make the tracee create this file descriptor on its
> >   side.  I can explain you how to achieve this, but first I'd like to
> >   be sure my assumption is right.
>
> Yes, that is what I want to do.  In both cases, I want to copy the
> file to the exec partition and then point at it, instead of the one
> on the noexec partition.

Then you have to make the tracee call open(2) before mmap(2), to make
it create the file descriptor to the copied file:

<pseudo code>
when(extension_event == SYSCALL_ENTER_START
     && get_sysnum(tracee, CURRENT) == PR_mmap) {

     /* Replace mmap(2) with open(2).  */
     set_sysnum(tracee, PR_open)
     set_sysarg_path(tracee, path, SYSARG_1)
     poke_reg(tracee, SYSARG_2, flags)
     poke_reg(tracee, SYSARG_2, mode)

     /* Chain this substituted syscall with mmap(2).  Use -1 for fd
      * since its value is not known as of now (this will be the
      * result of the preceding open(2).  */
     register_chained_syscall(tracee, PR_mmap, addr, length, prot, flags, -1, offset)

     /* Don't leak the temporary file descriptor (same remark about
      * fd == -1).  */
     register_chained_syscall(tracee, PR_close, -1, 0, 0, 0, 0, 0)
}

when(extension_event == SYSCALL_CHAINED_ENTER
     && get_sysnum(tracee, ORIGINAL) == PR_mmap) {

     /* Replace the fd placeholder with the result of the preceding
      * open(2).  */
     temporary_fd = peek_reg(tracee, SYSARG_RESULT, CURRENT)
     poke_reg(tracee, SYSARG_5, temporary_fd)
}

when(extension_event == SYSCALL_CHAINED_EXIT
     && get_sysnum(tracee, ORIGINAL) == PR_mmap) {

     /* The return value of this chain of syscall is now known.  */
     tracee->chain.final_result = peek_reg(tracee, CURRENT, SYSARG_RESULT)
}

when(extension_event == SYSCALL_CHAINED_ENTER
     && get_sysnum(tracee, ORIGINAL) == PR_close) {

     /* Replace the fd placeholder with the result of the preceding
      * open(2).  */
     poke_reg(tracee, SYSARG_1, temporary_fd)
}
</pseudo code>

As you can see, this is not straightforward at all.  Moreover, this
might create unexpected behaviors since the file descriptor used by
the modified mmap(2) is not the one used by the tracee (for other
purpose).  To me, a safest approach would be to hook all open(2) that
could be used to map an executable segment (either using mmap(2), or
mprotect(2)).  This would ensure the file descriptor usage
consistency.  In the worst case, only more files are copied from the
noexec partition to the exec one than it should be.  As far as I know,
only ELF files are used to create executable segments, so it could be
as simple as:

<pseudo code>
when(extension_event == SYSCALL_ENTER_END
     && get_sysnum(tracee, CURRENT) == PR_open) { /* TODO: openat(2) */

     get_sysarg_path(tracee, path, SYSARG_1)
     status = open_elf(path)  /* exported by "execve/elf.h" */
     if (status < 0)
       return;
     close(status)

     /* Remember: only guest rootfs is noexec.  */
     if (!belongs_to_guestfs(tracee, path))
       return;

     new_path = get_copy_of(path)
     set_sysarg_path(tracee, new_path, SYSARG_1)
}
</pseudo code>

Note: it is assumed get_copy_of() is based on PRoot binding mechanism,
      otherwise belongs_to_guestfs() can't be used.

By the way, I don't think you have to hook execve(2) -- in a first
time -- since PRoot always replaces an executed program with its ELF
interpreter.  That means you just have to bind copies of the ELF
interpreters (there's likely only one in your case) from an exec
partition into the noexec rootfs:

    exec/proot -b exec/ld-linux.so.3:/lib/ld-linux.so.3 [...] -r no-exec/rootfs

Cédric.

Corbin Champion

unread,
Apr 28, 2014, 2:14:47 PM4/28/14
to proo...@googlegroups.com, Corbin Champion
Cedric,


>I got the joke about logic designers, but didn't get the one about US
>schools.  I'm curious about it now, could you help me ;)

I am not a funny man and, sadly, there is nothing really funny about not funding education properly. 

>As you can see, this is not straightforward at all.

I understand what you are doing, but yes, it is more complicated that I had hoped.


>To me, a safest approach would be to hook all open(2) that
>could be used to map an executable segment (either using mmap(2), or
>mprotect(2)).  This would ensure the file descriptor usage
>consistency.  In the worst case, only more files are copied from the
>noexec partition to the exec one than it should be.  As far as I know,
>only ELF files are used to create executable segments, so it could be
>as simple as

I like this idea, especially for my early experiment. 

I will try that idea first, as it is the simple and robust, but I am curious about what the consequences of another technique would be.  What if I did something like:
set_sysnum(tracee, PR_void);
mmap2( using my fd )

So, kill off what is being done and do it here.  What are the flaws in that? 

>By the way, I don't think you have to hook execve(2)

I see the first call is always the interpreter.  Is this handled similarly for scripts?  Is the shebang extracted and then the interpreter of that is extracted and then that is called?  I want to be able to handle statically linked exectuables too.  How is that handled? 

Thanks,
Corbin

Corbin Champion

unread,
Apr 16, 2015, 12:34:52 PM4/16/15
to proo...@googlegroups.com, corb...@gmail.com
Some interesting learnings:

When doing the simpler approach:

><pseudo code>
>when(extension_event == SYSCALL_ENTER_END
>     && get_sysnum(tracee, CURRENT) == PR_open) { /* TODO: openat(2) */
>
>     get_sysarg_path(tracee, path, SYSARG_1)
>     status = open_elf(path)  /* exported by "execve/elf.h" */
>     if (status < 0)
>       return;
>     close(status)
>
>     /* Remember: only guest rootfs is noexec.  */
>     if (!belongs_to_guestfs(tracee, path))
>       return;
>
>     new_path = get_copy_of(path)
>     set_sysarg_path(tracee, new_path, SYSARG_1)
>}
></pseudo code>

There are sadly some key problems with this.  Primarily that it is much harder to know what needs to be copied down, than simply whether it is an ELF or not.  You need to copy down anything that might be executed (so something that starts with #!).  You also need to copy down files that are going to be run through translate_and_check_exec (I handled this by notifying the extensions with what the host path was that was going to be used in that function and modifying it).  Worst of all though, is that some programs mmap files that are not of a particular format that is easily detectable.  For example, apt-get mmaps data files.  So, what to do about that?

So, I started looking at this to try to copy down anything that was going to be used by mmap...
First, I have worked through the code and it probably should look like this (I know yours was pseudo code, but there are some key differences beyond that):

static int handle_mmap(Tracee *tracee, ExtensionEvent event)

{


int status;

word_t flags;

char path[PATH_MAX];

char final_path[PATH_MAX];


switch (event) {


case SYSCALL_ENTER_START: {


switch (get_sysnum(tracee, ORIGINAL)) {


case PR_mmap:

case PR_mmap2:


flags = peek_reg(tracee, ORIGINAL, SYSARG_4);


if ((flags & MAP_ANONYMOUS) != 0)

break;


status = readlink_proc_pid_fd(tracee->pid, peek_reg(tracee, ORIGINAL, SYSARG_5), path);

if (status < 0)

return status;


status = copy_executable(tracee, path, final_path);

if (status <= 0)

return status;


/* Replace mmap(2) with open(2).  */

set_sysnum(tracee, PR_open);

set_sysarg_path(tracee, final_path, SYSARG_1);

poke_reg(tracee, SYSARG_2, O_RDWR);

poke_reg(tracee, SYSARG_3, 0);


/* Chain this substituted syscall with mmap(2).  Use -1 for fd

* since its value is not known as of now (this will be the

* result of the preceding open(2).  */

register_chained_syscall(tracee, PR_mmap2, peek_reg(tracee, ORIGINAL, SYSARG_1), peek_reg(tracee, ORIGINAL, SYSARG_2), peek_reg(tracee, ORIGINAL, SYSARG_3), peek_reg(tracee, ORIGINAL, SYSARG_4), -1, peek_reg(tracee, ORIGINAL, SYSARG_6));


/* Don't leak the temporary file descriptor (same remark about

* fd == -1).  */

register_chained_syscall(tracee, PR_close, -1, 0, 0, 0, 0, 0);

break;


default:

break;


}

return 0;

}


case SYSCALL_CHAINED_ENTER: {


switch (get_sysnum(tracee, CURRENT)) {


case PR_mmap:

case PR_mmap2:

/* Replace the fd placeholder with the result of the preceding

* open(2).  */

poke_reg(tracee, SYSARG_5, temporary_fd);

break;


case PR_close:

/* Replace the fd placeholder with the result of the preceding

* open(2).  */

poke_reg(tracee, SYSARG_1, temporary_fd);

break;


default:

break;


}

return 0;

}


case SYSCALL_CHAINED_EXIT: {


switch (get_sysnum(tracee, CURRENT)) {


case PR_open:

//save off result of open

temporary_fd = peek_reg(tracee, CURRENT, SYSARG_RESULT);

break;


case PR_mmap:

case PR_mmap2:

/* The return value of this chain of syscall is now known.  */

force_chain_final_result(tracee, peek_reg(tracee, CURRENT, SYSARG_RESULT));

break;


default:

break;


}

return 0;

}


default:

return 0;

}

}


But there is an interesting problem with doing this.  On my x86 device all is well, but on my arm (android) device the following happens:

proot info: pid 7892: sysenter start: mmap2(0x0, 0x20880, 0x5, 0x802, 0x3, 0x0) = 0x0 [0xbe9f0ed8, 0]
handle_mmap path: /storage/emulated/0/GNURoot/debian/2231/6139
copy_executable:
proot info: pid 7892: translate("/" + "/noexec/")
proot info: pid 7892:          -> "/storage/emulated/0/GNURoot/debian/"
proot info: pid 7892: translate("/" + "/meta/2231/6139")
proot info: pid 7892:          -> "/data/data/com.gnuroot.debian/debian/meta/2231/6139"
copy_executable meta_path: /data/data/com.gnuroot.debian/debian/meta/2231/6139
proot info: pid 7892: translate("/" + "/data/data/com.gnuroot.debian/debian/lib/arm-linux-gnueabihf/libtinfo.so.5.9")
proot info: pid 7892:          -> "/data/data/com.gnuroot.debian/debian/lib/arm-linux-gnueabihf/libtinfo.so.5.9"
copy_executable final_path: /data/data/com.gnuroot.debian/debian/lib/arm-linux-gnueabihf/libtinfo.so.5.9
copy_executable done copying
proot info: pid 7892: sysenter end: open(0xbe9f0e8b, 0x2, 0x0, 0x802, 0x3, 0x0) = 0xbe9f0e8b [0xbe9f0e8b, 0]
proot info: pid 7892: sysexit start: open(0x7, 0x2, 0x0, 0x802, 0x3, 0x0) = 0x7 [0xbe9f0e8b, 0]
chained file handle path: /data/data/com.gnuroot.debian/debian/lib/arm-linux-gnueabihf/libtinfo.so.5.9
proot info: pid 7892: sysexit end: mmap2(0x0, 0x20880, 0x5, 0x802, 0xffffffff, 0x0) = 0x0 [0xbe9f0e8b, 0]
proot info: pid 7892: sysenter start: mmap2(0x0, 0x20880, 0x5, 0x802, 0xffffffff, 0x0) = 0x0 [0xbe9f0e8b, 0]
proot info: pid 7892: sysenter end: mmap2(0x0, 0x20880, 0x5, 0x802, 0x7, 0x0) = 0x0 [0xbe9f0e8b, 0]
proot info: pid 7892: sysexit start: mmap2(0xfffffff7, 0x20880, 0x5, 0x802, 0x7, 0x0) = 0xfffffff7 [0xbe9f0e8b, 0]
proot info: pid 7892: sysexit end: close(0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0) = 0xffffffff [0xbe9f0e8b, 0]
proot info: pid 7892: sysenter start: close(0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0) = 0xffffffff [0xbe9f0e8b, 0]
proot info: pid 7892: sysenter end: close(0x7, 0x0, 0x0, 0x0, 0x0, 0x0) = 0x7 [0xbe9f0e8b, 0]
proot info: pid 7892: sysexit start: close(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) = 0x0 [0xbe9f0e8b, 0]
proot info: pid 7892: sysexit end: mmap2(0xfffffff7, 0x20880, 0x5, 0x802, 0x3, 0x0) = 0xfffffff7 [0xbe9f0ed8, 0]

You can see the mmap2 being replaced by open, mmap2 and close with the result of the open being used at the fd for the mmap2 and the close.  But, the mmap2 is failing by returning -1.  The error code is 9 which means bad file number.  This is odd since that file number was just spit out by the preceding open.  Now it chose a file number of 7 and that also seems odd, since you can see it was going to use 3 and if I scan up through the logs (not all included here for brevity), I believe, there are lower numbers available for this process than 7.  Is it maybe opening the file from the wrong process or ??  

So, those are my adventures so far trying to get things to live on the scard.  

Take care,
Corbin

Corbin Champion

unread,
Apr 16, 2015, 2:47:06 PM4/16/15
to proo...@googlegroups.com, corb...@gmail.com
Even though the logs show the fd used by mmap being different from sysenter start vs sysenter end, it appears that when the mmap takes place, it is acting like it is still using the bogus fd -1.  I proved this by changing the -1 to the original fd that was going to be used by the original mmap2 and I get a different result, so it is like I am not replacing the fd early enough or my usual issue with using ORIGINAL, CURRENT and MODIFIED wrong.

Corbin

Corbin Champion

unread,
Apr 16, 2015, 6:33:58 PM4/16/15
to proo...@googlegroups.com, corb...@gmail.com
Based on that piece of data, if I delay registering the chained mmap and close until after I know the result of the open, things work well on ARM and x86.  Not sure if there is a proot bug there or not.

Corbin
Reply all
Reply to author
Forward
0 new messages