Suggested by: Eugene Syromyatnikov
Direct Rendering Manager subsystem has pretty elaborate ioctl interface, and it might be useful to be able to support its decoding.
在 2018年3月14日星期三 UTC+8上午12:32:01,赖伟峰写道:
1stare大部分情况下都没有 CAP_SYS_ADMIN 权限,而挂载 /proc 需要root权限,.所以不断挂载 /proc 是不可取的2.添加系统调用的办法很难被上流 / 下流 kernel 通过,所以之前的想法基本就被mentor PASS 掉了
Hello,
Greetings from my side, I'm interested in working on project "Adding
support for alternative tracing backends".
I have taken references from strace mailing list archive and other
articles. I understand the basic underlying approach about what needs
to be done, please correct me I'm getting into wrong direction.
Any suggestions will be really appreciated.
Abstract:
1. Add backend interface to allow for multiple alternative backends.
2. Using gdbserver:
With implementation of catch syscalls in gdbserver which adds a new
QCatchSyscalls packet to enable 'catch syscall', and newstop reasons
"syscall_entry" and "syscall_return" for those events.
GDB can catch some or all of the syscalls issued by the debuggee, and
show the related information for each syscall. If no argument is
specified, calls to and returns from all system calls will be caught.
Basic implementation idea:
strace talks to gdbserver via gdbserver backend
strace sends packet: $vCont;c (continue)
strace receives packet:T05syscall_entry:
16;06:b0e2ffffff7f0000;07:68e2ffffff7f0000;10:27a9b0f7ff7f0000;thread:p2162.2162;core:5;
strace sends packet: $g (get registers)
strace receives packet: daffffffffffffff0000000000000000...
I plan to use previous patches:[1], [2], [3]
3. Using ftrace:
kprobe, uprobe and kernel tracepoint scripts make use of ftrace - This
allows tracing kernel functions, stack tracing and debugging crash.
write to files in /sys/kernel/debug/tracing and reading output from
/sys/kernel/debug/tracing.
Detailed description: [4].
4. Perf events:
Call perf_event_open syscalls , kernel writes events to ring buffer
in user-space, read tracepoints from ring buffer.
Various ioctl like PERF_EVENT_IOC_ENABLE and PERF_EVENT_IOC_DISABLE
also act on perf_event_open() file descriptors, allowing enabling and
disabling the individual counter or event group specified by the file
descriptor argument respectively. [5]
[1]. https://lists.strace.io/pipermail/strace-devel/2017-February/005985.html
[2]. https://lists.strace.io/pipermail/strace-devel/2017-February/005986.html
[3]. https://lists.strace.io/pipermail/strace-devel/2017-February/005987.html
[4]. http://www.linuxjournal.com/article/6100
[5]. http://man7.org/linux/man-pages/man2/perf_event_open.2.html
Thanks for your time.
Best Regards,
Harsha Sharma
Yes, a PID NS tree should be built (at least to the point the desired
information is obtained) in order to perform the translation. As I said,
the endeavor can be complicated by the fact that /proc can be mounted from
the alien PID namespace, but in that case we can just bail out early, as
it is not a normal setup (however, pretty much possible).
Note that since this is involves quite a lot of syscalls, some form of
caching should be implemented. It is also complicated by the fact that
processes can come and go between queries, so we should account for that
somehow ({i,fa}notify?).
> BWT, there is another problem I don't how to solve it. it needs
> CAP_SYS_ADMIN when system check the contents of /proc/[pid]/ns/* .
> that means strace need CAP_SYS_ADMIN privileges still. Is there some
> better ways to solve this problem?
Why are you saying that CAP_SYS_ADMIN is needed? It perfectly works without it.
pts/15, esyr@asgard: /tmp % sudo unshare -p --fork su - esyr -c 'sleep 100' &
[2] 18281
pts/15, esyr@asgard: /tmp % cat ns.c
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/ioctl.h>
#include <sys/ptrace.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#define NSIO 0xb7
#define NS_GET_PARENT _IO(NSIO, 0x2)
int main(int argc, char **argv)
{
int target_pid = strtol(argv[1], NULL, 0);
char *path;
struct stat st;
int pidns_fd;
int pidns_fd_parent;
asprintf(&path, "/proc/%d/ns/pid", target_pid);
assert(!ptrace(PTRACE_SEIZE, target_pid));
pidns_fd = open(path, O_RDONLY);
assert(pidns_fd >= 0);
printf("pidns_fd = %d\n", pidns_fd);
assert(!fstat(pidns_fd, &st));
printf("pid ns inode: %llu\n", (unsigned long long) st.st_ino);
pidns_fd_parent = ioctl(pidns_fd, NS_GET_PARENT);
assert(pidns_fd_parent >= 0);
printf("pidns_fd_parent = %d\n", pidns_fd_parent);
assert(!fstat(pidns_fd_parent, &st));
printf("parent pid ns inode: %llu", (unsigned long long) st.st_ino);
return 0;
}
pts/15, esyr@asgard: /tmp % gcc ns.c -o ns
pts/15, esyr@asgard: /tmp % ./ns $(pgrep -f '^sleep 100$')
pidns_fd = 3
pid ns inode: 4026532513
pidns_fd_parent = 4
parent pid ns inode: 4026531836
pts/15, esyr@asgard: /tmp % ls -la /proc/$(pgrep -f '^sleep 100$')/ns/pid
lrwxrwxrwx 1 esyr esyr 0 Mar 26 13:54 /proc/18284/ns/pid -> pid:[4026532513]
pts/15, esyr@asgard: /tmp % ls -la /proc/self/ns/pid
lrwxrwxrwx 1 esyr esyr 0 Mar 26 13:55 /proc/self/ns/pid -> pid:[4026531836]
pts/15, esyr@asgard: /tmp %