请教 GSOC 开源项目 strace

103 views
Skip to first unread message

赖伟峰

unread,
Mar 13, 2018, 12:32:01 PM3/13/18
to 西邮Linux兴趣小组
Message has been deleted

赖伟峰

unread,
Mar 15, 2018, 7:21:31 AM3/15/18
to 西邮Linux兴趣小组
最近主要在思考第四个 idea也就是 namespace support,如下:
namespace support
Suggested by: Eugene Syromyatnikov
  It could be useful to be able to show PIDs of processes in different PID namespaces when they are shown up in syscall arguments and results. As of now, this is quite complicated by the fact that there's no way to easily derive PID of the target process in the strace's namespace.
  RH bug
  LKML thread with a patch that introduces a syscall that helps translating PIDs between PID namespaces
  Some WiP
  The other thing is the preservation of correctness of various strace features (path filtering, fd decoding, thread enumeration, ...) that rely on /proc when the traced process is in different namespace.

我觉得项目的需求是在 strace 命名空间导出目标进程的 PID ,比如 strace -fp 可以实现导出当前命名空间中目标进程的 PID ,但是无法导出它在不同命名空间的 PID 。通过 man pid_namespaces (较低版本没有 pid_namespaces 这个概念,可能 man 不出来),里面有这么一段:


所以可以通过不断的挂载 /proc 使得 strace 命名空间可以看见其他命名空间的 PID ,然而,在 strace 过程中是不支持输入命令的,所以需要给内核写文件实现一个系统调用,使得 stace 在 proc 之前就将其他命名空间的 PID 预加载。

但是开源常驻贡献者之一说了如下这么一段话, 出处为 Strace IRC ( http://webchat.freenode.net/ #strace ) :
[03:39] <eSyr-ng_> well, it's awfully broken. It's better to follow LKML discussion, there was a code example that is suitable for some specific case.
[03:40] <eSyr-ng_> strace's complication, however, is that it should translate pids of processes in different namespaces, and, strictly speaking, /proc can be mounted in a different PID namespace that strace has.
[03:41] <eSyr-ng_> if you really want to work on it, I can try to update my attempt on it, but it could take some time.
[03:43] <eSyr-ng_> btw, there was also an update on the translate_pid syscall proposal, let me check.
[03:46] <eSyr-ng_> generally, it's better to have the functionality in kernel, one way or another (I have a weak objection agains yet another syscall, but nsfs ioctl interface is just the worst and extremely cumbersome), but, on the other hand, strace supports…
[03:46] <eSyr-ng_> … pretty old kernels, and distributions tend to be pretty conservative regarding version of packages being included, so some userspace solution might be also useful for the several years after the possible kernel API accepted mainstream.

也就是说,如果给之后版本的内核添加了这个 strace 补丁,之前版本的内核的 strace 可能还是无法支持 namespace 。
希望学长学姐看到这个邮件,能够给出一些建议,十分感谢!
最后谢谢建军学长在下午阿里云宣讲会会后给我的建议。


在 2018年3月14日星期三 UTC+8上午12:32:01,赖伟峰写道:

Dizhi

unread,
Mar 16, 2018, 9:29:46 PM3/16/18
to 西邮Linux兴趣小组
赖同学,我是今年GSoC另外一个项目的mentor,但对strace并不了解,所以只能根据问题和你分析的描述给出一些通用建议。

1,关于GSoC idea的任何技术问题,发问的最佳场所就是该开源组织的邮件列表,在线讨论组。从哪里你能更加及时的得到专业建议。

2,无论你对这个idea的想法有多么粗糙。一般参加GSoC的mentor都希望你能够将proposal尽早的发给他们(公开在邮件组或私下发给几个重要mentor都可以)。越早发,越早听取建议,越有希望被录取。

如果你已经做了上述两点,在这里只是想寻求更多的帮助,请自动忽略我的这封信:)

最后祝你好运!

周迪之
2003级 计算机网络工程专业
GSoC ns-3项目mentor
加拿大

赖伟峰

unread,
Mar 16, 2018, 11:18:32 PM3/16/18
to 西邮Linux兴趣小组
谢谢周学长,我已经通过IRC和maiing list和主要的mentor 取得了联系,只是有些疑惑想在这里谈谈。 IRC 的人都不说话,可能是因为大家都是竞争关系吧,得不到什么有帮助的建议

在 2018年3月17日星期六 UTC+8上午9:29:46,Dizhi写道:

赖伟峰

unread,
Mar 17, 2018, 4:51:31 AM3/17/18
to 西邮Linux兴趣小组
我向mentor咨询了我的想法,mentor的回信如下,供大家参考:
  On Sat, Mar 17, 2018 at 10:52:39AM +0800, WeiDeng Lai wrote:
> mounting /proc whenever we enter the new name space.

How do you expect to do this, taking into account the fact that strace
process doesn't normally have CAP_SYS_ADMIN?

> To complete this requirement,we can make a try to add a
> new kernel API for trans_pid between different pid_namespaces,such as patch
> in link: * https://lkml.org/lkml/2018/3/6/593
> <https://lkml.org/lkml/2018/3/6/593> *.

Note Eric Biederman's comments there[1]. Please also refer to the
discussion related to the previous version of the patch[2]. How do you
expect to address the objections raised there in order to have the API
accepted in the kernel's upstream?

> a few days ago,I talk with my  seniors of community,we have a consistent
> point that add a new kernel API may a good idea,we can apply patch on later
> kernel versions,and modify it so that patch can apply on 3.x to now.If it
> make sense,I'll do this.

Note that stable upstream kernels do not normally accept new features.
And downstream kernels are also quite hesitant in doing so.

> I don't hatch other methods,can someone provide some information or
> documents for my reference?

There are NSFS_* ioctls present that can be used for (PID) namespace
tree traversal[3]. Along with inspection of *id fields in
/proc/<pid>/status, the available information information is sufficient
for deriving the needed PID in strace's PID NS (having /proc mounted
with different PID NS quite complicates things but still manageable).

[1] https://lkml.org/lkml/2018/3/13/1544
[2] https://lkml.org/lkml/2017/10/13/177
[3] http://blog.man7.org/2016/12/introspecting-namespace-relationships.html

在 2018年3月15日星期四 UTC+8下午7:21:31,赖伟峰写道:

Dizhi

unread,
Mar 17, 2018, 6:54:39 AM3/17/18
to 西邮Linux兴趣小组
Great! 注意proposal被选中的最核心部分就是让人信服你能在三个月内完成这个项目,即你proposal中计划完成的内容和时间安排。具体解决方案部分一般只要给出大致思路即可,让人觉得可行,不需要过于精细。

从这个mentor的回复来看还是比较细致的。说明他/她对你的兴趣比较重视。只要有学生来咨询自己提出的GSoC idea,mentor一般都是比较开心的。所以继续发问讨论,不要管问题是否高级深刻。哈哈

赖伟峰

unread,
Mar 18, 2018, 1:04:11 AM3/18/18
to 西邮Linux兴趣小组
收到了mentor的回复并看完了mentor提供的文档,先总结一下之前的误区:
1stare大部分情况下都没有 CAP_SYS_ADMIN 权限,而挂载 /proc 需要root权限,.所以不断挂载 /proc 是不可取的
2.添加系统调用的办法很难被上流 / 下流 kernel 通过,所以之前的想法基本就被mentor PASS 掉了
幸亏mentor给出了新的建议,那就是从 /proc/[pid]/ns/* 入手。通过查阅了许多资料了解了namespace的结构、操作以及kernel 4.9 的新特性,发现了许多可用于显示不同命名空间中的PID的东西:
在 Michael Kerrisk 的这篇文章里 (link: http://blog.man7.org/2016/12/introspecting-namespace-relationships.html),阐述了不同 namespace 的 relationship 以及之间的联系 ,
描述了 linux kernel 4.9 重要的一个特性:支持使用文件描述符 fd 绑定一个(卸载了的)命名空间中的对象。利用这个特性,我们可检查所有进程的 /proc/[pid]/ns/* 文件,就可以构建一个
含所有在 pid_namespaces 中的进程、具有层次结构的pid_namespaces 映射,利用这个映射就可以实现系统上所有进程都可以实时发现系统上的 PID 和用户命名空间的结构层次。
参考了上面贴出 link 的文章中 go 语言版本的代码,我想目前我可以做的是利用之前学习从 《Introduce to algorithm》的知识,拿一些高级数据结构去优化检索过程,请问我这样的想法有没有一些
偏差呢?有没有什么更好建议?
还有一个点我也没有想明白,就是在检查 /proc/[pid]/ns/* 文件的时候可能还是会需要 CAP_SYS_ADMIN,有没有比较好的思路来解决这个问题呢?

在 2018年3月17日星期六 UTC+8下午4:51:31,赖伟峰写道:

赖伟峰

unread,
Mar 22, 2018, 10:01:49 AM3/22/18
to 西邮Linux兴趣小组
感觉参与 strace 项目报名的人非常多啊,许多人的想法都不错。贴一份写的比较好的报名邮件,供自己也是供后来人参考:

Hello,
Greetings from my side, I'm interested in working on project "Adding
support for alternative tracing backends".
I have taken references from strace mailing list archive and other
articles. I understand the basic underlying approach about what needs
to be done, please correct me I'm getting into wrong direction.
Any suggestions will be really appreciated.
Abstract:
1. Add backend interface to allow for multiple alternative backends.
2. Using gdbserver:
With implementation of catch syscalls in gdbserver which adds a new
QCatchSyscalls packet to enable 'catch syscall', and newstop reasons
"syscall_entry" and "syscall_return" for those events.
GDB can catch some or all of the syscalls issued by the debuggee, and
show the related information for each syscall. If no argument is
specified, calls to and returns from all system calls will be caught.
Basic implementation idea:
strace talks to gdbserver via gdbserver backend
strace sends packet: $vCont;c  (continue)
strace receives packet:T05syscall_entry:
16;06:b0e2ffffff7f0000;07:68e2ffffff7f0000;10:27a9b0f7ff7f0000;thread:p2162.2162;core:5;
strace sends packet: $g (get registers)
strace receives packet: daffffffffffffff0000000000000000...
I plan to use previous patches:[1], [2], [3]
3. Using ftrace:
kprobe, uprobe and kernel tracepoint scripts make use of ftrace - This
allows tracing kernel functions, stack tracing and debugging crash.
write to files in /sys/kernel/debug/tracing and reading output from
/sys/kernel/debug/tracing.
Detailed description:  [4].
4. Perf events:
Call  perf_event_open syscalls , kernel writes events to ring buffer
in user-space, read tracepoints from ring buffer.
Various ioctl like PERF_EVENT_IOC_ENABLE and PERF_EVENT_IOC_DISABLE
also act on perf_event_open() file descriptors, allowing enabling and
disabling the individual counter or event group specified by the file
descriptor argument respectively. [5]

[1]. https://lists.strace.io/pipermail/strace-devel/2017-February/005985.html
[2]. https://lists.strace.io/pipermail/strace-devel/2017-February/005986.html
[3]. https://lists.strace.io/pipermail/strace-devel/2017-February/005987.html
[4]. http://www.linuxjournal.com/article/6100
[5]. http://man7.org/linux/man-pages/man2/perf_event_open.2.html
Thanks for your time.
Best Regards,
Harsha Sharma

我觉得其中比较好的地放在于:
  1.将问题的需求抽象了出来,这么做的好处是方便描述需求。
  2.不闭门造车,基于以前 contributer 的 patch 的基础上描述了大致的方向
  3.对某些部分深度思考并动手实践,show thoughts with code
  4.态度友好

GSOC果然高手云集。
但是目前还没有看到有已经提交正确 Patch 的人存在,我还是有机会的,给自己加油。(纯属安慰自己 ;-)

在 2018年3月18日星期日 UTC+8下午1:04:11,赖伟峰写道:

赖伟峰

unread,
Mar 27, 2018, 4:24:00 AM3/27/18
to 西邮Linux兴趣小组
mentor又回复我了,解决了我对于查看ns信息的权限受限的疑问,呼呼,非常有启发性!我可以通过映射父命名空间的uid到子命名空间的uid,访问子命名空间的 /proc/[pid]/ns/* 的文件,原理是unshare可以实现一个用户在父命名空间中是普通用户,而在子命名空间是超级用户,但用户在其余非子命名空间还是普通用户,不能访问超级用户能访问的资源。
mentor回复的邮件如下:
Yes, a PID NS tree should be built (at least to the point the desired
information is obtained) in order to perform the translation. As I said,
the endeavor can be complicated by the fact that /proc can be mounted from
the alien PID namespace, but in that case we can just bail out early, as
it is not a normal setup (however, pretty much possible).
Note that since this is involves quite a lot of syscalls, some form of
caching should be implemented. It is also complicated by the fact that
processes can come and go between queries, so we should account for that
somehow ({i,fa}notify?).
> BWT, there is another problem I don't how to solve it. it needs
> CAP_SYS_ADMIN when system check the contents of  /proc/[pid]/ns/* .
> that means strace need CAP_SYS_ADMIN  privileges still. Is there some
> better ways to solve this problem?
Why are you saying that CAP_SYS_ADMIN is needed? It perfectly works without it.
pts/15, esyr@asgard: /tmp % sudo unshare -p --fork su - esyr -c 'sleep 100' &
[2] 18281
pts/15, esyr@asgard: /tmp % cat ns.c
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/ioctl.h>
#include <sys/ptrace.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#define NSIO    0xb7
#define NS_GET_PARENT           _IO(NSIO, 0x2)
int main(int argc, char **argv)
{
        int target_pid = strtol(argv[1], NULL, 0);
        char *path;
        struct stat st;
        int pidns_fd;
        int pidns_fd_parent;
        asprintf(&path, "/proc/%d/ns/pid", target_pid);
        assert(!ptrace(PTRACE_SEIZE, target_pid));
        pidns_fd = open(path, O_RDONLY);
        assert(pidns_fd >= 0);
        printf("pidns_fd = %d\n", pidns_fd);
        assert(!fstat(pidns_fd, &st));
        printf("pid ns inode: %llu\n", (unsigned long long) st.st_ino);
        pidns_fd_parent = ioctl(pidns_fd, NS_GET_PARENT);
        assert(pidns_fd_parent >= 0);
        printf("pidns_fd_parent = %d\n", pidns_fd_parent);
        assert(!fstat(pidns_fd_parent, &st));
        printf("parent pid ns inode: %llu", (unsigned long long) st.st_ino);
        return 0;
}
pts/15, esyr@asgard: /tmp % gcc ns.c -o ns
pts/15, esyr@asgard: /tmp % ./ns $(pgrep -f '^sleep 100$')
pidns_fd = 3
pid ns inode: 4026532513
pidns_fd_parent = 4
parent pid ns inode: 4026531836
pts/15, esyr@asgard: /tmp % ls -la /proc/$(pgrep -f '^sleep 100$')/ns/pid
lrwxrwxrwx 1 esyr esyr 0 Mar 26 13:54 /proc/18284/ns/pid -> pid:[4026532513]
pts/15, esyr@asgard: /tmp % ls -la /proc/self/ns/pid
lrwxrwxrwx 1 esyr esyr 0 Mar 26 13:55 /proc/self/ns/pid -> pid:[4026531836]
pts/15, esyr@asgard: /tmp %

还有7h就提交proposal了,希望自己有幸能被选上

在 2018年3月22日星期四 UTC+8下午10:01:49,赖伟峰写道:
Reply all
Reply to author
Forward
0 new messages