Thank you for pointing this out. I had been using -F for exactly the
reason that you mentioned. I failed to include it in the command that
I sent along. Very sorry for the confusion. Here is an updated version
of the command that I issued:
sudo ./trace-cmd record -e exceptions:page_fault_user -T --profile -l
handle_mm_fault -F ../one_page_play/page
and I generated output like
./trace-cmd report --profile
and I see the following (among some other output):
Event: page_fault_user:0x7f094f7dd104 (1)
Event: page_fault_user:0x4000e0 (1)
Event: page_fault_user:0x7f094f7eae4a (1)
Event: page_fault_user:0x7f094f860d40 (1)
Event: page_fault_user:0x7f094f7db560 (1)
Event: page_fault_user:0x4040cb (1)
Event: page_fault_user:0x401825 (1)
Event: page_fault_user:0x401473 (1)
Event: page_fault_user:0x7f094f7e64c4 (1)
Event: page_fault_user:0x7f094f7f1212 (1)
That output comes from under the task: page-<pid> heading, so it seems
like those faults are being attributed to the page task.
This command seems to show something interesting:
sudo ./trace-cmd record -e exceptions:page_fault_user -p
function_graph -g __do_fault -F ../one_page_play/page
and the relevant output from
./trace-cmd report --profile
is
task: page-4032
Event: func: __do_fault() (4) Total: 6685 Avg: 1671 Max:
2398(ts:170150.060916) Min:855(ts:170150.054713)
Event: page_fault_user:0x7ffad3143d40 (1)
Event: page_fault_user:0x4000e0 (1)
Event: page_fault_user:0x401473 (1)
Event: page_fault_user:0x7ffad30c94c4 (1)
This is closer to what I would expect. The first of the two 0x4...
addresses is the entry point and the second is the target. Basically,
that is exactly what I expect. The other two are the "suspicious"
entries. Neither matches the copy_user_enhanced_fast_string symbol
location and are not loaded in the binary (according to gdb).
It is odd to me that the output from the previous command includes
information about the trace-cmd process since I specified the -F
option.
But, back to exactly what you asked. Here is the result of running
perf again with the -a option. I ran this command:
sudo perf record --call-graph fp -e page-faults -a ../one_page_play/page
50.41% page
ld-2.19.so [.] do_lookup_x
|
--- do_lookup_x
44.21% perf [kernel.kallsyms] [k] iov_iter_fault_in_readable
|
--- iov_iter_fault_in_readable
generic_file_buffered_write
__generic_file_aio_write
generic_file_aio_write
ext4_file_write
do_sync_write
vfs_write
sys_write
system_call_fastpath
__write_nocancel
0x4081a5
0x407a40
__libc_start_main
4.13% perf perf [.] 0x0000000000015b54
|
--- 0x415b54
0x4081a5
0x407a40
__libc_start_main
0.41% page page [.] _start
|
--- _start
0.41% page page [.] target
|
--- target
0.41% page [kernel.kallsyms] [k] copy_user_enhanced_fast_string
|
--- copy_user_enhanced_fast_string
load_elf_binary
search_binary_handler
do_execve_common.isra.23
sys_execve
stub_execve
__execve
What is interesting, is that the output differs based on whether I've
dropped the kernel caches before I run the perf record. When I do
that, there are no page faults attributed to the entry point or the
target of the program. I would imagine that, after dropping caches,
the readahead handler picks up those pages when the binary is loaded
and negates the need for a page fault. That, indeed, seems to be the
case. I can see that when I run perf with an additional
block:block_rq_issue event. Immediately after dropping the caches,
there is a block request event. On subsequent executions, there is no
such event but there are the page faults that I expect.
What I did notice is that some of the page faults come from the task
before execve is called. From what I've seen, this looks like the
kernel reclaiming pages from the spawning process before it is
replaced with the new binary (during the call to execve). After the
execve, there seem to be to page faults:
page-4613 [006] 171795.748310: funcgraph_entry:
0.151 us | mutex_unlock();
page-4613 [006] 171795.748313: funcgraph_entry:
0.166 us | __fsnotify_parent();
page-4613 [006] 171795.748313: funcgraph_entry:
0.321 us | fsnotify();
page-4613 [006] 171795.748314: funcgraph_entry:
0.090 us | __sb_end_write();
page-4613 [006] 171795.748317: funcgraph_entry:
| trace_do_page_fault() {
page-4613 [006] 171795.748317: page_fault_user:
address=__per_cpu_end ip=__per_cpu_end error_code=0x4
page-4613 [006] 171795.748319: funcgraph_exit:
2.254 us | }
page-4613 [006] 171795.748321: funcgraph_entry:
| trace_do_page_fault() {
page-4613 [006] 171795.748322: page_fault_user:
address=__per_cpu_end ip=__per_cpu_end error_code=0x14
page-4613 [006] 171795.748323: funcgraph_exit:
1.144 us | }
NOTICE THIS:
page-4613 [006] 171795.748324: funcgraph_entry:
| sys_execve() {
page-4613 [007] 171795.748391: block_rq_issue: 8,0
R 0 () 764812912 + 16 [trace-cmd]
page-4613 [005] 171795.759476: funcgraph_exit: #
11152.111 us | }
page-4613 [005] 171795.759477: funcgraph_entry:
3.745 us | do_notify_resume();
page-4613 [005] 171795.759481: funcgraph_entry:
| trace_do_page_fault() {
page-4613 [005] 171795.759482: page_fault_user:
address=__per_cpu_end ip=__per_cpu_end error_code=0x14
page-4613 [005] 171795.759487: funcgraph_exit:
5.833 us | }
page-4613 [005] 171795.759488: funcgraph_entry:
| trace_do_page_fault() {
page-4613 [005] 171795.759489: page_fault_user:
address=__per_cpu_end ip=__per_cpu_end error_code=0x14
page-4613 [005] 171795.759490: funcgraph_exit:
2.003 us | }
page-4613 [005] 171795.759492: funcgraph_entry:
| sys_exit() {
I wish that I could get something "better" than "
address=__per_cpu_end ip=__per_cpu_end error_code=0x14" for those page
faults. That would really tell me more about whether this is the
"correct" behavior.
As ever, thank you very much for your help! Using these tools has been
an incredible learning experience. I still think that I am just
missing something stupid, but I really appreciate your patience.
Thanks again!
Will
> -- Steve