Hello,
Currently I am trying to develop some swap backend for OSv but the performance is not ideal so I am profiling to see where the bottleneck is. However I have encountered the following problems
1) I first used the Timed Tracepoints, which was described by the Wiki. It is useful besides I have to add tracepoints in my code manually. The problem really is that I found the Timed Tracepoints sometimes lost samples. For example, I have a function A that contains a loop of function B. In the first run, I put Timed Tracepoints only at the beginning and end of function A. In the second run, I put Timed Tracepoints for both A and B but I saw the number of samples of A is significantly less and total time significantly less. I found the total number of Tracepoint samples is similar. I wonder whether there is a precision limit or a limit on the number of samples the Timed Tracepoints can collect.
2) I then tried to use the Sampler Tracepoints but ./trace summary --timed tells me that 60% is spent on sched::cpu::idle and another 39% on sched::cpu::do_idle which contradicts the result from the Timed Tracepoints. At least it should show the function A in the backtrace. I wonder whether there is any guide on how to use the Sampler properly besides the Wiki.
3) Finally I tried to use perf: perf kvm --guestvmlinux=build/release/loader.elf record -p pid but I got this error "Couldn't record guest kernel [0]'s reference relocation symbol.".Though perf is woken up to write data no sample is reccorded. Is there anyone who has also gotten this error?
If anyone has more advice on how to do perf on OSv besides the wiki, I am also willing to listen. Thanks very much!
Best Wishes
Pan