I've been trying to profile a CLI tool I use. I used both node's --prof/--prof-process and using v8-profiler + chromium to inspect the results. Both techniques give me the same result, which I can't fully understand. Because the latter technique is GUI only (!!!), I will just paste (part of) the result from the first one:
[Summary]:
ticks total nonlib name
11097 60.1% 65.2% C++
5911 32.0% 34.7% JavaScript
1453 7.9% Shared libraries
163 0.9% 1.0% GC
7 0.0% Unaccounted
[C++ entry points]:
ticks cpp total name
3493 28.5% 18.9% v8::internal::Runtime_StringReplaceGlobalRegExpWithString(int, v8::internal::Object**, v8::internal::Isolate*)
2054 16.7% 11.1% v8::internal::Runtime_Apply(int, v8::internal::Object**, v8::internal::Isolate*)
1900 15.5% 10.3% v8::internal::Runtime_KeyedGetProperty(int, v8::internal::Object**, v8::internal::Isolate*)
892 7.3% 4.8% v8::internal::Runtime_BoundFunctionGetBindings(int, v8::internal::Object**, v8::internal::Isolate*)
[JavaScript]:
ticks total nonlib name
484 2.6% 2.8% LazyCompile: *replace native string.js:146:23
399 2.2% 2.3% Stub: CEntryStub
295 1.6% 1.7% Stub: StringAddStub_CheckNone_NotTenured
269 1.5% 1.6% LazyCompile: ~InnerArrayFilter native array.js:891:26
209 1.1% 1.2% LazyCompile: ~<anonymous> native v8natives.js:1199:16
205 1.1% 1.2% Builtin: ArgumentsAdaptorTrampoline
163 0.9% 1.0% LazyCompile: ~tree.Filterset.cloneWith /var/lib/data/mdione/src/system/osm/carto/lib/carto/tree/filterset.js:59:46
[Shared libraries]:
ticks total nonlib name
1241 6.7% /usr/bin/nodejs
205 1.1% /lib/x86_64-linux-gnu/libc-2.24.so
3 0.0% /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.24
2 0.0% [vdso]
2 0.0% /lib/x86_64-linux-gnu/libm-2.24.so
[Bottom up (heavy) profile]:
Note: percentage shows a share of a particular caller in the total
amount of its parent calls.
Callers occupying less than 2.0% are not shown.
ticks parent name
1241 6.7% /usr/bin/nodejs
537 43.3% v8::internal::Runtime_Apply(int, v8::internal::Object**, v8::internal::Isolate*)
534 99.4% LazyCompile: ~<anonymous> native v8natives.js:1199:16
534 100.0% LazyCompile: ~InnerArrayFilter native array.js:891:26
534 100.0% LazyCompile: *filter native array.js:915:21
464 86.9% LazyCompile: *tree.Variable.ev /var/lib/data/mdione/src/system/osm/carto/lib/carto/tree/variable.js:16:17
55 10.3% LazyCompile: *tree.Operation.ev /var/lib/data/mdione/src/system/osm/carto/lib/carto/tree/operation.js:16:39
15 2.8% LazyCompile: ~tree.Variable.ev /var/lib/data/mdione/src/system/osm/carto/lib/carto/tree/variable.js:16:17
355 28.6% v8::internal::Runtime_StringReplaceGlobalRegExpWithString(int, v8::internal::Object**, v8::internal::Isolate*)
355 100.0% LazyCompile: *replace native string.js:146:23
55 15.5% LazyCompile: *tree.Filterset.addable /var/lib/data/mdione/src/system/osm/carto/lib/carto/tree/filterset.js:105:44
55 100.0% LazyCompile: *tree.Filterset.cloneWith /var/lib/data/mdione/src/system/osm/carto/lib/carto/tree/filterset.js:59:46
41 74.5% LazyCompile: *addRules /var/lib/data/mdione/src/system/osm/carto/lib/carto/renderer.js:473:18
12 21.8% LazyCompile: *foldStyle /var/lib/data/mdione/src/system/osm/carto/lib/carto/renderer.js:611:19
2 3.6% LazyCompile: ~foldStyle /var/lib/data/mdione/src/system/osm/carto/lib/carto/renderer.js:611:19
36 10.1% Handler: 67092480#__default__#line-width {5}
31 86.1% LazyCompile: ~tree.Filterset.cloneWith /var/lib/data/mdione/src/system/osm/carto/lib/carto/tree/filterset.js:59:46
22 71.0% LazyCompile: ~addRules /var/lib/data/mdione/src/system/osm/carto/lib/carto/renderer.js:473:18
9 29.0% LazyCompile: *foldStyle /var/lib/data/mdione/src/system/osm/carto/lib/carto/renderer.js:611:19
5 13.9% LazyCompile: *tree.Filterset.cloneWith /var/lib/data/mdione/src/system/osm/carto/lib/carto/tree/filterset.js:59:46
3 60.0% LazyCompile: *foldStyle /var/lib/data/mdione/src/system/osm/carto/lib/carto/renderer.js:611:19
2 40.0% LazyCompile: ~addRules /var/lib/data/mdione/src/system/osm/carto/lib/carto/renderer.js:473:18
36 10.1% Handler: 67043328#__default__#line-color
35 97.2% LazyCompile: ~tree.Filterset.cloneWith /var/lib/data/mdione/src/system/osm/carto/lib/carto/tree/filterset.js:59:46
24 68.6% LazyCompile: ~addRules /var/lib/data/mdione/src/system/osm/carto/lib/carto/renderer.js:473:18
11 31.4% LazyCompile: *foldStyle /var/lib/data/mdione/src/system/osm/carto/lib/carto/renderer.js:611:19
1 2.8% LazyCompile: *tree.Filterset.cloneWith /var/lib/data/mdione/src/system/osm/carto/lib/carto/tree/filterset.js:59:46
1 100.0% LazyCompile: *foldStyle /var/lib/data/mdione/src/system/osm/carto/lib/carto/renderer.js:611:19
[...]
You can find a complete tree here:
The first thing that strikes me is that the tree is reversed to what one would expect, but I can live with that. I fact, with the second method and chromium I can revert the tree.
The second thing is that the line numbers (the first ones just after the filename; what does the second one mean?) point to the line where the definition of the function in that output line begins, and not the source code line where the parent in this tree (not in the call stack) was called. So, for
1 2.8% LazyCompile: *tree.Filterset.cloneWith /var/lib/data/mdione/src/system/osm/carto/lib/carto/tree/filterset.js:59:46
1 100.0% LazyCompile: *foldStyle /var/lib/data/mdione/src/system/osm/carto/lib/carto/renderer.js:611:19
tree.Filterset.cloneWith() has been called from foldStyle(), but renderer.js:611 is the line where foldStyle() is defined, not where cloneWith() was called. This means that I can find the function where the calls happens (I already could, since I have its name), but not the call itself. That means that, if there were no JIT, and I had the classic, naïve, recursive implementation of Fibonacci(), and then called it twice from the same main() function, once for 5 and once for 100, I wouldn't know which *call* took more time[1].
But what completely baffles me is the fact that I don't know who Runtime_Apply() is or what it does; and more important, that there is no call to anything resembling a replace in tree.Filterset.addable(). Here,
take a look by yourself:
More questions: why does the tree stop there? Why doesn't it keep climbing up, showing who called tree.Variable.ev() or addRules()? And what are those Handlers? Why do tick counts not match between the Bottom Up tree and the C++ / Javascript summaries? The first one shows 355 ticks for Runtime_StringReplaceGlobalRegExpWithString and the latter ~3500!
And just to make this post complete, is there any line based profiler, even one that had a huge impact on the excution time, but that would provide me with more actionable data? Can I turn off the JIT to
profile the original code? I noticed that if I added some memoizing code in several of the methods (toString() and toObject() for many of the classes in lib/carto/tree) I get even slightly *worse* runtimes, so i guess the JIT is very clever with these methods. Maybe it's implementing memoizing itself, more efficiently that I could ever do.
I hope you can help me to get at least some of these questions answered
--
[1] To be fair, most probably in the implementation of such function there would be a line that reads:
return fib(n-1) + fib(n-2);
and the profiler would not be able to distinguish one call from the other.