Bear in mind that object allocations (e.g. allocating Object[], but also allocating Integer, Double from boxing) may be under-reported by a profiler. They show up as greater GC load, and also slower memory accesses due to fragmentation.
I run FoodBench, generate .csv files containing a line per query, then use Optiq to analyze those files. (Eating my own dogwood!) I’m interested in finding queries whose planning or execution time has gotten significantly better or worse.
$ head ../share/foodbench/optiq.03.csv
ID:int,ROWS:int,TOTAL:long,PREP:long,EXEC:long
1,2,1646939000,1447087000,199852000
2,1,118903000,89740000,29163000
3,1,1013595000,41812000,971783000
4,1,52523000,37076000,15447000
5,1,461499000,408529000,52970000
6,1,66348000,29690000,36658000
7,1,61776000,21744000,40032000
8,236,1079647000,492799000,586848000
9,1,54929000,13741000,41188000
0: jdbc:optiq:model=foodbench.json> select * from (select * from "optiq.03" order by prep desc limit 10) order by id;
+----+------+-------+------+------+
| ID | ROWS | TOTAL | PREP | EXEC |
+----+------+-------+------+------+
| 1 | 2 | 1646939000 | 1447087000 | 199852000 |
| 26 | 23 | 1945489000 | 1480449000 | 465040000 |
| 79 | 143 | 2160628000 | 1819281000 | 341347000 |
| 119 | 39 |
4386637000 | 3488263000 | 898374000 |
| 124 | 3 | 2735263000 | 1906179000 | 829084000 |
| 147 | 12 |
2502544000 | 2007991000 | 494553000 |
| 193 | 1 | 2453153000 | 1994598000 | 458555000 |
| 194 | 13 | 2445132000 | 1968425000 | 476707000 |
| 195 | 3 | 2444807000 | 1983243000 | 461564000 |
| 196 | 13 | 2419266000 | 1980136000 | 439130000 |