It seems to me that the test instances are enough, there are 12 problems of a benchmark. in the last iteration the metric parameters result as show. This can be classified as heterogeneous scenarios?
+-+-----------+-----------+-----------+---------------+-----------+--------+-----+----+------+
| | Instance| Alive| Best| Mean best| Exp so far| W time| rho|KenW| Qvar|
+-+-----------+-----------+-----------+---------------+-----------+--------+-----+----+------+
|x| 15| 9| 116| -4.000000000| 9|00:01:05| NA| NA| NA|
|x| 1| 9| 116| 6.500000000| 13|00:00:27|+0.00|0.50|0.4444|
|x| 2| 9| 116| 4.666666667| 17|00:00:27|+0.00|0.33|0.2963|
|x| 14| 9| 116| 3.750000000| 21|00:00:27|+0.00|0.25|0.2222|
|=| 10| 9| 75| 2223.000000| 25|00:00:29|-0.06|0.15|0.3867|
|=| 4| 9| 75| 1851.500000| 29|00:00:27|-0.05|0.13|0.3171|
|=| 6| 9| 75|
617.9168911| 33|00:00:27|-0.04|0.11|0.2688|
|=| 3| 9| 75| 540.1772797| 37|00:00:27|-0.03|0.10|0.2333|
|=| 5| 9| 75| 480.1853542| 41|00:00:27|-0.01|0.10|0.3076|
|=| 11| 9| 116|
843.6334197| 45|00:00:29|-0.03|0.08|0.3641|
|=| 7| 9| 116| 767.0303815| 49|00:00:27|-0.00|0.09|0.4163|
|=| 8| 9| 116| 1836.502267| 53|00:00:28|+0.02|0.10|0.4397|
|=| 12| 9| 116| 1710.115258| 57|00:00:29|+0.03|0.11|0.4585|
|=| 13| 9| 116| 1589.178454| 61|00:00:27|+0.03|0.10|0.4272|
|=| 9| 9| 116| 2716.581301| 65|00:00:28|+0.03|0.10|0.4513|
+-+-----------+-----------+-----------+---------------+-----------+--------+-----+----+------+